UserPreferences

Is there a maximum size of a u-form?


Strictly speaking, no. In practice, the maximum size of a u-form is limited by two considerations.

First, large individual attribute values are unwieldy. The standard repository interface provides fairly limited and not particularly well optimized techniques for operating on a value without loading the entire value into RAM. Therefore, if an attribute value is larger than the size of RAM on a given machine, it is impractical to use on that machine. In practice, the real limit can be considerably lower. We have seen serious performance problems with attribute values larger than 100 megabytes, even on fairly powerful machines (circa 2004). Future repository implementations may mitigate this, and it will always be relative to machine performance.

Second, the first time a u-form is shepherded between venues, all its attribute-value pairs must be moved. Subsequent shepherd operations must still evaluate checksums on each attribute. So a u-form with a very large number of attributes is expensive to shepherd given current shepherd technology.

[Note from Pete: The same logic that we use to argue against "Whole u-form access" to repositories can be applied to suggest that "whole u-form shepherding" is not adequate. It may be that we need to support per-attribute shepherding. At minimum, it is fairly clear to me that one can never assume that all the attributes of a u-form are necessarily available at a given venue.]

Early versions of the repository stored UUIDs in a B-tree, but stored attribute-value pairs as a simple list, keyed off the UUID. This required a linear scan to fetch a single attribute, meaning that large u-forms incurrent a long lookup cost even if not all attributes were required. Subsequent versions of the repository store UUID-attribute pairs as keys in the B-tree, so that attribute lookup is inexpensive regardless of the number of attributes on a given u-form.