Grouping Entries

An abstract, three-dimensional coordinate system. The axes are labelled "subspace", "path", and "timestamp".
The three dimensions of a namespace.

Willow lets authors place Entries in namespaces, and within each namespace, Entries are arranged according to three orthogonal dimensions: subspace_id, path, and timestamp. This suggests a powerful way of thinking about Willow: a namespace is a collection of points (Entries) in a three-dimensional space. Or more accurately, a namespace is a mapping from points in this three-dimensional space to hashes and sizes of Payloads.

This viewpoint enables us to meaningfully group Entries together. An application might want to access all chess games that a certain author played in the past week. This kind of query corresponds to a box (a rectangular cuboid to use precise terminology) in the three-dimensional Willow space.

In this document, we develop and define a vocabulary for grouping Entries based on their subspace_ids, paths, and timestamps. These definitions are not necessary for defining and understanding the core data model, but we make heavy use of them in our recommended capability system and our recommended synchronisation protocol.

Ranges

Ranges are simple, one-dimensional ways of grouping Entries, they can express groupings such as “last week’s Entries”. A range is either a closed range or an open range. A closed range consists of a start value and an end value, an open range consists only of a start value. A range includes all values greater than or equal to its start value and strictly less than its end value (if it is has one). A range is empty if it includes no values.

A visualisation of both a closed range and an open range as contiguous segments of a line of numbers.
A closed range and an open range.

The Willow protocols use three types of ranges:

A SubspaceId must be greater than or equal to this to be included in the range. The ordering must be given by a protocol parameter.
If open, the range is an open range. Otherwise, a SubspaceId must be numerically strictly less than this to be included in the range. The ordering must be given by a protocol parameter.
A range of Paths.
struct PathRange
A Path must be lexicographically greater than or equal to this to be included in the range.
If open, the range is an open range. Otherwise, a Path must be lexicographically strictly less than this to be included in the range.
struct TimeRange
A Timestamp must be numerically greater than or equal to this to be included in the range.
If open, the range is an open range. Otherwise, a Timestamp must be numerically strictly less than this to be included in the range.

Let r1 and r2 be ranges (over the same types of values). The intersection of r1 and r2 is the range whose start value is the greater of the start values of r1 and r2, and whose end value is the lesser of the end values of r1 and r2 (if both are closed ranges), the one end value among r1 and r2 (if exactly one of them is a closed range), or no end value (if both r1 and r2 are open ranges).

A three-dimensional coordinate system. On each axis, there is a contiguous, highlighted one-dimensional subrange. On the time axis, the subrange is labeled "11:30 to 12:30". On the path axis, it is labelled "blog to chat". On the subspace axis, it is labelled "Alfie to Betty". These three one-dimensional ranges together describe a three-dimensional box. This box is highlighted in the depicted space.
A 3dRange composed of a SubspaceRange, PathRange, and TimeRange.

When we combine ranges of all three dimensions, we can delimit boxes in Willow space.

A three-dimensional range that includes every Entry included in all three of its ranges.
struct 3dRange

A 3dRange includes every Entry whose subspace_id, path, and timestamp are all included their respective range.

We define default_3d_range(default_subspace) to denote the 3dRange with the following members:

Areas

3dRanges are a natural way of grouping Entries, but they have certain drawbacks around encrypted data in Willow: when encrypting Paths, for example, the lexicographic ordering of the encrypted Paths is inconsistent with the ordering of the unencrypted Paths. Similarly, SubspaceRanges do not preserve their meaning under encryption either. Hence, user-specified 3dRanges are close to useless when dealing with encrypted data.

Fortunately, there do exist encryption techniques that preserve some weaker properties than arbitrary orderings.See here for information on encrypting Willow. Without going into the cryptographic details, we now define an alternative to 3dRanges that can be used even when encrypting Paths and SubspaceIds.

Every Area can be expressed as a 3dRange, but not the other way around. Areas always denote boxes in Willow space, but some boxes do not correspond to any Area.Like the diagram for 3dRanges, a coordinate system with a highlighted box. The time dimension of the box is given as a time range, but the path and subspace dimensions are specified from a single path ("blog") and a single subspace ("Alfie") respectively.
This diagram attempts to show the key difference between a 3dRange and an Area, namely that its dimensions are derived from its subspace_id and its path.
A grouping of Entries.
struct Area
To be included in this Area, an Entry’s subspace_id must be equal to the subspace_id, unless it is any.
To be included in this Area, an Entry’s path must be prefixed by the path.
To be included in this Area, an Entry’s timestamp must be included in the times.

An Area a includes an Entry e if

An Area is empty if it includes no Entries. This is the case if and only if its times is empty.

An Area includes another Area if the first Area includes all Entries that the second Area includes. In particular, every Area includes itself.

The full area is the Area whose subspace_id is any, whose path is the empty Path, and whose times is the open TimeRange with start It includes all Entries.

The subspace area of the SubspaceId sub is the Area whose subspace_id is sub, whose path is the empty Path, and whose times is the open TimeRange with start It includes exactly the Entries with subspace_id sub.

If two Areas overlap, the overlap is again an Area. Let a1 and a2 be Areas. If there exists at least one Entry included in both a1, and a2, then we define the (nonempty) intersection of a1, and a2 as the Area whose

Areas of Interest

Occasionally, we wish to group Entries based on the contents of some store. For example, a space-constrained peer might ask for the 100 newest Entries when synchronising data.

We serve these use cases by combining an Area with limits to restrict the contents to the Entries with the greatest timestamps.

A grouping of Entries that are among the newest in some store.
To be included in this AreaOfInterest, an Entry must be included in the area.
To be included in this AreaOfInterest, an Entry’s timestamp must be among the max_count greatest Timestamps, unless max_count is zero.
The total payload_lengths of all included Entries is at most max_size, unless max_size is zero.

An AreaOfInterest aoi includes an Entry e from a store store if

Let aoi1 and aoi2 be AreaOfInterests. If there exists at least one Entry included in both aoi1.area, and aoi2.area, then we define the (nonempty) intersection of aoi1, and aoi2 as the AreaOfInterest whose