In this post I am going to describe some work I’ve done recently on a system of Scala traits that support tree-based collection algorithms prefix-sum, nearest key query and value increment in a mixable format, all backed by Red-Black balanced tree logic, which is also a fully inheritable trait.
(update) Since I wrote this post, the code has evolved into a library on the isarn project. The original source files, containing the exact code fragments discussed in the remainder of this post, are preserved for posterity here.
This post eventually became a bit more sprawling and “tl/dr” than I was expecting, so by way of apology, here is a table of contents with links:
- Motivating Use Case
- Library Overview
- A Red-Black Tree Base Class
- Node Inheritance Example: NodeMap[K,V]
- Collection Trait Example: OrderedMapLike[K,V,IN,M]
- Collection Example: OrderedMap[K,V]
- Finale: Trait Mixing
A Motivating Use Case
The skeptical programmer may be wondering what the point of Yet Another Map Collection really is, much less an entire class hierarchy. The use case that inspired this work was my project of implementing the t-digest algorithm. Discussion of t-digest is beyond the scope of this post, but suffice it to say that constructing a t-digest requires the maintenance of a collection of “cluster” objects, that needs to satisfy the following several properties:
- an entry contains one or more cluster objects at a given numeric location
- entries are maintained in a numeric key order
- entries will be frequently inserted and deleted, in arbitrary order
- given a numeric key value, be able to find the entry nearest to that value
- given a key, compute a prefix-sum for that value
- all of the above should be bounded by logarithmic time complexity
Propreties 2,3 and 6 are commonly satisfied by a map structure backed by some variety of balanced tree representation, of which the best-known is the Red-Black tree.
Properties 1, 4 and 5 are more interesting. Property 1 – representing a collection of multiple objects at each entry – can be accomplished in a generalizable way by noting that a collection is representable as a monoid, and so supporting values that can be incremented with respect to a user-supplied monoid relation can satisfy property-1, but also can support many other kinds of update, including but not limited to classical numeric incrementing operations.
Properties 4 and 5 – nearest-entry queries and prefix-sum queries – are also both supportable in logarithmic time using a tree data structure, provided that tree is balanced. Again, the details of the algorithms are out of the current scope, however they are not extremely complex, and their implementations are available in the code.
A reader with their software engineering hat on will notice that these properties are orthogonal. A programmer might be interested in a data structure supporting any one of them, or in some mixed combination. This kind of situation fairly shouts “Scala traits” (or, alternatively, interfaces in Java, etc). With that idea in mind, I designed a system of Scala collection traits that support all of the above properties, in a pure trait form that is fully “mixable” by the programmer, so that one can use exactly the properties needed, but not pay for anything else.
The library consists broadly of 3 kinds of traits:
- tree node traits – implement core tree support for some functionality
- collection traits – provide additional collection API methods the user
- collections – instantiate a usable incarnation of a collection
For the programmer who wishes to either create a trait mixture, or add new mixable traits, the collections also function as reference implementations.
The three tables that follow summarize the currently available traits of each kind listed above. They are (at the time of this posting) all under the package namespace
|Node[K]||redblack.tree||Fundamental Red-Black tree functionality|
|NodeMap[K,V]||ordered.tree||Support a mapping from keys to values|
|NodeNear[K]||nearest.tree||Nearest-entry query (key-only)|
|NodeNearMap[K,V]||nearest.tree||Nearest-entry query for key/value maps|
|NodeInc[K,V]||increment.tree||Increment values w.r.t. a monoid|
|NodePS[K,V,P]||prefixsum.tree||Prefix sum queries by key (w.r.t. a monoid)|
|OrderedSetLike[K,IN,M]||ordered||ordered set of keys|
|OrderedMapLike[K,V,IN,M]||ordered||ordered key/value map|
|NearestSetLike[K,IN,M]||nearest||nearest entry query on keys|
|NearestMapLike[K,V,IN,M]||nearest||nearest entry query on key/value map|
|IncrementMapLike[K,V,IN,M]||increment||increment values w.r.t a monoid|
|PrefixSumMapLike[K,V,P,IN,M]||prefixsum||prefix sum queries w.r.t. a monoid|
|OrderedMap[K,V]||ordered||ordered key/value map|
|NearestSet[K]||nearest||ordered set with nearest-entry query|
|NearestMap[K,V]||nearest||ordred map with nearest-entry query|
|IncrementMap[K,V]||increment||ordered map with value increment w.r.t. a monoid|
|PrefixSumMap[K,V,P]||prefixsum||ordered map with prefix sum query w.r.t. a monoid|
The following diagram summarizes the organization and inheritance relationships of the classes.
A Red/Black Tree Base Class
The most fundamental trait in this hierarchy is the trait that embodies Red-Black balancing; a “red-black-ness” trait, as it were. This trait supplies the axiomatic tree operations of insertion, deletion and key lookup, where the Red-Black balancing operations are encapsulated for insertion (due to Chris Okasaki) and deletion (due to Stefan Kahrs) Note that Red-Black trees do not assume a separate value, as in a map, but require only keys (thus implementing an ordered set over the key type):
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47
I will assume most readers are familiar with basic binary tree operations, and the Red-Black rules are described elsewhere (I adapted them from the Scala red-black implementation). For the purposes of this discussion, the most interesting feature is that this is a pure Scala trait. All
val declarations are abstract. This trait, by itself, cannot function without a subclass to eventually perform dependency injection. However, this abstraction allows the trait to be inherited freely – any programmer can inherit from this trait and get a basic Red-Black balanced tree for (nearly) free, as long as a few basic principles are adhered to for proper dependency injection.
Another detail to call out is the abstraction of the usual
key with a
Data element. This element represents any node payload that is moved around as a unit during tree structure manipulations, such as balancing pivots. In the case of a map-like subclass,
Data is extended to include a
value field as well as a
The other noteworthy detail is the abstract definition
def iNode(color: Color, d: Data[K], lsub: Node[K], rsub: Node[K]): INode[K] - this is the function called to create any new tree node. In fact, this function, when eventually instantiated, is what performs dependency injection of other tree node fields.
Node Inheritance Example: NodeMap[K,V]
A relatively simple example of node inheritance is hopefully instructive. Here is the definition for tree nodes supporting a key/value map:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
Note that in this case very little is added to the red/black functionality already provided by
DataMap[K,V] trait is defined to add a
value field in addition to the
key, and the internal node
INodeMap[K,V] refines the type of its
data field to be
DataMap[K,V]. The semantics is little more than “tree nodes now carry a value in addition to a key.”
A tree node trait inherits from its own parent class and the corresponding traits for any mixed-in functionality. So for example
INodeMap[K,V] inherits from
NodeMap[K,V] but also
Collection Trait Example: OrderedMapLike[K,V,IN,M]
Continuing with the ordered map example, here is the definition of the collection trait for an ordered map:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22
You can see that this trait supplies collection API methods that a Scala programmer will recognize as being standard for any map-like collection. Note that this trait also inherits other standard methods from
OrderedLike[K,IN,M] (common to both sets and maps) and also inherits from
NodeMap[K,V]: In other words, a collection is effectively yet another kind of tree node, with additional collection API methods mixed in. Note also the use of “self types” (the type parameter
M), which allows the collection to return objects of its own kind. This is crucial for allowing operations like data insertion to return an object that also supports node insertion, and to maintain consistency of type across operations. The collection type is properly “closed” with respect to its own operations.
Collection Example: OrderedMap[K,V]
To conclude the ordered map example, consider the task of defining a concrete instantiation of an ordered map:
1 2 3 4 5 6
You can see that (aside from a convenience override of
toString) the trait
OrderedMap[K,V] is nothing more than a vehicle for instantiating a particular concrete
OrderedMapLike[K,V,IN,M] subtype, with particular concrete types for internal node (
INodeMap[K,V]) and its own self-type.
Things become a little more interesting inside the companion object
1 2 3 4 5 6
Note that the object returned by the factory method is upcast to
OrderedMap[K,V], but in fact has the more complicated type:
InjectMap[K,V] with LNodeMap[K,V] with OrderedMap[K,V]. There are a couple things going on here. The trait
LNodeMap[K,V] ensures that the new object is in particular a leaf node, which embodies a new empty tree in the Red-Black tree system.
InjectMap[K,V] has an even more interesting purpose. Here is its definition:
1 2 3 4 5 6 7 8 9 10
Firstly, note that it is a bona fide class, as opposed to a trait. This class is where, finally, all things abstract are made real – “dependency injection” in the parlance of Scala idioms. You can see that it defines the implementation of abstract method
iNode, and that it does this by returning yet another
InjectMap[K,V] object, mixed with both
OrderedMap[K,V], thus maintaining closure with respect to all three slices of functionality: dependency injection, the proper type of internal node, and map collection methods.
The various abstract
rsub are all given concrete values inside of
iNode. Here is where the value of concrete “reference” implementations manifests. Any fields in the relevant internal-node type must be instantiated here, and the logic of instantiation cannot be inherited while still preserving the ability to mix abstract traits. Therefore, any programmer wishing to create a new concrete sub-class must replicate the logic for instantiating all inherited in an internal node.
Another example makes the implications more clear. Here is the definition of injection for a collection that mixes in all three traits for incrementable values, nearest-key queries, and prefix-sum queries:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
Here you can see that all logic for both “basic” internal nodes and also for maintaining prefix sums, and key min/max information for nearest-entry queries, must be supplied. If there is a singularity in this design here is where it is. The saving grace is that it is localized into a single well defined place, and any logic can be transcribed from a proper reference implementation of whatever traits are being mixed.
Finale: Trait Mixing
I will conclude by showing the code for mixing tree node traits and collection traits, which is elegant. Here are type definitions for tree nodes and collection traits that inherit from incrementable values, nearest-key queries, and prefix-sum queries, and there is almost no code except the proper inheritances:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26