A hash-trie data structure for use inside other data structures. More...

Source file:

/Core/Loyc.Collections/Sets/InternalSet.cs

Inheritance diagram for Loyc.Collections.Impl.InternalSet< T >:

Remarks

A hash-trie data structure for use inside other data structures.

InternalSet<T> is a dual-mode mutable/immutable "hash trie", which is a kind of tree that is built from the hashcode of the items it contains. It supports fast cloning, and is suitable as a persistent data structure.

InternalSet<T> is not designed to be used by itself, but as a building block for other data structures. It has no Count property because it does not know its own size; the outer data structure must track the size if the size is needed. The lack of a Count property allows an empty InternalSet to use a mere single word of memory!

This is my second implementation of InternalSet. The original version used memory very efficiently for reference types, but required boxing for value types; this version needs more memory, but is moderately faster in most cases and supports value types without boxing. I estimate that InternalSet (the second version) uses roughly the same amount of memory as HashSet<T> (actually more or less depending on the number of items in the set, and on the hashcode distribution.)

Collection classes based on InternalSet are most efficient for small sets, but if you always need small sets then a simple wrapper around HashSet would suffice. In fact, despite my best efforts, this data type rarely outperforms HashSet, but this is because HashSet<T> is quite fast, not because InternalSet<T> is slow. Still, there are several reasons to consider using a collection class based on InternalSet instead of HashSet<T>:

All of my set collections offer read-only variants. You can instantly convert any mutable set or dictionary into an immutable one, and convert any immutable set or dictionary back into a mutable one in O(1) time; this relies on the same fast-cloning technique I developed for AList<T>*.
All of my set collections offer set operators that combine or intersect two sets without modifying the source sets ("|" for union, "&" for intersection, "-" for subtraction); these operators are available on both the mutable and immutable versions of the sets.
InternalSet<T> supports combined "get-and-replace" and "get-and-remove" operations, which is mainly useful when it is being used as a dictionary (i.e. when T is a key-value pair). There are two "Add" modes, "add if not present" and "add or replace"; both modes retrieve the existing value if the key was already present, and the "add or replace" mode furthermore changes the value. Also, when removing an item, you can get the value that was removed.
InternalSet<T>'s enumerator allows you to change or delete the current value (this feature is used internally by set operations such as UnionWith and IntersectWith).
InternalSet<T> was inspired by Clojure's PersistentHashMap, or rather by Karl Krukow's blog posts about PersistentHashMap**, and so it is designed so that you can use it as a fully persistent set, which means that you can keep a copy of every old version of the set that has ever existed, if you want. The "+" and "-" operators (provided on the wrapper classes, not on InternalSet<T> itself) allow you to add or remove a single item without modifying the original set. There is a substantial performance penalty for overusing these operators, but these operators are cheaper than duplicating a HashSet<T> every time you modify it.

After developing AList<T> and Loyc trees, I realized that freezable classes are error-prone, because it is sometimes difficult for a developer to figure out (before run-time) whether a given object could be frozen. If an object is frozen and you modify it, the compiler will never detect your mistake in advance and warn you. The collections based on InternalSet<T> fix this problem by having separate data types for frozen and unfrozen (a.k.a. immutable and mutable) collections.
** http://blog.higher-order.net/2010/08/16/assoc-and-clojures-persistenthashmap-part-ii/

InternalSet is not efficient for Ts that are expensive to compare; unlike standard .NET collections, this data structure does not store the hashcode of each item inside the collection. The memory saved by not storing the hashcode compensates for the extra memory that InternalSet tends to require due to its structure.

As I was saying, this data structure is inspired by Clojure's PersistentHashMap. Whereas PersistentHashMap uses nodes of size 32, I chose to use nodes of size 16 in order to increase space efficiency for small sets; for some reason I tend to design programs that use many small collections and a few big ones, so I tend to prefer designs that stay efficient at small sizes.

So InternalSet is a tree of nodes, with each level of the tree representing 4 bits of the hashcode. Slots in the root node are selected based on bits 0 to 3 of the hashcode, slots in children of the root are selected based on bits 4 to 7 of the hashcode, and so forth. Here's a diagram:

                             _root*
* IsFrozen=true                |
                               |
      +---------+---------+----+----+---------+---------+
      |         |         |         |         |         |
     0x2       0x3       0x6       0x7       0x9       0xF
                |                   |         |
             +--+--+                |      +--+--+
             |     |                |      |     |
           0x13   0x73             0x57  0x09   0x59

Each of the 12 nodes on this diagram has 16 slots for items of type T, and the 4 nodes that have children have 16 additional slots for references to children. The numbers on the nodes represent their role in the tree; for example:

0x59 is at depth 2 and only holds items whose hashcodes end with 0x59.
0x9 is at depth 1 and only holds items whose hashcodes end with 0x9.
the root node is always at depth 0 and can hold any item regardless of hashcode.

Technically, this data structure has O(log N) time complexity for search, insertion and removal. However, it's a base-16 logarithm and maxes out at 8 levels, so it is faster than typical O(log N) algorithms that are base-2. At smaller sizes, its speed is similar to a conventional hashtable, and some operations are still efficient at large sizes, too.

Unlike InternalList<T>, new InternalSet<T>() is a valid empty set. Moreover, because the root node is never changed after it is created (unless you modify it while it is frozen), all copies of an InternalSet<T> represent the same set unless the set is frozen with CloneFreeze; see Thaw() for more information.

The neatest feature of this data structure is fast cloning and subtree sharing. You can call CloneFreeze to freeze/clone the trie in O(1) time; this freezes the root node (a transitive property that implicitly affects all children), but still permits the hashtrie to be modified by copying nodes on-demand. Thus the trie is actually frozen, but copy-on-write behavior provides the illusion that it is still editable.

This data structure is designed to support classes that contain mutable data, so that it can be used to construct dictionaries; that is, it allows T values that have an immutable "key" part and a mutable "value" part. Call Find to retrieve the value associated with a key, and call Add with replaceIfPresent=true to change the "value" associated with a key. The Map<K,V> and MMap<K,V> classes rely on this feature to implement a dictionary.

How it works: I call this data structure a "hash-trie" because it blends properties of hashtables and tries. It places items into a tree by taking their hashcode and dividing it into 8 groups of 4 bits, starting at the least significant bits. Each group of 4 bits is used to select a location in the tree/trie, and each node of the tree always has 16 items (and 16 children, if it has any children at all.) For example, consider a tree with 7 items that have the following hash codes:

J: 0x89BC98B1
K: 0xB173A12C
L: 0x20913491
M: 0x1977FEB3
N: 0x01299451
O: 0x0732AF01
P: 0x0732AF01 (Note: O.Equals(P)==false, but the hashcodes are equal)

The top level of the trie represents the lowest 4 bits of the hashcode. Since each node has 16 items, 7 items can usually fit in a single node, but in this case there are too many hashcodes that end with "1", causing a node split:

                           |0|1|2|3|4|5|6|7|8|9|A|B|C|D|E|F|
       _root ==> _items    | |!|!|M|!| | | | | | | |K| | | |
                 _children | |*| | | | | | | | | | | | | | |

                           |0|1|2|3|4|5|6|7|8|9|A|B|C|D|E|F|
* child node ==> _items    |O|P| | | |N| | | |L| |J| | | | |
                 _children (null)

("!" represents the deleted flag, which indicates that an item was 
 once present at this location.)

The second level of the trie represents bits 4-7, which is the second- last hex digit. You can see, for example, that the second-last digit of N is 5, therefore N is stored at index 5 of the child node.

In case hashcodes of different objects collide at a particular digit, adjacent array elements can be used to hold the different objects that share the same 4-bit sub-hashcode; this is a bounded-time variation on the linearly-probed hashtable. In this example, both O and P have zero as their second-last digit. Assuming O is added first, it takes slot [0]; then P takes slot [1]. Up to 3 adjacent entries can be used for a given hashcode; therefore, when searching for an entry it is necessary to search up to 4 locations in each node: the preferred location, plus 3 adjacent locations.

For example, support we search for an item X that is not in the set and has hashcode 0xCCA9A241. In that case, the Find methods starts with the least-significant digit, 1. This points us to the child slot; an invariant of our hashtrie is that if there is a child node, all items with the corresponding sub-hashcode must be placed in the child node. Therefore it is impossible, for example, that X could be located at index 2 of the root node; the existence of the child node guarantees that it is not there. So the Find method looks inside the child node, at index 4 (the second-last digit of X's hashcode) and finds nothing. It also looks at indexes 5, 6, and 7, comparing N to X in the process. Since none of these slots contain X, the Find method returns false.

Something unfortunate happens if five or more objects have the same hashcode: it forces the tree to have maximum depth. Since a particular hashcode can only be repeated four times in a single node, upon adding a fifth item with the same hashcode, child nodes are created for all 8 digits of the hashcode. At the 8th level, a special node type is allocated that contains, in addition to the usual 16 slots, a list of "overflow slots" holds items that cannot fit in the normal slots due to excessive collisions. All of this has a substantial memory penalty; to avoid this problem, use a better hash function that does not create false collisions.

If there are more than 16 items that share the same 28 lower-order bits, the overflow area on the 8th level node will expand to hold all of these items; this is the only way that a node can have more than 16 items.

Fast cloning works by setting the "IsFrozen" flag on the root node. When a node is frozen, all its children are frozen implicitly; since the children are not marked right away, the CloneFreeze method can return immediately. The frozen flag will be propagated from parents to children lazily, when the tree is modified later.

To "thaw" a node, a copy is made of that node and all of its parents. For example, suppose that the following tree is frozen and cloned:

                             _root*
* IsFrozen=true                |
                               |
      +---------+---------+----+----+---------+---------+
      |         |         |         |         |         |
     0x2       0x3       0x6       0x7       0x9       0xF
                |                   |         |
             +--+--+                |      +--+--+
             |     |                |      |     |
           0x13   0x73             0x57  0x09   0x59

Remember, only the root's IsFrozen flag is set at first; all other nodes do not have the frozen flag yet.

Now suppose that an item is added to node 0x9 (e.g. something with hashcode 0x39 could go in this node). Before the new item can be placed in node 0x9, it must be thawed. To thaw it, an unfrozen copy is made, leaving the original untouched. The copy is not frozen, but it does point to the same frozen children (0x09 and 0x59), so a for-loop sets the IsFrozen flag of each child. Then, the new item is added to the copy of node 0x9. Next, the _root is also unfrozen by making a copy of it with IsFrozen=false. Again, a for-loop sets the IsFrozen flag of each frozen child, and then child slot [9] in the root is replaced with the new copy of 0x9 (which has the new item).

This concludes the thawing process. So at this point, just two nodes are actually unfrozen, and the modified tree looks like this:

! Unfrozen copy              _root!
* IsFrozen=true                |
                               |
      +---------+---------+----+----+---------+---------+
      |         |         |         |         |         |
     0x2*      0x3*      0x6*      0x7*      0x9!      0xF*
                |                   |         |
             +--+--+                |      +--+--+
             |     |                |      |     |
           0x13   0x73             0x57  0x09*  0x59*

There are 12 nodes here and 2 have been copied. The other 10 nodes are still shared between the modified tree and the clone. Next, if you add an item to node 0x6, only that one node has to be thawed; the root has already been thawed and there is no need to make another copy of it. Due to the random nature of hashcodes, it is probable that as you modify the set after cloning it, it is typical for each modification to require approximately one node to be thawed, until the majority of the nodes have been thawed.

InternalSet does not thaw unnecessarily. If you try to remove an item that is not present, none of the tree will be thawed. If you add an item that is already present in a frozen node (and you do not ask for replacement), that node will not be thawed. Contains and Find never cause thawing.

I am not aware whether a data structure quite like this has been described in the comp-sci literature or not (although it probably has). If you see something like this in a paper, let me know.

When attempting to insert a new item in a node, the first available empty slot will be used; and when searching for an item, the search stops at an empty slot. For example, suppose that the root node contains these items:

                    |0|1|2|3|4|5|6|7|8|9|A|B|C|D|E|F|
_root ==> _items    |A| |C| |E|F| | |I| |K|L| |N| | |

Now suppose that you are searching for, or adding, or an item 'D' whose hashcode ends with '3'. Slot 3 is empty, and this data structure works in such a way that the search for 'D' can end immediately with a result of 'false', or it can be added at slot 2 immediately without comparing 'D' with slots 4, 5 and 6 which (if 2 were not empty) might already contain 'D'.

The reasoning behind this rule is that if 'D' already existed in the set, slot 2 should not be empty; since it is empty, 'D' must not be in the set already. However, deletions could violate this logic. For example, imagine that we add two items, first 'd' and then 'D', which both have a hashcode that ends in '3'. Then the node would look like this:

                    |0|1|2|3|4|5|6|7|8|9|A|B|C|D|E|F|
_root ==> _items    |A| |C|d|E|F|D| |I| |K|L| |N| | |

Next, you delete 'd'. Imagine that this leaves the node in the following state:

                    |0|1|2|3|4|5|6|7|8|9|A|B|C|D|E|F|
_root ==> _items    |A| |C| |E|F|D| |I| |K|L| |N| | |

Now 'D' is left outside its 'home' location of 3. If you then attempt to add 'D' to the set, a duplicate copy would be added at position '3'! Or if you search for 'D' instead, the result would be 'false' even though D is present in the set.

I thought of two solutions to this problem; the first was to 'fix' the node after a deletion so that 'D' would move from slot 6 to 3. But there's a big problem with this solution because InternalSet<T>.Enumerator has a RemoveCurrent() method which is supposed to delete the current item and move to the next one. If the node had to be rearranged in response to a deletion, it would be very difficult to guarantee that the enumerator still returns each item in the set exactly once.

The second solution, which I actually implemented, puts a special "deleted" marker in slot 3 (denoted ! on the first diagram). This marker forces the search routine to compare the item being added or searched for with other slots beyond the current one, but otherwise it behaves like an empty slot.

There is a third solution–always check all four possible slots. But the comparison is not always cheap, so InternalSet<T> does not use this solution unless you are using null as the value of the IEqualityComparer<T>.

Since InternalSet<T> can hold any value of type T, the "deleted" and "empty/in use" indicators cannot physically be stored in the slots of type T. Instead, these indicators are stored separately, with 16 bits for "deleted" flags and 16 bits for "used" flags.

During a normal delete operation, if a node has no children and is using only one or two slots after an item is deleted, the parent is checked for empty slots to find out whether the child is really necessary. If there are enough free slot(s) in the parent node, the remaining items in the child are transferred back back to the parent and the child is deleted (the reference to it is cleared to null).

Unfortunately, this behavior is not available when you call Enumerator.RemoveCurrent. In order to maintain the integrity of the enumerator, a child node will not be deleted during a call to RemoveCurrent unless the node is completely empty after the removal. Consequently, the tree will use extra memory if you remove most, but not all, items from the set using RemoveCurrent.

By the way, unlike the original implementation, this version of InternalSet allows 'null' to be a member of the set.

Interesting fact: it is possible for two sets to be equal (contain the same items), and yet for those items to be enumerated in different orders in the two sets.

Nested classes
struct	Enumerator

Public fields
Node	_root

Public static fields
static readonly InternalSet< T >	Empty = new InternalSet<T> { _root = FrozenEmptyRoot() }
	An empty set. More...

static readonly IEqualityComparer< T >	DefaultComparer = typeof(IReferenceEquatable).IsAssignableFrom(typeof(T)) ? null : EqualityComparer<T>.Default
	This is EqualityComparer<T>.Default, or null if T implements IReferenceEquatable. More...

const int	BitsPerLevel = 4

const int	FanOut = 1 << BitsPerLevel

const int	Mask = FanOut - 1

const int	MaxDepth = 7

const uint	FlagMask = (uint)((1L << FanOut) - 1)

const int	CounterPerChild = FanOut << 1

const short	OverflowFlag = 1 << 12

static readonly OnFoundExisting	AddIfNotPresent = _IgnoreExisting_

static readonly OnFoundExisting	AddOrReplace = _ReplaceExisting_

static readonly OnFoundExisting	RemoveMode = _DeleteExisting_

static Enumerator	_setOperationEnumerator

Properties
bool	IsRootFrozen `[get]`

bool	HasRoot `[get]`

Public Member Functions
	InternalSet (IEnumerable< T > list, IEqualityComparer< T > comparer, out int count)

	InternalSet (IEnumerable< T > list, IEqualityComparer< T > comparer)

int	GetSetHashCode (IEqualityComparer< T > comparer)

InternalSet< T >	CloneFreeze ()
	Freezes the hashtrie so that any further changes require paths in the tree to be copied. More...

void	Thaw ()
	Thaws a frozen root node by duplicating it, or creates the root node if the set doesn't have one. More...

bool	Add (ref T item, IEqualityComparer< T > comparer, bool replaceIfPresent)
	Tries to add an item to the set, and retrieves the existing item if present. More...

bool	Remove (ref T item, IEqualityComparer< T > comparer)
	Removes an item from the set. More...

delegate bool	OnFoundExisting (ref Node slots, int i, T item)

bool	Find (ref T item, IEqualityComparer< T > comparer)

Enumerator	GetEnumerator ()

IEnumerator< T > IEnumerable< T >.	GetEnumerator ()

System.Collections.IEnumerator System.Collections.IEnumerable.	GetEnumerator ()

void	CopyTo (T[] array, int arrayIndex)

void	Clear ()

bool	Contains (T item, IEqualityComparer< T > comparer)

int	Count ()

int	UnionWith (IEnumerable< T > other, IEqualityComparer< T > thisComparer, bool replaceIfPresent)
	Adds the contents of 'other' to this set. More...

int	UnionWith (InternalSet< T > other, IEqualityComparer< T > thisComparer, bool replaceIfPresent)

int	IntersectWith (InternalSet< T > other, IEqualityComparer< T > otherComparer)
	Removes all items from this set that are not present in 'other'. More...

int	IntersectWith (ISet< T > other)

int	IntersectWith (IEnumerable< T > other, IEqualityComparer< T > comparer)
	Removes all items from this set that are not present in 'other'. More...

int	ExceptWith (IEnumerable< T > other, IEqualityComparer< T > thisComparer)
	Removes all items from this set that are present in 'other'. More...

int	ExceptWith (InternalSet< T > other, IEqualityComparer< T > thisComparer)

int	SymmetricExceptWith (InternalSet< T > other, IEqualityComparer< T > thisComparer)

int	SymmetricExceptWith (IEnumerable< T > other, IEqualityComparer< T > comparer, bool xorDuplicates=true)
	Modifies the current set to contain only elements that were present either in this set or in the other collection, but not both. More...

bool	IsSubsetOf (ISet< T > other, int myMinCount)
	Returns true if all items in this set are present in the other set. More...

bool	IsSubsetOf (InternalSet< T > other, IEqualityComparer< T > otherComparer)

bool	IsSubsetOf (IEnumerable< T > other, IEqualityComparer< T > comparer, int myMinCount=0)

bool	IsSupersetOf (IEnumerable< T > other, IEqualityComparer< T > thisComparer, int myMaxCount=int.MaxValue)
	Returns true if all items in the other set are present in this set. More...

bool	IsSupersetOf (InternalSet< T > other, IEqualityComparer< T > thisComparer)

bool	Overlaps (IEnumerable< T > other, IEqualityComparer< T > thisComparer)
	Returns true if this set contains at least one item from 'other'. More...

bool	Overlaps (InternalSet< T > other, IEqualityComparer< T > thisComparer)

bool	IsProperSubsetOf (ISet< T > other, int myExactCount)
	Returns true if all items in this set are present in the other set, and the other set has at least one item that is not in this set. More...

bool	IsProperSubsetOf (IEnumerable< T > other, IEqualityComparer< T > comparer, int myExactCount)
	Returns true if all items in this set are present in the other set, and the other set has at least one item that is not in this set. More...

bool	IsProperSupersetOf (ISet< T > other, IEqualityComparer< T > thisComparer, int myExactCount)
	Returns true if all items in the other set are present in this set, and this set has at least one item that is not in the other set. More...

bool	IsProperSupersetOf (IEnumerable< T > other, IEqualityComparer< T > comparer, int myExactCount)
	Returns true if all items in the other set are present in this set, and this set has at least one item that is not in the other set. More...

bool	SetEquals (ISet< T > other, int myExactCount)
	Returns true if this set and the other set have the same items. More...

bool	SetEquals (IEnumerable< T > other, IEqualityComparer< T > comparer, int myExactCount)
	Returns true if this set and the other set have the same items. More...

int	CountMemory (int sizeOfT)
	Measures the total size of all objects allocated to this collection, in bytes, including the size of InternalSet<T> itself (which is one word). More...

int	CountMemory (int sizeOfT, out InternalSetStats stats)
	Measures the total size of all objects allocated to this collection, in bytes, and counts the number of nodes of different types. More...

Static Public Member Functions
static Node	FrozenEmptyRoot ()

static int	Adj (int i, int n)

static bool	Equals (T value, ref T item, IEqualityComparer< T > comparer)

static uint	GetHashCode (T item, IEqualityComparer< T > comparer)

static void	PropagateFrozenFlag (Node parent, Node child)

static void	ReplaceChild (ref Node slots, int iHome, Node newChild)

static bool	TryRemoveChild (ref Node slots, int iHome, Node child)

static bool	_IgnoreExisting_ (ref Node slots, int i, T item)

static bool	_ReplaceExisting_ (ref Node slots, int i, T item)

static bool	_DeleteExisting_ (ref Node slots, int i, T item)

static bool	AddOrRemove (ref Node slots, ref T item, uint hc, IEqualityComparer< T > comparer, OnFoundExisting mode)

static bool	OnFoundInOverflow (ref Node slots, int i, ref T item, OnFoundExisting mode, T existing)

static int	SelectBucketToSpill (Node slots, int i0, IEqualityComparer< T > comparer)

static void	Spill (Node parent, int i0, IEqualityComparer< T > comparer)

static Enumerator	SetOperationEnumerator ()

Member Function Documentation

◆ Add()

bool Loyc.Collections.Impl.InternalSet< T >.Add	(	ref T	item,
		IEqualityComparer< T >	comparer,
		bool	replaceIfPresent
	)

inline

Tries to add an item to the set, and retrieves the existing item if present.

Returns: true if the item was added, false if it was already present.

Referenced by Loyc.Collections.Impl.InternalSet< KeyValuePair< K, V > >.IsProperSupersetOf(), Loyc.Collections.Impl.InternalSet< KeyValuePair< K, V > >.SetEquals(), Loyc.Collections.Impl.InternalSet< KeyValuePair< K, V > >.SymmetricExceptWith(), and Loyc.Collections.Impl.InternalSet< KeyValuePair< K, V > >.UnionWith().

◆ CloneFreeze()

InternalSet<T> Loyc.Collections.Impl.InternalSet< T >.CloneFreeze ( )

inline

Freezes the hashtrie so that any further changes require paths in the tree to be copied.

This is an O(1) operation. It causes all existing copies of this InternalSet<T>, as well as any other copies you make in the future, to become independent of one another so that modifications to one copy do not affect any of the others.

To unfreeze the hashtrie, simply modify it as usual with (for example) a call to Add or Remove, or call Thaw. Frozen parts of the trie are copied on-demand.

◆ CountMemory() [1/2]

int Loyc.Collections.Impl.InternalSet< T >.CountMemory ( int sizeOfT )

inline

Measures the total size of all objects allocated to this collection, in bytes, including the size of InternalSet<T> itself (which is one word).

Parameters

sizeOfT Size of each T. C# provides no way to get this number so it must be supplied as a parameter. If T is a reference type such as String, IntPtr.Size tells you the size of each reference; please note that this method is does not look "inside" each T, it just measures the "shallow" size of the collection. For instance, if this is a set of strings, then CountMemory(IntPtr.Size) is the size of the set including the references to the strings, but not including the strings themselves.

Returns

Referenced by Loyc.Collections.Set< Loyc.LLParserGenerator.AndPred >.CountMemory(), and Loyc.Collections.Impl.InternalSet< KeyValuePair< K, V > >.CountMemory().

◆ CountMemory() [2/2]

int Loyc.Collections.Impl.InternalSet< T >.CountMemory	(	int	sizeOfT,
		out InternalSetStats	stats
	)

inline

Measures the total size of all objects allocated to this collection, in bytes, and counts the number of nodes of different types.

◆ ExceptWith()

int Loyc.Collections.Impl.InternalSet< T >.ExceptWith	(	IEnumerable< T >	other,
		IEqualityComparer< T >	thisComparer
	)

inline

Removes all items from this set that are present in 'other'.

Parameters

other	The set whose members should be removed from this set.
thisComparer	The comparer for this set (not for 'other', which is simply enumerated).

Returns: Returns the number of items that were removed.

◆ IntersectWith() [1/2]

int Loyc.Collections.Impl.InternalSet< T >.IntersectWith	(	IEnumerable< T >	other,
		IEqualityComparer< T >	comparer
	)

inline

Removes all items from this set that are not present in 'other'.

Parameters

other The set whose members should be kept in this set.

Returns: Returns the number of items that were removed.

This method is costly if 'other' is not a set; a temporary set will be constructed to answer the query. Also, this overload has the same subtle assumption as the other overload.

◆ IntersectWith() [2/2]

int Loyc.Collections.Impl.InternalSet< T >.IntersectWith	(	InternalSet< T >	other,
		IEqualityComparer< T >	otherComparer
	)

inline

Removes all items from this set that are not present in 'other'.

Parameters

other	The set whose members should be kept in this set.
otherComparer	The comparer for 'other' (not for this set, which is simply enumerated).

Returns: Returns the number of items that were removed from the set.

Referenced by Loyc.Collections.Impl.InternalSet< KeyValuePair< K, V > >.IntersectWith().

◆ IsProperSubsetOf() [1/2]

bool Loyc.Collections.Impl.InternalSet< T >.IsProperSubsetOf	(	IEnumerable< T >	other,
		IEqualityComparer< T >	comparer,
		int	myExactCount
	)

inline

Returns true if all items in this set are present in the other set, and the other set has at least one item that is not in this set.

This method is costly if 'other' is not a set; a temporary set will be constructed to answer the query. Also, this overload has the same subtle assumption as the other overload.

◆ IsProperSubsetOf() [2/2]

bool Loyc.Collections.Impl.InternalSet< T >.IsProperSubsetOf	(	ISet< T >	other,
		int	myExactCount
	)

inline

Returns true if all items in this set are present in the other set, and the other set has at least one item that is not in this set.

This implementation assumes that if the two sets use different definitions of equality (different IEqualityComparer<T>s), that neither set contains duplicates from the point of view of the other set. If this rule is broken–meaning, if either of the sets were constructed with the comparer of the other set, that set would shrink– then the results of this method are unreliable. If both sets use the same comparer, though, you have nothing to worry about.

Referenced by Loyc.Collections.Set< Loyc.LLParserGenerator.AndPred >.IsProperSubsetOf(), and Loyc.Collections.Impl.InternalSet< KeyValuePair< K, V > >.IsProperSubsetOf().

◆ IsProperSupersetOf() [1/2]

bool Loyc.Collections.Impl.InternalSet< T >.IsProperSupersetOf	(	IEnumerable< T >	other,
		IEqualityComparer< T >	comparer,
		int	myExactCount
	)

inline

Returns true if all items in the other set are present in this set, and this set has at least one item that is not in the other set.

This method is costly if 'other' is not a set; a temporary set will be constructed to answer the query. Also, this overload has the same subtle assumption as the other overload.

◆ IsProperSupersetOf() [2/2]

bool Loyc.Collections.Impl.InternalSet< T >.IsProperSupersetOf	(	ISet< T >	other,
		IEqualityComparer< T >	thisComparer,
		int	myExactCount
	)

inline

Returns true if all items in the other set are present in this set, and this set has at least one item that is not in the other set.

This implementation assumes that if the two sets use different definitions of equality (different IEqualityComparer<T>s), that neither set contains duplicates from the point of view of the other set. If this rule is broken–meaning, if either of the sets were constructed with the comparer of the other set, that set would shrink– then the results of this method are unreliable. If both sets use the same comparer, though, you have nothing to worry about.

Referenced by Loyc.Collections.Set< Loyc.LLParserGenerator.AndPred >.IsProperSupersetOf(), and Loyc.Collections.Impl.InternalSet< KeyValuePair< K, V > >.IsProperSupersetOf().

◆ IsSubsetOf()

bool Loyc.Collections.Impl.InternalSet< T >.IsSubsetOf	(	ISet< T >	other,
		int	myMinCount
	)

inline

Returns true if all items in this set are present in the other set.

Parameters

myMinCount Specifies the minimum number of items that this set contains (use 0 if unknown)

Referenced by Loyc.Collections.Impl.InternalSet< KeyValuePair< K, V > >.IsProperSubsetOf(), Loyc.Collections.Set< Loyc.LLParserGenerator.AndPred >.IsSubsetOf(), and Loyc.Collections.Impl.InternalSet< KeyValuePair< K, V > >.SetEquals().

◆ IsSupersetOf()

bool Loyc.Collections.Impl.InternalSet< T >.IsSupersetOf	(	IEnumerable< T >	other,
		IEqualityComparer< T >	thisComparer,
		int	myMaxCount = `int.MaxValue`
	)

inline

Returns true if all items in the other set are present in this set.

Referenced by Loyc.Collections.Impl.InternalSet< KeyValuePair< K, V > >.IsProperSupersetOf(), and Loyc.Collections.Set< Loyc.LLParserGenerator.AndPred >.IsSupersetOf().

◆ Overlaps()

bool Loyc.Collections.Impl.InternalSet< T >.Overlaps	(	IEnumerable< T >	other,
		IEqualityComparer< T >	thisComparer
	)

inline

Returns true if this set contains at least one item from 'other'.

Referenced by Loyc.Collections.Set< Loyc.LLParserGenerator.AndPred >.Overlaps().

◆ Remove()

bool Loyc.Collections.Impl.InternalSet< T >.Remove	(	ref T	item,
		IEqualityComparer< T >	comparer
	)

inline

Removes an item from the set.

Returns: true if the item was removed, false if it was not found.

Referenced by Loyc.Collections.Impl.InternalSet< KeyValuePair< K, V > >.ExceptWith(), and Loyc.Collections.Impl.InternalSet< KeyValuePair< K, V > >.SymmetricExceptWith().

◆ SetEquals() [1/2]

bool Loyc.Collections.Impl.InternalSet< T >.SetEquals	(	IEnumerable< T >	other,
		IEqualityComparer< T >	comparer,
		int	myExactCount
	)

inline

Returns true if this set and the other set have the same items.

This method is costly if 'other' is not a set; a temporary set will be constructed to answer the query. Also, this overload has the same subtle assumption as the other overload.

◆ SetEquals() [2/2]

bool Loyc.Collections.Impl.InternalSet< T >.SetEquals	(	ISet< T >	other,
		int	myExactCount
	)

inline

Returns true if this set and the other set have the same items.

This implementation assumes that if the two sets use different definitions of equality (different IEqualityComparer<T>s), that neither set contains duplicates from the point of view of the other set. If this rule is broken–meaning, if either of the sets were constructed with the comparer of the other set, that set would shrink– then the results of this method are unreliable. If both sets use the same comparer, though, you have nothing to worry about.

Referenced by Loyc.Collections.Set< Loyc.LLParserGenerator.AndPred >.SetEquals().

◆ SymmetricExceptWith()

int Loyc.Collections.Impl.InternalSet< T >.SymmetricExceptWith	(	IEnumerable< T >	other,
		IEqualityComparer< T >	comparer,
		bool	xorDuplicates = `true`
	)

inline

Modifies the current set to contain only elements that were present either in this set or in the other collection, but not both.

Parameters

xorDuplicates Controls this function's behavior in case 'other' contains duplicates. If xorDuplicates is true, an even number of duplicates has no overall effect and an odd number is treated the same as if there were a single instance of the item. Setting xorDuplicates to false is costly, since a temporary set is constructed in order to eliminate any duplicates. The same comparer is used for the temporary set as for this set.

Returns the change in set size (positive if items were added, negative if items were removed)

◆ Thaw()

void Loyc.Collections.Impl.InternalSet< T >.Thaw ( )

inline

Thaws a frozen root node by duplicating it, or creates the root node if the set doesn't have one.

Since InternalSet<T> is a structure rather than a class, it's not immediately obvious what the happens when you copy it with the '=' operator. The InternalList<T> structure, for example, it is unsafe to copy (in general) because as the list length changes, the two (or more) copies immediately go "out of sync" because each copy has a separate Count property and a separate array pointer–and yet they will share the same array, at least temporarily, which can produce strange results.

It is mostly safe to copy InternalSet instances, however, because they only contain a single piece of data (a reference to the root node), and the root node only changes in two situations:

When the root node is null and you call Add or this method
When the root node is frozen and you modify the set or call this method

In the second case, when you have frozen a set with CloneFreeze(), all existing copies are frozen, and further changes affect only the specific copy that you change. You can also call Thaw() if you need to make copies that are kept in sync, without actually modifying the set first.

This method has no effect if the root node is already thawed.

◆ UnionWith()

int Loyc.Collections.Impl.InternalSet< T >.UnionWith	(	IEnumerable< T >	other,
		IEqualityComparer< T >	thisComparer,
		bool	replaceIfPresent
	)

inline

Adds the contents of 'other' to this set.

Parameters

thisComparer	The comparer for this set (not for 'other', which is simply enumerated).
replaceIfPresent	If items in 'other' match items in this set, this flag causes those items in 'other' to replace the items in this set.

Member Data Documentation

◆ DefaultComparer

readonly IEqualityComparer<T> Loyc.Collections.Impl.InternalSet< T >.DefaultComparer = typeof(IReferenceEquatable).IsAssignableFrom(typeof(T)) ? null : EqualityComparer<T>.Default

static

This is EqualityComparer<T>.Default, or null if T implements IReferenceEquatable.

◆ Empty

readonly InternalSet<T> Loyc.Collections.Impl.InternalSet< T >.Empty = new InternalSet<T> { _root = FrozenEmptyRoot() }

static

An empty set.

This property comes with a frozen, empty root node, which Set<T> uses as an "initialized" flag.

Remarks

Nested classes

Public fields

Public static fields

Properties

Public Member Functions

Static Public Member Functions

Member Function Documentation

◆ Add()

◆ CloneFreeze()

◆ CountMemory() [1/2]

◆ CountMemory() [2/2]

◆ ExceptWith()

◆ IntersectWith() [1/2]

◆ IntersectWith() [2/2]

◆ IsProperSubsetOf() [1/2]

◆ IsProperSubsetOf() [2/2]

◆ IsProperSupersetOf() [1/2]

◆ IsProperSupersetOf() [2/2]

◆ IsSubsetOf()

◆ IsSupersetOf()

◆ Overlaps()

◆ Remove()

◆ SetEquals() [1/2]

◆ SetEquals() [2/2]

◆ SymmetricExceptWith()

◆ Thaw()

◆ UnionWith()

Member Data Documentation

◆ DefaultComparer

◆ Empty