C# Gets Pattern Matching, Algebraic Data Types, Tuples and Ranges
7 Mar 2016March 7, 2016
Introduction
This article will teach you how to use some of the “functional” features of the open-source Enhanced C# + LeMP project: pattern matching, algebraic data types, tuples and numeric ranges. To download the Visual Studio extension or learn more about LeMP, visit the LeMP home page.
Pattern matching!
There is a code pattern that pops up occasionally: “get an object from somewhere, see if it has type X, and if so, get/query its properties”. This can get tedious very quickly:
var obj = connection.DownloadNextObject();
if (obj is StatusReport)
{
StatusReport report = (StatusReport)obj;
if (report.IsValid) {
SaveReport(report);
}
}
else if (obj is DataPacket)
{
DataPacket packet = (DataPacket)obj;
DoStuffWith(packet);
}
I bet you’ve written code like this. Some of you write this style of code a lot. But now LeMP has a shortcut for patterns like these: it’s called match
. It’s like switch
, but for “pattern matching”. With match
, the code above becomes simply
match (connection.DownloadNextObject()) {
case $report is StatusReport(IsValid: true):
SaveReport(report);
case $packet is DataPacket:
DoStuffWith(packet);
}
easy, right? LeMP will translate this to C# code like
do {
var tmp_1 = connection.DownloadNextObject();
if (tmp_1 is StatusReport) {
StatusReport report = (StatusReport) tmp_1;
if (true.Equals(report.IsValid)) {
SaveReport(report);
break;
}
}
if (tmp_1 is DataPacket) {
DataPacket packet = (DataPacket) tmp_1;
DoStuffWith(packet);
break;
}
} while (false);
match
allows the case blocks to use break
to exit each branch, but unlike switch
it does not require the break
statements. match
wraps its output in do...while(false)
in case your case
-block includes a break
statement (and, well, it adds a break
if you don’t). Note that the logic here isn’t quite the same as the original if-else
code: match
behaves like a big “if-else” chain, so if the first case
doesn’t match, it always tries the second one. So if it’s a StatusReport
but IsValid
is false, it’ll go on and check whether it’s a DataPacket
(this may sound like a dumb thing to do, but think about it: it is possible that the object is both a StatusReport
and a DataPacket
. And if that’s impossible, maybe if we’re lucky the compiler will optimize it.)
match
has more features, which we’ll discuss later. But first, a word from its sponsor:
Algebraic Data Types!
Many languages offer data types that consist of a series of alternatives, like this simple representation of a binary search tree in Haskell:
data BinaryTree t = Leaf t
| Node t (BinaryTree t) (BinaryTree t)
This says that a value of the BinaryTree
type is generic (t
is a type parameter, like <T>
in C#), a BinaryTree
value is either a Leaf
(which contains a single value of type t
) or a Node
(which contains a value of type t
and two more values, each of type BinaryTree t
). Haskell folks call this an Algebraic Data Type.
Algebraic Data Types (ADTs) may have a dumb name, but they are also known by another name: disjoint unions.
With LeMP you can write ADTs using alt class
. You will see that C# ADTs these don’t work the same as disjoint unions in other languages; because of their increased flexibility, arguably they shouldn’t be called “disjoint unions” at all. But whatever you call them, they are quite useful, especially when productivity and concise code are your top priorities.
Under LeMP, an ADT is called alt class
and it looks like this:
public abstract alt class BinaryTree<T> where T: IComparable<T>
{
alt Leaf<T>(T Value);
alt Node(T Value, BinaryTree<T> Left, BinaryTree<T> Right);
// (you could also have written `Node<T>`)
}
In fact, this is nothing more or less than a really quick way to produce a class hierarchy of immutable types. The output is fairly massive:
public abstract class BinaryTree<T> where T: IComparable<T>
{
public BinaryTree() { }
}
class Leaf<T> : BinaryTree<T> where T: IComparable<T>
{
public Leaf(T Value)
{
this.Value = Value;
}
public T Value { get; private set; }
public Leaf<T> WithValue(T newValue)
{
return new Leaf<T>(newValue);
}
[System.ComponentModel.EditorBrowsable(System.ComponentModel.EditorBrowsableState.Never)]
public T Item1
{
get { return Value; }
}
}
static partial class Leaf
{
public static Leaf<T> New<T>(T Value) where T: IComparable<T>
{
return new Leaf<T>(Value);
}
}
class Node<T> : BinaryTree<T> where T: IComparable<T>
{
public Node(T Value, BinaryTree<T> Left, BinaryTree<T> Right)
{
this.Value = Value;
this.Left = Left;
this.Right = Right;
}
public T Value { get; private set; }
public BinaryTree<T> Left { get; private set; }
public BinaryTree<T> Right { get; private set; }
public Node<T> WithValue(T newValue)
{
return new Node<T>(newValue, Left, Right);
}
public Node<T> WithLeft(BinaryTree<T> newValue)
{
return new Node<T>(Value, newValue, Right);
}
public Node<T> WithRight(BinaryTree<T> newValue)
{
return new Node<T>(Value, Left, newValue);
}
// Additional code hidden to spare your eyes
}
static partial class Node
{
public static Node<T> New<T>(T Value, BinaryTree<T> Left, BinaryTree<T> Right) where T: IComparable<T>
{
return new Node<T>(Value, Left, Right);
}
}
As you can see, there’s a lot of helper code to make these types easy to use.
First of all, you don’t just get a Leaf<T>
and Node<T>
class, you also get Leaf
and Node
classes with no type parameters. These allow you to create leaves and nodes without mentioning the type T:
void TreeOfThree()
{
var tree = Node.New(42, Leaf.New(17), Leaf.New(99));
}
Note: A New
method is created only when an alt
uses generic type parameters.
You can “modify” individual properties of an ADT with the appropriate “With
” method, like this:
Node<T> node = Node.New(42, Leaf.New(17), Leaf.New(99));
// Use `WithRight` to change the right child to the leaf `101`:
node = node.WithRight(Leaf.New(101));
And of course, you can use match
to learn about your ADT. For example, if the binary tree is sorted, it could be searched like this:
public static bool Contains<T>(BinaryTree<T> tree, T item)
{
T value;
match (tree) {
case is Leaf<T>($value):
return Compare(value, item) == 0;
case is Node<T>($value, $left, $right):
int cmp = Compare(item, value);
if (cmp < 0)
return left != null && Contains(left, item);
else if (cmp > 0)
return right != null && Contains(right, item);
else
return true;
}
}
internal static int Compare<T>(T a, T b) where T:IComparable<T>
{ // It's null's fault that this method exists.
if (a != null)
return a.CompareTo(b);
else if (b != null)
return -b.CompareTo(a);
else
return 0;
}
Notice that this code says Leaf<T>($value)
instead of Leaf<T>(Value: $value)
, and similarly case is Node<T>
does not mention Value
, Left
or Right
. If you leave out the property names, match
will read items “by position” from Item1
, Item2
, etc., and that’s why the generated Leaf<T>
has a public T Item1
property, which is marked with EditorBrowsableState.Never
to hide it from IntelliSense.
But wait, there’s more! In fact, alt class
can do more than ADTs in other languages that support ADTs, because it “accepts” its identity as a class hierarchy instead of pretending to be a traditional mathematical disjoint union: it has the same capabilities as a normal class hierarchy.
For starters, notice that Leaf
and Node
both have a Value
property. We can move the common property into the base class like this:
public partial abstract alt class BinaryTree<T> where T: IComparable<T>
{
alt this(T Value);
alt Leaf();
alt Node(BinaryTree<T> Left, BinaryTree<T> Right);
}
alt this
provides a way to add data to the base class. Plus, if you give it a body in { braces }
, it becomes the code of the constructor.
Now, since Leaf()
contains no additional data beyond what the base class contains, we can go a step further and eliminate it completely, while removing the abstract
attribute from BinaryTree<T>
:
public alt class BinaryTree<T> where T: IComparable<T>
{
alt this(T Value);
alt Node(BinaryTree<T> Left, BinaryTree<T> Right);
}
This does have a disadvantage: leaf nodes are created with BinaryTree.New
instead of Leaf.New
. But it still works.
If we’d like to ensure that Node
is not initialized with two null
children, we can add validation code. This is done by adding a body to Node
, which is treated as a class body, and then adding a “constructor” named alt this
:
public alt class BinaryTree<T> where T: IComparable<T>
{
alt this(T Value);
alt Node(BinaryTree<T> Left, BinaryTree<T> Right)
{
public alt this() {
if (Left == null) throw new ArgumentNullException("Left");
if (Right == null) throw new ArgumentNullException("Right");
}
}
}
Note: admittedly, there is something odd about this: the new this
constructor has its own argument list, even though Node
already had one. As you saw from the first constructor (alt this(T Value)
), constructor arguments in an alt class
create new properties. So you are allowed to place new properties in either the Node
list or the inner this
list; it is recommended not to use both.
Also, instead of using match
, you could implement the Contains
method as a virtual method.
public alt class BinaryTree<T> where T: IComparable<T>
{
alt this(T Value);
alt Node(BinaryTree<T> Left, BinaryTree<T> Right)
{
public alt this() {
if (Left == null && Right == null) throw new ArgumentNullException("Both children");
}
public override bool Contains(T item)
{
int cmp = Compare(item, Value);
if (cmp < 0)
return Left != null && Left.Contains(item);
else if (cmp > 0)
return Right != null && Right.Contains(item);
else
return true;
}
}
public virtual bool Contains(T item)
{
return Compare(Value, item) == 0;
}
internal static int Compare(T a, T b)
{ // It's null's fault that this method exists.
if (a != null)
return a.CompareTo(b);
else if (b != null)
return -b.CompareTo(a);
else
return 0;
}
}
You can also define a multi-level class hierarchy. In the following example, MyTuple<T1,T2>
is derived from MyTuple<T1>
and MyTuple<T1,T2,T3>
is derived from MyTuple<T1,T2>
:
public alt class MyTuple<T1> {
public alt this(T1 Item1);
public alt MyTuple<T1,T2>(T2 Item2) {
public alt MyTuple<T1,T2,T3>(T3 Item3) { }
}
}
A more complex example would be a tree of GUI widget information:
public abstract alt class Widget {
alt this(Rectangle Location) {
if (Location == null) throw new ArgumentNullException("Location");
}
alt Button(string Text) { }
alt TextBox(string Text) { }
abstract alt StringListWidget(string[] subItems) {
alt ComboBox();
alt ListBox();
}
public abstract alt Container() {
alt TabControl(TabPage[] Children);
alt Panel(Widget[] Children) {
alt TabPage(string Title);
}
}
}
Just remember that the nested alt
s are not nested classes; in the output, each new class is brought out to the “top level”. This means, for example, that you should write new ComboBox(location, subItems)
and not new Widget.StringListWidget.ComboBox(location, subItems)
.
Finally, you can use alt class
to produce immutable classes with only one “case”, like this:
public alt class Rectangle {
alt this(int X, int Y, int Width, int Height);
}
Note: as of this writing, alt class
does not support non-public constructors.
Okay, I think we covered everything! I hope you enjoy.
Tuples
Enhanced C# supports tuples, e.g.
var pair = (12, "twelve");
This comes out as
var pair = Tuple.Create(12, "twelve");
Note: Tuple
is a standard class of the .NET framework.
You can also deconstruct tuples, like this:
(var num, var str) = pair;
Output:
var num = pair.Item1;
var str = pair.Item2;
Since ADTs have properties named Item1
, Item2
, etc., you can deconstruct them the same way:
(var x, var y, var w, var h) = new Rectangle(10, 10, 600, 400);
Output:
var tmp_0 = new Rectangle(10, 10, 600, 400);
var x = tmp_0.Item1;
var y = tmp_0.Item2;
var w = tmp_0.Item3;
var h = tmp_0.Item4;
However, EC# does not help you write tuple types. For example, to return (5,"five")
from a function, the return type is Tuple<int, string>
- there is no shortcut.
More Pattern Matching
With tuples
match
, too, supports tuples and other things that have Item1
, Item2
, etc.:
match (rect) {
case ($x, $y, $w, $h):
Console.WriteLine("("+x+","+y+","+w+","+h+")");
}
But this is no different from the tuple deconstruction shown above. It bypasses type checking entirely; the code is simply
do {
var x = rect.Item1;
var y = rect.Item2;
var w = rect.Item3;
var h = rect.Item4;
Console.WriteLine("(" + x + "," + y + "," + w + "," + h + ")");
break;
} while(false);
This means if rect
itself has the wrong type (e.g. it’s a tuple of 3 rather than 4) you’ll get an error from the C# compiler. Remember, you have to use is
to check the data type:
match (rect) {
case is Rectangle($x, $y, $w, $h):
Console.WriteLine("("+x+","+y+","+w+","+h+")");
}
Often, though, it’s useful to use tuple syntax once you know what the type is. For example, after using is Button
, we know we have a Button
widget, we know the first component is a Rectangle
so we can deconstruct it with tuple syntax, like this:
void DrawIfButton(Graphics g, Widget widget) {
match(widget) {
case is Button(($x, $y, $width, $height), $text):
// TODO: draw the button
}
}
Simple matching
What else can you do with pattern matching? For one thing, you can do equality testing and range testing, like this:
static void FavoriteNumberGame()
{
Console.Write("What's your favorite number? ");
match(int.Parse(Console.ReadLine())) {
case 7, 777: Console.WriteLine("You lucky bastard!");
case 5, 10: Console.WriteLine("I have that many fingers too!");
case 0, 1: Console.WriteLine("What? Nobody picks that!");
case 2, 3: Console.WriteLine("Yeah, I guess you deal with those a lot.");
case 12: Console.WriteLine("I prefer a baker's dozen.");
case 666, 13: Console.WriteLine("Isn't that bad luck though?");
case 1..<10: Console.WriteLine("Kind of boring, don't you think?");
case 11, 13, 17, 19, 23, 29: Console.WriteLine("A prime choice.");
case 10...99: Console.WriteLine("I have to admit... it has two digits.");
case _...-1: Console.WriteLine("Oh, don't be so negative.");
default: Console.WriteLine("What are you, high? Like that number?");
}
}
This example illustrates several things.
Single evaluation: as you would expect, match
makes a temporary variable so that Console.ReadLine()
is only called once.
Priority order: Earlier cases are tested before lower cases, so 5
matches case 5, 10
and not case 1..<10
.
Equality testing: You can use not just literals like case 5
but expressions like case (x + y):
. In the output, case 5
becomes if (5.Equals(matchExpr))
. Why is this particular equality test used? Consider the alternatives:
if (matchExpr == 5)
works in fewer cases. Specifically, considermatch((object)5) {case 5:}
. In this case,5.Equals((object)5)
returns true, but(object)5 == 5
is a compile-time error. ThusEquals
is more flexible.if (object.Equals(matchExpr, 5))
would involve boxing and dynamic downcasting.if (matchExpr.Equals(5))
would causeNullReferenceException
in casematchExpr
is null.
The form only changes if you write case null
; this becomes if (matchExpr == null)
. Note: other than the literal null
, which is allowed, it is your responsibility to ensure that the cases themselves do not evaluate to null. For reasons of performance and epistemology, when you use non-literal expressions like case X
, match
does not check whether X
itself is null (in fact, it cannot tell whether X
has a nullable type).
Default: Like switch
, match
can have a default:
, but it must come last. It’s equivalent to case _:
.
Multiple patterns per case
: Enhanced C# allows case
to have multiple separate cases separated by commas, such as 1..10, 20, 30
. Unfortunately, when translating your code to plain C#, it is often impossible for two separate patterns to lead into the same handler, and therefore the output will duplicate the handler. For example, the first case
above is translated as
if (7.Equals(tmp_0)) {
Console.WriteLine("You lucky bastard!");
break;
}
if (777.Equals(tmp_0)) {
Console.WriteLine("You lucky bastard!");
break;
}
In this particular example it is possible to avoid duplicating the Console.WriteLine
statement, but in most nontrivial patterns it is not (at least not without analysis capabilities LeMP doesn’t have), so match
doesn’t even try. Therefore, if possible, avoid writing large code blocks inside a case
with multiple patterns.
Range operators: Enhanced C# defines binary and unary operators named ..<
and ...
, as well as a binary in
operator that is intended to test whether a value is contained in a collection.
- ..<
is the exclusive range operator: it means you want the number on the right side to be excluded from the range. case 1..<10
is translated to something like if (tmp_0.IsInRangeExcludeHi(1, 10))
. IsInRangeExcludeHi
should be an extension method, either one you define yourself or one of the extension methods in Loyc.Essentials.dll. This operator has two names, in fact: you can write 1..10
, as used in Rust, or 1..<10
, as used in Swift. The lexer treats them as the same operator (named ..
).
- ...
is the inclusive range operator: it means you want the number on the right side to be included in the range. For example, case 10...99
is translated to something like if (tmp_0.IsInRange(10, 99))
. (This operator’s name is also three dots in Rust and Swift.)
You can use underscores to express an open-ended range, e.g. case _..-1
. ..<
and ...
also exist as unary operators, so you can write ...-1
instead. However, there is no corresponding suffix operator: you must write 100..._
, not 100...
. All of these are translated to the appropriate binary operator (e.g. _...9
becomes matchExpr <= 9
).
A fancy example
Here’s a much more complicated example that shows most of the features of match
:
match (obj) {
case is Shape(ShapeType.Circle, $size, Location: $p is Point<int>($x, $y) && x > y):
Circle(size, x, y);
}
When I first wrote this article I tried explaining what this does, but the explanation was so long I decided it would be better just to show the output code:
if (obj is Shape) {
Shape tmp_0 = (Shape) obj;
if (ShapeType.Circle.Equals(tmp_0.Item1)) {
var size = tmp_0.Item2;
var tmp_1 = tmp_0.Location;
if (tmp_1 is Point<int>) {
Point<int> p = (Point<int>) tmp_1;
var x = p.Item1;
var y = p.Item2;
if (x > y) {
Circle(size, x, y);
break;
}
}
}
}
You can see several more features in action here:
Unary and binary “is” operators: is Type
is a new operator added to Enhanced C# for the specific purpose of supporting pattern matching. It means “check if the match_expression is Type
and if so, downcast to Type
and make a temporary variable to hold the result”. The binary version of is
allows a few different things on the left-hand side:
$newVar is Type
creates a new variableType newVar
to hold the downcasted valueref var is Type
sets an existing variable calledvar
to the downcasted valuelow..hi is Type
holds the downcasted the value in a temporary variable, then checks if it’s in the specified rangeotherExpr is Type
(whereotherExpr
matches none of the other patterns above) holds the downcasted the value in a temporary variable, then checks ifotherExpr
is equal to it.
Subpatterns in parentheses: After the is
part of the pattern, you can write “inner patterns” or “subpatterns” in parentheses. For example, in case A(B(C), D)
, A
has subpatterns B(C)
and D
, and D
is a subpattern of B
. Each subpattern is treated the same way as the outermost pattern, except that subpatterns can specify a property name (e.g. Location:
) and the outer pattern cannot.
Positional and named properties: In this example, the first two components of Shape
are treated as “positional” properties while the third component is a “named” property (its name is Location
). A named property consists of an identifier followed by colon (:
) such as Location:
. If you don’t provide a name, match
uses a numbered property instead (Item1
, Item2
, etc.). So in this example the Shape.Item1
property is matched against the subpattern ShapeType.Circle
, and Shape.Item2
is matched against $size
.
Please note that you can only name simple properties, not nested properties, methods or indexer properties. For example, you might be tempted to write case is Foo(Bar(): 777)
to find out if the Foo.Bar()
method returns 777
, but this is not allowed because it is a syntax error. However, you can write case $foo is Foo && foo.Bar() == 777
instead.
Variable binding: Use the $
operator to create a new variable and assign it to the value of part of an object. In this case the new size
variable is assigned to the Shape.Item2
property of obj
, the new p
variable is assigned to Shape.Location
, and so forth.
Extra conditions: You can use the &&
operator on the main pattern or subpatterns to add extra conditions to a pattern, e.g. given
case is Size(Width: $w, Height: $h && h > 100) && w > h:
DoSomethingWith(w, h);
The output is something like
if (obj is Size) {
Size tmp_2 = (Size) obj;
var w = tmp_2.Width;
var h = tmp_2.Height;
if (h > 100 && w > h) {
DoSomethingWith(w, h);
break;
}
}
Rough left-to-right evaluation: Patterns are evaluated roughly left-to-right, except that if you’re using a binary is
condition such as $x is Type
, the type test on the right-hand side (obviously) runs before the test or binding on the left-hand side.
Cases using the “in” operator
Earlier you saw that you could write case lo..hi
to find out if a value is within a range. If you want to combine a range test with a variable binding, an equality test, or subpattern matching, you can use the in
operator. Here are some examples:
match(value) {
// Is value a double between 0 and 1 ?
case $newVar is double in 0.0...1.0:
ZeroToOne(newVar);
// This one is tricky! It requires that `coefficient.Equals(value) && value in 0...1`
case coefficient in 0.0...1.0:
ZeroToOne(newVar);
// Due to the precedence rules of EC#, if you combine `in` with
// subpatterns, the subpatterns must come before `in`.
case _ is Point(X: $x, Y: $y) in polygon:
CollisionDetected(x, y);
// However, when you add conditions with `&&`, they still come last.
case is Size(Width: $w, Height: $h) in acceptableSizes && w > h:
SizeIsOK(w, h);
}
Assigning to an existing variable with ref
You can use ref variable
instead of $variable
to assign a value to an existing variable rather than creating a new variable. For consistency with the matchCode
macro introduced in the previous article, $(ref variable)
is also accepted.
Rationales
That reminds me: why do you think the syntax for creating a new variable in case
is $x
instead of, say, var x
? Partially it’s because var x
would, in general, not be permitted by the parser, but it’s also because the matchCode
macro also uses the $x
syntax, and var x
usually isn’t even possible in the context of matchCode
.
It’s also fair to ask why you have to write case $x is Foo
instead of simply case x is Foo
. In fact, initially I did support the latter syntax (because Rust works similarly), but soon afterward I decided to drop support. There are three reasons:
$x
is easier to spot, so people reading the code can more easily see the point wherex
is created.$x
is consistent with the syntax ofmatchCode
.- If you write
777 is Foo
orFoo.Bar is Foo
, the left-hand side is interpreted as an equality test. Thusx is Foo
would have been a special case, and it would arguably be surprising ifx is Foo
andx.y is Foo
did fundamentally different things.
Standalone ranges and in
operator
The ..<
, ...
, and in
operators are not limited to match
. You can use them in ordinary expressions, like this:
if (!(index in 0..list.Count))
throw new ArgumentOutOfRangeException("index");
As before, the x in lo..<hi
pattern translates to x.IsInRangeExcludeHi(lo, hi)
while the x in lo...hi
pattern translates to x.IsInRange(lo, hi)
. You can also use in
by itself, or use the range operators by themselves:
var range = 0..list.Count;
if (!(index in range))
throw new ArgumentOutOfRangeException("index");
This is translated to
var range = Range.ExcludeHi(0, list.Count);
if (!range.Contains(index))
throw new ArgumentOutOfRangeException("index");
Loyc.Essentials.dll contains all of the methods shown here; IsInRangeExcludeHi
is an extension method in class Loyc.Range
. While Range.ExcludeHi
is in the Loyc
namespace, it returns a variable of type Loyc.Collections.NumRange<Num,Math>
where Num
is a numeric type such as int
, and Math
is a helper type that allows NumRange
to perform arithmetic on that numeric type (it is needed since .NET does not define math interfaces for built-in types.) It is worth noting that NumRange
implements IReadOnlyList<Num>
, so you can use it in foreach
loops and LINQ expressions.
Since the expression x in range
just calls range.Contains(x)
, it is compatible with standard collection types.
Wrapping up
I think that’s everything. I hope these features make you more productive. Enjoy!
To learn more or download LeMP, visit the LeMP home page.