The C# null-coalescing operator

17 May 2021

Recently I have had cause to analyse the mathematical properties of C#’s null-coalescing operator (??). If you’re wondering what kind of wacky and fun-filled life I lead, then read on …

The null-coalescing operator takes in two arguments, and returns the first one unless it’s null, in which case it returns the second one. That is, if you have two variables a and b of type T, then the statement T x = a ?? b; is equivalent to the following:

T x;
if (a != null)
{
    x = a;
}
else
{
    x = b;
}

Using this operator can be a really succinct way of expressing concepts involving nulls in code.

For example, recently I’ve been working on some code which parses XML into more structured types. The XML can take many different (valid) forms, and so there are dozens of different types that could be returned. Currently, the code looks something like this:

XElement fileContents = // Import from file ...

return GetFormat1OrNull(fileContents) ??
    GetFormat2OrNull(fileContents) ??
    GetFormat3OrNull(fileContents) ??
    GetFormat4OrNull(fileContents) ??
    GetFormat5OrNull(fileContents);

In my case, the definition of what constitutes the different formats is a little vague, and I’d like to refactor to group them differently. In order to do this, I needed to verify two mathematical properties of the ?? operator.

Maths

In mathematics, a binary operator is a function which takes two inputs (hence “binary”) and returns one output. These are all over the place in coding; examples in C# include:

  • +, -, * and /, which input two numbers and output a number
  • ==, which input any two types and outputs a boolean
  • <, <=, > and >=, which input two numbers and output a boolean
  • (yes, you guessed it) ??, which inputs two of the same type and outputs that type.

A binary operator is called associative if it doesn’t matter where you put the brackets when you apply the operator twice. For example, (a + b) + c and a + (b + c) are always the same; this makes + associative. This is helpful when writing things down because it means you can just leave out the brackets: a + b + c. [Note: this only makes sense when the operator inputs and outputs the same type.]

A binary operator is called commutative if it doesn’t matter which order you write the inputs. For example, a + b and b + a are always the same; this makes + commutative.

You can easily find examples of operators with and without these properties; e.g. - is neither associative or commutative (try it out!).

Application to my code

It turns out that ?? is associative (proof at the end of this post).

This is really helpful, because it means that I can change the way that the different formats are grouped together by clever use of brackets. Once I’ve done that I can extract methods for groupings of formats that make sense together.

Unfortunately it turns out that ?? is not commutative, because the order matters in the case when both arguments are not null (again, see the end of this post for a proof).

However, in my case the different formats are mutually exclusive; that is, the incoming XML can match at most one of the valid formats. This means that I can treat the ?? as commutative, because there is never a situation where the XML matches multiple formats.

This means that if I wanted to group formats 2 and 4, I could first reorder 3 and 4 (by commutativity) and then group 2 and 4 (by associativity).

Conclusion

This is just one of many ways in which the mathematical theory behind programming helps you to refactor. Understanding the maths helps you to tell whether a change you’ve made is a genuine refactor or whether it might break the code.

Breaking up a refactor into lots of small steps, each of which can be proven not to break anything, is a very useful way of improving the quality of your code. It’s also very satisfying!

Appendix: proofs

See below a proof that ?? is associative.

A B C (A ?? B) ?? C A ?? (B ?? C)
null null null (null ?? null) ?? null
= null ?? null
= null
null ?? (null ?? null)
= null ?? null
= null
null null C (null ?? null) ?? C
= null ?? C
= C
null ?? (null ?? C)
= null ?? C
= C
null B null (null ?? B) ?? null
= B ?? null
= B
null ?? (B ?? null)
= null ?? B
= B
null B C (null ?? B) ?? C
= B ?? C
= B
null ?? (B ?? C)
= null ?? B
= B
A null null (A ?? null) ?? null
= A ?? null
= A
A ?? (null ?? null)
= A ?? null
= A
A null C (A ?? null) ?? C
= A ?? C
= A
A ?? (null ?? C)
= A ?? C
= A
A B null (A ?? B) ?? null
= A ?? null
= A
A ?? (B ?? null)
= A ?? B
= A
A B C (A ?? B) ?? C
= A ?? C
= A
A ?? (B ?? C)
= A ?? B
= A

See below a proof that ?? is commutative except when both values are not null.

A B A ?? B B ?? A
null null null null
null B B B
A null A A
A B A B