Looking Inside C# Closures

If you're like me, you understand new language features better when you see what the new language features generate for you.

Closures in C# are no different. There's quite a bit that goes on under the covers in a C# closure. Looking at all the code that the C# 3.0 compiler generates can really help you understand what's happening. A little help from Reflector and we can learn a lot.

I started with this rather simple C# 3.0 program:

class Program
{
static void Main(string[] args)
{
int counter = 0;
IEnumerable<int> values = Utilities.Generate(20, () => counter++);

Console.WriteLine("Current Counter: {0}", counter);
foreach(int num in values)
Console.WriteLine(num);

Console.WriteLine("Current Counter: {0}", counter);

foreach (int num in values)
Console.WriteLine(num);

Console.WriteLine("Current Counter: {0}", counter);
}
}

public static class Utilities
{
public static IEnumerable<T> Generate<T>(int num, Func<T> generator)
{
int index = 0;
while (index++ < num)
yield return generator();
}
}

The output for this program is pretty simple, but shows us a few things about closures and deferred execution:

Current Counter: 0
0
1
2
...
17
18
19
Current Counter: 20
20
21
...
38
39
Current Counter: 40

 

There are a few points to see here. First, notice that the value of counter is 0 after defining the sequence. That's because enumerations use deferred execution. The enumeration does not happen until some calling code wants to examine the enumeration. You can see this by looking at the value of counter after the first and second enumerations. Notice that counter has a value of 20 after enumerating the sequence once. Then, you see that counter has a value of 40 after enumerating the sequence again. Also, notice that the sequence returned changes each time you enumerate the sequence. Charlie Calvert covers this concept very well here.

Now, let's look at how it works inside. A Closure is a data structure that holds an expression and an environment containing the variable bindings necessary to evaluate the expression. OK, that's a mouthful. Sometimes, it's easier to understand in code. So let's power up Reflector and see what the compiler wrote. Reflector has a great option where you can specify what version of .NET Reflector should disassemble into. For this post, I chose to have Reflector generate .NET 1.1 code. Now, that caused quite an increase in the volume of the code, its readability, and in fact, the C# compiler won't even compile the disassembled code (more on that in a minute). But, it does show exactly what C# is doing for you with all these new features. Therefore, for the rest of this article, I took the disassembled code, and reworked it so that it was valid C#. Strange compiler generated variable names have been replaced with legal names. Constructs that won't compile have been rewritten.

I'll leap to the conclusion right now: In most cases, C# 3.0 creates classes to handle the state for closures, continuations (enumerator methods), and other new C# 3.0 features. There's no magic, just a lot of generated code.

Creating an Enumerator class

Let's begin with the Generate method. Generate creates the sequence, and it does so using the yield return contextual keyword. Yield Return creates a nested enumerator class to generate the sequence, and it handles all the work to create and use the sequence:

public static class Utilities
{
// Methods
public static IEnumerable<T> Generate<T>(int num, Func<T> generator)
{
GenerateEnumerator<T> d__ = new GenerateEnumerator<T>(-2);
d__.currentNumber = num;
d__.generatorFunc = generator;
return d__;
}

// Nested Types
private sealed class GenerateEnumerator<T> :
IEnumerable<T>, IEnumerable, IEnumerator<T>, IEnumerator, IDisposable
{
// Fields
private int state;
private T current;
public Func<T> generatorFunc;
public int currentNumber;
private int initialThreadId;
public int index;
public Func<T> generator;
public int num;

// Methods
public GenerateEnumerator(int initialState)
{
this.state = initialState;
this.initialThreadId = Thread.CurrentThread.ManagedThreadId;
}

public bool MoveNext()
{
switch (this.state)
{
case 0:
this.state = -1;
this.index = 0;
while (this.index++ < this.num)
{
this.current = this.generator();
this.state = 1;
return true;
}
break;

case 1:
while (this.index++ < this.num)
{
this.current = this.generator();
this.state = 1;
return true;
}
break;
}
return false;
}

IEnumerator<T> IEnumerable<T>.GetEnumerator()
{
Utilities.GenerateEnumerator<T> d__;
if ((Thread.CurrentThread.ManagedThreadId ==
this.initialThreadId) && (this.state == -2))
{
this.state = 0;
d__ = (Utilities.GenerateEnumerator<T>)this;
}
else
d__ = new Utilities.GenerateEnumerator<T>(0);
d__.num = this.currentNumber;
d__.generator = this.generatorFunc;
return d__;
}

IEnumerator IEnumerable.GetEnumerator()
{
return ((IEnumerable<T>)this).GetEnumerator();
}

void IEnumerator.Reset()
{
throw new NotSupportedException();
}

void IDisposable.Dispose()
{
}

T IEnumerator<T>.Current
{
get { return this.current; }
}

object IEnumerator.Current
{
get { return this.current; }
}
}
}

All the interesting additions are in the GenerateEnumerator class. You can see that this class contains an implementation if IEnumerator<T> and IEnumerator. It handles the state relating to the current location in the list. It creates the nested class whenever you call generate. All the IEnumerator methods are handled by the nested class. It's quite a bit of typing, but it's nothing new or magical.

Closures are also classes and objects

The same technique is used for the closure that surrounds the generate method in the main method of the sample:

class Program

{
private sealed class GeneratedClosure
{
// Fields
public int counter;

// Methods
public int GeneratedMethod1()
{
return this.counter++;
}
}

private static void Main(string[] args)
{
GeneratedClosure closureObject = new GeneratedClosure();
closureObject.counter = 0;

IEnumerable<int> values = Utilities.Generate<int>(20,
new Func<int>(closureObject.GeneratedMethod1));

Console.WriteLine("Current Counter: {0}", closureObject.counter);

using (IEnumerator<int> generatedEnumerator = values.GetEnumerator())
{
while (generatedEnumerator.MoveNext())
{
int num = generatedEnumerator.Current;
Console.WriteLine(num);
}
}
Console.WriteLine("Current Counter: {0}", closureObject.counter);

foreach (int num in values)
{
Console.WriteLine(num);
}
Console.WriteLine("Current Counter: {0}", closureObject.counter);
}
}

 

The compiler created the GeneratedClosure class to contain the bound variables in the environment that the expression needs. This is a simple environment that only needs one field, the counter, so it's a simple class. Note that the field is public, and the closure type contains the method that will be bound to the delegate (GeneratedMethod1). GeneratedClosure implements the environment and the bound variables. All execution of the expressions in the closure take place in the context of this nested class.

You can see what I mean by looking at what the C# 1.1 equivalent of Main() looks like. Instead of a simple int local variable, the compiler creates an instance of GeneratedClosure. Then, the compiler initializes the bound variables (closureObject.counter).

The lambda expression has been replaced by a delegate bound to the instance method closureObject.GeneratedMethod1. That ensures that the delegate is evaluated in the context of the closure environment.

There are a few extra bits of C# behavior to see here. Even though Main enumerates the sequence more than once, the compiler creates only one instance of the closure. That environment is reused each time. That's how counter ends up being 40, rather than 20. The second sequence contains the numbers 20-39 for the same reason. Notice that in both cases, Main examines the bound variable inside the closure. That's how changes in the closure environment are visible (and modifyable) from the outer scope.

Finally, I don't know why the first foreach loop is completely different than the second. If anyone knows, I'd be interested.

I hope this little side trip inside closures has been useful. The bottom line (at least to me), is this: The compiler creates an environment using lots of familiar constructs. While it helps to look inside the code to see what's going on, that's only necessary to help understand the new features. It helps to remove the mystery by peeling under the covers. In most daily work, it's better to use the new syntax, let the compiler do the work, and get stuff done. But now, the next time someone mentions "closures" and debates whether or not they are useful, now you know what a closure is, why it's a useful construct, and how the C# compiler puts one together for you.

Comments are closed.