With power comes responsibility: Spot the bug #2 Solution, yield return and partial execution

A couple of days ago I posted a piece of code that made some trouble under certain circumstances. At a first glance it seemed odd, but looking deeper into it, it behaved absolutely correctly.

We used a base class for a Query-Object called Finder (implementing IFinder). A finder should be instantiated once per use, and hence never be used twice. That makes it easier to create complicated query-logic using member variables. Just to protect the deriving classes, we wanted to make sure that the implementation is never called twice.

I slightly simplified the code since the last post.

public abstract class Finder
{
    private bool _used;
    public IEnumerable<int> GetIds()
    {
        if (_used)
            throw new InvalidOperationException("Please do not use twice!");

        _used = true;

        return getIds();
    }

    protected abstract IEnumerable<int> getIds();
}

public class YieldingFinder : Finder
{
    private bool _executed;
    protected override IEnumerable<int> getIds()
    {
        if (_executed)
            throw new ApplicationException("booom!");
        _executed = true;
        yield return 1;
    }
}

Line 6 and 7 will make sure, that YieldingFinder.getIds() is called only once. But when using yield, the C# compiler factors the code inside getIds() into a state machine implementing IEnumerable<int>. Everytime you call getIds(), it will return a instance of this Enumerable.

Generated Source code

When you then instantiate an Enumerator using IEnumerable.GetEnumerator() you’ll get a fresh state machine which moves forward through it’s states on every call to MoveNext(). It holds a reference to the parent class YieldingFinder, which is used for member variable access.

In other words, it creates a backdoor to the protected method. The base class lost control.

public YieldingFinder <>4__this;
private bool MoveNext()
{
    switch (this.<>1__state)
    {
        case 0:
            this.<>1__state = -1;
            if (this.<>4__this._executed)
            {
                throw new ApplicationException("booom!");
            }
            this.<>4__this._executed = true;
            this.<>2__current = 1;
            this.<>1__state = 1;
            return true;
         case 1:
            this.<>1__state = -1;
            break;
    }
    return false;
}

[DebuggerHidden]
void IEnumerator.Reset()
{
    throw new NotSupportedException();
}

Full source code here.

The Bug

In our concrete example you can simply provoke the ApplicationException to be thrown, by retrieving the enumerator once and then enumerate it twice.

var ids = new YieldingFinder().GetIds();
ids.ToArray();
ids.ToArray();

In other words: On the same instance of YieldingFinder the check if it is used twice (Finder.GetIds()) runs only once, but the code checking if the custom code YieldingFinder.getIds() still run twice.

Tests-first

These are the test cases I expect to pass:

  • Second call to GetIds() should fail
  • Second call to GetEnumerator() should fail
  • Happily the compiler-generated enumerator is single-use already. It throws a NotSupportedException on Reset.
  • I also test what happens when I enumerate twice, even though if you can’t call GetEnumerator() twice, the test will succeed, too.
[TestClass]
public class FunnyCode
{
    [TestMethod]
    public void GetIds_FailsSecondTime()
    {
        var finder = new YieldingFinder();
        finder.GetIds();
        finder.Invoking(_ => _.GetIds())
            .ShouldThrow<InvalidOperationException>();
    }

    [TestMethod]
    public void GetEnumerator_FailsSecondTime()
    {
        var ids = new YieldingFinder().GetIds();
        ids.GetEnumerator();
        ids.Invoking(_ => _.GetEnumerator())
            .ShouldThrow<InvalidOperationException>();
    }

    [TestMethod]
    public void GetIds_EnumeratingTwice_FailsSecondTime()
    {
        var ids = new YieldingFinder().GetIds();
        ids.GetEnumerator();
        ids.Invoking(_ => _.GetEnumerator())
            .ShouldThrow<InvalidOperationException>();
    }

    [TestMethod]
    public void GetEnumerator_EnumerateTwice_ResetNotSupported()
    {
        var ids = new YieldingFinder().GetIds();
        var enumerator = ids.GetEnumerator();
        enumerator.MoveNext();

        enumerator.Invoking(_ => _.Reset())
            .ShouldThrow<NotSupportedException>();
    }
}

Take 1

My first take was to wrap the enumeration in another yield:

public IEnumerable<int> GetIds()
{
    if (_used)
        throw new InvalidOperationException("Please do not use twice!");
     _used = true;
     foreach (var item in getIds())
        yield return item;
}

But then line 3 to 6 will not run until MoveNext() is called the first time. It closes the back door, but you can then call GetIds() and GetEnumerator() as often as you want.

Hence, the tests GetIds_FailsSecondTime and GetEnumerator_FailsSecondTime fail.

Take 2

Now, if you want to prevent GetIds() from beeing called twice, the method has to be split.

private bool _used, _enumerated;
public IEnumerable<int> GetIds()
{
    if (_used)
        throw new InvalidOperationException("Please do not use twice!");

    _used = true;

    return enumerateOnce();
}

private IEnumerable<int> enumerateOnce()
{
    if (_enumerated)
        throw new InvalidOperationException("Please do not enumerate twice!");

    _enumerated = true;
    foreach (var item in getIds())
        yield return item;
}

Now, line 4-9 run when GetIds() is called, and 14-17 run whenever MoveNext() is called on the second Enumerator. This solution is acceptable, but it is still later than it could be. GetEnumerator_FailsSecondTime still fails.

Take 3, Final

The best solution is to wrap the enumerator in a class, that intercepts the call to GetEnumerator(). This class could be abstracted into a SingleUseEnumerator, if needed in other classes, too.

It passes all tests.

Here is the code for the complete base class:

public abstract class Finder
{
    private bool _used;
    public IEnumerable<int> GetIds()
    {
        if (_used)
            throw new InvalidOperationException("Please do not use twice!");

        _used = true;

        return new EnumerateOnce(getIds());
    }

    private class EnumerateOnce : IEnumerable<int>
    {
        private bool _enumerated;
        private IEnumerable<int> _baseEnumerator;

        public EnumerateOnce(IEnumerable<int> baseEnumerator)
        {
            _baseEnumerator = baseEnumerator;
        }

        public IEnumerator<int> GetEnumerator()
        {
            if (_enumerated)
                throw new InvalidOperationException("Please do not enumerate twice!");

            _enumerated = true;

            return _baseEnumerator.GetEnumerator();
        }

        IEnumerator IEnumerable.GetEnumerator()
        {
            return GetEnumerator();
        }
    }

    protected abstract IEnumerable<int> getIds();
}

As said, with power comes responsibility. It will be fun to see what people (including me) manage to do with async and await 🙂

Advertisement

Spot the bug #2 (Code)

Yesterday we stumbled some odd behavior. Totally logical and well defined, but still odd.

I reduced the classes to it’s essentials. Finder is a base class among other things assuring it’s implementor can rely on being called only once per instance.

The particular implementation uses member variables to detect circles in a tree.

Can you spot the problem? Can you tell me how to solve it?

(Update 12.01.2011: While anonymizing I had introduced another bug. Fixed now.)

public abstract class Finder
{
    private bool _used;
    public IEnumerable<int> GetIds()
    {
        if (_used)
            throw new NotSupportedException("Please do not use twice!");

        _used = true;

        return getIds();
    }

    protected abstract IEnumerable<int> getIds();
}

public class FlatHierarchyIdsFinder : Finder
{
    List<int> _visited = new List<int>();
    protected override IEnumerable<int> getIds()
    {
        if (_visited.Contains(1))
            throw new ApplicationException("Cycle detected!");

        _visited.Add(1);
        yield return 1;
    }
}

It might help you to read this post: Behind the scenes of the C# yield-keyword

Behind the scenes of the C# yield keyword

After reading the great article about the code-saving yield keyword “Give way to the yield keyword” by Shay Friedman I thought it could be interesting to know how the yield keyword works behind the scenes.

…it doesn’t really end the method’s execution. yield return pauses the method execution and the next time you call it (for the next enumeration value), the method will continue to execute from the last yield return call. It sounds a bit confusing I think… (ShayF)

By using yield return within a method that returns IEnumerable or IEnumerator the language feature is activated.

Note: IEnumerable is kind of a stateless factory for Enumerators. IEnumerable.GetEnumerator() is thread safe and can be called multiple times, while the returned stateful Enumerator is just a helper for enumerating contained values once. By contract IEnumerator offers a Reset() method, but many implementations just throw a NotSupportedException.

Lets create an enumerator method that yields some Fibonacci nubmers.

public class YieldingClass
{
    public IEnumerable<int> GetFibonachiSequence()
    {
        yield return 1;
        yield return 2;
        yield return 3;
        yield return 5;
    }
}

Note: Yield is not a feature of the .Net runtime. It is just a C# language feature which gets compiled into simple IL code by the C# compiler.

The compiler now generates a inner class with following signature (Reflector + some renaming):

[CompilerGenerated]
private sealed class YieldingEnumerator : 
   IEnumerable<object>, IEnumerator<object>
{
    // Fields
    private int state;
    private int current;
    public YieldingClass owner;
    private int initialThreadId;

    // Methods
    [DebuggerHidden]
    public YieldingEnumerator(int state);
    private bool MoveNext();
    [DebuggerHidden]
    IEnumerator<int> IEnumerable<int>.GetEnumerator();
    [DebuggerHidden]
    IEnumerator IEnumerable.GetEnumerator();
    [DebuggerHidden]
    void IEnumerator.Reset();
    void IDisposable.Dispose();

    // Properties
    object IEnumerator<object>.Current 
    { [DebuggerHidden] get; }

    object IEnumerator.Current 
    { [DebuggerHidden] get; }
}

The original method GetFibonachiSequence() only returns a new instance of the YieldingEnumerator, passing the initial state –2 as well as itself as the owner.

Each enumerator holds a state indicating:

  • -2: Initialized as Enumerable. (Not yet an Enumerator)
  • -1: Closed
  • 0: Initialized as Enumerator. 
    If a new Enumerator is requested on the same instance, GetEnumerator() returns another new instance of YieldingEnumerator.
  •  1-n: Index of the yield return in the original GetFibonachiSequence()method. In case of nested enumerators or other more complex scenarios one yield return consumes more than one index.

The content of GetFibonachiSequence() is translated into YieldingEnumerator.MoveNext().

In our very simple scenario the code looks like this:

bool MoveNext()
{
    switch (state)
    {
        case 0:
            state = -1;
            current = 1;
            state = 1;
            return true;

        case 1:
            state = -1;
            current = 2;
            state = 2;
            return true;

        case 2:
            state = -1;
            current = 3;
            state = 3;
            return true;

        case 3:
            state = -1;
            current = 5;
            state = 4;
            return true;

        case 4:
            state = -1;
            break;
    }
    return false;
}

Quite easy, isn’t it?

So far we easily could have created the classes and methods used to enable the yield keyword ourselves, too.

But in more complex scenarios Microsoft does some tricks, which won’t compile as C# – at least not how Reflector translates the resulting IL code.

Lets have a look at some code with a nested enumeration…

foreach(int i in new int[] {1, 2, 3, 5, 8})
{
    yield return i;
}

This compiles into:

private bool MoveNext()
{
    try
    {
        switch (state)
        {
            case 0:
                state = -1;
                state = 1;
                this.values = new int[] { 1, 2, 3, 5, 8 };
                this.currentPositionInValues = 0;
                while (this.currentPositionInValues < this.values.Length)
                {
                    current_i = this.values[this.currentPositionInValues];
                    current = current_i;
                    state = 2;
                    return true;
                Label_007F:
                    state = 1;
                    this.currentPositionInValues++;
                }
                this.Finally2();
                break;

            case 2:
                goto Label_007F;
        }
        return false;
    }
    fault
    {
        this.System.IDisposable.Dispose();
    }
}
[/sourcecode]

Now the states 1 and 2 are used to indicate whether the enumerator actually is at some point (2), or wether it is trying to retrieve the next value (1).

Two things would not compile:

  • goto Label_007F is used to jump back into the iteration over int[] values. The C# goto statement is not able to jump into another statements context. But in IL this is totally valid as a while statement in MSIL is nothing but some gotos either.
  • The fault is proper MSIL, but not supported in C#. Basically it acts as a finally which just is executed in case of an error.

Attention: As in anonymous delegates, parameters as well as local and instance variables are passed to the YieldingEnumerator only once. Read this great post on this: Variable Scoping in Anonymous Delegates in C#

Thanks for your attention!

kick it on DotNetKicks.com