ProcGen Fun 4: Recursive sampling

[List of posts | playground | source code for this post]

In a previous post we implemented a function which transforms a single-valued distribution into a distribution returning multiple values, as follows.

public static IDistribution<IEnumerable<T>> Repeat<T>(
    this IDistribution<T> dist, int count) =>
    new RepeatDistribution<T>(dist, count);

private class RepeatDistribution<T>(IDistribution<T> dist, int count) :
    IDistribution<IEnumerable<T>>
{
    public IEnumerable<T> Sample<TRng>(TRng rng) where TRng : notnull, IRng
    {
        for (int i = 0; i < count; i++)
        {
            yield return dist.Sample(rng);
        }
    }
}

At the time I should have added argument validation, so let's do that quickly:

public static IDistribution<IEnumerable<T>> Repeat<T>(
    this IDistribution<T> dist, int count)
{
    if (count < 0)
    {
        throw new ArgumentOutOfRangeException(
            nameof(count), count, "Count must be non-negative");
    }

    return new RepeatDistribution<T>(dist, count);
}

When I first wrote this function, it irked me a little that we had to introduce a new class to implement it. Granted, the extra class is private, and so can't be seen outside by anyone using the Repeat method, but it would be nice to try and implement this using a more functional approach.

Well, today I had a brainwave - use recursion! Here's my refactored version, in all its glory:

public static IDistribution<IEnumerable<T>> Repeat<T>(
    this IDistribution<T> dist, int count)
{
    if (count < 0)
    {
        throw new ArgumentOutOfRangeException(
            nameof(count), count, "Count must be non-negative");
    }

    if (count == 0)
    {
        return Singleton.New(Enumerable.Empty<T>());
    }

    return
        from values in dist.Repeat(count - 1)
        from value in dist
        select values.Append(value);
}

We start with the base case: if the caller has asked for a count of zero, then they will always get an empty sequence. For this we can use a singleton distribution.

In the general case, we can use the same function to generate all but the last value, and then sample from the distribution once and append it to the result. Using query syntax for this works quite well I think, as it's clear that we're building up the resultant distribution from simpler ones.

Let's just test it out quickly, to make sure we haven't broken anything. I'll open the "visualise distributions" screen from post 1 as that uses the Repeat method to generate a large number of samples.

Screenshot of a
StackOverflowException

Oh dear! By implementing this using recursion instead of a loop, we've made it so that sampling from the distribution uses up one frame on the stack per sample we need. I'm asking for 100,000 samples here, which blows way past the limit of around 7000 that C# allows.

So why didn't this work? Isn't this how you would implement an algorithm like this in functional languages like Haskell or F#? The key is that if a function is tail recursive (the final statement in a function contains a call to itself) then it's possible for the recursion to be replaced with a loop. In many languages the compiler does this for you to avoid exceptions like this; C# happens not to be one of them!

So, alas, my grand idea of implementing this using recursion has failed. Never mind; often in programming you have to compromise on your lofty theoretical ideals in order to make something that actually works. Let's revert these changes and move on. Next time I'll fulfil my promise of more mazes!