Thursday, February 24, 2011

Example of C# lazy, functional programming: SplitUp()

I seem to be on a bit of roll here regarding extension methods. They are by no means a silver bullet, but this method is a nice LINQ-like lazy method on a generic sequence that is a perfect fit. I think it nicely illustrates how you can write your own such functional methods that are usable like LINQ methods that are part of the .NET framework, and have some of the same characteristics.

This SplitUp() extension method takes a sequence and splits it up into subsequences that each have a maximum length. For instance, you can split a sequence (list, collection, array, etc.) of 64 integers into an enumerable sequence of List<int> instances of lengths 10, 10, 10, 10, 10, 10 and 4 by calling SplitUp(10) on it.

Here is the source:

namespace peSHIr.Utilities
{
 using System;
 using System.Linq;
 using System.Text;
 using System.Collections.Generic;

 /// <summary>Utility code for working with sequences</summary>
 public static class SequenceUtility
 {
  /// <summary>Split up sequence of items</summary>
  /// <typeparam name="T">Item type</typeparam>
  /// <param name="input">Input sequence</param>
  /// <param name="n">Maximum number of items per sublists</param>
  /// <returns>Sequence of lists with a maximum
  /// of <paramref name="n"/> items</returns>
  /// <remarks>Might need a suppression of code analysis rule
  /// CA1006 because of the nested generic type in the method
  /// signature.</remarks>
  public static IEnumerable<IEnumerable<T>>
   SplitUp<T>(this IEnumerable<T> input, int n)
  {
   // Non-lazy error checking
   if (input == null) throw new ArgumentNullException("input");
   if (n < 1) throw new ArgumentOutOfRangeException("n", n, "<1");
   return SplitUpLazy(input, n);
  }

  private static IEnumerable<IEnumerable<T>>
   SplitUpLazy<T>(IEnumerable<T> input, int n)
  {
   // Lazy yield based implementation
   var list = new List<T>();
   foreach (T item in input)
   {
    list.Add(item);
    if (list.Count == n)
    {
     yield return list;
     list = new List<T>();
    }
   }
   if (list.Count > 0) yield return list;
   yield break;
  }
 }
}

As you can see, the SplitUp function behaves like built in LINQ functions because its implementation is split up (pun intented...). The public variant basically just does argument checking, so you get the ArgumentExceptions on improper use immediately when calling the method, while the private actual implementation uses yield statements to implement the actual splitting of the input sequence into lists of at most n elements.

This mirrors the implementation of LINQ methods, as shown in the very informative Edulinq blog series on their implementation by Jon Skeet, the so called superuser of stackoverflow.com.

I hope you find this extra illustration of this technique informative, or at least find the method itself useful. Personally I have used it for splitting up sequences of input records from a file into batches for processing by a web service that had a maximum request size. I would love to hear what you have used it for, so all comments are welcome.

(Added later: For those of you that like to have working pieces of example code to play with for code nuggets like this, please check out my next blog post.)

No comments:

Post a Comment