Comma Quibbling

Comma Quibbling

Rate This

[UPDATE: Holy goodness. Apparently this was a more popular pasttime than I anticipated. There's like a hundred solutions in there. Who knew there were that many ways to stick commas in a string? It will take me some time to go through them all, so don't be surprised if it's a couple of weeks until I get them all sorted out.]

Comma The point of Monday’s post about comma-separated lists was not so much about the actual problem; it’s a rather trivial problem. Rather, I wanted to make two points. First, stating the actual problem rather than a much harder and more general version of the problem is likely to get you a realistic solution to your actual problem much faster. And second, reworking the statement of the problem into an equivalent but structurally different statement is a great way to see solutions that you might have otherwise missed.

But whenever I make a post illustrating such points with a specific example, lots of people pipe up with their ideas for how to solve the specific example. Which is awesome; I encourage this behaviour.

So in that spirit, here’s a slightly harder version of the string concatenation problem, just for the fun of it. Write me a function that takes a non-null IEnumerable<string> and returns a string with the following characteristics:

(1) If the sequence is empty then the resulting string is "{}".
(2) If the sequence is a single item "ABC" then the resulting string is "{ABC}".
(3) If the sequence is the two item sequence "ABC", "DEF" then the resulting string is "{ABC and DEF}".
(4) If the sequence has more than two items, say, "ABC", "DEF", "G", "H" then the resulting string is "{ABC, DEF, G and H}". (Note: no Oxford comma!)

I think you get the idea. You can post your solution in the comments or use the link on the blog page to email your solution to me.

The strings in the sequence can be assumed to be non-null but can otherwise be any string value, including empty strings or strings containing commas, braces and "and".

There’s no size limit on the sequence; it could be tiny, it could be thousands of strings. But it will be finite.

All you get are the methods of IEnumerable<string>; if you want to make that thing into a list or an array, you’re going to need to do that explicitly rather than casting it and hoping for the best.

I am particularly interested in solutions which make the semantics of the code very clear to the code maintainer.

Of course, C# is most interesting to me, but if there are neat ways to express this in other languages, I’d love to see them too.

If there are any particularly amusing or interesting implementations I’ll dissect them on the blog in a future episode, probably in a week or so. I’m not going to have time to do a detailed analysis of every one.

And… go!

  • I am a fan of the recursive solution, as it fits closely with how the problem is stated.

    My first attempt was very similar to Andreas Kromann's except I used ToArray to remove the need to traverse the IEnumerable each time with count and an offset to remove the need for slow array splicing.

    This solution uses a lot of string concatenation, which for large lists would cause performance problems (the problem states that the sequences could be large). My second attempt was very similar but passed in a StringBuilder. While this was more efficient it was less clear.

    My third attempt was to try and find a happy middle ground between the two. I ended up using recursion to generate a new IEnumerable which I could then consume and use a StringBuilder to combine.

    using System;

    using System.Text;

    using System.Linq;

    using System.Collections.Generic;

    public class CommaQuibble

    {

       private static IEnumerable<string> ToList(string[] strings, int start)

       {

    int length = strings.Length - start;

    if (length == 0)

    {

       yield return string.Empty;

    }

    else if (length == 1)

    {

       yield return strings[start];

    }

    else if (length == 2)

    {

       yield return strings[start] + " and " + strings[start + 1];

    }

    else

    {

       yield return strings[start] + ", ";

       /* Concatinate the result of the recursive call to the end of this IEnumerable. */

       foreach(string s in ToList(strings, start + 1))

       {

    yield return s;

       }

    }

       }

       public static string ToList(IEnumerable<string> strings)

       {

    StringBuilder stringBuilder = new StringBuilder("{");

    foreach(string s in ToList(strings.ToArray(), 0))

    {

       stringBuilder.Append(s);

    }

    stringBuilder.Append("}");

    return stringBuilder.ToString();

       }

       public static void Main()

       {

    List<string> list = new List<string>() {"one", "two", "three"};

    Console.WriteLine(ToList(list));

       }

    }

  • I dont't like code that has to undo what proviously done before. But my standard solution for the original problem is to always append comma-value and than returning the final string from the second char on.

    For this problem I found a nice solution with very few variables, and no backtracking:

    string Stringize1(IEnumerable<string> list) {

     StringBuilder sb = new StringBuilder("{");

     IEnumerator<string> enumerator = list.GetEnumerator();

     if(enumerator.MoveNext()) {

       sb.Append(enumerator.Current);

       if(enumerator.MoveNext()) {

         string current = enumerator.Current;

         while(enumerator.MoveNext()) {

    sb.AppendFormat(", {0}",current);

    current = enumerator.Current;

    }

         sb.AppendFormat(" and {0}",current);

         }

       }

     return sb.Append("}").ToString();

     }

    The inner declaration of the string variable and the while-loop can be also expressed this way:

         string current;

         for(current=enumerator.Current; enumerator.MoveNext(); current=enumerator.Current)

    sb.AppendFormat(", {0}",current);

    but this is a matter of taste... I prefer the former, because I need to explicitly define the string var outside the loop, because I need to use it after the loop termination!

  • Olivier Leclant's answer is by far the best.

  • Here's the smallest, efficient solution.  Stringbuilder, and track the position of the last comma, for conversion to " and ".

    static string PrettyPrint(IEnumerable<string> strings)

    {

               bool firstString = true;

               int lastCommaPos = -1; /* no comma, yet */

               var sb = new StringBuilder();

               sb.Append("{");

               foreach (var s in strings)

               {

                   if (!firstString)

                   {

                       lastCommaPos = sb.Length;

                       sb.Append(", ");

                   }

                   sb.Append(s);

                   firstString = false;                              

               }

               /* if we have a final comma, turn it into " and " */

               if (lastCommaPos != -1)

                   sb.Replace(", ", " and ", lastCommaPos, 2);

               sb.Append("}");

               return sb.ToString();

    }

  • I like Jafar Husain's strategy and have attempted a port to Ruby. There are pattern matching implementations for Ruby, but nothing in the core language, so here a case statement has to suffice.

    module Enumerable

     def bracketed_english_join

       s = "{"

       triple {

               |one, two, three|

               case

                 when two == nil

                         s << ""

                 when one == nil && three == nil

                         s << two

                 when one == nil

                         s << two

                 when three == nil

                         s << " and " << two

                 else  s << ", " << two

               end

       }

       s << "}"

     end

     def triple

       prepre = pre = current = nil;

       yield [nil, nil, nil]

       each do |item|

         prepre = pre;

         pre = current;

         current = item;

         yield [prepre, pre, current]

       end

       yield [pre, current, nil]

     end

    end

    if __FILE__ == $0

    require 'test/unit'

    class BracketedEnglishJoinTestCase < Test::Unit::TestCase

      def test_empty_returns_empty_string

        assert_equal('{}', [].bracketed_english_join)

      end

      def test_single_returns_item_only

        assert_equal(

          '{ABC}',

          ['ABC'].bracketed_english_join

        )

      end

      def test_dual_returns_and_separated

        assert_equal(

          '{ABC and DEF}',

          ['ABC', 'DEF'].bracketed_english_join

        )

      end

      def test_many_returns_comma_then_and_separated

        assert_equal(

          '{ABC, DEF, G and H}',

          ['ABC', 'DEF', 'G', 'H'].bracketed_english_join

        )

      end

    end

    end

  • @rbirkby: Could you point out a usage of StringBuilder in this thread which is inappropriate?

    Yes, the compiler will use String.Concat - but in this situation you really do want to use StringBuilder.

  • Just to clarify my last comment - it's fair to say that if you're already using string.Join (as rbirkby's solution does) then using StringBuilder wouldn't help much. However, the solutions which only build the result up as they go without *any* duplicate strings being created (beyond StringBuilder buffer doubling) are going to be better off using StringBuilder than string concatenation.

  • Not so efficient since I cannot assume the IEnumerable be a specific  containar and need to count elementes, but to me simple enough.

         public string StringCommas(IEnumerable<string> collection)

           {

               // Count Elements

               int numberOfCollection = 0;

               foreach (string item in collection)

               {

                   numberOfCollection++;    

               }

               int numOfLeftSeparator = numberOfCollection - 1;

               string result = "";

               foreach (string item in collection)

               {

                   result += item;

                   if (numOfLeftSeparator == 1)

                   {

                       result += " and ";

                   }

                   else if (numOfLeftSeparator > 1)

                   {

                       result += " ,";

                   }

                   numOfLeftSeparator--;

               }

               return "{" + result + "}";

           }

  • I have extract position detection algorythm from my previous post to extension method, which seems to be very usable in similar scenarious.

           public static string Join2(IEnumerable<string> strings)

           {

               var delimiter = new Dictionary<ItemPosition, string>

               {

                   {ItemPosition.First, ""},

                   {ItemPosition.Single, ""},

                   {ItemPosition.Default, ", "},

                   {ItemPosition.Last, " and "},

               };

               return strings

                   .GetPositions()

                   .Aggregate(

                       new StringBuilder("{"),

                       (sb, item)=> sb

                           .Append(delimiter[item.Position])

                           .Append(item.Value),

                       sb => sb

                           .Append("}")

                           .ToString()

                   );

           }

           [Flags]

           public enum ItemPosition

           {

               Default = 0,

               First = 1,

               Last = 2,

               Single = First | Last,

           }

           public class PositionedItem<T>

           {

               private ItemPosition m_position;

               private T m_value;

               public PositionedItem(ItemPosition position, T value)

               {

                   m_position = position;

                   m_value = value;

               }

               public ItemPosition Position { get { return m_position; } }

               public T Value { get { return m_value; } }

           }

           public static IEnumerable<PositionedItem<T>> GetPositions<T>(this IEnumerable<T> items)

           {

               T current = default(T);

               bool thereAreItems = false;

               ItemPosition position = ItemPosition.First;

               foreach (var item in items)

               {

                   if (thereAreItems)

                   {

                       yield return new PositionedItem<T>(position, current);

                       position = ItemPosition.Default;

                   }

                   current = item;

                   thereAreItems = true;

               }

               if (!thereAreItems)

               {

                   yield break;

               }

               position |= ItemPosition.Last;

               yield return new PositionedItem<T>(position, current);

           }

    (It seems my post is not appears in a couple of hours, so I repost it with little changes.)

  • It seems to me that everyone using the Count() extension method on IEnumerable multiple times would do well to just translate the IEnumerable<string> to a List<string>, as each invocation of Enumerable.Count() does a complete iteration. The List implementation keeps that count internally, making it a single operation to retrieve it. You'll take a single hit converting it to a List<>, as opposed to multiple hits calling Count() multiple times.

  • Sorry. A little bit better removng an else :)

           public string StringCommas(IEnumerable<string> collection)

           {

               // Count Elements

               int numberOfCollection = 0;

               foreach (string item in collection)

               {

                   numberOfCollection++;    

               }

               int numOfLeftSeparator = numberOfCollection - 1;

               string result = "";

               foreach (string item in collection)

               {

                   result += item;

                   if (numOfLeftSeparator == 1)

                   {

                       result += " and ";

                   }

                   if (numOfLeftSeparator > 1)

                   {

                       result += " ,";

                   }

                   numOfLeftSeparator--;

               }

               return "{" + result + "}";

  • This is my two cents :-)

    using System;
    using System.Collections.Generic;
    namespace lippert_strings
    {
      class MainClass
      {
        public static void Main(string[] args)
        {
          Console.WriteLine("{}" == SmartJoin(new string[0]));
          Console.WriteLine("{ABC}" == SmartJoin(new string[]{"ABC"}));
          Console.WriteLine("{ABC and DEF}" == SmartJoin(new string[]{"ABC", "DEF"}));
          Console.WriteLine("{ABC, DEF, G and H}" == SmartJoin(new string[]{"ABC", "DEF", "G", "H"}));
          Console.WriteLine("{ABC, \"DE, F\", G and H}" == SmartJoin(new string[]{"ABC", "DE, F", "G", "H"}));
          Console.WriteLine("{\"A and BC\", DEF, G and H}" == SmartJoin(new string[]{"A and BC", "DEF", "G", "H"}));
          Console.ReadKey();
        }

        private static string EscapeString(string source)
        {
          if(source.IndexOf(' ') != -1 || 
            source.IndexOf(',') != -1 ||
            source.IndexOf('{') != -1 ||
            source.IndexOf('}') != -1)
            return "\"" + source + "\"";
          return source;
        }

        public static string SmartJoin(IEnumerable<string> source)
        {
          string lastValue = null;
          List<string> firstValues = new List<string>();
          foreach(var current in source)
          {
            if(lastValue != null)
              firstValues.Add(lastValue);
            lastValue = EscapeString(current);
          }
          var firstValuesStr = "";
          if(firstValues.Count > 0)
          {
            firstValuesStr = string.Join(", ", firstValues.ToArray()) + " and ";
          }
          var result = string.Format("{{{0}{1}}}", firstValuesStr, lastValue);
          return result;
        }
      }
    }

  • Sam the Count() extension does a sneak peak at the type and the standard collection implementations will result in a call to Count property rather than enumeration

  • Fun problem!  Elegance is in the eye of the beholder, but this one's at least a little different from those posted by others.  Single pass, no back-patching of the output, and fairly readable:

       static string InsertCommas(IEnumerable<string> strings)
       {
           StringBuilder builder = new StringBuilder();
           bool first = true;
           builder.Append('{');
           strings.Aggregate(
               (string)null, //init prev to null
               (prev, current) =>
               {
                   if (prev != null)
                   {
                       if (!first)
                           builder.Append(", ");
                       first = false;
                       builder.Append(prev);
                   }
                   return current;
               },
               (last) =>
               {
                   if (last != null)
                   {
                       if (!first)
                           builder.Append(" and ");
                       builder.Append(last);
                   }
                   return string.Empty;
               });
           builder.Append('}');
           return builder.ToString();
       }

  • I didn't add StringBuilder, to keep the posting short, but here is mine. I used regex instead of complicated logic "remembering" the last type, etc.

    using System;
    using System.Text.RegularExpressions;
    namespace CommaQuibbling
    {
       class Program
       {
           static void Main(string[] args)
           {
               // Output:
               // {}
               // {ABC}
               // {ABC and DEF}
               // {ABC, DEF, G and H}
               Console.WriteLine(joinWords(""));
               Console.WriteLine(joinWords("ABC"));
               Console.WriteLine(joinWords("ABC", "DEF"));
               Console.WriteLine(joinWords("ABC", "DEF", "G", "H"));
               Console.ReadLine();
           }
           private static string joinWords(params string[] words)
           {
               // Add the commas and brackets
               string returnString = "{" + string.Join(", ", words) + "}";
               // Add the "and", and remove the last comma
               string pattern = @"^(?<First>.*), (?<Last>.*)$";
               returnString = Regex.Replace(returnString, pattern,
                   match => match.Groups["First"].Value + " and " + match.Groups["Last"].Value);
               return returnString;
           }
       }
    }

Page 8 of 19 (277 items) «678910»