Comma Quibbling

Comma Quibbling

Rate This

[UPDATE: Holy goodness. Apparently this was a more popular pasttime than I anticipated. There's like a hundred solutions in there. Who knew there were that many ways to stick commas in a string? It will take me some time to go through them all, so don't be surprised if it's a couple of weeks until I get them all sorted out.]

Comma The point of Monday’s post about comma-separated lists was not so much about the actual problem; it’s a rather trivial problem. Rather, I wanted to make two points. First, stating the actual problem rather than a much harder and more general version of the problem is likely to get you a realistic solution to your actual problem much faster. And second, reworking the statement of the problem into an equivalent but structurally different statement is a great way to see solutions that you might have otherwise missed.

But whenever I make a post illustrating such points with a specific example, lots of people pipe up with their ideas for how to solve the specific example. Which is awesome; I encourage this behaviour.

So in that spirit, here’s a slightly harder version of the string concatenation problem, just for the fun of it. Write me a function that takes a non-null IEnumerable<string> and returns a string with the following characteristics:

(1) If the sequence is empty then the resulting string is "{}".
(2) If the sequence is a single item "ABC" then the resulting string is "{ABC}".
(3) If the sequence is the two item sequence "ABC", "DEF" then the resulting string is "{ABC and DEF}".
(4) If the sequence has more than two items, say, "ABC", "DEF", "G", "H" then the resulting string is "{ABC, DEF, G and H}". (Note: no Oxford comma!)

I think you get the idea. You can post your solution in the comments or use the link on the blog page to email your solution to me.

The strings in the sequence can be assumed to be non-null but can otherwise be any string value, including empty strings or strings containing commas, braces and "and".

There’s no size limit on the sequence; it could be tiny, it could be thousands of strings. But it will be finite.

All you get are the methods of IEnumerable<string>; if you want to make that thing into a list or an array, you’re going to need to do that explicitly rather than casting it and hoping for the best.

I am particularly interested in solutions which make the semantics of the code very clear to the code maintainer.

Of course, C# is most interesting to me, but if there are neat ways to express this in other languages, I’d love to see them too.

If there are any particularly amusing or interesting implementations I’ll dissect them on the blog in a future episode, probably in a week or so. I’m not going to have time to do a detailed analysis of every one.

And… go!

  • Haskell. It practically reads like the problem statement! I guess it does assume you are using arrays. I don't know how to do enums yet, I'll try later maybe.

    inner :: [String] -> String

    inner [] = ""

    inner [a] = a

    inner [a,b] = a ++ " and " ++ b

    inner (a:rest) = a ++ ", " ++ inner rest

    formatString :: [String] -> String

    formatString a = "{" ++ inner a ++ "}"

  • My solution:

    static string Join(IEnumerable<string> words)

    {

       StringBuilder buffer = new StringBuilder();

       buffer.Append("{");

       bool isFirst = true;

       int counter = 0;

       int count = words.Count(x => true);

       foreach (string word in words)

       {

           if (isFirst)

               buffer.Append(" " + word);

           else if (counter == count - 1)

               buffer.Append(" and " + word);

           else

               buffer.Append(", " + word);

           isFirst = false;

           counter++;

       }

       buffer.Append(" }");

       return buffer.ToString();

    }

    static void Main(string[] args)

    {

       Console.WriteLine(Join(new string[] { }));

       Console.WriteLine(Join(new string[] { "ds" }));

       Console.WriteLine(Join(new string[] { "ds", "sdf" }));

       Console.WriteLine(Join(new string[] { "ds", "sdf", "sdfs" }));

       Console.WriteLine(Join(new string[] { "ds", "sdf", "sdfs", "rty" }));

    }

  • My solution in Java:

    public static void main(String[] args) {

    ArrayList<String> list = new ArrayList<String>();

    list.add("ABC");

    list.add("DEF");

    list.add("GHI");

    list.add("JKL");

    Iterator<String> it = list.iterator();

    System.out.println("{" + print(it,false).replace(", and", " and") + "}");

    }

    private static String print(Iterator<String> iter, boolean nonFirst) {

    if(iter.hasNext()){

    String curr1 = iter.next();

    if(!iter.hasNext()) return (nonFirst?" and ":"") + curr1;

    else return curr1 + "," + print(iter, true);

    }

    return "";

    }

  • Here is a perl solution (with tests):

    #!/net/bin/perl

    use strict;

    use warnings;

    use Test::More 'tests' => 4;

    is(

       concat(),

       '{}',

    );

    is(

       concat('ABC'),

       '{ABC}',

    );

    is(

       concat('ABC', 'DEF'),

       '{ABC and DEF}',

    );

    is(

       concat('ABC', 'DEF', 'G', 'H'),

       '{ABC, DEF, G and H}',

    );

    exit;

    #################

    sub concat

    {

       my @parts = @_;

       if ( not @parts )

       {

           return '{}';

       }

       if ( scalar @parts < 2 )

       {

           return '{' . $parts[0] . '}';

       }

       my $last = pop @parts;

       return '{' . join( ', ', @parts ) . ' and ' . $last .'}';

    }

    I wrote this before reading the comments and was pleasantly surprised to see that Olivier Leclant had used the same technique.

    A different version that simply joins with comma and then substitutes the last comma:

    sub concat

    {

       my $string = join ', ', @_;

       # assuming that parts are just capital ascii letters

       $string =~ s{ , \s ([A-Z]+) \z }{ and $1}smx;

       return '{' . $string . '}';

    }

    But note the massive assumption of what the data is.  

  • One thing I'm seeing a lot of from the imperative side of the room is "where are we in the string?" ifs in a single loop. While I am primarily an imperative programmer, I think this is less elegant than it could be. Consider the following python example:

    def default_head_format(part):

    return str(part)

    def default_body_format(part):

    return ", %s" % str(part)

    def default_tail_format(part):

    return " and %s" % str(part)

    def english_list(sequence,

                    head_format=default_head_format,

                    body_format=default_body_format,

                    tail_format=default_tail_format):

    """

    Converts a sequence (like a list) into an english-language string.

    (1) If the sequence is empty then the resulting string is "".

    (2) If the sequence is a single item "ABC" then the resulting string is "ABC".

    (3) If the sequence is the two item sequence "ABC", "DEF" then the resulting string is "ABC and DEF".

    (4) If the sequence has more than two items, say, "ABC", "DEF", "G", "H" then the resulting string is "ABC, DEF, G and H". (Note: no Oxford comma!)

    """

    # Split up sequence into three subsequences:

    #  - the sequence containing the first item (head)

    #  - the sequence containing the last item (tail)

    #  - the sequence containing all other items (body)

    #

    # Relies on the fact that [][0:1] is [], not an error.

    head = sequence[0:1]

    body = sequence[1:-1]

    tail = sequence[-1:0]

    return "%s%s%s" % (

    "".join(map(head_format, head)),

    "".join(map(body_format, body)),

    "".join(map(tail_format, tail))

    )

    Look, ma, no branches! All the conditional logic is handled by breaking up the input list into three (possibly-empty) lists. The "".join construct is idiomatic python and is (at the time of this writing) much faster than looping and concatenating strings.

    A sample test run:

    >>> import fabstring

    >>> fabstring.english_list([])

    ''

    >>> fabstring.english_list([1])

    '1'

    >>> fabstring.english_list([1,2])

    '1 and 2'

    >>> fabstring.english_list([1,2,3])

    '1, 2 and 3'

    >>> fabstring.english_list([1,2,3,4])

    '1, 2, 3 and 4'

  • If you want to run the above perl the #! line should actually be #!/usr/bin/perl.  I forgot that the terminal I had open was to a box where we do not use the system perl.

  • Serves me right for not checking that I'd *saved* my code before the last test: there's a bug in that, around the fact that the slice [-1:0] is always empty.

    def english_list(sequence,

                    head_format=default_head_format,

                    body_format=default_body_format,

                    tail_format=default_tail_format):

    """

    Converts a sequence (like a list) into an english-language string.

    (1) If the sequence is empty then the resulting string is "".

    (2) If the sequence is a single item "ABC" then the resulting string is "ABC".

    (3) If the sequence is the two item sequence "ABC", "DEF" then the resulting string is "ABC and DEF".

    (4) If the sequence has more than two items, say, "ABC", "DEF", "G", "H" then the resulting string is "ABC, DEF, G and H". (Note: no Oxford comma!)

    """

    # If we get a singleton, or an empty list, handle it properly.

    if len(sequence) <= 1:

    return "".join(map(head_format, sequence))

    # Split up sequence into three subsequences:

    #  - the sequence containing the first item (head)

    #  - the sequence containing the last item (tail)

    #  - the sequence containing all other items (body)

    #

    # Relies on the fact that [][0:1] is [], not an error.

    head = sequence[:1]

    body = sequence[1:-1]

    tail = sequence[-1:]

    return "%s%s%s" % (

    "".join(map(head_format, head)),

    "".join(map(body_format, body)),

    "".join(map(tail_format, tail))

    )

    is correct.

  • private string StringAte(IEnumerable<string> strings)

       {

           StringBuilder list = new StringBuilder("{");

           int lastComma = -1;

           string comma = string.Empty;

           foreach (string s in strings)

           {

               list.AppendFormat("{0}{1}", comma, s);

               if (comma.Length < 1)

               {

                   comma = ", ";

               }

               else

               {

                   lastComma = list.Length - s.Length - comma.Length;

               }

           }

           if (lastComma > 0)

           {

               list.Replace(comma, " AND ", lastComma, comma.Length);

           }

           list.Append("}");

           return list.ToString();

       }

  •    class Program

       {

           static void Main(string[] args)

           {

               Console.WriteLine(AppendWords(null));

               Console.WriteLine(AppendWords(string.Empty));

               Console.WriteLine(AppendWords("ABC"));

               Console.WriteLine(AppendWords("ABC", "DEF"));

               Console.WriteLine(AppendWords("ABC", "DEF", "GHI"));

               Console.ReadKey();

           }

           static string AppendWords(params string[] words)

           {

               if (words == null || words.Length == 0)

               {

                   return "{}";

               }

               StringBuilder builder = new StringBuilder();

               int counter = 0;

               int wordCountToAppend = 0;

               builder.Append("{");

               do

               {

                   builder.Append(words[counter]);

                   counter++;

                   wordCountToAppend = words.Length - counter;

                   if (counter >= 1

                       && wordCountToAppend >= 2)

                   {

                       builder.Append(", ");

                   }

                   else if (wordCountToAppend == 1)

                   {

                       builder.Append(" and ");

                   }

               } while (counter < words.Length);

               builder.Append("}");

               return builder.ToString();

           }

       }

  • My first idea was a solution a lot like Jon Skeet's.  But I ended up with something inspired by my old Perl methods, where arrays acted like stacks, and the differences between nulls and undefineds and empty string were often sort of hand-waved away.

    Restating the problem (parts stolen from Jon):

    1) We always start with "{" and end with "}" (stolen from Jon, of course)

    2) If there's more than one item, join all but the tail item with commas, and append " and " + the tail item. (covers Eric's Cases 3 and 4)

    3) Otherwise, return the zero or one item in the list.  (covers Eric's Cases 1 and 2)

    with some reorganizing of code, taking advantage of the fact that an array join on an array with one item does what we want it to do here:

    static string Joiner(IEnumerable<string> strings)

    {

       var stack = new Stack<string>(strings);

       string last = String.Empty, rest = String.Empty;

       if (stack.Count > 0)

       {

           last = stack.Pop();

           if (stack.Count > 0)

               rest = String.Join(", ", stack.ToArray()) + " and ";

       }

       return '{' + rest + last + '}';

    }

  •    private string BuildString(IEnumerable<string> list)

       {

         StringBuilder sb = new StringBuilder("{");

         string values = string.Join(",", list.ToArray());

         int pos = values.LastIndexOf(',');

         if(pos > -1)

         {

           values = values.Remove(pos, 1);

           values = values.Insert(pos, " and ");

         }

         sb.Append(values);

         sb.Append("}");

         return sb.ToString();

       }

  • So my solution was to just join the strings with a comma, and then find the last comma and replace it with an " and ". This satisfies the requirements as stated in the original question.

  • using System.Collections.Generic;

    using System.Linq;

    namespace CommaQuibbling

    {

       internal class Translator

       {

           public string Translate(IEnumerable<string> items)

           {

               return "{" + Join(items) + "}";

           }

           private static string Join(IEnumerable<string> items)

           {

               var leadingItems = LeadingItemsFrom(items);

               var lastItem = LastItemFrom(items);

               return JoinLeading(leadingItems) + lastItem;

           }

           private static IEnumerable<string> LeadingItemsFrom(IEnumerable<string> items)

           {

               return items.Reverse().Skip(1).Reverse();

           }

           private static string LastItemFrom(IEnumerable<string> items)

           {

               return items.Reverse().FirstOrDefault();

           }

           private static string JoinLeading(IEnumerable<string> items)

           {

               if (items.Any() == false) return "";

               return string.Join(", ", items.ToArray()) + " and ";

           }

       }

    }

  • OK, writing from Australia, which is why I'm so behind everyone else. I haven't looked at other people's solution, and I'm sure I'm just replicating everyone else's code, but here goes:

    /// <summary>

    /// Formats an enumeration of strings using commas as

    /// dictated by English grammar rules.

    /// </summary>

    /// <example>

    /// {} -&gt; "{}"

    /// {"ABC"} -&gt; "{ABC}"

    /// {"ABC", "DEF"} -&gt; "{ABC and DEF}"

    /// {"ABC", "DEF", "G", "H"} -&gt; "{ABC, DEF, G and H}"

    /// </example>

    /// <param name="strings">Enumeration of string to join</param>

    /// <returns>A comma seperated list of the strings with the last

    /// two elements seperated by 'and'</returns>

    public string EnglishJoin(IEnumerable<string> strings) {

    Queue<string> stringQueue = new Queue<string>();

    StringBuilder sb = new StringBuilder();

    sb.Append("{");

    foreach (string s in strings) {

    stringQueue.Enqueue(s);

    if (stringQueue.Count > 2) {

    sb.Append(stringQueue.Dequeue());

    sb.Append(", ");

    }

    }

    // Last two string seperated by ' and '

    if (stringQueue.Count > 0) {

    sb.Append(stringQueue.Dequeue());

    if (stringQueue.Count > 0) {

    sb.Append(" and ");

    sb.Append(stringQueue.Dequeue());

    }

    }

    sb.Append("}");

    return sb.ToString();

    }

  • Why not do it backward? Read the enumerable into a stack and build the output keeping an index:

    index == 0 : result = item

    index == 1 : result = item & " and " & result

    index > 1 : result = item & ", " & result

    Actually there is no point in lazy enumeration, as a complete result requires the whole range to be enumerated and a partial result doesn't seem to be worth much.

    However, the stack in this solution requires memory allocation of (size of the enumerable * size of an object reference). To avoid this one can apply a peeking mechanism to find out when one is at index last - 2.

    something like

    result.Append("{");

    if (e.MoveNext())

    {

       result.Append(e.Current);

       if (e.MovNext()) {

           last = e.Current;

           while (e.MoveNext())

           {

               result.Append(", ");

               result.Append(last);

               last = e.Current;

           }

           result.Append(" and ");

           result.Append(last);

       }

    }

    result.Append("}");

Page 5 of 19 (277 items) «34567»