Comma Quibbling

Comma Quibbling

Rate This

[UPDATE: Holy goodness. Apparently this was a more popular pasttime than I anticipated. There's like a hundred solutions in there. Who knew there were that many ways to stick commas in a string? It will take me some time to go through them all, so don't be surprised if it's a couple of weeks until I get them all sorted out.]

The point of Monday’s post about comma-separated lists was not so much about the actual problem; it’s a rather trivial problem. Rather, I wanted to make two points. First, stating the actual problem rather than a much harder and more general version of the problem is likely to get you a realistic solution to your actual problem much faster. And second, reworking the statement of the problem into an equivalent but structurally different statement is a great way to see solutions that you might have otherwise missed.

But whenever I make a post illustrating such points with a specific example, lots of people pipe up with their ideas for how to solve the specific example. Which is awesome; I encourage this behaviour.

So in that spirit, here’s a slightly harder version of the string concatenation problem, just for the fun of it. Write me a function that takes a non-null IEnumerable<string> and returns a string with the following characteristics:

(1) If the sequence is empty then the resulting string is "{}".
(2) If the sequence is a single item "ABC" then the resulting string is "{ABC}".
(3) If the sequence is the two item sequence "ABC", "DEF" then the resulting string is "{ABC and DEF}".
(4) If the sequence has more than two items, say, "ABC", "DEF", "G", "H" then the resulting string is "{ABC, DEF, G and H}". (Note: no Oxford comma!)

I think you get the idea. You can post your solution in the comments or use the link on the blog page to email your solution to me.

The strings in the sequence can be assumed to be non-null but can otherwise be any string value, including empty strings or strings containing commas, braces and "and".

There’s no size limit on the sequence; it could be tiny, it could be thousands of strings. But it will be finite.

All you get are the methods of IEnumerable<string>; if you want to make that thing into a list or an array, you’re going to need to do that explicitly rather than casting it and hoping for the best.

I am particularly interested in solutions which make the semantics of the code very clear to the code maintainer.

Of course, C# is most interesting to me, but if there are neat ways to express this in other languages, I’d love to see them too.

If there are any particularly amusing or interesting implementations I’ll dissect them on the blog in a future episode, probably in a week or so. I’m not going to have time to do a detailed analysis of every one.

And… go!

• Great solutions. But If I have to pick one for readability alone (ignoring efficiency), I will pick Oliver's solution. His code is shortest and crystal clear in what it does.

• -- Haskell one pass solution

--

-- Problem restatement (skipping empty and one word case)

-- *) first word is alone

-- *) last word has " and " prefix

-- *) every other word has ", " prefix

--

-- Tail recursion optimization should be possible and stack space shouldn't be a problem

quibbler :: [String] -> String

quibbler words = "{" ++ (quibHelper words True) ++ "}"

where

quibHelper :: [String] -> Bool -> String -- where the Bool == True iff the function is called for the first time

quibHelper [] True = ""

quibHelper [] False = undefined -- kind of like assert

quibHelper [onlyWord] True = onlyWord

quibHelper [lastWord] False = " and " ++ lastWord

quibHelper (x:xs) True = x ++ (quibHelper xs False)

quibHelper (x:xs) False = ", " ++ x ++ (quibHelper xs False)

-- my Haskell-fu is weak, I'm sure one can do better than that

• Just for laughs, a Mathematica solution using pattern matching. Used like Foo[{"ABC", "DEF", "G", "H"}]

Foo[strings_] := StringJoin["{", strings, "}"]

Foo[{most__, last_}] := StringJoin["{", Riffle[{most}, ", "], " and ", last, "}"]

• Wish I had time to read all the submissions.  Some really interesting ones, particular the use of a queue (wish I'd thought of that).  Anyhow, here's another compact explicit one-pass .NET 2.0-compat entry with a slight twist in that it retains and relies on the last item's position in the string for post-enum patch-up.  Seems clear to me. :)

static string LippertJoin(IEnumerable<string> items) {

StringBuilder sblist = new StringBuilder();

int lastItemPos = -1;

foreach (string item in items) {

lastItemPos = sblist.Length;

sblist.Append(item + ", ");

}

string list = sblist.ToString(0, ((lastItemPos == -1) ? 0 : sblist.Length - 2));

if (lastItemPos > 0)

list = list.Substring(0, lastItemPos - 2) + " and " + list.Substring(lastItemPos);

return "{" + list + "}";

}

• A Javascript solution (assuming an Array input) so Eric won't forget his roots:

function makeList(array)

{

var a = array.concat(), last = a.pop() || "";

return "".concat(

"{", a.length? a.join(", ")+" and " : "", last, "}"

);

}

Sure it could be made longer and clearer, but where's the fun in that?

• Actually, the last two code lines should have been:

quibHelper (firstWord:words) True = firstWord ++ (quibHelper words False)

quibHelper (middleWord:words) False = ", " ++ middleWord ++ (quibHelper words False)

(indeed, better named variables convey intention better)

• I have two. Once is pretty readable, but does not scale to huge lists. The other uses straight enumeration and doesn't take a local copy.

[Test]

public void TestConvertToList()

{

var input1 = new string[] { };

var input2 = new string[] { "ABC" };

var input3 = new string[] { "ABC", "DEF" };

var input4 = new string[] { "ABC", "DEF", "G", "H" };

var expectedOutput1 = "{}";

var expectedOutput2 = "{ABC}";

var expectedOutput3 = "{ABC and DEF}";

var expectedOutput4 = "{ABC, DEF, G and H}";

Assert.AreEqual(expectedOutput1, ConvertToBracedEnglishSentence_EfficientForLargeLists(input1));

Assert.AreEqual(expectedOutput2, ConvertToBracedEnglishSentence_EfficientForLargeLists(input2));

Assert.AreEqual(expectedOutput3, ConvertToBracedEnglishSentence_EfficientForLargeLists(input3));

Assert.AreEqual(expectedOutput4, ConvertToBracedEnglishSentence_EfficientForLargeLists(input4));

}

/// <summary>

/// This method assumes that the input collection is not huge.

/// If it is huge, creating a local copy of the collection

/// will cost time and memory.

/// This method is very readable.

/// </summary>

/// <param name="inputCollection"></param>

/// <returns></returns>

{

// Convert to fixed list so that we know the count

List<string> items = new List<string>(inputCollection);

StringBuilder sentence = new StringBuilder();

for (int i = 0; i < items.Count; i++)

{

// All items except the last two

if (i < items.Count - 2)

{

sentence.AppendFormat("{0}, ", items[i]);

}

// The second to last item

else if (i == items.Count - 2)

{

sentence.AppendFormat("{0} and ", items[i]);

}

// The last item

else

{

sentence.Append(items[i]);

}

}

// Add the braces around the result

return string.Format("{{{0}}}", sentence.ToString());

}

/// <summary>

/// This method does not require a local copy of the input

/// collection. That should make it faster and less

/// memory hungry for large input lists.

/// It is a but less readable, but still fairly clear:

///   For each item in the inputCollection,

///   if it's not the first item, append ", " (and remember where we put it), then

///   append the item to the sentence.

///   After all items have been added, add the closing brace "}"

///   If we inserted a comma, replace the last one with " and ".

///   Then return the sentence surrounded by braces.

/// </summary>

/// <param name="inputCollection"></param>

/// <returns></returns>

private static string ConvertToBracedEnglishSentence_EfficientForLargeLists(IEnumerable<string> inputCollection)

{

int indexOfLastCommaInsert = -1;

bool firstItem = true;

StringBuilder sentence = new StringBuilder();

sentence.Append("{");

foreach (string item in inputCollection)

{

if (!firstItem)

{

indexOfLastCommaInsert = sentence.Length;

sentence.Append(", ");

}

firstItem = false;

sentence.Append(item);

}

sentence.Append("}");

if (indexOfLastCommaInsert >= 0)

{

sentence.Replace(", ", " and ", indexOfLastCommaInsert, 2);

}

return sentence.ToString();

}

• Thanks for great posts, Eric!

As I saw this question, I came up with all kinds of efficient solutions, but variants of them have already been posted here.

So I tried to think of the most unusual solution in C# that no one else would think of.

My solution is inefficient and cryptic, it's presented only as a mind game, and of course I would never write this kind of code in my projects. Nevertheless - it is very cool (it's recursive!). See if you can understand why and how it works:

private static string JoinWords(IEnumerable<string> e)

{

IEnumerator<string> en = e.GetEnumerator();

en.Reset();

string dummy;

return "{" + (en.MoveNext() ? JoinRecursive(en, out dummy) : string.Empty) + "}";

}

private static string JoinRecursive(IEnumerator<string> en, out string sep)

{

string result = en.Current;

if (en.MoveNext())

{

string rest = JoinRecursive(en, out sep);

result += sep + rest;

sep = ", ";

}

else

{

sep = " and ";

}

return result;

}

• My serious attempt looked much like Olivier's, except with a "l.Take(l.Count() - 1)" in place of GetRange( ), which is probably inferior.  Then I started looking for the tersest possible solution.  I haven't been able to top this:

public string NOxfordComma_Silly(IEnumerable<string> l)

{

var i = 0;

return "{" + l.Reverse().Aggregate("", (a, b) => b + new[] {"", " and ", ", "}[Math.Min(i++, 2)] + a) + "}";

}

Obviously pretty crummy from an efficiency and readability standpoint, but it's always fun to find perverse misuses for the LINQ extension methods. :)

format :: [String] -> String

format l = "{"++(sep l)++"}" where

sep [] = ""

sep [a] = a

sep [a,b] = a++" and "++b

sep (a:l) = a++", "++(sep l)

• Some interesting solutions here, but nothing worthy of enterprise production code.  What's the deal... I don't even see any factories or XML!

Admittedly I don't have a lot of time, but here's my first pass at a real enterprise solution.  I'll add an XML-serving web service later. Should work with VS2008/.NET 3.5.

using System;

using System.Collections.Generic;

using System.Linq;

using System.Text;

namespace Scratch

{

class Program

{

static void Main(string[] args)

{

string[] input = { "ABC", "DEF", "G", "H" };

Console.WriteLine(EnumerableStringFormatter.FormatInput(input));

}

}

class EnumerableStringFormatter

{

public static string FormatInput(IEnumerable<string> input)

{

var words = new List<string>();

var separators = new List<string>();

var pairs = new List<WordFactory.WordSeparatorPair>();

// get words

foreach (var word in WordFactory.GetWord(input))

// get separators

foreach (var separator in WordFactory.GetSeparator(input))

// combine them

for (int i = 0; i < words.Count; i++)

// convert to a string

StringBuilder sb = new StringBuilder("{");

foreach (var pair in pairs)

sb.Append(String.Format("{0}{1}", pair.Separator, pair.Word));

return sb.Append("}").ToString();

}

class WordFactory

{

public static IEnumerable<string> GetWord(IEnumerable<string> input)

{

foreach (var s in input)

yield return s;

}

public static IEnumerable<string> GetSeparator(IEnumerable<string> input)

{

yield return "";

var a = input.Skip(1).TakeWhile(s => s != input.Last());

foreach (var s in a)

yield return ", ";

yield return " and ";

}

public class WordSeparatorPair

{

private string _word;

private string _separator;

public string Word { get { return _word; } }

public string Separator { get { return _separator; } }

public WordSeparatorPair(string word, string separator)

{

_word = word;

_separator = separator;

}

}

}

}

}

PS: I'm really not as familiar with LINQ as I'd like to be, so there are probably better^Wworse ways to do this.

• PPS: For what it's worth, here was my first blind attempt at solving the problem.  As with others, I usually prefer a simple and straightforward solution compared to a clever shorter one.  (However, I do enjoy reading sneaky clever code -- just not at work).

static string FormatEnumerableStrings(IEnumerable<string> input) {

var sb = new StringBuilder("{");

var strings = input.ToList();

int count = strings.Count;

if (count > 0) {

sb.Append(strings[0]);

for (int i = 1; i < count - 1; i++) {

sb.Append(", ");

sb.Append(strings[i]);

}

if (count > 1) {

sb.Append(" and ");

sb.Append(strings[count - 1]);

}

}

return sb.Append("}").ToString();

}

• I went for as terse as possible with Linq. I don't preserve the order, but I don't think that was a requirement. It is also fairly effecient as it only revisits the first couple items.

static string FormatStrings(IEnumerable<string> strings)

{

return string.Format("{{{0}}}", string.Concat(

strings.Skip(2).Select((x) => x + ", ").Concat(

strings.Skip(1).Take(1).Select((y) => y + " and ")).Concat(

strings.Take(1).Select((z) => z)).ToArray()));

}//method

• I realized that given the rules the order probably does matter, in which case this version, which is much less efficient but almost equally terse, works.

static string FormatStrings(IEnumerable<string> input)

{

var reordered = input.Reverse().Take(2).Concat(input.Reverse().Skip(2).Reverse());

return string.Format("{{{0}}}", string.Concat(

reordered.Skip(2).Select((x) => x + ", ").Concat(

reordered.Skip(1).Take(1).Select((y) => y + " and ")).Concat(

reordered.Take(1).Select((z) => z)).ToArray()));

}//method

• Minor correction for Hristo's F# pattern matching solution, which breaks for lists of odd length > 1:

let format (words : list<string>) =

let rec makeList (words : list<string>) =

match words with

| [] -> ""

| first :: [] -> first

| first :: second :: [] -> first + " and " + second

| first :: rest -> first + ", " + (makeList rest)

"{" + (makeList words) + "}"

Page 4 of 19 (277 items) «23456»