In the previous post, I’ve come up with this interview question:
In a given .NET string, assume there are line breaks in standard \r\n form (basically Environment.NewLine). Write a method that inserts a space between two consecutive line breaks to separate any two line breaks from each other.
In a given .NET string, assume there are line breaks in standard \r\n form (basically Environment.NewLine).
Write a method that inserts a space between two consecutive line breaks to separate any two line breaks from each other.
Update: Rik Hemsley has posted an absolutely fantastic summary in form of a Visual Studio solution with all the answers from the comments, unit-tests and benchmarks:
http://code.google.com/p/kirill-question/
Jordan Terrell’s RegExp-based solution seems to be the fastest one.
Thanks Rik! This is very helpful!
First I’ll explain how I came up with the problem. I was writing some code that generated HTML, specifically, some text inside the <PRE> tag, interspersed with <BR /> tags. I’m not an expert in HTML, that’s why it came as a surprise to me that if you have multiple consecutive <BR /> elements, the browser will only render one line break (well, at least the browser I checked it on - IE8). My code generated the <BR /> tags from Environment.NewLine substrings in the source text. So, to preserve empty lines in the resulting HTML, I had to intersperse the line breaks with at least one space – then IE would keep the empty lines when rendering <PRE> contents. I didn’t keep direct line breaks because some HTML viewers that I care about don’t render them correctly.
Anyway, I’ve gotten a lot of great responses that contain a lot of food for thought. Mostly I was amazed how the responses actually reflected my own thinking as I was coming up with a solution. My ultimate solution (which was also correctly suggested by ram) was this:
public static string IntersperseLineBreaks(string source) { string result = source.Replace("\n\r", "\n \r"); return result; }
A temporary local variable here is for your debugging convenience until my team comes up with a way to inspect the return value of a method in the debugger.
Of course, banko correctly noted that the problem is stated ambiguosly, since Environment.NewLine is “\r\n” on Windows, and “\n” on Unix. Besides, as Jordan Terrell mentions, the behavior is unclear on source texts that contain wild mixes like “\r\n\r” and “\r\n\n”. Jordan also suggests that I include unit-tests to specify correct solutions.
Well, I apologize for a problem statement that is not specified well enough (I simply didn’t have time to be that thorough!). I didn’t do it on purpose, but now when I think about it, most real-world requirements are ambiguous and not well specified. It is always interesting to see how a candidate deals with ambiguity and unclear requirements. So be prepared to expect unclear problem statements in your interviews and deal with them gracefully.
There are several good strategies to impress the interviewer in this case, the simplest being to ask clarifying questions in the places where you think the requirements are unclear. Let them know that you are trying to explicitly understand and model the requirements in your head and are filling in the gaps.
Another approach is to explicitly identify and parameterize the variability in the requirements and give a generic solution that works in all cases. Finally, if the problem statement is unclear, you can totally assert, assume or demand requirements that make your life easier. For instance, I remember I said “let’s assume that C# has tuples” during my interview at Microsoft. The interviewer laughed and said OK. I produced a nice solution that used tuples, and then he asked how would I change it to not use tuples. But that’s a totally different question ;)
If I were an interviewer (and I’ve never been one yet) and I would happen to ask this question, I guess I would look at several things in answers:
So I guess it’s impossible to declare a single winner because different solutions win when different assessment criteria are used. In terms of readability, getting things done and reusing as much as possible, I like ram's solution (and mine :P), even if it won’t work in certain scenarios (I’m OK with making assumptions about the source text). In terms of algorithm implementation, other solutions are certainly much better, those that use a StringBuilder or build the resulting array char-by-char. Finally, in terms of expressiveness I really liked some declarative solutions (using LINQ), even if they’re less effective (e.g. use string concatenation).
One solution deserves some special explanation since it nicely demonstrates the core of the problem. The Replace method continues matching substrings after the end of the replacement, so if your replacement patterns overlap, it won’t replace the second one:
To overcome this problem, some people suggested the following:
while (str.Contains("\r\n\r\n")) str = str.Replace("\r\n\r\n", "\r\n \r\n");
Note that this loop will never run more than twice (this is where I goofed myself). GSEJ corrected me, pointing out that only two passes are necessary – first pass leaves a maximum of two consecutive line breaks, second pass eliminates them all.
However we notice that in the replacement pattern (“\r\n\r\n”, “\r\n \r\n”), the start and the end are the same (hence the first and last chars overlap with previous and next occurrences of the pattern). The pattern is redundant, so we can make it shorter and only replace the crux of the pattern: Replace(“\n\r”, “\n \r”):
This pattern is non-redundant and doesn’t overlap with the neighbors, hence no second pass required.
Well, anyway, thanks everyone for your answers! It was truly interesting for me and hopefully educational and entertaining for some of you.
you can speed up Jordan's by precompiling the RegEx and caching it in a static:
static readonly Regex _re = new Regex (@"(\r\n)(?=\r\n)", RegexOptions.Compiled);
public override string InsertSpaceBetweenCrLfs(string input)
{
return _re.Replace (input, "$1 ");
}
however, this is nearly twice as fast (on my box):
public override string InsertSpaceBetweenCrLfs (string input)
// check for null
if (input == null)
throw new ArgumentNullException ("input");
int cch = input.Length;
// simple boundary condition, helps later
if (cch < 4)
return input;
// pre-allocate space
var sb = new StringBuilder (cch * 3 / 2 + 1);
// handle boundary condtion later, instead of in loop
cch -= 3;
int ichStart = 0; // the start of the current span
int ichEnd = 0; // the end of the current span
while (ichEnd < cch)
int ch = input [ichEnd++];
// check for double newlines, won't overflow
if (ch == '\r' && input [ichEnd] == '\n' && input [ichEnd + 1] == '\r' && input [ichEnd + 2] == '\n')
// copy the previous span
sb.Append (input, ichStart, ichEnd - ichStart);
// add a space
sb.Append (' ');
// start the new span
ichStart = ++ichEnd;
// copy the remaining characters
sb.Append (input, ichStart, cch - ichStart + 3);
return sb.ToString ();
Kirill, apologies for spelling your name incorrectly initially. I hate it when people do that to me!
I've put the code here: http://code.google.com/p/kirill-question/
PiersH, your solution isn't as fast as the the regular expression version, on my test data, which is simply 2MB of Lorem Ipsum. I chose this data as it seemed likely it would be similar to real world scenarios.
I've also added the caching to Jordan's version and, as expected, it speeds up somewhat. In normal code, I generally don't bother caching regular expressions, as they tend not to be (de-)allocated in a tight loop, but my artificial benchmark does this, so we get to see the difference caching can make.
@Rik, i noticed a little copy/paste bug here:
AddBenchmark("PiersH2", () => new PiersH().InsertSpaceBetweenCrLfs(text), benchmark);
once fixed, i get these results:
Name Milliseconds
Ddkk 17.51445000
JaredParsons 1828.32110000
JordanTerrell 19.67060196
JordanTerrellCached 17.04021525
KooKiz2 20.55781633
PiersH 49.96797619
PiersH2 9.71894175
Robert2 20.53869184
Rik: no problem at all! I really appreciate you summarizing the solutions and giving us a way to benchmark and compare them. Some really interesting results there!
PiersH2: oh wow, it's awesome that you were able to bring it down to 9.7
PiersH: I fixed my code to call your updated solution, but I don't see it running that fast. Do you have some updated code?
It also fails the unit tests.
Example: "\r\n\r\n\r\n"
Expected: "\r\n \r\n \r\n"
But was: "\r \r \r\n"
Ddkk 14.25188310
fholm 59.60957647
JaredParsons 1222.55470000
JordanTerrell 13.11744545
JordanTerrellCached 10.87201828
KooKiz2 16.34176129
PiersH 49.39638571
PiersH2 20.95233333
Robert2 16.28622097
Warturtle 47.26178636
Cool.
By the way, what tool do you use for making the image?
It looks great.
Thanks Benjamin :) I use PowerPoint (2010) SmartArt
I get similar timings to Piers. (In Release build (x64))