Base types, Collections, Diagnostics, IO, RegEx…
I recently came across a bug in FileStream’s internal buffer where the behavior is non-intuitive, so I thought I would discuss it.
Filestream uses one internal buffer for both reading and writing (scary but memory efficient). Unfortunately, this means you need to understand the behavior to avoid some issues (especially for non-seekable streams for obvious reasons).
Note: Unless you specify a particular buffer size in the constructor, FileStream uses a default internal buffer size of 4K.
1) When you read ‘n’ bytes (where n < bufferSize), FileStream would attempt to read ‘bufferSize’ into its internal buffer. Upon successful read, it will return you only the ‘n’ bytes that you had asked for and cache the rest in its internal buffer. At this stage there is (bufferSize-n) amount of unread data left in the internal buffer. This bulk reading is desirable for most streams for performance reasons.
Of course, the obvious question is what happens when at this stage if you switch to writing?
Since FileStream uses the same buffer for both reading and writing, it has to somehow flush the reminder of the read buffer.
For seekable streams, this is not a big deal as you can seek back to appropriate position. But for non-seekable streams this internal buffering approach is less than desirable as you would loose the remaining data in the buffer. This was a bug in Everett which has been fixed in Whidbey Beta2. The fix was to simply avoid buffering data for non-seekable streams.
Let us look at how the above bug affects Pipe for a minute (which is probably one of the interesting case). Most pipes are one-way or uni-directional, meaning you either write or read using the opened handle, hence you would never run into this bug. However, there are some scenarios where you could have bi-directional or two-way pipes (ex, duplex NamedPipe) and you could very easily get into a situation where you would loose data.
When FileStream asks for ‘bufferSize’ bytes (and let us assume only ‘n’ bytes are available), it doesn’t necessarily mean that it is going to block. Remember that OS can choose to return anywhere from 0 to ‘bufferSize’ bytes. I’m not saying it always happens in practice as opposed to blocking for further data. Perhaps if you are having some issues with Pipes, this could be one of the problems.
In the mean time, let us see how you can work around this issue.
2) When you read ‘n’ bytes (where n > bufferSize), FileStream would be smart enough to by pass the internal buffer and instead attempt to read ‘n’ bytes directly into your buffer. Perhaps, you could set the ‘bufferSize’ to a lower value than your desired read size and avoid running into the above bug.