Download Spellcheck on the VS Gallery. Get the source on github.
As I wrote about in my series on Markdown Mode, one of the features I've missed from vim and many other IDEs is spell check, both in normal code comments (from when I used Eclipse) and in writing plaintext or other mostly-text formats (in vim).
Roman, the editor QA lead, worked on a project back at the beginning of Beta 2 that does just this, but the extension was temporarily lost/delayed due to how busy we've been ever since. A week or so back, Chris Granger (a PM on our team) posted the extension source on the VSX samples page. I grabbed the code almost immediately, but noticed a few performance issues; since I'm running VS under VMware Fusion on a 2GHz Core 2 Duo Macbook, things that may be a little bit slow on beefier machines can be very slow on this machine. Also, Roman's primary use case was testing out the extension with code comments, whereas my primary use (in conjunction with my Markdown extension) is spell checking entire files.
Roman was nice enough to let me work on this extension, and also nice enough to let me post it to the VS Gallery, so many thanks to him.
This extension offers spell checking in the same way as many other applications – misspelled words show up with a red underline at some delay after typing (so that every partial word you type doesn't appear as a spelling mistake). To match my other extensions and my experiences writing this type of behavior, I've settled on my normal "500ms of continuous idle", meaning the spell check results will update 500ms after the last character you type, and not while you are typing. Of course, if you pause in the middle of a word and that partial word isn't found in the dictionary, you'll get an underline. Fixing the word (and then waiting 500ms) will make the squiggle disappear.
The extension is basically split out into four parts: a component that knows what pieces of text need checking (exposed as an ITagger<NaturalTextTag>), the spell checker (an ITagger<IMisspellingTag>), the squiggles (an ITagger<SquiggleTag>, which will be an ITagger<IErrorTag> when the product ships), and the smart tags (an ITagger<SmartTag>).
ITagger<NaturalTextTag>
ITagger<IMisspellingTag>
ITagger<SquiggleTag>
ITagger<IErrorTag>
ITagger<SmartTag>
This component understands what pieces of a file are "natural" text. It's limited to two different implementations: one for "code" files (files with the code ContentType), and one for "plaintext" files. The code implementation consumes classification information in the file to find all comments and strings, and the plaintext implementation just returns a tag for the entire file. Here's the CommentTextTagger.cs, since it is the only interesting one.
ContentType
This component is intended to be an extensibility point for other components to use as well, though we don't provide a good framework for that. A few people have thought about it, but nobody decided on a best practice (use an MSI to place the definition assembly in a known place?).
If it was extensible, I'd add a tagger to the Markdown Mode extension to produce NaturalTextTags over everything in the file except for URLs and HTML tag content, since those tend to show up as misspellings, and I don't really care that "aspx" isn't found in the dictionary. As it is, I just made the markdown content type use plaintext as one of its base types, which gets the "check the entire file" behavior.
NaturalTextTag
markdown
plaintext
This tagger is consumed by the spell checker via an ITagAggregator<NaturalTextTag>, so the spelling tagger doesn't know anything beyond that about what to spell check or what types of files it is operating on. This sounds more complicated than it is, but organizing your extension like this (different tagger implementations that can consume each other and have specialized purposes and knowledge) turns out to be a fairly effective way at keeping everything segmented cleanly.
ITagAggregator<NaturalTextTag>
This is, by far, the most complicated portion of the extension. Here's the source for SpellingTagger.cs.
Without getting too much into the details, the general way the tagger works is:
NormalizedSnapshotSpanCollection
TagsChangedEvent
The truly ugly part is step #3. Since the .NET Framework doesn't have a good way to do spell checking, and I didn't want to mess with the licensing and trouble of finding some other spell checking library to use, I use a WPF TextBox to do the spell checking.
TextBox
Take a deep breath, it'll help.
It is truly horrendous to have to do this, and the result is both a) immensely costly, b) thoroughly ugly, and c) means you can't use the ThreadPool, because TextBox can only be used on an STA thread (and thread pool threads are MTA).
ThreadPool
STA
MTA
Spell checking a document of this size, by sticking the entire document directly in the TextBox, takes about 15-20 seconds on my laptop, pegging the CPU at 100% the whole time.
To work around this, the tagger does a few things:
BelowNormal
Bleh.
Anyways, this tagger produces IMisspellingTag, which contains just a list of suggestions (as strings).
IMisspellingTag
string
This one is incredibly simple (SquiggleTagger.cs): it just returns the results of calling into its ITagAggregator<IMisspellingTag>, and forwards on the TagsChanged events from that aggregator to its own TagsChanged event.
ITagAggregator<IMisspellingTag>
TagsChanged
This one (SpellSmartTagger.cs) is almost as simple as the squiggle tagger, though there is a bit of extra work around creating ISmartTagAction implementations. For the purposes of this extension, there are two: one type of action for suggested spellings (and one action returned for each smart tag for each suggestion), and one type of action for "Ignore All" (and always exactly one of these per smart tag session).
ISmartTagAction
If the user selects a suggestion, then the smart tag just replaces the SnapshotSpan that it got from the ITagAggregator<IMisspellingTag> with the suggestion string. If the user selects IgnoreAll, the smart tag actions calls into a service (unmentioned until now) that's sole purpose it to maintain a list of ignored words and write/load them to/from disk for persistence. This service is consumed by the spell checker (so it can know what to ignore) and this component (so it can add new words to the ignore list) only.
SnapshotSpan
IgnoreAll
There's a bug in Beta 2 that shows up if you create a custom tag type (your own object that inherits from ITag); this sample does this for the NaturalTextTag and IMisspellingTag. Because the export for an ITaggerProvider requires a TagType attribute, which stores the actual type of the tag (e.g. [TagType(typeof(NaturalTextTag))]), the MEF cache gets a bit confused and angry when it tries to load that metadata if the assembly the type comes from hasn't been loaded yet.
ITag
ITaggerProvider
TagType
[TagType(typeof(NaturalTextTag))]
You can work around this in Beta 2 by adding a pkgdef file that basically forces your module to load at startup; see SpellChecker.pkgdef (also, that GUID is just a random GUID; it doesn't match up with any other value).
I believe this is fixed post Beta 2, so it won't be a worry for much longer.
The final result is acceptable, and I've already been greatly happy with the result in writing the last few blog articles. The performance is still pretty crappy overall, and the mechanism of using a TextBox sets my teeth on edge, but it doesn't really detract from the value overall. It did get more useful over time, as I added more and more words to the ignored list.
Also, one of the more recent "features" was to make the spell checker skip words in CamelCase (either PascalCase or mixedCase), since these words generally refer to types. I also tried out a version that spell checked the individual "words" in CamelCase word-combinations, but the correction options for that got a bit complicated. Michael, another developer on the editor team, had some good ideas for what the behavior should be, but I chickened-out at the thought of getting that behavior exactly right when I really don't find myself needing it.
I find that organizing the code into taggers like this fits well with my mental model of how to piece these together (it probably should, since I wrote the original implementation of tagging, though it's gotten love from quite a few people since then). It feels somewhat UNIX-y to me: separate pieces of logic understand how to do only one thing and communicate with other tools by a well-defined (and usually very simple) channel. I'm sure that could be used to describe a lot of things, but I tend to associate that with UNIX tools and communicating via plain text over pipes.
So, if you get a chance, please try this out! Performance isn't expected to be great, though the truth is that I don't notice it that much anymore, besides the initial and still somewhat painful parse.