The time has come to summarize the features added in RichEdit 8, which shipped with Windows 8 and Office 2013. Since so much was added, I wrote a number of blog posts over the last twelve months about the larger RichEdit 8 features. The present post lists those features and then describes some smaller features included in RichEdit 8. Two large features, the Text Object Model Version 2 (TOM 2) and the Windows RT TOM don’t have separate posts since they’re described in detail in MSDN. In spite of these other posts, this post is bigger than usual. Features added in previous versions of RichEdit are described in RichEdit Versions 1.0 through 3.0, RichEdit versions, and RichEdit Versions Update to 7.0.
RichEdit Spell Checking
TOM Version 2
TOM Table Interface
Windows RT TOM
Windows 8 RichEdit
Windows Phone 8 RichEdit
Windows RT Font Binding
Character Flags Class
DWrite Font Fallback
Font Style, Weight, Stretch
Default Preferred Font Table
BCP-47 Language Tag Support
More List Types
RTL Math Prototype
Generic LineServices Math Callbacks
Office 2013 has undergone a substantial shift to a relatively new display facility, Direct2D, and a new text facility, DirectWrite. These are the display facilities that are used on Windows Phone 8, the new Windows RT slates, and optionally on Windows 7 & 8. For further info, see this post.
This post describes a couple of performance improvements: 1) a more efficient display tree, and 2) a faster rich-text formatting mechanism.
This post describes how RichEdit 8 was enhanced to access the Windows 8 spell-checking and autocorrection components directly.
This post describes how the RichEdit selection and grippers work on Windows 8 touch devices
This post describes an implementation of Microsoft UI Automation (UIA) that exposes most objects in a RichEdit instance. This is done via the UIA Text Pattern and includes basic character and paragraph formatting, images, OLE objects, math zones, tables, hyperlinks, and the text inside or associated with these objects.
This post describes how RichEdit 8 uses the Windows Imaging Component (see also) to provide image support for jpg’s, png’s and gif’s.
Over the years, the basic edit control has grown in size to accommodate greatly increased functionality. So now, even a plain-text, single-line control is pretty large. Clients can benefit from “flyweight RichEdit controls” which are stored in RichEdit stories (ITextStory, part of TOM 2) and share the properties of the parent ITextServices. RichEdit uses scratch stories internally for math build up/down, to convert MathML into the internal math representation, and to copy rich text when the text in the original copy selection gets changed. And there’s the main flyweight story that’s used by default. So all clients benefit from flyweight controls even those that don’t use the controls explicitly via the ITextStory interface. For more info, see this post.
Emoji characters posed special challenges due in part to the Unicode unification of 107 Emoji characters with existing characters in the BMP and in part to the 11 keycap Emoji for #, 0, …, 9, which use the U+20E3 keycap combining mark. The original plan was to use the Segoe UI Symbol font for all Emoji, but this font choice is ambiguous for the unified Emoji. RichEdit 8 uses Segoe UI Symbol for all Emoji except the double exclamation mark (U+203C: ‼), which uses the current font if it has this character.
If an ambiguous Emoji character is followed by one of the BMP “Emoji” variation selectors U+FE0E and U+FE0F, RichEdit treats it as an Emoji character. U+FE0E specifies that the character should be rendered using a standard Emoji-capable font, e.g., Segoe UI Symbol, whereas U+FE0F implies that special Emoji rendering should be used. This special rendering is specified, in principle, in a higher-order protocol, but RichEdit 8 doesn’t have such a protocol. For more info, see this post.
Variation selector sequences posed challenges in both the user interface and in font selection. Such sequences consist of a base character, either in the BMP or a surrogate pair, followed by a variation selector, which can also be in the BMP (U+FE00..U+FE0F), or a surrogate pair (U+E0100..U+E01EF). The keyboard arrow, Delete and Backspace keys need to treat a VS sequence as a single character. Font binding is tricky, since currently only special fonts have support for VS sequences. Initially we tried font binding the U+E0100..U+E01EF variation selectors to a Japanese font, since the only usage at the time is in Japan. But this was changed to use the font of the base character, since it’s likely that China will define some VS sequences as well. It’s important that the variation selector is in the same character format run as its base character. See also the Emoji entry above which mentions how VS sequences can help denote how to render emoji characters.
The Text Object Model (TOM) Version 2 adds the interfaces ITextDocument2, ITextSelection2, ITextRange2, ITextFont2, ITextPara2, ITextStoryRanges2, ITextStrings, ITextStory, and ITextRow. The complete TOM model is defined by tom.idl, which includes tom1.idl. The interfaces are all documented in MSDN.
This post describes the ITextRow table interface which allows you to insert tables, examine tables and to perform table manipulations, such as inserting, deleting and resizing table columns. Along with the ITextRange Move methods, the ITextRow methods give complete control over RichEdit’s nested table facility.
The Windows RT Text Object Model gives the Windows RT RichEditBox a TOM-like object model. The Windows RT TOM is a subset of the full TOM2 interfaces. It has the following interfaces, all in the Windows.UI.Text namespace: ITextDocument, ITextSelection, ITextRange, ITextCharacterFormatting, ITextParagraphFormatting, and ITextConstantsStatics. The first five of these interfaces delegate to the TOM2 ITextDocument2, ITextSelection2, ITextRange2, ITextFont2, and ITextPara2, respectively. The large TOM2 enum of values is broken into a set of enums each oriented towards a particular feature. The Windows.UI.Text.idl file defines the interfaces and enumerations.
For the new immersive environment on tablets and on the Windows Phone 8, not only are GDI and Uniscribe absent, so are the functions handled by the venerable user.dll. That program library includes the Windows functions SendMessage, MessageBox, CreateWindow, etc. A version of RichEdit 8 has been created for the immersive environment. All instances are windowless and use D2D/DWrite for measuring and rendering. The client can still send RichEdit messages via the ITextServices::TxSendMessage() method. The advantage of dropping the traditional user.dll is the relative simplicity of the model. But doing so omits significant functionality, at least in the initial version. At the same time, the touch functionality is dramatically improved on Windows 8 and in the immersive environment. The immersive version of RichEdit 8 is used by the Windows Store OneNote.
The Windows 8 RichEdit is mostly a subset of the Office RichEdit 8. The features that are included are documented in MSDN. Here’s a list of omitted features:
The Windows Phone 8 RichEdit is based on the standard RichEdit 8 code base, rather than on the earlier WinCE version. A combination of makefile and conditional compilation instructions control various differences.
The default preferred font table was modified to correspond to the fonts on the phone. Related to this is the need to have a table of fonts to use when a file specifies a font on the phone, but not on the desktop. There’s also a last minute font fallback for East Asian scripts. Many changes from the phone teams have been back ported into the main RichEdit code base.
For Windows RT, a special font callback interface, IProvideFontInfo, is defined that is used to replace RichEdit’s built-in font binding with Windows RT’s font binding. A major reason for this replacement is to support Windows RT composite fonts. The IProvideFontInfo interface is obtained by calling ITextHost::QueryInterface for an IProvideFontInfo. It includes the GetRunFontFaceId() method, which returns a font ID given the current font, the font weight, stretch and style, the lcid, a pointer to input characters and character count to be used with the returned font, the current font ID, and an out parameter runCount that gives the character count that the returned font covers. Two known problems exist with this feature: 1) it doesn’t stamp the characters with a CharRep, and 2) the Windows RT font binder doesn’t understand mathematical text. The CharRep is important for BiDi and for font fallback. Hopefully these problems will be addressed in a future release. IProvideFontInfo should not be used in math zones, since font binding in math zones is quite tricky.
RichEdit font binding uses a character repertoire (CharRep) facility. Character repertoires are often the same as Unicode scripts, but they include other sets such as symbols and emoji. In previous versions of RichEdit, the character-repertoire flags and indices along with the functions that manipulated them were scattered around in several files. Furthermore the variables used had no more space for new character repertoires, such as emoji. Accordingly we needed to generalize the facility.
To this end, we collected the character flags functionality and associated defines into the CCharFlags class, which hides many details from calling code. We used it to add support for 15 new character repertoires bringing RichEdit up to date with the scripts that Windows 8 supports. The scripts added are: Symbol, Emoji, Glagolitic, Lisu, Vai, N’ko, Osmanya, PhagsPa, Gothic, Deseret, Tifinagh, Old Italic, Old Turkic, Bopomofo, and Cyrillic Ext B. More character repertoires can be added easily and, in fact, a number have been added in Windows 8.1.
If you specify the charset in creating a font, GDI will ensure that you get a font that handles that charset. Admittedly charsets cover only a subset of the world’s languages (no Indic, Syriac, etc.), but they do cover many important languages, notably Chinese, Japanese, and Korean (CJK). It’s really desirable to choose a font for Chinese characters that suits the user: Simplified Chinese, Traditional Chinese, or Japanese. Another trick is if a character is an end-user-defined character (EUDC) in the Unicode Private Use Area, GDI will ensure that you see a glyph by searching through possible EUDC fonts. These characters are not defined in the Unicode Standard, so you can’t use them reliably for text interchange. But they are popular in CJK locales and a given machine may have fonts with the glyphs that the user wants.
DWrite doesn’t offer such automatic font fallback. Accordingly to handle font fallback better on the DWrite code path, we pass down the current CharRep. This gives access to a default font that is likely to have the character glyphs when the current font does not. Code to handle EUDC for DWrite is included as well.
Windows 8 generalized its font attributes to have style, weight, and stretch. Actually GDI’s LOGFONT has always had font weight, but it hasn’t always been consistent about grouping font files that differ only by weight into a font family. For example, Windows 8 considers Arial Black to be the heaviest weight member of the Arial family rather than an independent font. The only change needed to handle font weight was to expose it in the RTF format with the \fweightN control word. Font style includes upright, italic, and oblique. GDI has always had upright and italic, and used oblique when italic is requested and no corresponding italic font is available. To handle explicit requests for oblique, we added the RTF control word \oblique and an attribute CEM_OBLIQUE. Font stretch didn’t have a representation in RichEdit’s character formatting or in RTF, so we added the RTF control word \fstretchN.
In RichEdit 7 and earlier versions, the default preferred font table is created at run time using a set of calls. The table is indexed by the charrep. In RichEdit 8, most entries are given in a convenient, explicit table. This change facilitated updating the entries to the Windows 8 preferences and creating a modified table for use on Windows Phone 8. There are two kinds of entry: user-interface (UI) and document. Plain-text instances use the UI entries and rich-text instances use the document entries unless the client has sent an EM_SETLANGOPTIONS message with the IMF_UIFONTS option.
RichEdit’s character formatting includes an LCID, which is being deprecated in favor of the BCP-47 language tags. In particular, Windows RT uses BCP-47 language tags as does the Windows RT Windows.UI.Text.ITextCharacterFormat LanguageTag property. We didn’t want to add a new method to ITextFont2, so we implemented the functionality in classic TOM by adding the flag tomLanguageTag for the ITextRange2::GetText2() and SetText2() methods. The approach uses the OS LCIDToLocaleName and LocaleNameToLCID functions. We also implemented a facility for converting BCP-47 strings that LocaleNameToLCID doesn't recognize into LCIDs for internal consumption. This facility doesn’t handle arbitrary BCP-47 tags, but it handles those used by Windows 8 that don’t have LCIDs (see following feature).
Support was added for seven new Windows 8 keyboards that don’t have LCIDs. This involves decoding the string returned by GetKeyboardLayoutName() for Myanmar, New Tai Lue, Tai Le, Ogham, Lisu, N’Ko, and PhagsPa and assigning internal LCID values for use with the RichEdit keyboard code.
Display and file support for 17 list types supported by Office Art, but not by earlier RichEdit versions were exposed for general RichEdit use.
Due to the large number of changes needed in RichEdit 8.0, we had very little time to improve the math editing and display facility. The main achievement was to support math on the D2D/DWrite code path. In addition, we added two hot keys related to alt+=, namely, ctrl+alt+= (build down) and ctrl+alt+shift+= (build up). We also support the traditional Word subscript and superscript hot keys (ctrl+= and ctrl+shift+=) in math zones. The equation-array equation numbering option was re-enabled and works with OneNote. Inline math function breaking was implemented. Two prototypes were developed: RTL math and math autocomplete. The latter offers a drop-down menu of possible completions for math autocorrect entries. This uses the built-in math autocorrect facility. For a shipping product, we’d want to use the external autocorrect facility.
Except in some Arabic locales, mathematical text is written “left to right” (LTR). For example in the expression , the plus is displayed to the right of the and the is displayed to the right of the plus. But in some Arabic locales, mathematical text is written right to left (RTL). For example instead of , one would see , although the letters would be Arabic, not Latin. RichEdit 8.0 has a prototype of RTL math, although this prototype is disabled, pending further testing and the availability of a released RTL upgrade to the Cambria Math font.
To understand RTL math, first consider what an LTR math zone is. This is what Word 2007 and the Office 2010 applications implement. It has RTL text whenever Arabic or standard Hebrew characters appear adjacent to one another. But all operators and other “neutral” characters are considered to be “strong LTR”, that is, they are displayed to the right of the character that precedes them. This can be quite different from a display that obeys the Unicode Bidirectional Algorithm (UBA). A sequence of digits is always displayed LTR, regardless of the character that precedes it even outside math zones and according to the UBA. Inside LTR math zones a sequence of digits is displayed to the right of the character that precedes it even if that character is Arabic. According to the UBA, a number following an Arabic character is displayed to the left of the Arabic character in both LTR and RTL paragraphs. Inside embedded normal text in a math zone, the usual rules for BiDi text are followed. Note that except for such text, the math-zone BiDi rules are much simpler than those of the UBA, which gets quite tricky in complicated scenarios.
In math RTL locales, all operators and most other “neutral” characters are considered to be “strong RTL”, that is, they are displayed to the left of the character that precedes them. In addition square roots are mirrored, so that the surd symbol √ is flipped relative to the vertical axis. Similarly integral signs are mirrored, although the circular arrows in contour integrals are not mirrored, since they pertain to the 2D complex plane, not the 2D text plane.
This prototype was carried out, in part, as a trial implementation for MathML 3.0, which includes attributes for RTL math zones. If the released Cambria Math is upgraded to support the glyphs for RTL math zones, we might be able to release the RTL-math functionality in a shipping version of RichEdit.
A request has often been made to have a generic implementation for the LineServices callbacks. To start down such a path and to make it easier for implementers to support LineServices math, we factored the 118 LineServices math callbacks out in a backing-store independent fashion so that any client can use them. Specifically, the callback code relies on eight routines, namely GetMathRunParameters(), GetMathFont(), SetMathFont(),GetDefaultMathFont(), GetMathDocProperties(), GetDefaultMathFont(), FDWrite(), FIdealLayout(). These routines provide math parameters so that RichEdit-specific structures are not referenced directly. This allows the callback code to be used by clients other than RichEdit, e.g., Internet Explorer (hint, hint!)
RichEdit autolink detection is controlled by the EM_AUTOURLDETECT message. If the lparam argument is NULL, the default scheme name list is used, which on Windows 8 is defined by the scheme string “callto:file:ftp:gopher:http:https: mailto:news:notes:nntp:onenote:outlook:prospero:tel:telnet:wais: webcal:”.
Alternatively in RichEdit 8, lparam can point to a null-terminated string consisting of one or more colon-terminated scheme names that supersede the default scheme name list. For example, the string could be "news:http:ftp:telnet:". The scheme name syntax is defined in the Uniform Resource Identifiers (URI): Generic Syntax document on The Internet Engineering Task Force (IETF) website. Specifically, a scheme name must start with an ASCII alphabetic and can be followed by a mixture of ASCII alphabetics, digits, and the three punctuation characters: ".", "+", and "-". The string type can be either char* or WCHAR*; RichEdit detects the string type automatically.
It’s more secure if passwords are encrypted while in memory. Then if they’re swapped out to disk, what’s on disk is indecipherable. To implement this, RichEdit 8 encrypts a windowless password control’s backing store. Accordingly any 16-bit value may appear in the backing store. This is a tricky operation, since many parts of the RichEdit code associate Unicode semantics with the character codes in the backing store. To prevent use of such semantics, we restricted passwords to be plain text and added conditions to isolate the remaining semantic associations. Since this is a sensitive subject, further details remain classified J
RichEdit has a smart ellipsis mode (EM_SETELLIPSISMODE, EM_GETELLIPSISMODE, etc.). The algorithm for choosing an optimal breaking point is implemented in LineServices.
One of the design goals is to maximize battery life. Accordingly the Office version uses coalescable timers (see SetCoalescableTimer function), which minimize the number of times a computer has to be awoken. This feature was implemented too late to get into the Windows 8 version of RichEdit (msftedit.dll).