em dash, en dash, dash, dash, dash...

em dash, en dash, dash, dash, dash...

  • Comments 1

Some people have noticed that you can paste examples out of Word documents directly into a PowerShell session. Given all of the typographic tricks that Word does, this is actually much harder than it sounds. Here’s what we do. There's a piece of code in the interpreter that takes each of the possible characters and maps it into the canonical representation for that character. So - an em-dash ([char] 0x2014) or an en-dash ([char] 0x2013) become a simple dash (0x02d). There are also predicate functions that return true it the character is a single quote, double quote or a dash. The code is (approximately):

 

public const char enDash = (char)0x2013;

public const char emDash = (char)0x2014;

public const char horizontalBar = (char)0x2015;

// left single quotation mark

public const char quoteSingleLeft = (char)0x2018;

// right single quotation mark

public const char quoteSingleRight = (char)0x2019; 

// single low-9 quotation mark

public const char quoteSingleBase = (char)0x201a;  

// single high-reversed-9 quotation mark   

public const char quoteReversed = (char)0x201b; 

// left double quotation mark

public const char quoteDoubleLeft = (char)0x201c;  

// right double quotation mark

public const char quoteDoubleRight = (char)0x201d; 

// low double left quote used in german.

public const char quoteLowDoubleLeft = (char)0x201E;

 

public static bool IsDash(char c)

{

    return (c == enDash || c == emDash || c == horizontalBar ||

        c == '-');

}

public static bool IsSingleQuote(char c)

{

    return (c == quoteSingleLeft || c == quoteSingleRight ||

        c == quoteSingleBase || c == quoteReversed || c == '\'');

}

public static bool IsDoubleQuote(char c)

{

    return (c == '"' || c == quoteDoubleLeft ||

        c == quoteDoubleRight || c == quoteLowDoubleLeft);

}

public static bool IsQuote(char c)

{

    return (IsSingleQuote(c) || IsDoubleQuote(c));

}

 

Of course it’s not just Word that we want to support. We want to provide reasonable support for arbitrary applications (within the limitations of the console host for now) so if anyone sees anything we missed, please let me know.

 

Now, for the trivia folks in the audience who want to know what an en is, from encarta:

em dash (plural em dash·es)
noun 
Definition:
long dash: in printing, a dash that is one em long
 
en dash (plural en dash·es)
noun 
Definition:
dash one en long: in printing, a dash that is one en in length
 
en [ en ] (plural ens)
noun 
Definition:
measure of printing width: a measure of printing width, half that of an em

em [ em ] (plural ems)
noun 
Definition:
1. variable measure of type: a unit of measurement of print size, equal to the point size of the typeface being used
2. printing 
Same as  pica
 
[Late 18th century. Representing pronunciation of m because the letter is about this width]


-bruce

 

Bruce Payette

PowerShell Technical Lead

 

PSMDTAG:FAQ: Can I cut-n-paste examples from WORD documents?
PSMDTAG:PARSER: (em dash, en dash, dash) handling

 

Leave a Comment
  • Please add 6 and 3 and type the answer here:
  • Post
  • How about non breaking spaces (0x00A0)? That one got me one time when I copied text from a web page into Visual Studio.
Page 1 of 1 (1 items)