Normalization as obfuscation in C#

Sorting it all Out
Michael Kaplan's random stuff of dubious value
Be sure to read the disclaimer here first!

Normalization as obfuscation in C#

  • Comments 18

Take a look at the following code, let me know what you think of it (compiled with Whidbey Beta 2, note the preview of the exciting new StringInfo methods for dealing with text elements!):

namespace àáâãäå {
using System;
using System.Text;
using System.Globalization;

    class àáâãäå
    {
  [STAThread]
  static void Main(string[] args) {
            àáâãäå(); àáâãäå(); àáâãäå();
            àáâãäå(); àáâãäå(); àáâãäå(); àáâãäå();
        }
        static void àáâãäå(string àáâãäå) {
            StringBuilder àáâãäå = new StringBuilder();
            StringInfo àáâãäå =  new StringInfo(àáâãäå);

            àáâãäå.Append(àáâãäå.Normalize(NormalizationForm.FormC));
            àáâãäå.Append(": ");

            for(int àáâãäå=0; àáâãäå < àáâãäå.LengthInTextElements; àáâãäå++) {
                string àáâãäå = àáâãäå.SubstringByTextElements(àáâãäå, 1);
                if(àáâãäå.IsNormalized(NormalizationForm.FormC)) {
                    àáâãäå.Append("C");
                } else if(àáâãäå.IsNormalized(NormalizationForm.FormD)) {
                    àáâãäå.Append("D");
                } else {
                    àáâãäå.Append("_");
                }
            }
            Console.WriteLine(àáâãäå.ToString());
            return;
        }
        static void àáâãäå() {
            àáâãäå.àáâãäå("àáâãäå");
        }
        static void àáâãäå() {
            àáâãäå.àáâãäå("àáâãäå");
        }
        static void àáâãäå() {
            àáâãäå.àáâãäå("àáâãäå");
        }
        static void àáâãäå() {
            àáâãäå.àáâãäå("àáâãäå");
        }
        static void àáâãäå() {
            àáâãäå.àáâãäå("àáâãäå");
        }
        static void àáâãäå() {
            àáâãäå.àáâãäå("àáâãäå");
        }
        static void àáâãäå() {
            àáâãäå.àáâãäå("àáâãäå");
        }
    }
}

It compiles, even though it looks like the namespace, the class name, every procedure (other than main) and every variable looks like the same string.

The wonders of various combinations of the string "àáâãäå".

(Interestingly, due to all the exciting work of someone from the VS team, the cursor moves over the text elements as letters and thus it was an interesting challenge getting this written!)

Do you think it makes the code less readable? :-)

Compile it on the command line:

c:\temp>csc àáâãäå.cs

and then run it to see the output:

c:\temp>àáâãäå
àáâaäå: CCCCCC
àáâaäå: DDDDDD
àáâaäå: DCCCCC
àáâaäå: DDCCCC
àáâaäå: DDDCCC
àáâaäå: DDDDCC
àáâaäå: DDDDDC

Interesting, no? :-)

 

This post brought to you by "à", "á", "â", "ã", "ä", and "å" (U+00e0, U+00e1, U+00e2, U+00e3, U+00e4, and U+00e5, a.k.a. LATIN SMALL LETTER A WITH GRAVE, LATIN SMALL LETTER A WITH ACUTE,  LATIN SMALL LETTER A WITH CIRCUMFLEX, LATIN SMALL LETTER A WITH TILDE, LATIN SMALL LETTER A WITH DIAERESIS, and LATIN SMALL LETTER A WITH RING ABOVE)
Well who did you think would be willing to sponsor this rubbish? :-)

Comment on the blather
Leave a Comment
  • Please add 1 and 6 and type the answer here:
  • Post
Blog - Comment List
  • It should be a compiler error if identifiers are the same after begin normalized to a fixed type. Just to stop the sort of confusing bug where it LOOKS like you're using the correct identifier, but in reality it's a different one. This wouldn't happen when everyone uses the same editor for the code, but if some developers choose to use a different editor which has a different normalization mode to the other developer's editor...
  • Well, I will not disagree with you, but this is not an error in 1.0, 1.1, or 2.0 of .NET as shipped by Microsoft (if you have an application that does not call Whidbey-specific methods), so technically that would be a breaking change.

    Though maybe FxCop can make it a warning. :-)
  • Somehow, it made me thinking of a line in a quite funny Swedish poem. (It's dialectal and a semi-phonetic representation.)

    Å i åa ä e ö.

    Why use anything but vowels if you don't have to?

    ("Och i ån är en ö." is a "normalized" version of that string. From there on, I leave it as an exercise to the reader.)

    (Did you happen to be inspired by the IDN phishing post on Slashdot et al today? I even saw comments linking back to your blog where you talked about it in general terms.)
  • Nah, I had this one planned a week ago (thats when I originally wrote the code!). Though I would like to stay under *that* radar....
  • I think firefox is rendering the text incorrectly. When I first looked at this page in firefox I thought "what do you mean, they all look they same - no they don't". Then I switched to IE and had a look, and they all *do* look the same.

    To compare:
    IE: http://www.codeka.com/tmp/norm/ie.png
    Firefox: http://www.codeka.com/tmp/norm/firefox.png

    Notice how firefox doesn't seem to be composing the characters properly? I wonder why that is... perhaps if I could be bothered, I might submit a bug report (unless there already is one).

    I'm always torn between IE and firefox... I switch to one for a while, but then I'll switch back, and then later switch again...
  • Yikes, poor support for Normalization Form D combining marks seems like a pretty bad bug to me!

    Though even IE will mess it up for some fonts -- using Uniscribe can help but using fonts with good info helps even more....

    Probably worth reporting the bug, though. Better international support is always worthwhile (I pointed out many Extension B bugs in Mozilla back in the day).
  • How about using the romanian letter ă or Ă
    (I hope that IE will get this right :-)
  • Filed bug report: https://bugzilla.mozilla.org/show_bug.cgi?id=281483

    If you'll note, I tried it on Mozilla when I got home, and it's even worse there (well, arguably so - at least Mozilla was consistent) because it didn't combine any of the Form D characters!
  • > Interesting, no? :-)

    NO ;]
  • Adi -- worked in IE no problem. :-) I actually just chose the first bunch I saw, I could have included more (I probably would have to if I were building an obfuscator for a big spplication, as even with a factorial number if possiblities it would be a huge name map!

    Cool on the bug, Dean -- maybe they will even fix it (shame they do not use Uniscribe on Win32, but I guess I understand why they choose not to "sully" themselves that way. At least it works in IE!

    bg -- sorry you didn't like it, but I gave up a long time ago trying to please everybody....
  • Apparently Uniscribe is in the works. My bug got attached as a dependency to https://bugzilla.mozilla.org/show_bug.cgi?id=218887 "Use Uniscribe APIs in GFX:Win and layout/editor" so it looks like that's the plan. Mind you, that one's over a year old, and not much progress in a while... so I'm not holding my breath for a quick fix.
  • Last night I got a piece of email from someone who had seen my talk for the .NET Developer's Association...
  • I used it in a very confusing and obfuscated way in Normalization as obfuscation in C#. And then yesterday...
  • Ok, we start with the following code -- just paste it into Notepad, and save in UTF-8 or Unicode encoding...
Page 1 of 2 (18 items) 12