A group blog from members of the VB team
Back in seventh grade, I had a pretty heavy crush on a girl named Melissa. She was a good friend of mine, and so I had no problems talking to her. However, being painfully shy back then, I was far too nervous to ever ask her to “go with” me. (For those not steeped in American traditions, to “go with” a girl or boy in junior high school meant that you were publicly asserting that the two of you were somehow romantically involved, whatever that actually meant when you were only twelve years old. It was always unclear to me why the term “go with” was used, since, being seventh graders, you never actually went anywhere. By the time you had access to a car and could actually go somewhere together, you were “dating,” not “going with.” In practice, what “going with” really meant was that the girl felt obliged to call the boy on the phone on a daily basis, and if the boy didn’t muster up the courage to ask the girl out onto the floor in front of all of her friends at the next school dance, she would cry and break up with him.)
Anyway, I was able to reconcile my cowardice by writing notes to her expressing my affections. Well, in code, anyway – I was somewhat insensitive as well as shy. I used a simple 1:1 cipher where I mapped invented hieroglyphs to letters of the alphabet (essentially identical to writing a love letter using the WingDings font, though we didn’t have PCs back then). This allowed me to feel good that I had done *something* to tell her how I felt about her, without actually putting me in emotional danger. You see, I never gave her the solution key!
This worked fairly well for my ego for a couple of weeks, until I learned from a friend that she had deciphered at least half of my invented alphabet, using letter frequency to figure them out (it turned out that she liked puzzles of this sort). Mortified, I immediately stopped sending her any new notes, and so, alas, she never learned how the depths of my feelings for her.
That junior high memory came back to me on Christmas Day, as I was reading through a book that my wife (apparently, I did overcome my shyness at some point) had given me as a gift. It’s a fascinating book entitled “The Six Unsolved Ciphers” by Richard Belfield. In it, Mr. Belfield walks the readers through such confounding puzzles as the Kryptos puzzle at the new CIA headquarters and the Shepherd’s monument. The book really struck a chord with me. As a kid, I was motivated to create all sorts of codes, alphabets, and languages, but in recent years, while I’ve certainly used cryptology – heck, we all do on a daily basis, whether we realize it or not – I hadn’t really done much with it other than to leverage functionality that was already there. However, in reading Mr. Belfield ‘s description of the Vigenère encoding scheme, I realized that I had to try that one out, and I started working out the code. Ultimately, I realized that this would make a fun blog topic (and for all I know, this has already been blogged somewhere, but I’ll press on nonetheless and see where we come out).
Before I jump into the coding, I’d better briefly describe the Vigenère method . The method (and I’m borrowing heavily from Mr. Belfield here) involves both a key and an alphabet, and the encrypted letters result from a mathematical relation leveraging both of those. The key can be any word – let’s say “MICROSOFT” – and the alphabet would be something like the letters A-Z -- they might be in a different order known only to the sender and the receiver, but let’s assume for the purpose of this example that they are in normal English alphabetical order. First thing I need to do is to create a grid with the alphabet offset by one each row (or column):
So, let’s say I want to encode the phrase “ERIC KNOX SHOULD POST MORE BLOGS”. The first letter in the keyword (MICROSOFT) is M, the first letter in the sentence is E, so I find the M row and the E column and note where they intersect – this gives us the letter Q. The next letter in the keyword is I and the next letter in the sentence is R, and that yields Z. We ignore spaces, and when we run out of keyword letters, we just start again at the beginning of the keyword. This ultimately yields the encoded phrase “QZKTYFCCLTWWCRHCXMYWTVPDCLL”.
To decode the phrase, I work backwards – I find the Q in the M row, and ascend the column to find the E at the top, then find the Z in the I row and ascend to find the R at the top, and repeat this until I get back to “ERICKNOXSHOULDPOSTMOREBLOGS”. It’s pretty simple, though tedious work for messages of any meaningful length when doing it by hand. According to Mr. Belfield, this sort of code apparently was all but unbreakable for hundreds of years – nowadays, people have to change keywords at random times, throw in random junk in the text, or alter the alphabet ordering, to make it tougher to decode.
In coding up the Vigenère method, I knew that I’d want to make provision for code which used a non-standard alphabet (as does the Kryptos cipher that Mr. Belfield writes about) – for example, to make decoding trickier, I might make an alphabet that leverages an additional code word such as my last name:
All 26 letters are still there; I’ve just pushed a few up to the front. The advantage is that not knowing the alphabet ordering makes it tougher for others to crack my code.
The other thing I immediately decided was that creating a 2-dimensional array to hold the offsets of the alphabet would be incredibly inelegant as well as wasteful of memory. Since the ordering of each row is deterministic, being offset by one each time, a simple index into the alphabet should suffice, wrapping around to the front of the alphabet when needed. (Think of the alphabet as being a circular list if that helps visualize what I mean).
I’ll start out the project as usual: I’ll create a Visual Basic Windows Forms application (“VBEncrypt”). Now, on the form, I’ll drag out labels and edit boxes for the Key, the Alphabet, the Clear Text, and the Encoded Text (the latter two boxes set to both “Multiline=True” and “Scroll bars=Vertical” in the properties). I’ll also added some checkboxes to handle niceties – “Case-Insensitive” to allow the user to ignore casing, and “Group Letters When Encoding” (with an accompanying edit box) to cause spaces to be inserted every n characters when emitting the encrypted characters – i.e., “QZKT YFCC LTWW CRHC XMYW TVPD CLL” vs “QZKTYFCCLTWWCRHCXMYWTVPDCLL”. Finally, I’ll add a couple of buttons to do the actual encoding or decoding. My final layout is in a JPG in the ZIP file attached to this post.
The idea is that the user specifies the Key and the Alphabet (note that I’ll default to a normal alphabet in the interest of helping the user if they don’t want to type all 26 characters every time). To encode, the user pastes text into the left pane and presses the “-->” button to encode it. To decode, the user pastes encrypted text in the right pane and presses the “<--“ arrow.
Before we get to the guts of the code, I’m going to do a little housekeeping first. I’m going to need to do validation on the Key, Alphabet, and the “Letters per group.” For the Key, there are three issues – it must be non-empty, it must have no spaces (these are forbidden, since spaces are equivalent to NULLs and are used to simply make outputs more readable), and it must not use any characters not in the Alphabet. The first two cases are easy to code:
Private Function VerifyKey() As Boolean
Dim Key As String = Me.edtKey.Text
If String.IsNullOrEmpty(Key) Then
MsgBox("The Key contains no characters.")
If Key.IndexOf(" ") <> -1 Then
MsgBox("The key contains one or more space characters.")
Whereas the third case (alphabet validation) is more complicated and, since I expect it to be rare, I’ll defer checking for it until I’m actually doing the encoding/decoding so as not to penalize the typical case performance-wise. (Note , by the way,that I am hard-coding my MsgBox strings – this is just for blog clarity, and normally I’d put the strings into resources so that they could be localized into non-English languages.)
For the Alphabet verification, it’s nearly identical – I need the alphabet to be non-empty and to not contain spaces. However, I also want to make sure that the Alphabet contains no duplicate characters, because that will make coding or decoding problematic. To check for duplicate characters in a string, people usually sort the string using a selection sort and then walk through it looking for multiples step-by-step. I’m lazy, though, and so I’m going to leverage the “Distinct()” method on the String class – this will do the sort for me and return an IEnumerable containing the unique characters in the string. If its count of that IEnumerable isn’t identical to the length of the original string, then I know there’s a problem, and all it’ll cost me is just an IEnumerable object that I otherwise won’t use:
Private Function VerifyAlphabet() As Boolean
Dim Alphabet As String = Me.edtAlphabet.Text
If String.IsNullOrEmpty(Alphabet) Then
MsgBox("The Alphabet contains no characters.")
If Alphabet.IndexOf(" ") <> -1 Then
MsgBox("The alphabet contains a space character.")
Dim distinctLetters As IEnumerable(Of Char) = Alphabet.Distinct()
If distinctLetters.Count <> Alphabet.Length Then
MsgBox("The alphabet contains duplicates.")
For the GroupSize box, I want to enforce having only non-negative integers. I’ll do this on the validation. At the top of the editor, I’ll select the edtGroupSize control in the left dropdown, then in the right dropdown I’ll choose “Validating.” I’ll then fill in the resulting code thusly:
Private Sub edtGroupSize_Validating(ByVal sender As Object, _
ByVal e As System.ComponentModel.CancelEventArgs) Handles edtGroupSize.Validating
If Not IsNumeric(edtGroupSize.Text.ToString) OrElse CInt(edtGroupSize.Text) < 0 _
OrElse edtGroupSize.Text.IndexOf(".") <> -1 Then
' Cancel the event and select the text to be corrected by the user.
e.Cancel = True
MsgBox("Only non-negative whole numbers are allowed in this field.")
One last detail before we get to the meat of the code: I need to link the availability of the GroupSize box to the state of the Group checkbox:
Private Sub Form1_Load(ByVal sender As System.Object, ByVal e _
As System.EventArgs) Handles MyBase.Load
edtGroupSize.Enabled = (chkGroup.CheckState = CheckState.Checked)
Private Sub chkGroup_CheckedChanged(ByVal sender As System.Object, _
ByVal e As System.EventArgs) Handles chkGroup.CheckedChanged
We’ll now dig into the encoding. By double-clicking on the Encode button, I’ll generate the boilerplate header:
Private Sub btnEncode_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles btnEncode.Click
Next, I’ll call the Validation methods I set up in the previous session:
If VerifyKey() AndAlso VerifyAlphabet() AndAlso Not String.IsNullOrEmpty(Me.edtClearText.Text) Then
I’m going to pull out all of my string values from the form a priori, since I know I’m going to be in a loop and I detest dereferencing controls or string lengths more than necessary. Note that, when I pull in the text to encode (ClearText), I’m also stripping out any spaces from it using the Replace() method:
Dim Alphabet As String = Me.edtAlphabet.Text
Dim Key As String = Me.edtKey.Text
Dim ClearText As String = Me.edtClearText.Text.Replace(" ", "")
Dim ClearTextLength As Integer = ClearText.Length
Dim AlphabetLength As Integer = Alphabet.Length
Dim KeyLength As Integer = Key.Length
The user might have selected “Case Insensitive,” and if that’s the case, I’ll upper-case everything in all of the strings to make the comparisons work:
' Push everything to uppercase if we are being case-insensitive:
If chkCaseInsensitive.CheckState = CheckState.Checked Then
Alphabet = Alphabet.ToUpper
Key = Key.ToUpper
ClearText = ClearText.ToUpper
Now, I need a place for the encrypted code to go. I’ll use a StringBuilder for this, because (as I’ve mentioned in earlier posts) it is a lot less wasteful than concatenating strings. In fact, in this case, I already know the size of the text buffer I’ll need – the encoded text will be the same size as the clear text:
Dim EncodedText As New StringBuilder(ClearTextLength)
As I mentioned above, I’m going to have to cycle through the Key, so I’ll need an index into it:
Dim KeyIndex As Integer = 0
And I’ll proactively determine if the user wants me to add spaces between groups of letters:
Dim GroupSize As Integer = -1
If Me.chkGroup.CheckState = CheckState.Checked Then
GroupSize = CInt(edtGroupSize.Text)
Now, we come to the place where the work gets done. I will step through each letter of the ClearText and mathematically determine its corresponding encoded value. The first thing to do is to determine the “row.” I’ll get the current letter pointed to in the Key, and figure out where that letter lives in the Alphabet. In a normal alphabet, I could simply calculate its offset from the letter “A”, but I mentioned earlier I want to support arbitrary alphabets and so I use “IndexOf()” instead. (If IndexOf() returns a -1, then I know that the Key uses a letter not in the Alphabet, and will abort the operation with an appropriate message.) The code is:
For i As Integer = 0 To ClearTextLength - 1
Dim RowOffset As Integer = Alphabet.IndexOf(Key(KeyIndex))
If RowOffset = -1 Then
MsgBox("Key contains character" & Key(KeyIndex) & "not in the alphabet.")
Knowing the row offset, I don’t need a 2-D array – I’ll just pretend that the Alphabet starts RowOffset characters later than it does, wrapping around if necessary – but I’ll get to that later.
Now, let’s track down the column. I’ll get that from the current ClearText letter itself, locating it within the Alphabet via IndexOf(). Again, I can report & abort if the text to encode contains characters that are not in the Alphabet:
Dim ColOffset As Integer = Alphabet.IndexOf(ClearText(i))
If ColOffset = -1 Then
MsgBox("ClearText contains character" & ClearText(i) & _
"not in the alphabet.")
Now, the encoding is easy. Moving along rows or along columns just changes the letter by one index on each step. To select the “row” is identical to just advancing through the Alphabet by RowOffset characters and pretending that the Alphabet starts there. I then add ColOffset to that value in order to account for the horizontal displacement, following the same logic. Because the sum might “wrap around” back to the front of the Alphabet, I need to perform a modulus on the sum before using it as an index to Alphabet, so I won’t reference an illegal index:
Dim EncodedChar As Char = Alphabet((RowOffset + ColOffset) Mod AlphabetLength)
It’s just that simple! Now I’ll add any spaces if requested (again using a modulus to see if we’ve reached the n-1 point where a space needs to go), move to the next letter in the Key (again, modulus will help me wrap around to the beginning if needed), and throw the encoded letter into the buffer:
' Add spaces if requested:
If GroupSize > 0 Then
If i Mod GroupSize = GroupSize - 1 Then
' On to the next character in the key...
KeyIndex = (KeyIndex + 1) Mod KeyLength
After the final loop, I’ll update the text in the right-hand pane, and we’re done:
Me.edtEncodedText.Text = EncodedText.ToString
Decoding is very similar to encoding, except that I use the EncodedText instead of the ClearText and, once I’ve found the “row” based on the current letter in the Key, I need to subtract the “column” instead of adding it – this means I also have to add the length of the alphabet to the result before taking a modulus, because modulus doesn’t work on negative numbers and I still need to account for wrap-around in the leftward direction. The result looks like this:
Private Sub btnDecode_Click(ByVal sender As System.Object, ByVal e _
As System.EventArgs) Handles btnDecode.Click
If VerifyKey() AndAlso VerifyAlphabet() AndAlso _
Not String.IsNullOrEmpty(Me.edtEncodedText.Text) Then
Dim EncodedText As String = Me.edtEncodedText.Text.Replace(" ", "")
Dim EncodedTextLength As Integer = EncodedText.Length
Dim KeyLength As Integer = Key.Length
EncodedText = EncodedText.ToUpper
Dim ClearText As New StringBuilder(EncodedTextLength)
For i As Integer = 0 To EncodedTextLength - 1
Dim ColOffset As Integer = Alphabet.IndexOf(EncodedText(i))
MsgBox("ClearText contains character" & EncodedText(i) _
& "not in the alphabet.")
Dim DecodedChar As Char = Alphabet((AlphabetLength + (ColOffset - RowOffset)) _
Me.edtClearText.Text = ClearText.ToString
And that’s it! The final application is attached to this post.
This was code I particularly enjoyed writing – I like the simple elegance of a good mathematical model. For me, though, the most poignant part of this whole exercise was remembering that APTNW WBOIA DLMJV RDATM QSOVU YJWLP ZJALN APZJL DNXRO VFUVA RQYOY ZFMAR CLDBZ XJJYY PBZQO IUWRN AOZHP NYIZW MSWMU YJWNO MQATD YRKUY RINRZ NDADP MMGIV BUPRQ.
‘Til next time,
PingBack from http://www.codedstyle.com/sshhh%e2%80%a6-it%e2%80%99s-a-secret-matt-gertz/
Great story, I enjoyed reading :)
Hey thanks for leaving the source code! It was hard to understand what to do when there wasn't exact directions. Thanks!
[General comment to all: Folks should note, BTW, that I usually remove comments from my code when I insert it into the blog, since I'm discussing the code right there -- this saves space in the blog. However, if you grab the zip file, you'll find that the code is well-commented and therefore hopefully helpful when the blog isn't right in front of you.].
Hey, just wanted to say that i have an idea! What i did was i added a cool feature to this vbencrypt program! what i did was make it so you can SAVE the secret message! i just click the save button i made, and then you choose where to save it and what name it is, the usual. it saves the text in the big text box to the right, like i said, the encoded text, to the file, but the file saved isn't a text format! its format is .vbcrypt. windows calls it a VBCRYPT file, and when you double click it, it doesnt know what to open it with! that can make your encoded message even more secret because people wont know what to open it with! but, if you opened it with notepad, it would display the encoded text. also i put an open file button on there, where you can open VBCRYPT files. when it does this, i defined a StreamReader and used it to write the encoded text to the encoded text box! then you just have to enter the key, decode it, and you have your message :)
And another thing i found out. for this there is no extra editting needed. just go to the alphabet, and somewhere put in an underscore. _ is an underscore, just to clarify. now, when you type your messages, you can type underscores and use them as spaces! like this: Hello_i'm_using_underscores_as_spaces
this can help with rather long messages, it just makes it quicker instead of having to separate the words.
and finally, an extra encoding technique! Ok, so open up that program VBEncrypt. now, you enter the key as usual, and enter your message, and encode it. it comes up as the usual odd jumble of letters. now, in case someone somehow got a hold of your key, you can ALSO encode it again! do this by copying the already encoded text, clear the large left text box, and paste the encoded text there. then, change the key to something different. click encode, and now you have a double encoded message =) you can do this as many times as you want, just do the same thing you did with the same keys and everything, but backwards, and you get your message. Cya!
P.S. Would you believe me if i said i was 13? =S
Your article is so sweet. You have a natural sense of humor and easy to follow coding capabilities. Let's hear more from you Matt.
@Jerry: Thanks for the feedback! 13, really? It's really encouraging to folks your age getting into the coding (and now you can write encoded messages to your "significant other" as well... hopefully, you'll be braver than I was :-).
On the first point you bring up -- absolutely, that is a great ideas for extensions to my program and it's really cool when people take an idea and run with it like this.
For the second point: yep, it's definitely more readable with the underscore. The only thing that argues against using the underscore method (and the reason that most people avoid doing something like it) is that, the more characters you end up with, the easier it is to crack the code, simply because there's more data to work with. Of course, short messages are nearly uncrackable and so underscore is great for readability in those cases.
One the third point: well, you have to be careful about thinking that the message is doubly-encoded. There still exists a mathematical relationship between your text, the keyword, and the original alphabet. Since the keyword is constant in each calculation, and each iteration of the text has a relationship to the previous iteration, mathematically speaking you're essentially just working with a different alphabet and some offsets -- from the perspective of someone trying to crack your message, it's probably no more difficult for them to succeed if you do this. I haven't run the actual math, but I think essentially it's equivalent just starting out with a different alphabet. If you changed the keyword at some point, it would be harder to decrypt, and of course if both alphabet and codeword changed, then you could really start to think of it as doubly-encrypted since presumably there'd be no relationship between either of those.
@Waleed: Thanks for the kind words! It's harder to find time to write these days, but I do enjoy it and have a couple of ideas that I hope to get to later this month.
Thanks for replying! It's nice to see that you actually reply to your feedback, its nice! Also, I'm just curious, you are on the vb team...correct? Well, what do they do? I assume work a lot with vb, seeing how good you are at it...but could you elaborate on it? Thanks!
Actually, I'm not on the VB team anymore, although I work with them still from an operational point of view. I was on the team (or component groups that later merged into it) for about thirteen years, but left it just over a year ago to take a new job in operations -- getting Visual Studio out the door to market. So, as to what the VB team is doing now -- well, I'm not sure! You'd have to ask Lisa... :-)
Oh, i see...oh, and if you need any ideas for blogging if you'd like, i have one...a vb database :). i'm currently working with vb right now, trying to figure out how to make one, since there are no web tutorials i can find...well, it's just an idea.
I have one here: http://blogs.msdn.com/vbteam/archive/2008/03/02/keeping-track-of-books-using-vb-matt-gertz.aspx -- and people who commented on that one also noted some other books/blogs on teh subject. Good luck!
Enjoyed your article. Bought the book, and will download your code. One is taught these days not to believe in coincidences, but I was thinking about doing a project similar to yours and just happened to come across your article.
As someone who still uses VB, the fact that your code is a VB project will be a bonus.
Thanks! (Richard Belfield owes me a commission now. :-) VB's a great language for programming all sorts of things. I used to do all of my personal coding in C++, but I switched to using VB once I joined that team and frankly feel have not missed any power over the machine.
Hey Matt, I forgot to ask ya this back in january, d'you have an msn or something like that we can chat in? (Yahoo is fine too, msn is compatible with it)
At this time, I don't have a presence on MSN, Live, Yahoo, or Facebook. One of my friends/co-workers has been trying to talk me into setting one up, but unfortunately, I'm hampered by two problems:
(1) I grew up in the Midwest *and* am over 40, and thus I belong to a demographic which historically has tried hard to keep a low and humble profile;
(2) I work crazy long hours and thus try as much as possible to devote whatever remains to my family (as opposed to being online).
Consequently (for better or worse) I've always viewed such sites with trepidation. There's a certain irony in this, admittedly, since I'm in an industry which actively promotes such sites. I've actually been looking at some of the sites to see if I can convince myself to set something up, and if that happens, I will let folks on this blog know how they can get in touch with me. (Recently, I've been dragged into the world of IM chatting internally at Microsoft, so maybe there's hope for me... :-) )