One of the questions that comes up often - usually after somebody comes across one of the C# decompilers, such as RemoteSoft's Salamander or Lutz Roeder's Reflector - is “how do I keep somebody from reverse-engineering my assemblies and stealing my code?”.
While reverse engineering of code has been around for a long time, the combination of IL and rich metadata in systems such as Java and .NET make it easier. Reverse engineering optimized x86 code is harder, but it's still quite feasible, as many companies that used copy protection back in the 80s found out.
One way to think about securing your code is to consider how you secure your house or business. There are some possessions that are not protected, some that are secured with simple locks, some are put in safes, and some warrant a vault with a 24-hour guard. I think it depends upon how valuable a specific piece of software is, and that obviously is something you'll have to decide for yourself.
As for protection, there are a number of different schemes that customers have told me about.
I should also point out that there are some products that claim to use encryption-based approaches to prevent reverse-engineering, but I don't have any credentials to evaluate such schemes, so I won't venture an opinion.
If you know of any other schemes, please add them in the comments
I've been tracking topics I wanted to cover in various methods (excel spreadsheets, Post-It (tm) notes, “post-it compatible” notes, writing on my hand, and graffiti in my office).
It's been hard to keep all of those straight, so I'd like to centralize them in a blog post. If you have a question that you'd like me to answer, please add a comment to this post with your question.
I'll put a link on the top-level web page of my blog, so that you can easily get to it. I'll try to put some of my own ideas there as well.
The answer I came up with is at the bottom. But first, a brief digression.
There were several responses to my regex puzzle. They can be grouped into:
#1 was the kind of response I expected. My original idea was to highlight a regex technicque that made this a lot easier and more robust than the code I had seen suggested.
#2 is interesting. Clearly, if you can find a good library - and it's not more effort to prove that it is good than to create your own (remembering that you always underestimate how much effort it is to do it yourself), you should use it. But that really wasn't the point of my post - my question was “how you would do it using regex?“.
Which brings us to #3. While I agree with Raymond that there are cases where regex is more trouble than it's worth - something like brace matching comes to mind - I'm not sure that I agree in this case. You can't use an XML approach because HTML isn't required to be well-formed, which means you're either using a library or writing custom code. I'm not convinced that custom code is going to be more robust than a well-written regex without a fair bit of testing, and I do know that it would just as easy (perhaps easier) to write custom code that isn't robust as it would be to write a regex that isn't robust.
On to our solution. Note that I'm not claiming that this is a robust and tested solution - I'm more interested in showing off a regex technique. If you want to use it for real, be sure to test it well.
Conventional regex systems would require us to enumerate every tag that we want to replace. In that direction lies madness, as it's pretty likely you won't get it right. The example I saw, for example, didn't even replace “<script>“. But .NET regex (and current Perl syntax, IIRC...) allows you to use zero-width assertions and specify what you don't want to match.
The first step is to create something that matches a xml tag. The simplest version is:
which works great if there are no embedded “>“ inside the tag. To be able to handle a quoted attribute such as
we'll need to modify the regex to handle that case specifically. Here's the regex to do it:
( # group[^"]+? # One or more non-" chars. Matches tag with no quotes. non-greedy| # or # match something like <fred a="<5>">.+? # Everything up to ", non-greedy" # literal “.*? # zero or more characters after quote, non-greedy" # literal “.*? # zero or more characters after quote, non-greedy)
Now that we have that, we have to tell it what tags not to match. We can do that with a negative lookahead:
The key to the lookaheads/lookbehinds is that they don't consume any characters. So, this says “It's okay to match at this point unless the string is one of “br“, “/br“, “p“, or “/p“ (yes, you'd need to use a case-insensitive match to cover both upper and lowercase versions).
Lookahead is a great feature to have if you're trying to do more than one thing in a regex. Here's the full regex.
< # opening < of the tag(?!br|/br|p|/p) # negative lookahead. Match wil fail if any of these are present( # group[^"]+? # One or more non-" chars. Matches tag with no quotes. non-greedy| # or # match something like <fred a="<5>">.+? # Everything up to ", non-greedy" .*? # zero or more characters after quote, non-greedy" .*? # zero or more characters after quote, non-greedy)> # close of tag
If you value your productivity.
I got to level 30
Partial classes can be pretty cool because of the opportunities it gives in separating machine or designer generated code from user code. It does present a few issues, however.
One that we've been talking about recently is a scenario where you have some machine generated code that can function by itself, but sometimes your user wants to extend it. And, to extend it, the user needs to have some code run as part of the constructor.
That presents a problem. If we want the user to be able to use the class by itself, it needs to have a constructor in the generated code (assuming, of course, that the generated code needs a user-written constructor, which is true the majority of the time), but that prevents the user from writing a constructor themselves.
We talked about this from a design perspective, and explored some ways in which you could write “partial methods” - for want of a better term - in your partial classes. It is possible, but in all the schemes we came up with, you end up with weird ordering constraints (“I want the user constructor code to be inserted after the first 2 lines in the constructor but before the remaining 10 lines...”)
So, basically, those schemes were all untenable.
How can you do this? Well, the best way is to provide some designer support to add a call to a user method at the appropriate place.
Shaykat and I have been having a discussion about how we organize information that we present to customers. Specifically, we're trying to figure out how to present “What's new/what's changed” information. There are two options we've been considering.
The first option considers the language attribute to be the high-order bit. It would look something like:
What's new in C#* C# Language* C# IDE* Debugger* XML* Data* ASP.NET* Windows Forms* Smart Client* CLR* BCL
What's new in VB* VB Language* VB IDE* Debugger* XML* Data* ASP.NET* Windows Forms* Smart Client* CLR* BCL
The second option is to consider language one of the organizational attributes, but not the only attribute. It would look something like:
What's new in C#* C# Language* C# IDEWhat's new in VB* VB Language* VB IDE
Application Types* What's new in ASP.NET* What's new in Windows Forms* What's new in Smart Client applications
Technologies* What's new in the CLR* What's new in the BCL* What's new in Data* What's new in XMLWhich one of these organizations do you prefer? Please add a comment with “Language” if you prefer the first one, and “Attribute” if you prefer the second one. Or, if you think there's another organization that's better, feel free to explain.
Why can't we have .Add and .Remove aswell as += and -= operations after all its just a collection, I was rather supprised that this method wasn't actually available for adding event handlers.
One of our design goals is not to have more than one way of doing something (in contrast to Perl's tmtowtdi). If there are two different ways of doing something, developers need to learn both ways and understand that they are equivalent. That makes their job more complex without any real benefit.
Another area that this shows up is on delegates. Since C# has a special syntax for invoking delegates, we don't let you call Invoke() directly.
I recently picked up an iRiver iHP-120 MP3 player.
I'd been finding it hard to work at our the club without music. I thought about buying a Rio S50 (like my wife's), but my collection is ripped at a pretty high bit rate, so you can't fit that much music on 512MB of memory. So, a hard disk player it was going to be.
I spent a lot of time reading and researching. The contenders were the iPod, the Rio Karma, and the iRiver, all in their 20GB variations. It finally came down to having FM support, as the club has TVs that rebroadcast their audio on FM, and I sometimes like to watch TV rather than listen to music. So, I decided on the iRiver. If I went on the iPod, I think I would have spent the extra money to get this one.
After doing a bunch of price shopping, I couldn't find any great deals on the iRiver - at least, any great deals on companies with good reputations. Then I decided to check with at newegg, and it turned out they carried the iRiver at a good price.
A quick plug for newegg. I've bought components to build two PCs, two Mp3 players, and other assorted PC components, and both their price and service have been wonderful. No complaints at all.
Back to the iRiver. The player is roughly iPod sized, but a bit thicker. It has a nice LCD screen and a 5-way joystick on the front (left/right, up/down, and click), and play, stop, and a/b/repeat buttons on the side. It also has a record button and a built-in microphone.
In addition to the main player, there's an LCD remote that has all the functions of the main player, so you can stuff the player in your pocket. You can use the player with or without the remote.
Setup was a matter of plugging in the power, plugging in the USB cable, opening the drive in Windows, and dragging the music over. That took overnight on my home system (USB 1.0), but on my work system, I copied 18G in a little over 30 minutes, which rates pretty darn fast in my book. That got me up and running. It's nice that you don't have to have software to copy files, but if you stop there, the player is file based - go to a folder, choose the first song, and it plays through. If you want to play based on artist, album, or genre, you need to install the manager software, reboot, and then right-click on the drive and choose “update song database“ (or something like that). Easy to do, but the manual gives you few clues, and the online FAQ isn't much help either. The player also claims to support Winamp playlists, but I haven't tried it yet.
Navigation is a bit of a chore. I'm not sure whether it's the sheer number of functions they're trying to map to a limited number of keys, or whether they could use a better metaphor (I suspect a little of both), but it's hard to do the right thing in the UI. I'm getting the hang of it now, but it's not just a 'pick up and use immediately' device.
Overall, I'll give it an 8. A bit pricey, but good sound, good battery life (they claim 14 hours), nice to have the remote.
I bought a Western Digital 120 Gig external drive to use as a backup drive for my home system, and I'd like to use something nicer than Windows backup to do the backup. Here's what I'm looking for:
Bonus if it only stores a single version of files that are shared across the systems.
A customer wrote me today to ask, “Why can't I do arithmetic on byte types?” For example, if you write:
byte a = 0x01;byte b = ~a;
The compiler will complain on the second line that you can't convert an int to a byte. This happens because there are no operations defined for any types smaller than 4-byte types. The question is, “Why?”
Back in the old days, when a 5 MB disk drive was the size of a washing machine and memory was measured in the KB (ie when I went to college), memory and disk storage was very precious, and you did your best to conserve it. But times have changed. If you look at most programs these days, you'll find that they do the vast majority of their operations on 4-byte operands. And, in fact, x86 processors are very good at these operations.
But to get fast at those operations, many processors are slower at accessing smaller-than-4-byte operands, so if you use them you will have a program that's slower to execute.
Given that, the C# and CLR designers decided to only define arithmetic operations on 4-byte and 8-byte operand sizes. It does mean that you have to have some casts when using smaller types, but it's more indicative of what's going on at the chip level.
So, that's what's going on here.
I was reading our newsgroups, and I came across a post where the user wanted to filter out all tags from html text except for <br>, </br>, <p>, and </p>.
What is the shortest .net regex to do that?
I usually ignore the spam that I get (and I get a bunch), but I read one of them a bit more carefully:
We hereby inform you that your computer was scanned under the IP 126.96.36.199 . The contents of your computer were confiscated as an evidence, and you will be indicated.
Umm... I think that the author was probably looking for “indicted”.
(Note: Thanks to Keith who noted that I had mispelled “indicted” originally, without calling me a moreon)
One of the things that the C# team does occaisionally is hold an appweek, where we take a week out of our schedules and devote it solely to using our product to build apps. For the last week or so, there's been a QA appweek that the PMs have been participating in, and we're organized into 8 or so teams.
The reason I called it “appweak” is that I've only been able to spend, at best, 50% of my time devoted to appweek, which means my contribution has been fairly modest. Our team has been using the profiling apis to do some program visualization (function stack, exception behavior, memory behavior, etc.). Other teams have their own projects, with at least one team doing a game (a Berzerk clone, IIRC).
Why do I mention this? Well, to explain a bit how we test our products, and to tell you that I think you're going to be pretty happy with Whidbey - it has some nice advances that make writing code a fair bit easier.
[Update] Jack asked how we decide what kinds of apps we write, and whether we target specific feature areas.
The apps are nearly always chosen by the teams. Sometimes we give them guidance on what kind of app they write, but my experience is that it's better to just let the teams guide themselves. Our goal is to use the product the way we expect our customer to use it, which for us meant using source code control, the new build system, and training ourselves to use refactoring
(in case you're wondering, right now I'm cherry-picking a few comments people have asked me to write on. I'll get to them in order soon...)
Jonathan Crossland asked
.. your thoughts on enforcing patterns at the compiler level. As an example: - excluding the public scope from field declarations, making them private by default and only private. - declaring a field as public appears as a compiler warning (level x) or an abstract example - putting accessors (constructor like) on the object level, so that we can code against someone setting an instance to another as in myobj = yourobj (yourobj fires a get, myobj fires a set)
We haven't talked about this at length in the design meetings I've been at, but there is a big gap in my attendence.
Given that we already have an extendible tool like FXCop available, I don't think it makes sense for each compiler to do work that would duplicate the FXCop features. I'm fairly sure that both of your first examples can be done in FXCop now.
The abstract one could either be done by post-compile IL analysis, or it could be done by a compiler. Doing so in the compiler would be difficult to do in an general way, and our compiler architecture doesn't really lend itself to that sort of things.
Philosophically, it would be nice to have some way to leverage the knowledge that the compiler has about the code, but we don't currently have any plans in that area.
Jonathan, can you comment on why you want to be able to detect reference assignments? I'd like to understand the scenario better.
I had the pleasure to meet Ron Jeffries last year when he was on campus, and I've been reading his column in XP magazine.
Ron has released his “Adventures in C#” columns in book form. I haven't read it, but Ron has a great way of explaining thing, and I'm sure it's great.
which means I won't be around, though I may blog late next week.
OOF is a weird Microsoft TLA that means “out of office”. Yes, we know that it should probably be “OOOf”, but that's not the way it go expressed in our early tools. I guess that means it's really a TLpA (Three-letter-pseudo-acronym).
My wife and daughter and I are going up to Stevens Pass to night ski tonight. Nearby Snoqualmie pass was the first ski area in the country to provide night skiing, way back in 1945, and probably hosts more night skiers than anywhere else, as it's only 45 minutes from Seattle.
We'll stay up in the mountains, ski at Stevens again on Sunday, and then head up to ski Mt Baker, where Jake Burton started snowboarding, and holder of the single-season snowfall record for the United States (an incredible 1140 inches (yes nearly 100 feet) over the 1998-1999 season (ski areas had to *close* to dig out their lifts that year). The previous record was 1122 inches at Paradise Ranger Station on Mt. Rainier. I've never skied there, and I'm hoping for some of the “pow pow” that Baker is famous for.
We'll ski Baker Tuesday and Wednesday, then come back, relax for a few days, and head up to Stevens again next weekend.
[Update: Dare comments that “OOF” means “Out of Facility”. I've heard that before, but I don't buy it.
First of all, I've been at Microsoft for 9+ years, and have never heard anybody use “out of facility” (though to be fair,”OOF” is used far more than “out of office”). Second, if you do some google searches, you'll find that "out of facility" gets about 900 hits, and "out of office" gets 324K hits. ]
[Update: The skiing was okay, but not great, mostly due to the weather. On Monday, they had gotten 5-6" of new snow, but there was a 1/2" frozen crust on top. At the top of the first run, I skied down a little, sideslipped to the side of the run, hit the new stuff, and promptly fell down. Samantha skied next to me, and fell down. Kim skied around a "slow" sign, hit the new stuff and fell down.
Lest you think this behavior was limited to us, the first 5 people I saw did the identical thing. The new snow was fairly unskiable in the morning, and since there aren't a ton of trees where we skied, it was really hard to tell where the new snow started.
The second day there was about 7" of new, but it had warmed up, so it was 7" of slop. It was kindof skiable if it wasn't tracked out and you were patient, but without powder skis (I am not so equipped), it was a lot of work. A better day for boarders than skiers. Oh, and it was mixed rain and snow even up high, which meant we got really wet in the morning, even with our Gore-Tex.
Overall review: Baker is a good place for advanced skiers. It's not great for intermediates, as there aren't that many runs, and they don't spend a lot of money on grooming. And if you show up midweek, you better not be a beginner because you can't get to the beginner terrain without skiing the intermediate stuff. Lifts are all fixed-grip, which does mean you spend more time on the lifts. If you're hard-core, there is some good out of bounds skiing (bring your avalanche beacon). Note that we didn't ski some of the over lifts because a) some of them weren't open and b) Samantha isn't quite up to a “have to ski an ungroomed black diamond to get down” yet.
We stayed in Glacier, in an authentic log cabin. It had 'lectricity and runnin' water, but it didn't really have any amenities. So, we sat around, ate, read, and watched the first in the wood stove. Relaxing, and cheap for ski accomodations.
This has been the workout vacation. Worked out and skied Friday, skied Sunday, worked out Monday, skied Tues/Wed, rode 28 miles today, ski tommorrow and Sunday. Nice and relaxing.
Rick said in a comment,
You said: “Philosophically, it would be nice to have some way to leverage the knowledge that the compiler has about the code, but we don't currently have any plans in that area.“
I've often thought that its pretty difficult for tools to do meaningfull things with most source code due to the complexities of parsing. Any chance we'll see the front-end and back-end (and maybe even each phase) of C# compiler be accessible independantly?
I'd like to clarify a bit what I said.
We don't currently have any plans in this area, but we certainly do recognize the utility of providing this sort of access for customers, and would like to be able to do it in the future. And, to answer one of Rick's other questions, Intellisense and refactoring do share the compiler code, but the interface between them is complex and not terribly pretty, and therefore not something we'd like to expose.
I was web browsing this morning, and I clicked on a video link on a website, and IE popped up a little dialog that said.
You have clicked on a video link. Internet Explorer can play this in a separate window, which will allow you to keep browsing the web while the link plays.
Do you want to play this video in Internet Explorer?
I thought it was UI design 101 that when you told somebody about an option and then gave them a choice, the choice should not be phrased as THE EXACT OPPOSITE to the option you just told them about. Because, unless you read the question very closely, you will end up choosing the wrong thing. And, of course, you had the “remember my choice” option checked, so now you're stuck with the setting.
I had to read that dialog three times before I realized that I needed to click 'No', but I know that I've done it wrong before.
[Note to readers. I'm actually a fan of asparagus, but you have to admit it's at least close to an anti-cookie]
Last summer, Bruce Eckel and Bill Venners came to Seattle to talk to the C# team, which lead to the Anders interviews that have been featured on Artima. During a break between talking to more important people, I got the chance to talk to Bruce and Bill for 20 or 30 minutes, and this interview is the result.
One of the cool things about my job is that I sometimes get to meet “important people”. I'd always really enjoyed Bruce Eckel's C++ books (I learned C++ from some of his early C++ titles), and it was a distinct pleasure to be able to meet him and express my appreciation.
Feel free to comment on the interview in this entry, or on the comment page for the interview. Special thanks if you find a place where I've said something stupid, so I can correct it. (No, I didn't use the word “Indite”...)
[Update: In a comment, Tom asks how Java requiring a JVM on the other end is any different than .NET requiring .NET on the other end.
My point - which didn't really come through very well - was that presuming what you have on the other side is <X> is a constraint. The way around this is to use Web Services, which are platform agnostic. In that sort of world, the ability to ship code around the network doesn't really gain you anything, because you can't do it on all platforms.
Up and down. Two words that we generally think of opposites, but in actual use they are quite different.
This became apparent yesterday, when I was in the meeting and the organizers said, “Let's wait a few minutes for others to turn up”, and I realized that “turn up” is not the opposite of “turn down”. The latter means “to reject or refuse”, and the former refers to the biennial flowering plant Brassica rapa var. rapifera.
We came up with a number of other examples of this phenomena. Here's the full list
I was reading the manual for my iRiver a few days ago. There are lots of details and acronyms in the digital music world, and it's sometimes hard to figure things out - especially if you aren't very technical. For example, the player will display an icon telling you what format of file you're listening, but not everybody knows what an OGG format is.
The folks at iRiver have taken that into consideration, and they've included this handy graphic:
Now that's helpful.
I was having a discussion with Joe a few days ago, and he asked me what he should do when people make comments that ask questions.
My current approach is to look at the question and try to decide whether I should do a new posting on the topic, or whether I should just add a comment, but I realized that I don't know whether most people go back to the comments to look for answers or not.
If you have an opinion on what you think works well, feel free to comment about comments in the comments section of this post on commenting on comments.
If you walk into some offices at Microsoft, you'll find these strange 2“ black cubes on some people's desks. These cubes show up when you complete a patent application and it is filed. It's a nice reward to a process that takes a fair bit of work on your time, and there is also a monetary award.
Sometime later (and by later, I mean *years* later), the patent application may be granted, and if it does, you get a nice little plaque a summary of your patent on it. It also has a diagram that details how a computer system works, which is boilerplate that goes into all of our patents (and those of other computer companies, I suspect). So, there's a processing unit with a system bus, a hard disk drive interface, etc. It's not enough to just say “this patent is related to a computer“, you apparently have to describe a computer.
So, I've been awarded patent # 6,xxx,xxx for “Method and Apparatus xxxx xxxx x x xx xxxx x xxxxx“.
I thought that I'd give you the real number, but I've decided not to because you probably don't want to read patents, as one of the idiosyncracies of patent law is that damages are trebled if you knowingly violate a patent, so it's better to be ignorant.
And no, I'm not going to talk about the advisability of software patents in general. I do have an opinion on that, but I'd rather not debate that here.