I have been watching the SQL Server Security forum for several years now and there is one question that gets spawned about once a month under different titles. It invariably begins with a request for guidance on how to secure access to a database, which sounds like a reasonable security inquiry, but after a while it becomes clear that the subject is copy protection for some application's database. All right, but why does it matter? Well, it matters because these are very different topics. (Note that I am using copy protection as an umbrella term - these days, DRM is the fashionable term, which stands for digital rights management and which covers more than the simple operation of copying. But I'm trying to keep things simple.)

Let's see what security and copy protection are about. Security is about controlling access to data - a person is either allowed to access the data or not. Security may restrict the way in which the access can be gained, but it cannot control what the person will do with the information once he gets access to it - that person may give it away, sell it, or store it for his personal use. This is where copy protection comes in and attempts to establish some control over the use of the data. The goal of copy protection is to restrict what the user can do with some information once he already got it. To be sure, the goal of copy protection may pertain to security, but the problem is that there are no viable techniques that allow us to achieve that - the techniques used so far are simply not capable of providing the guarantees that one associates with security techniques.

But why is copy protection a hard problem? The difficulty of achieving copy protection should become immediately apparent with the following simple example: let's say the data I want to protect is a confidential telephone number. There is no way one can possibly copy-protect that, because no matter what access restrictions one would set around the record storing the phone number, the people that have access to it can do whatever they want with it. There is no techology that we can use to enforce the restrictions we would like placed on that telephone number. Some solutions may come to mind, but they are either extreme, or expensive, or both: permanent surveillance of the persons that came into contact with the phone number (Big Brother method), restricting the contact that those persons have with any unauthorized person (imprisonment technique), or inflicting capital punishment on those persons (medieval approach) - so, basically, there are no solutions that we can implement in IT. What remains is recourse to the legal system - we can launch an investigation in the leak of the telephone number and we can hope to catch the culprit and have him sentenced - this may not help that much after the fact, but it might be a strong enough deterrent.

At this point, someone may argue that the example I gave above is simplistic - that we may not care about the loss of a single phone number but about the loss of 1 billion phone numbers that cannot possibly be memorized by a person, or that the information that we want to protect is hard to memorize - maybe it is an image or an audio file. Well, it doesn't matter. What that simplistic example shows is that there is a hole that cannot be covered - the fact that the user gets access to the information whose use we are supposed to restrict. Having more information to protect may make the user's task of copying it more difficult if he only has access to a pen and a bit of paper, but when the user has a computer in his hands - a tool designed to relieve man from tedious activities such as computing or memorizing lots of stuff - then there is really no additional difficulty raised.

But what about all those copy protection schemes that are in use today? How do they work - or do they work? They are there, indeed, but what all of them do is that they obfuscate data with a secret algorithm hard coded in software that knows how to provide the data to the user in a limited set of scenarios, but would not allow other scenarios. This sounds hopeful until you realize that there is no way to really protect the algorithm from a curious user - the computer has to execute it and whatever the computer executes, the user can read - it's his machine after all and he only needs to invest some time (a few months) into reading a book that teaches the assembly language of his computer - he just needs to be literate and fairly intelligent. This problem of being able to discover the obfuscation algorithm may not seem a high risk to some, but the situation is even worse: you only need one dedicated person to figure the algorithm out and to write a program that reverses the obfuscation, and then he can distribute that software for the use of all the other people that want unrestricted access to the data - and this time they only need to know how to download a piece of software from the Internet.

If after reading this one still has a shadow of a doubt, it must be related to the use of encryption. Encryption is supposed to be the panacea to any security problem: you want data protection - encrypt it; you want double protection - encrypt it twice. The truth is that encryption will only help if properly used and in the case of copy protection, its use will degenerate into obfuscation, as I have already discussed in this post. Some copy protection schemes use encryption to some extent, but the problem of securing the encryption key is where designers have to rely on obfuscation and that is what the attackers will go after. But actually, in most cases, the implementation of the encryption algorithm or of the entire application is flawed in such a way that data can be accessed without even worrying about getting the encryption keys.

The difference between security and copy protection can be likened to the difference between encryption and obfuscation. With encryption (and other security schemes), the data has some strong guarantee of being secure even if the algorithm is made public, as long as the encryption key is kept secret; in fact, all successful commercial algorithms have had their details made available to researchers before and after being adopted. With obfuscation, the data is safe as long as the algorithm is kept a secret, but as soon as the algorithm is discovered, the entire scheme fails - and there is no way to keep the algorithm secret against a smart attacker that has access to the code that executes the algorithm and to the machine on which the algorithm executes.

Another way of illustrating the difference is via the following generic scenarios. In a security scenario, Alice wants to grant Bob access to some data and does not want Charles to have access to that data. In a copy protection scenario, Alice needs to grant Bob access to some data but wants to prevent him from communicating that data to Charles. Same characters, but very different story.

Final comparison - copy protection is about preventing the dissemination of some information, while security is about preventing access to it in the first place.

By this time, it should be clear that copy protection schemes are not offering any security, because they cannot provide any reasonable guarantees around their policy enforcement. But are copy protection schemes really useless? From a security point of view, my answer is yes. From a practical point of view, my answer is that a copy protection scheme may serve some purpose if the product using it has limited use and its users are either not interested, or they are not capable of breaking the scheme, or they don't have the financial capabilities of hiring someone who can. There is some inherent irony here: the more successful a product will become, the less effective its copy protection scheme will be, as it will attract more attention.

The best bet is to work around the need for a copy protection scheme. One way to do this is to have a software application be reliant on an online service requiring some sort of validation from its company's servers. For example, an online RPG requires the use of game servers and even if the credentials would be shared, only one person can play at any time. Another way would be to have the software only run on dedicated devices, whose use is limited - see video game consoles - there is still the risk of copying the media, but if the media used is expensive to produce (game cartridges), then this risk is alleviated somewhat.