For those that read my previous posts, the question in the title may be startling. I want to reassure you from the start: this post is not about encryption being a useless technique; it is just about it not being a solution for certain problems and definitely not being a general solution for any problem. Also, this post is not product specific - I am discussing the technique of encryption in general, not the way encryption is implemented in one particular product. The limitations I will discuss are properties of current standard encryption techniques, not limitations of specific product implementations.
Over the last year, I have seen too many attempts of applying encryption to problems where encryption does not really solve anything. So, in this post, I would like to make an attempt and explain why encryption is not a magical band-aid that only needs to be applied to data, to have it secured. To understand this, we have to understand a few basic facts about encryption and about what it achieves. There are not many such facts, but they are extremely important.
No part of this post requires any deep cryptographical knowledge - the only thing you need to know about encryption is that it transforms data into a form that hides all its properties in such a way that cannot be reverted unless a piece of information referred to as the decryption key is known. The encryption operation requires the use of another piece of information referred to as the encryption key. There are encryption algorithms where the encryption key and the decryption key are the same - these are called symmetric algorithms. There are also algorithms in which these are different - these are called asymmetric algorithms.
Fact 1: Encryption does not eliminate the need to protect some data
A small detail, but most important, really. If we want to protect some data, encryption does not fully help - it just reduces the amount of data we need to protect. Instead of having to protect a large amount of data, we will just have to protect a small piece - the decryption key. This changes the difficulty of a problem: for example, if you want to keep secret 100 pages of information, one choice is to memorize them and then burn them, but few people can memorize so much information, so a more practical solution using encryption would be to encrypt those pages using a password and then to commit that password to memory. The bottom line is that you still have some information to protect, which, if discovered, would compromise your encrypted data.
So, encrypted data is only as secure as you can secure the key that decrypts it.
But how can we secure the decryption key? Encrypting it with another key only results in the same problem. So, eventually, we will need to protect a key with one or more of the following methods used to control access to data:
(1) - something that you know (for example: a password or a PIN)(2) - something that you have (for example: a magnetic card)(3) - something that only you are supposed to be able to produce (biometrics, for example: fingerprint, iris scan, voice recognition)
Debit cards, for example, are a combination of (1) and (2). (3) is tricky, because, for example, the verification method for voice recognition should be able to distinguish between the real voice and a recording. Credit cards, when the signature is verified or when they have a photograph attached, attempt to do a combination of (2) and (3).
Fact 2: Encrypted data is unusable until it is decrypted
This is pretty obvious, right? But this is often ignored and it makes a lot of difference in some scenarios I will discuss later here. Encryption is designed to hide the meaning of the data, which means we cannot make much sense out of it in encrypted form, so it is no longer as usable as it used to be when it was unprotected. If encryption would not achieve this, it would not be very useful for protecting the data.
Fact 3: Decryption cannot happen without having access to the decryption key
Again, a rather obvious fact. I am almost embarassed of having to mention it, as it is what we are relying on when we use encryption: we want to make sure the process cannot be reverted without knowing the decryption key. But there is a useful takoff from this obvious fact, which is overlooked more often than not.
Fact 4: Designing an encryption algorithm is not easy
Any decent cryptography book will tell us right away that designing a good encryption algorithm is not something anyone can do. Unless you have devoted years of your life with successful results to the science of cryptography, you should use standard algorithms validated by the scientific community, instead of designing and implementing your own.
Fact 5: There is nothing magic about encryption
If you can read this post to the end and it makes sense to you, then you will probably agree about this fact. If not, then I am not sure what I can suggest other than to post your questions in the comment section.
What are some key learnings from these facts?
Takeoff 1: Decryption key protection is essential for safeguarding the encrypted data
This derives from Fact 1. It means that if we cannot secure the decryption key, there was not much point in encrypting the data at all. Whoever can get the key and the encrypted data, can get access to the decrypted data.
Takeoff 2: A system that does any meaningful querying of encrypted data needs to have access to the decryption key
A system cannot use the encrypted data without decrypting it first (as per Fact 2) and it cannot do that without access to the decryption key (as per Fact 3), hence access to the key is required.
So, encryption will help against someone that can only get access to the encrypted data. This is a valid use scenario for encryption. For example, it will protect against someone that steals an encrypted database, if the decryption key was not stolen as well. But let us look at a few other database related scenarios that come up pretty often and for which encryption is of no benefit:
Scenario 1: DRM
DRM is mainly used in relation to media protection, such as music files, but in this case, I use this acronym for the following kind of database scenario:
"I would like to package my database application in a form that would allow a customer to use it, but without him ever being capable to access the actual data stored in it. I think encrypting the database should help".
Unfortunately, encryption will not help. The reason it will not help is that the customer still needs to be able to use the database, so, according to Fact 2 and Takeoff 2, the encryption key needs to be present somewhere on the customer's machine. At this point, the protection breaks, because there is no place on a machine where software can store some information in such a way that the machine owner cannot get to it. Once the key is found, the database can be decrypted and the protection is broken.
Encryption in this case would act as obfuscation. This is the case with the old SQL Server CREATE PROCEDURE ... WITH ENCRYPTION feature. This feature has been hammered hard because it attempted to do something impossible. Some people suggested improvements related to using a better encryption algorithm. This was not the problem - this feature uses RC4 and no one broke it by breaking the encryption algorithm - attackers just figured out where the system gets the key from. Any other solution for this problem would just change a bit the algorithm, but the issues remain the same.
To be more specific, the difference between obfuscation and encryption is the following: with obfuscation, the strength relies in the algorithm itself, disclosing it would allow someone to easily break it; with encryption, the strength relies in the decryption key, not in the algorithm - even knowing the algorithm would not be sufficient to break the encryption. Using encryption and obfuscating the decryption key makes the ensemble equate to obfuscation.
Obfuscation is not completely useless. It can serve a purpose if the cost of breaking it is higher than the benefit of breaking it. However, a successful commercial obfuscation method will most certainly be broken and the chance will increase with the degree of commercial success of the method, because the more data it protects, the more benefit can be derived from breaking it. It really is a matter of economics, more than security.
Scenario 2: Separation of duties from machine administrator
"I think encrypting the data in the database will prevent a machine administrator from being able to read it".
Unfortunately, encryption will not help with this scenario either. The problem is similar to the one in the previous scenario: the decryption key has to be available to the database system, which needs it to be able to process the data, but this also makes it available to the machine administrator, which has unlimited access to the system.
Unfortunately, PCs were not designed with separation of duties in mind. The PC acronym stands for personal computer - this machine was originally meant to be a personal device; it is only due to the amazing advancements in hardware over the last two decades that this originally humble machine became capable of handling increasingly complicated tasks, which require a different level of security. The result is that the PC does not permit information to be processed by software without the administrator of the machine being able to see it. Trying to protect information from the eyes of the machine administrator is similar to attempting to hide your valuables from your landlord - you can sue him after he steals them, but you cannot prevent him from doing so.
To be clear, it would be extremely expensive to build a machine and the software around it that could provide an absolute guarantee about separation of duties. And using the system would probably be fairly inconvenient. So, it should not come as a surprise that a PC does not provide such guarantee.
Scenario 3: Protections against a hacker attack on an online database
"If an attacker hacks into my database, he'll get my sensitive information, but if I encrypt it, it will be useless to him".
Well, depending on how the hacker gets in and how the data is used, this may or may not be true. In most practical situations, unfortunately, I expect that this will not hold true.
To take a recent data theft as an example, an attacker was able to use a SQL injection attack to steal SSNs from a database system. The investigation of this attack appeared to deplore the fact that this information was stored in clear. But would it have really helped if it were encrypted? The database system was probably using that information and was serving it in decrypted form to the authorized users that requested it; if the injection was made under an account that was authorized to see the data in decrypted form, the attacker would have been able to read that data as easily as if it was not encrypted.
Too many "if"s? Maybe, but what they mean is that there is no guarantee that the encrypted data is safe from an attacker that gets access to the live system. Especially if encryption was done "transparently", for usability purposes, then there would be no barrier against an attacker other than the regular authorization system - such encryption would not protect at all against a SQL injection attack.
To conclude this rather long article, encryption is not a "silver bullet" of security. Having data encrypted does not necessarily mean that it also becomes automatically secured. Unfortunately, encrypted data still has to be decrypted before being used, and at that moment it becomes vulnerable. We can make some strong security guarantees based on the use of encryption in some scenarios, but we cannot use encryption in an arbitrary scenario and claim security just for doing that.
If you want to learn more about encryption than what I covered here (I could not cover very much), then I recommend the books written by Bruce Schneier. He has been writing for years about the dangers of using cryptography (or other various procedures) without understanding security and, while I wrote this post in my own words, I cannot shake the feeling that I am repeating some of the ideas that Bruce keeps mentioning in his books - maybe for the same reason why he also keeps mentioning them, because some security mistakes are constanty being repeated too.