Why should I encrypt my database?

Breaking down the advantages and disadvantages of common approaches to protecting your users' sensitive data against breaches, from plaintext defaults to end-to-end encryption

Author profile picture
  • 7 minutes

Approaches to protecting user data

Plaintext (authentication only)

Most end-user data stored in a database or in files is stored completely as-is, with the only protection coming from controlling access to the database or filesystem. If an attacker gains access to the system, the one line of defense has been defeated and any sensitive user data stored on the system can be leaked. This is the lowest-cost approach but has serious vulnerabilities to a wide range of attacks, and is strongly recommended against by the Open Web Application Security Project1.

This is the default approach taken by the major standalone database providers (MySQL, PostgreSQL, MongoDB, SQL Server, etc.) since it requires no additional configuration, but it is also the most risky. All it takes is one issue to reveal all of your users’ data to an attacker, including:

  • Software vulnerability or misconfiguration (including in any of your project’s dependencies)
  • Incorrect access control policy
  • Weak or reused employee password
  • Social engineering attack
  • Disgruntled employee

If you’re unsure what database encryption method you’re currently using, it’s almost certainly this one.

At-rest encryption

Current industry-standard practice is to encrypt data with at-rest encryption, for instance using a Key Management Service (KMS) provided by Amazon Web Services or Google Cloud Platform. Entire databases are encrypted with a single master key, protecting the data from being accessed if anyone without that key manages to access the system. This offers a definitive improvement over storing data in plaintext, but both software and system administrators need access to the key in order to access the data for normal daily tasks - thus, the keys are often stored entirely in plaintext and with the same access controls as access to the data. If an attacker manages to access your system, they can generally easily obtain the master key and thus access the sensitive user data protected by it.

Hence, the data under at-rest encryption schemes can be seen as effectively stored in plaintext with only authentication, since the key must still be accessible for normal day-to-day operations. This presents a significant attack surface for a full-scale data breach, and there are many instances of large breaches happening despite the usage of at-rest encryption, including:

  • A 2019 Capital One data breach that cost the company a $190 million settlement, along with other associated remediation and reputation costs2
    • This attack originated because Capital One had configured an unrelated portion of their infrastructure with permissions to access the at-rest encryption key, showing how the slightest misconfiguration even by professionals reduces at-rest encryption back to plaintext-level security
  • A 2021 Azure vulnerability revealed by security researchers that enabled unrestricted access to thousands of companies’ databases (including many Fortune 500 companies)3
    • This attack is perhaps even scarier than the Capital One breach above, as even companies that did everything correctly had their configuration changed out from under them by their cloud provider (since they’re the ones managing the keys) that resulted in their keys being exposed

On the whole, at-rest encryption is an improvement over plaintext, but has some major shortcomings since the keys are (a) accessible at all times to systems and administrators and (b) often managed by third parties that may change configuration out from under you.

End-to-end encryption

An end-to-end encrypted system encrypts each user’s data with their own unique key (typically derived from their password), such that the company cannot access that data at all unless the user provides their key (or password). From a privacy standpoint, this is the strongest method to protect your users’ data because only that user controls access to their own data. Major data breaches of large numbers of users’ records (and the associated costs and liability that go along with them) are effectively no longer feasible - even if an attacker compromised your system, your database only contains data encrypted with a key you don’t have access to.

Stricter subsets of end-to-end encryption can also be used to enable zero-knowledge architectures, where your systems are never provided the key and thus have zero risk of ever having the user’s sensitive data. An example of this is client-side encryption, where the user’s device is sent their encrypted data and they decrypt it locally (instead of them sending you the key and you decrypt it on your system). This method is extremely popular for higher-security applications such as cryptocurrency wallets, secure cloud storage, and encrypted messaging applications, as there is no longer any risk of a data breach.

So why don’t more people use end-to-end encryption?

Users lose their passwords. This isn’t a big deal with plaintext or at-rest-encrypted systems as the company has access to the data and can simply reset their password and restore access to it. With end-to-end encryption however, the user’s key is the only way to access the data, and if the user loses it, the data is irretrievably lost.

For these reasons, many end-to-end encryption implementations ask users to print or store a recovery code, memorize a seed phrase, or answer a set of security questions that can be used as a secondary encryption key into their account. But what are the chances a paper code is still around potentially years later when the user forgets their password? How many users have their codes stored on their computer or phone, risking losing the codes if their device is lost or damaged? The reality is that users regularly lose access to these backup methods, and in turn risk losing assets like cryptocurrency or valuable files stored encrypted in the cloud.

Moreover, these backup methods represent a large security risk - if an attacker gets their hands on the user’s recovery code or manages to guess their security questions (most of which are public record or easily guessable, leading to many recommendations to cease using them entirely4), they get full access to that user’s data. All of these methods amount to asking a user to remember or store a second password (and one that they use extremely rarely), and come with the same risks.

Standard end-to-end encryption also requires a lot of work to implement correctly, leading it to be the more expensive option that’s inaccessible to small companies or solo developers.

Where Bunkyr comes in

Obviously, none of the approaches above are great - each has its strengths and weaknesses, with end-to-end encryption at one end with the best security and plaintext at the other with the lowest implementation effort and user-experience cost. But what if there were a way to get the best of both worlds - a system with the security of end-to-end encryption that also preserves user- and developer-friendliness?

Bunkyr solves the problem of your users losing their backup methods by being able to securely generate an encryption key from something used frequently by the user, such as their Google or Facebook account. At any point, if a user forgets their primary key, Bunkyr can provide you with a backup key to unlock access to their encrypted data and set a new password without risking data loss. Bunkyr also provides SDKs and other engineering support to make implementing end-to-end encryption as frictionless as possible.

If you’re interested in learning more about how our system works (and how we protect the keys we generate), please take a look at our more detailed “how it works” page. You can also reach out directly to our engineering team for any questions about our product or to get started with the integration process.

Bunkyr gives you the tools you need to stop worrying about data breaches and get back to building your product.