Encryption in Varasto
Summary: configurable security¶
You can configure the "security dial" of Varasto between maximum convenience and maximum security - even on a per-directory-tree basis.
(The data on disk is always encrypted with strong encryption, but how the data encryption keys are accessible is configurable for convenience.)
For most casual users you don't even necessarily have to understand about these security mechanisms. Varasto can do the best it can without getting in the way by configuring easy-to-use defaults - the security will still be comparable to full-disk encryption solutions.
If you are interested in improving the security beyond baseline, read on!
Glossary¶
Tip
Locating these terms in the diagram below will make understanding easier!
Term | Meaning |
---|---|
DEK (Data Encryption Key) | Encryption key used to encrypt actual files |
KEK (Key Encryption Key) | Encryption key used to encrypt other encryption keys - DEKs in our case |
HSM (Hardware Security Module) | Varasto can optionally use a HSM to securely store KEKs in a way that the KEK cannot be stolen |
Key envelope | DEK is encrypted/wrapped inside an envelope in a way that only the KEK can decrypt the DEK. I.e. KEK (and by extension, optionally the HSM) controls access to data. |
You can read more about above concepts at Google KMS. (Google has nice docs on this, but we're not Google specific)
Overview¶
Each collection in Varasto has a different DEK, so compromise of one collection does not compromise other collections. The DEKs are stored in Varasto's database.
If an attacker steals your encrypted files and Varasto's database or gets access to it, she could steal all DEKs to be able to decrypt all your data.
To protect from this, we encrypt the DEKs with a KEK within a "key envelope" - one envelope for each of your KEK. This way a given DEK can be decrypted by having any one of your KEKs.
This way even if we have millions of collections with millions of DEKs, we can have all of them protected with about two "root" KEKs that can be stored in a high-security place with auditing and/or physical key press to grant access to just one collection at a time.
HSM or not?¶
For casual users we can keep the private portion of the KEK stored inside Varasto so you can fiddle with your files without being asked to do anything.
For more advanced users the private portion of KEKs can be stored in HSMs so that any data in Varasto can only be read by asking the HSM to decrypt the DEK.
You can also mix-and-match security levels by having in-Varasto KEK for your less private data and HSM-backed KEKs for your more private data.
Data security | KEK in HSM | Bulletproof auditing | Touch-to-decrypt | Convenience |
---|---|---|---|---|
Low | ☐ | ☐ | ☐ | High |
Medium | ☑️ | ☑️ | ☐ | Medium |
High | ☑️ | ☑️ | ☑️ | Low |
Bulletproof auditing = if your HSM is on a dedicated server with almost nothing but the HSM and the auditing running, it'd be pretty hard for an attacker to bypass the auditing.
You can even have touch-to-decrypt for your employees' machines by having a YubiKey or similar generate a "I approve requesting the remote-stored KEK to grant me access to these files" -signature which will be relayed to the HSM service to authorize the DEK decryption.
Why do I need at least two KEKs?¶
Why¶
Remember, a KEK unlocks all your DEKs. If you only have one KEK and lose it:
- => you lose the DEKs (since they're encrypted with the lost KEK)
- => you lose access to your files (since accessing the data requires the DEK)
HSMs can break or be stolen, so you can lose a KEK. Therefore we recommend you to always have at least one backup KEK so you won't lose your files.
You can unlock a given DEK with any one of your KEKs.
Backup KEKs¶
The backup KEK can be:
- another HSM if you need high availability or want zero (or minimal) downtime if your primary HSM fails
- or if you don't mind some downtime when a HSM fails, you can store your backup key offline on paper or inside a flash drive etc.
What about backing up DEKs?¶
You don't need to worry about backing up DEKs, because they're stored in Varasto's database and that DB is covered by Varasto's DB backup mechanism.
Algorithms used¶
DEK¶
Your files are encrypted with a 256-bit DEK that encrypts with AES in CTR mode with a unique IV that is never reused due to it being the plaintext content hash and Varasto's CAS nature stores each unique content blob only once.
KEK¶
KEKs use public key crypto (RSA-OAEP) to asymmetrically wrap ("key envelope") the DEKs. This means that if you store the private portion of the KEK outside of Varasto (a HSM maybe), Varasto itself can't even access the files that you store.
EC support is being researched but might not be feasible.
Limitations of Varasto's crypto design¶
Since SHA-256
of plaintext is stored in the database, some knowledge is leaked: if someone
already has the file that you have stored, they can compute the hash and see that you have
the same hash, i.e. know that you have the same file.
This means Varasto is not great for you if you have data that also other people have and you want to have the ability to deny those people that you have the same data as they have.
Please note that this only applies if those people have access to your Varasto database.
Can I trust Varasto's encryption?¶
Summary
We've provided an OpenSSL command to decrypt a file that Varasto has encrypted. If you trust OpenSSL and the parameters we give it (and our rationales), you can trust Varasto's implementation.
Plaintext file¶
I stored a picture of a kitten, plaintext.jpg, in my Varasto installation.
The file is under 4 MB, so it got stored as a single blob. The blob's (and in this case,
whole file's) sha256(plaintext)
is
b2b0d7f8c66c11ae1355a7f240ec6fe421a71f77fa3022c032783a39bdfb14cb
.
Encryption key¶
The collection the blob was stored in, got assigned DEK
fc5832fa18c5d2534c8a387b90e83ced6dc987dfb0a98bd3dceef0f36cc3e697
(256-bit AES key).
Was the key generated in a safe manner?
You can audit the calling code here.
(crypto/rand.Read is safe for cryptographic use.)
On-disk encrypted file¶
I copied the on-disk encrypted blob here as ciphertext.bin.
Where did I find the encrypted file from?
My instance had it stored in /mnt/varastotest/m/ao/dfu66dg8qs4qlkvp41r3fsggqe7rnv8o25g1if0t3jffr2j5g
.
The path is lowercased base32 extended alphabet of the sha256 hash.
Why base32? It takes less disk space than hex encoding, but we can't use base64 because it contains mixed case letters and unfortunately we have to support Windows (which has largely a case insensitive filesystem).
OpenSSL command for decrypting a file encrypted by Varasto¶
Detail | Value |
---|---|
Cipher | AES-256 |
Mode of operation | CTR |
DEK | fc5832fa18c5d2534c8a387b90e83ced6dc987dfb0a98bd3dceef0f36cc3e697 |
IV | b2b0d7f8c66c11ae1355a7f240ec6fe4 (first half of sha256(plaintext) , 128 bits) |
Is CTR a safe mode?
See a good video on encryption modes.
CTR can be unsafe if you don't authenticate the data - that's why there's more modern mode GCM, but we chose not to use it because GCM uses more disk space (CTR gives us nice 1:1 on lengths of plaintext and ciphertext) and we already do authentication due to our CAS design, so ciphertext tampering wouldn't go unnoticed.
Is the IV safe?
A secure IV can be public knowledge but must never be reused (so usually a random
source is used). Since we use sha256(plaintext)
as IV, it's guaranteed to not be
reused because due to our CAS nature we only store each unique hash (and by extension,
IV) once.
See Crypto StackExchange on Is it safe to use file's hash as IV?.
Also see Is Convergent Encryption really secure?:
If it's implemented properly, it is as secure as any other form of encryption in preventing those who don't know the data from obtaining it from the encrypted data. However, it does have one fundamental limitation that, so far as we know, is inherent in the technology -- Anyone who has the same file you have can potentially prove that you have that file.
The above limitation is already present by design in every CAS-based system.
The IV is 128 bits because that's AES's block size and what the CTR mode requires.
Plucking in the values from the above table, we get this openssl
command:
$ openssl enc -aes-256-ctr -d \
-iv b2b0d7f8c66c11ae1355a7f240ec6fe4 \
-K fc5832fa18c5d2534c8a387b90e83ced6dc987dfb0a98bd3dceef0f36cc3e697 \
-in ciphertext.bin \
-out plaintext-decrypted-from-varasto.jpg
Does the decrypted file match the original plaintext?
$ sha256sum plaintext.jpg plaintext-decrypted-from-varasto.jpg
b2b0d7f8c66c11ae1355a7f240ec6fe421a71f77fa3022c032783a39bdfb14cb plaintext.jpg
b2b0d7f8c66c11ae1355a7f240ec6fe421a71f77fa3022c032783a39bdfb14cb plaintext-decrypted-from-varasto.jpg
They match. You can download the ciphertext.bin
and plaintext.jpg
from this article to
test it yourself.
Recap¶
Since the original plaintext and OpenSSL's decryption results match, you can trust Varasto's encryption implementation if you:
- Trust AES-256-CTR as a good cipher and mode
- Trust OpenSSL's implementation of AES-256-CTR
- Trust Varasto's encryption key generation
- Trust Varasto's IV selection
We have given explanations to these selections and places/tools for you to audit them yourself.
What if my HSM gets stolen?¶
Since the HSM grants access via the KEK to all the DEKs, an attacker stealing the HSM (and your encrypted data) would enable decryption of all your files. We recommend using a HSM that only grants access to the KEK after a PIN is entered. This way if an attacker steals the HSM she can't use the KEK without unlocking the HSM first with a PIN.
Remember to safeguard your backup KEKs as well as your primary KEK - your security is only as strong as your weakest link.
How does integrity verification work with encrypted content?¶
Read first
Read on what a CAS is.
Integrity verification in a generic CAS system¶
With a generic CAS, one can verify integrity by checking that the file content matches
the hash it was stored under - i.e. our integrity verifier could use the SHA-256
hashes.
Nice and simple.
Things get a little more complicated if we want encryption, deduplication and for file scrubbing to be possible without access to the encryption keys.
Integrity verification in an encrypted, deduplicated CAS system¶
We have two different perspectives for integrity verification, with their minimal requirements:
User accesses a file | File scrubbing | |
---|---|---|
Detect drive I/O errors | ☑️ | ☑️ |
Detect bit rot | ☑️ | ☑️ |
Detect tampering | ☑️ | ☐ |
(cryptographically secure hash needed) | ☑️ | ☐ |
Given our use cases and requirements, here's a rough list of our options:
Encryption | Deduplication1 | Scrubber works w/o encryption keys | CAS address | Scrubber checks |
---|---|---|---|---|
☐ | ☑️ | n/a | plaintext | (CAS address) |
☑️ | ☑️ | ☐ | plaintext | (CAS address) |
☑️ | ☐ | ☑️ | ciphertext | (CAS address) |
☑️ | ☑️ | ☑️ | plaintext | CRC32(ciphertext) |
Since we want to tick all the boxes (encryption, deduplication and scrubbing without encryption keys), we're left with having to have a separate hash, for scrubber's use, based on the ciphertext.
Why CRC32
? Because it's much cheaper than SHA256
(would've been more consistent though)
and it doesn't have to be a cryptographic hash capable of detecting tampering - because it'll
be verified when an actual user accesses the file:
-
If our CAS used ciphertext hash as address, we'd lose deduplication because quality ciphertext is always indistinguishable from randomness. ↩