Transparent Data Encryption
To meet the requirements for protecting user data security, Cloudberry Database supports Transparent Data Encryption (TDE).
TDE is a technology used to encrypt database data files:
- "Data" refers to the data in the database.
- Files are stored in ciphertext on the hard drive disk and processed in plaintext in memory. TDE is used to protect static data, so it is also known as static data encryption.
- "Transparent" means users do not need to change their operational habits. TDE automatically manages the encryption/decryption process without user or application intervention.
Introduction to encryption algorithms
Basic concepts
- DEK (Data Encryption Key): The key used to encrypt data, generated by the database and stored in memory.
- DEK plaintext: The same meaning with DEK, but can only be stored in memory.
- Master key: The key used to encrypt the DEK.
- DEK ciphertext: The DEK encrypted with the master key, stored persistently.
Key management module
The key management module is the core component of TDE, implementing a two-tier key structure: master key and DEK. The master key is used to encrypt the DEK and is stored outside the database; the DEK is used to encrypt database data and is stored in the database in ciphertext.
Algorithm classification
Encryption algorithms are divided into the following types:
- Symmetric encryption: The same key is used for both encryption and decryption.
- Asymmetric encryption: Public key for encryption, private key for decryption, suitable for one-to-many and many-to-one encryption needs.
Block encryption algorithms in symmetric encryption are the mainstream choice, offering better performance than stream encryption and asymmetric encryption. Cloudberry Database supports two block encryption algorithms: AES and SM4.
AES encryption algorithm
AES is an internationally standardized block encryption algorithm, supporting 128, 192, and 256-bit keys. Common encryption modes include:
- ECB: Electronic Codebook mode
- CBC: Cipher Block Chaining mode
- CFB: Cipher Feedback mode
- OFB: Output Feedback mode
- CTR: Counter mode
More ISO/IEC encryption algorithms
More ISO/IEC encryption algorithms include:
- ISO/IEC 14888-3/AMD1 (i.e., SM2): Asymmetric encryption, based on ECC, outperforms RSA.
- ISO/IEC 10118-3:2018 (i.e., SM3): Message digest algorithm, similar to MD5, outputs 256 bits.
- ISO/IEC 18033-3:2010/AMD1:2021 (i.e., SM4): Symmetric encryption algorithm for wireless LAN standards, supports 128-bit keys and block lengths.
User instructions
Before using the TDE feature, ensure the following conditions are met:
- Install OpenSSL: OpenSSL is expected to be installed on the Cloudberry Database node. Typically, Linux distributions come with OpenSSL pre-installed.
- Cloudberry Database version: Make sure your Cloudberry Database version is not less than v1.6.0, which is when TDE support was introduced.
When deploying Cloudberry Database, you can enable the TDE feature through settings, making all subsequent data encryption operations completely transparent to users. To enable TDE during database initialization, use the gpinitsystem
command with the -T
parameter. Cloudberry Database supports two encryption algorithms: AES and SM4. Here are examples of enabling TDE:
-
Using the AES256 encryption algorithm:
gpinitsystem -c gpinitsystem_config -T AES256
-
Using the SM4 encryption algorithm:
gpinitsystem -c gpinitsystem_config -T SM4
Verify TDE effectiveness
The transparent data encryption feature is invisible to users, meaning that enabling or disabling this feature does not affect the user experience during read and write operations. However, to verify the effectiveness of encryption, you can simulate a key file loss scenario and ensure that the database cannot start without the key file by following these steps.
The key file is located on the Coordinator node. To locate the key file, first find the data directory of the Coordinator node. For example:
COORDINATOR_DATA_DIRECTORY=/home/gpadmin/work/data0/master/gpseg-1
Then, find the key files:
$ pwd
/home/gpadmin/work/data0/master/gpseg-1
$ ls -l pg_cryptokeys/live/
total 8
-rw------- 1 gpadmin gpadmin 48 Apr 12 10:26 relation.wkey
-rw------- 1 gpadmin gpadmin 48 Apr 12 10:26 wal.wkey
The relation.wkey
file is the key used to encrypt data files, while the wal.wkey
file is used to encrypt WAL logs. Currently, only relation.wkey
is active; the WAL logs are not yet encrypted.
Verification process
-
Create a table and insert data.
-
Create an append-only (AO) table and insert data:
postgres=# create table ao2 (id int) with(appendonly=true);
postgres=# insert into ao2 select generate_series(1,10); -
Ensure the data has been successfully inserted.
-
-
Stop the database.
gpstop -a
-
Simulate key file loss.
-
Switch to the directory where the key files are stored:
cd /home/gpadmin/work/data0/master/gpseg-1/pg_cryptokeys/
-
Move the key files to another directory (to simulate key file loss):
mv live backup
-
-
Attempt to start the database.
-
Start the database using the
gpstart
command:gpstart -a
The database will fail to start because of the missing key files. You will see an error in the database logs on the Coordinator node, similar to the following:
FATAL: cluster has no data encryption keys
This confirms that the database cannot start without the key files, ensuring data security.
-
-
Restore the key files by moving the previously backed-up key files back to the original directory:
mv backup live
-
Restart the database and verify the data.
-
Start the database again using the
gpstart
command:gpstart -a
-
Once the database has successfully started, query the
ao2
table to verify the data:postgres=# select * from ao2 order by id;
id
----
1
2
3
4
5
6
7
8
9
10
(10 rows)
-
By following these steps, you can verify the effectiveness of the transparent data encryption feature, ensuring that the database cannot start without the key files, thus securing the data at rest.