Perform Full Backup and Restore
Apache Cloudberry supports backing up and restoring the full database in parallel. Parallel operations scale regardless of the number of segments in your system, because segment hosts each write their data to local disk storage at the same time.
gpbackup
and gprestore
are Apache Cloudberry command-line utilities that create and restore backup sets for Apache Cloudberry. By default, gpbackup
stores only the object metadata files and DDL files for a backup in the Apache Cloudberry coordinator data directory. Apache Cloudberry segments use the COPY ... ON SEGMENT
command to store their data for backed-up tables in compressed CSV data files, located in each segment's backups directory.
The backup metadata files contain all of the information that gprestore
needs to restore a full backup set in parallel. Each gpbackup
task uses a single transaction in Apache Cloudberry. During this transaction, metadata is backed up on the coordinator host, and data for each table on each segment host is written to CSV backup files using COPY ... ON SEGMENT
commands in parallel. The backup process acquires an ACCESS SHARE
lock on each table that is backed up.
Install the gpbackup
and gprestore
utilities
Before installing the gpbackup
and gprestore
utilities, make sure that you have the Golang (v1.11 or later) installed and that you have set the Go PATH
environment
variable.
-
Pull the
apache/cloudberry-gpbackup
GitHub repository to the target machine.go install github.com/apache/cloudberry-gpbackup@latest
The repository is placed in
$GOPATH/pkg/mod/github.com/apache/cloudberry-gpbackup
. -
Enter the
apache/cloudberry-gpbackup
directory. Then, build and install the source code:cd <$GOPATH/pkg/mod/github.com/apache/cloudberry-gpbackup>
make depend
make buildYou might encounter the
fatal: Not a git repository (or any of the parent directories): .git
prompt after runningmake depend
. Ignore this prompt, because this does not affect the building.The
build
target will put thegpbackup
andgprestore
binaries in$HOME/go/bin
. This operation will also try to copygpbackup_helper
to the Apache Cloudberry segments (by retrieving hostnames fromgp_segment_configuration
). -
Check whether the build is successful by checking whether your
$HOME/go/bin
directory containsgpback
,gprestore
, andgpbackup_helper
.ls $HOME/go/bin
-
Validate whether the installation is successful:
gpbackup --version
gprestore --version
Back up the full database
To perform a complete backup of a database, as well as Apache Cloudberry system metadata, use the command:
gpbackup --dbname <database_name>
For example:
$ gpbackup --dbname test_04
20240108:17:17:18 gpbackup:gpadmin:cbdb-coordinator:001945-[INFO]:-gpbackup version = 1.2.7-beta1+dev.7
20240108:17:17:18 gpbackup:gpadmin:cbdb-coordinator:001945-[INFO]:-Greenplum Database Version = oudberry Database 1.0.0 build 5551471267
20240108:17:17:18 gpbackup:gpadmin:cbdb-coordinator:001945-[INFO]:-Starting backup of database test_04
20240108:17:17:18 gpbackup:gpadmin:cbdb-coordinator:001945-[INFO]:-Backup Timestamp = 20240108171718
20240108:17:17:18 gpbackup:gpadmin:cbdb-coordinator:001945-[INFO]:-Backup Database = test_04
20240108:17:17:18 gpbackup:gpadmin:cbdb-coordinator:001945-[INFO]:-Gathering table state information
20240108:17:17:18 gpbackup:gpadmin:cbdb-coordinator:001945-[INFO]:-Acquiring ACCESS SHARE locks on tables
20240108:17:17:18 gpbackup:gpadmin:cbdb-coordinator:001945-[INFO]:-Gathering additional table metadata
20240108:17:17:18 gpbackup:gpadmin:cbdb-coordinator:001945-[INFO]:-Getting storage information
20240108:17:17:18 gpbackup:gpadmin:cbdb-coordinator:001945-[WARNING]:-No tables in backup set contain data. Performing metadata-only backup instead.
20240108:17:17:18 gpbackup:gpadmin:cbdb-coordinator:001945-[INFO]:-Metadata will be written to /data0/coordinator/gpseg-1/backups/20240108/20240108171718/gpbackup_20240108171718_metadata.sql
20240108:17:17:18 gpbackup:gpadmin:cbdb-coordinator:001945-[INFO]:-Writing global database metadata
20240108:17:17:18 gpbackup:gpadmin:cbdb-coordinator:001945-[INFO]:-Global database metadata backup complete
20240108:17:17:18 gpbackup:gpadmin:cbdb-coordinator:001945-[INFO]:-Writing pre-data metadata
20240108:17:17:18 gpbackup:gpadmin:cbdb-coordinator:001945-[INFO]:-Pre-data metadata metadata backup complete
20240108:17:17:18 gpbackup:gpadmin:cbdb-coordinator:001945-[INFO]:-Writing post-data metadata
20240108:17:17:18 gpbackup:gpadmin:cbdb-coordinator:001945-[INFO]:-Post-data metadata backup complete
20240108:17:17:19 gpbackup:gpadmin:cbdb-coordinator:001945-[INFO]:-Found neither /usr/local/cloudberry-1.0.0/bin/gp_email_contacts.yaml nor /home/gpadmin//gp_email_contacts.yaml
20240108:17:17:19 gpbackup:gpadmin:cbdb-coordinator:001945-[INFO]:-Email containing gpbackup report /data0/coordinator/gpseg-1/backups/20240108/20240108171718/gpbackup_20240108171718_report will not be sent
20240108:17:17:19 gpbackup:gpadmin:cbdb-coordinator:001945-[INFO]:-Backup completed successfully
The above command creates a file that contains global and database-specific metadata on the Apache Cloudberry coordinator host in the default directory, $COORDINATOR_DATA_DIRECTORY/backups/<YYYYMMDD>/<YYYYMMDDHHMMSS>/
. For example:
ls $COORDINATOR_DATA_DIRECTORY/backups/20240108/20240108171718
gpbackup_20240108171718_config.yaml gpbackup_20240108171718_report
gpbackup_20240108171718_metadata.sql gpbackup_20240108171718_toc.yaml
By default, each segment stores each table's data for the backup in a separate compressed CSV file in <seg_dir>/backups/<YYYYMMDD>/<YYYYMMDDHHMMSS>/
. For example:
ls /data1/primary/gpseg1/backups/20240108/20240108171718/
gpbackup_0_20240108171718_17166.gz gpbackup_0_20240108171718_26303.gz
gpbackup_0_20240108171718_21816.gz
To consolidate all backup files into a single directory, include the --backup-dir
option. Note that you need to specify an absolute path with this option:
$ gpbackup --dbname test_04 --backup-dir /home/gpadmin/backups
20240108:17:34:10 gpbackup:gpadmin:cbdb-coordinator:003348-[INFO]:-gpbackup version = 1.2.7-beta1+dev.7
20240108:17:34:10 gpbackup:gpadmin:cbdb-coordinator:003348-[INFO]:-Greenplum Database Version = oudberry Database 1.0.0 build 5551471267
...
20240108:17:34:12 gpbackup:gpadmin:cbdb-coordinator:003348-[INFO]:-Backup completed successfully
$ find /home/gpadmin/backups/ -type f
/home/gpadmin/backups/gpseg0/backups/20240108/20240108173410/gpbackup_0_20240108173410_16593.gz
/home/gpadmin/backups/gpseg-1/backups/20240108/20240108173410/gpbackup_20240108173410_config.yaml
/home/gpadmin/backups/gpseg-1/backups/20240108/20240108173410/gpbackup_20240108173410_report
/home/gpadmin/backups/gpseg-1/backups/20240108/20240108173410/gpbackup_20240108173410_toc.yaml
/home/gpadmin/backups/gpseg-1/backups/20240108/20240108173410/gpbackup_20240108173410_metadata.sql
/home/gpadmin/backups/gpseg1/backups/20240108/20240108173410/gpbackup_1_20240108173410_16593.gz
When performing a backup operation, you can use the --single-data-file
in situations where the additional overhead of multiple files might be prohibitive. For example, if you use a third party storage solution such as Data Domain with backups.
Backing up a materialized view does not back up the materialized view data. Only the materialized view definition is backed up.
Restore the full database
To use gprestore
to restore from a backup set, you must use the --timestamp
option to specify the exact timestamp value (YYYYMMDDHHMMSS
) to restore. Include the --create-db
option if the database does not exist in the cluster. For example:
$ dropdb demo
$ gprestore --timestamp 20240108171718 --create-db
20240108:17:42:26 gprestore:gpadmin:cbdb-coordinator:004115-[INFO]:-Restore Key = 20240108171718
20240108:17:42:26 gprestore:gpadmin:cbdb-coordinator:004115-[INFO]:-gpbackup version = 1.2.7-beta1+dev.7
20240108:17:42:26 gprestore:gpadmin:cbdb-coordinator:004115-[INFO]:-gprestore version = 1.2.7-beta1+dev.7
20240108:17:42:26 gprestore:gpadmin:cbdb-coordinator:004115-[INFO]:-Greenplum Database Version = oudberry Database 1.0.0 build 5551471267
20240108:17:42:26 gprestore:gpadmin:cbdb-coordinator:004115-[INFO]:-Creating database
20240108:17:42:26 gprestore:gpadmin:cbdb-coordinator:004115-[INFO]:-Database creation complete for: test_04
20240108:17:42:26 gprestore:gpadmin:cbdb-coordinator:004115-[INFO]:-Restoring pre-data metadata
Pre-data objects restored: 3 / 3 [=================================] 100.00% 0s
20240108:17:42:26 gprestore:gpadmin:cbdb-coordinator:004115-[INFO]:-Pre-data metadata restore complete
20240108:17:42:26 gprestore:gpadmin:cbdb-coordinator:004115-[INFO]:-Restoring post-data metadata
20240108:17:42:26 gprestore:gpadmin:cbdb-coordinator:004115-[INFO]:-Post-data metadata restore complete
20240108:17:42:26 gprestore:gpadmin:cbdb-coordinator:004115-[INFO]:-Found neither /usr/local/cloudberry-1.0.0/bin/gp_email_contacts.yaml nor /home/gpadmin//gp_email_contacts.yaml
20240108:17:42:26 gprestore:gpadmin:cbdb-coordinator:004115-[INFO]:-Email containing gprestore report /data0/coordinator/gpseg-1/backups/20240108/20240108171718/gprestore_20240108171718_20240108174226_report will not be sent
20240108:17:42:26 gprestore:gpadmin:cbdb-coordinator:004115-[INFO]:-Restore completed successfully
If you specified a custom --backup-dir
to consolidate the backup files, include the same --backup-dir
option when using gprestore
to locate the backup files:
$ dropdb test_04
$ gprestore --backup-dir /home/gpadmin/backups/ --timestamp 20240109102646 --create-db
20240109:10:33:17 gprestore:gpadmin:cbdb-coordinator:017112-[INFO]:-Restore Key = 20240109102646
...
20240109:10:33:17 gprestore:gpadmin:cbdb-coordinator:017112-[INFO]:-Restore completed successfully
gprestore
does not attempt to restore global metadata for the Apache Cloudberry system by default. If this is required, include the --with-globals
argument.
By default, gprestore
uses 1 connection to restore table data and metadata. If you have a large backup set, you can improve performance of the restore by increasing the number of parallel connections with the --jobs
option. For example:
$ gprestore --backup-dir /home/gpadmin/backups/ --timestamp 20240109102646 --create-db --jobs 4
Test the number of parallel connections with your backup set to determine the ideal number for fast recovery.
You cannot perform a parallel restore operation with gprestore
if the backup combines table backups into a single file per segment with the gpbackup option --single-data-file
.
Restoring a materialized view does not restore materialized view data. Only the materialized view definition is restored. To populate the materialized view with data, use REFRESH MATERIALIZED VIEW
. When you refresh the materialized view, the tables that are referenced by the materialized view definition must be available. The gprestore
log file lists the materialized views that have been restored and the REFRESH MATERIALIZED VIEW
commands that are used to populate the materialized views with data.
Filter the contents of a backup or restore
Filter by schema
gpbackup
backs up all schemas and tables in the specified database, unless you exclude or include individual schema or table objects with schema level or table level filter options.
The schema level options are --include-schema
, --include-schema-file
, or --exclude-schema
, --exclude-schema-file
command-line options to gpbackup. For example, if the test_04
database includes only 2 schemas, schema1
and schema2
, both of the following commands back up only the schema1
schema:
$ gpbackup --dbname test_04 --include-schema schema1
$ gpbackup --dbname test_04 --exclude-schema schema2
You can include multiple --include-schema
options in a gpbackup
or multiple --exclude-schema
options. For example:
$ gpbackup --dbname test_04 --include-schema schema1 --include-schema schema2
If you have a large number of schemas, you can list the schemas in a text file and specify the file with the --include-schema-file
or --exclude-schema-file
options in a gpbackup
command. Each line in the file must define a single schema, and the file cannot contain trailing lines. For example, this command uses a file in the gpadmin
home directory to include a set of schemas.
$ gpbackup --dbname test_04 --include-schema-file /home/gpadmin/backup-schemas.txt --backup-dir /home/gpadmin/backups