Skip to main content
[101-6] Lesson 6: Backup and Restore Operations
Learn how to backup and restore your data in the Apache Cloudberry.
info

The Apache Cloudberry does not include the utility gpbackup by default. It's maintained separately. Please follow the README to install gpbackup before using it.

The parallel dump utility gpbackup backs up the Apache Cloudberry master instance and each active segment instance at the same time.

By default, gpbackup creates dump files in the backups subdirectory.

Several dump files are created for the master, containing database information such as DDL statements, the Apache Cloudberry system catalog tables, and metadata files. gpbackup creates dump files for each segment.

You can perform full or incremental backups. To restore a database to its state when an incremental backup was made, it will restore the previous full backup and all subsequent incremental backups.

Backing up a Apache Cloudberry

Each file created for a backup begins with a 14-digit timestamp key that identifies the backup set the file belongs to.

gpbackup can be run directly in a terminal on the master host, or you can add it to crontab on the master host to schedule regular backups.

The parallel restore utility gprestore takes the timestamp key generated by gpbackup, validates the backup set, and restores the database objects and data into a distributed database in parallel. Parallel restore operations require a complete backup set created by gpbackup, a full backup and any required incremental backups.

Restoring a Apache Cloudberry backup

The gpbackup utility provides flexibility and verification options for use with the automated backup files produced by gpbackup or with backup files moved from the Apache Cloudberry array to an alternate location.

Exercises

These exercises will walk through how to create a full backup of your database and then restore a table.

  1. To run a full backup, use gpbackup --dbname database --backup-dir /path/for/backup. This will backup the entire database to the directory given.

    [gpadmin@mdw tmp]$ gpbackup --dbname tutorial --backup-dir /tmp/
    20230727:10:28:19 gpbackup:gpadmin:mdw:020061-[INFO]:-gpbackup version = 1.2.7-beta1+dev.7
    20230727:10:28:19 gpbackup:gpadmin:mdw:020061-[INFO]:-Greenplum Database Version = oudberry Database 1.0.0 build dev
    20230727:10:28:19 gpbackup:gpadmin:mdw:020061-[INFO]:-Starting backup of database tutorial
    20230727:10:28:19 gpbackup:gpadmin:mdw:020061-[INFO]:-Backup Timestamp = 20230727102819
    20230727:10:28:19 gpbackup:gpadmin:mdw:020061-[INFO]:-Backup Database = tutorial
    20230727:10:28:19 gpbackup:gpadmin:mdw:020061-[INFO]:-Gathering table state information
    20230727:10:28:19 gpbackup:gpadmin:mdw:020061-[INFO]:-Acquiring ACCESS SHARE locks on tables
    Locks acquired: 28 / 28 [==============================================================] 100.00% 0s
    20230727:10:28:19 gpbackup:gpadmin:mdw:020061-[INFO]:-Gathering additional table metadata
    20230727:10:28:20 gpbackup:gpadmin:mdw:020061-[INFO]:-Getting storage information
    20230727:10:28:20 gpbackup:gpadmin:mdw:020061-[INFO]:-Metadata will be written to /tmp/gpseg-1/backups/20230727/20230727102819/gpbackup_20230727102819_metadata.sql
    20230727:10:28:20 gpbackup:gpadmin:mdw:020061-[INFO]:-Writing global database metadata
    20230727:10:28:20 gpbackup:gpadmin:mdw:020061-[INFO]:-Global database metadata backup complete
    20230727:10:28:20 gpbackup:gpadmin:mdw:020061-[INFO]:-Writing pre-data metadata
    20230727:10:28:20 gpbackup:gpadmin:mdw:020061-[INFO]:-Pre-data metadata metadata backup complete
    20230727:10:28:20 gpbackup:gpadmin:mdw:020061-[INFO]:-Writing post-data metadata
    20230727:10:28:20 gpbackup:gpadmin:mdw:020061-[INFO]:-Post-data metadata backup complete
    20230727:10:28:20 gpbackup:gpadmin:mdw:020061-[INFO]:-Writing data to file
    Tables backed up: 11 / 11 [============================================================] 100.00% 8s
    20230727:10:28:28 gpbackup:gpadmin:mdw:020061-[INFO]:-Data backup complete
    20230727:10:28:28 gpbackup:gpadmin:mdw:020061-[INFO]:-Skipped data backup of 3 external/foreign table(s).
    20230727:10:28:28 gpbackup:gpadmin:mdw:020061-[INFO]:-See /home/gpadmin/gpAdminLogs/gpbackup_20230727.log for a complete list of skipped tables.
    20230727:10:28:29 gpbackup:gpadmin:mdw:020061-[INFO]:-Found neither /usr/local/cloudberry-db/bin/gp_email_contacts.yaml nor /home/gpadmin/gp_email_contacts.yaml
    20230727:10:28:29 gpbackup:gpadmin:mdw:020061-[INFO]:-Email containing gpbackup report /tmp/gpseg-1/backups/20230727/20230727102819/gpbackup_20230727102819_report will not be sent
    20230727:10:28:29 gpbackup:gpadmin:mdw:020061-[INFO]:-Backup completed successfully

    This runs a full backup of the database created during the previous exercises.

  2. To view the backups:

    $ ls -al /tmp/

    drwxrwxr-x 3 gpadmin gpadmin 21 Jul 27 10:28 gpseg-1
    drwxrwxr-x 3 gpadmin gpadmin 21 Jul 27 10:28 gpseg1
    drwxrwxr-x 3 gpadmin gpadmin 21 Jul 27 10:28 gpseg0
  3. Now, that we have a full backup let's remove data from a table to simulate a failure.

    psql (14.4, server 14.4)
    Type "help" for help.
    tutorial=# select count(*) from faa.otp_r;
    count
    ---------
    1024552
    (1 row)

    This should return 1024552 rows in the table. Let's truncate the table and then check the row count:

    tutorial=# truncate table faa.otp_r;
    TRUNCATE TABLE
    tutorial=# select count(*) from faa.otp_r;
    count
    -------
    0
    (1 row)

    tutorial=#

    The report should now show 0 rows in the table.

  4. Let's restore the data that was lost. First, exit from the psql shell by typing \q then issue the gprestore command(14 digital timestamp information could be got from previous gpbackup output, also you may record it down once you have done any backup before.):

    gprestore --include-table faa.otp_r --data-only   --backup-dir /tmp/ --timestamp 20230727102819
    20230727:10:38:29 gprestore:gpadmin:mdw:020373-[INFO]:-Restore Key = 20230727102819
    20230727:10:38:30 gprestore:gpadmin:mdw:020373-[INFO]:-gpbackup version = 1.2.7-beta1+dev.7
    20230727:10:38:30 gprestore:gpadmin:mdw:020373-[INFO]:-gprestore version = 1.2.7-beta1+dev.7
    20230727:10:38:30 gprestore:gpadmin:mdw:020373-[INFO]:-Greenplum Database Version = oudberry Database 1.0.0 build dev
    20230727:10:38:30 gprestore:gpadmin:mdw:020373-[INFO]:-Restoring sequence values
    20230727:10:38:30 gprestore:gpadmin:mdw:020373-[INFO]:-Sequence values restore complete
    Tables restored: 1 / 1 [===============================================================] 100.00% 5s
    20230727:10:38:35 gprestore:gpadmin:mdw:020373-[INFO]:-Data restore complete
    20230727:10:38:35 gprestore:gpadmin:mdw:020373-[INFO]:-Found neither /usr/local/cloudberry-db/bin/gp_email_contacts.yaml nor /home/gpadmin/gp_email_contacts.yaml
    20230727:10:38:35 gprestore:gpadmin:mdw:020373-[INFO]:-Email containing gprestore report /tmp/gpseg-1/backups/20230727/20230727102819/gprestore_20230727102819_20230727103829_report will not be sent
    20230727:10:38:35 gprestore:gpadmin:mdw:020373-[INFO]:-Restore completed successfully
  5. Finally, verify the row count

    [gpadmin@mdw tmp]$ psql -U gpadmin tutorial
    psql (14.4, server 14.4)
    Type "help" for help.
    tutorial=# select count(*) from faa.otp_r;
    count
    ---------
    1024552
    (1 row)

The table should show 1024552 rows again as it was prior to the truncate call.