Query Performance in Cloudberry Database
Cloudberry Database dynamically eliminates irrelevant partitions in a table and optimally allocates memory for different operators in a query.These enhancements scan less data for a query, accelerate query processing, and support more concurrency.
-
Dynamic partition elimination
In Cloudberry Database, values available only when a query runs are used to dynamically prune partitions, which improves query processing speed. Enable or deactivate dynamic partition elimination by setting the server configuration parameter
gp_dynamic_partition_pruning
toON
orOFF
; it isON
by default. -
Memory optimizations
Cloudberry Database allocates memory optimally for different operators in a query and frees and re-allocates memory during the stages of processing a query.
Cloudberry Database uses GPORCA by default. GPORCA extends the planning and optimization capabilities of the Postgres optimizer.
📄️ Update Statistics
The most important prerequisite for good query performance is to begin with accurate statistics for the tables. Updating statistics with the ANALYZE statement enables the query planner to generate optimal query plans. When a table is analyzed, information about the data is stored in the system catalog tables. If the stored information is out of date, the planner can generate inefficient plans.
📄️ Create Unique Index on AO Table
Starting from v1.5.0, you can create a unique index on an Append-Optimized (AO) or Append-Optimized Column Store (AOCS) table in Cloudberry Database. With a unique index, Cloudberry Database checks the unique constraint when data is inserted into the AO table to ensure the uniqueness of the data. At the same time, the database optimizes specific queries with the optimizer to improve the query performance. However, this also brings some overhead for maintaining a unique index, especially when inserting data.
📄️ Use Auto Materialized Views for Query Optimization
Since v1.5.0, Cloudberry Database supports automatically using materialized views to process some or all queries (called AQUMV) during the query planning phase. This feature is suitable for queries on large tables and can greatly reduce query processing time. AQUMV uses incremental materialized views (IMVs) because IMVs usually keep the latest data when related tables have write operations.
📄️ Incremental Materialized View
This document introduces the usage scenarios of the incremental materialized view in Cloudberry Database, how to use it, the restrictions, and the things to note.
📄️ Create AO/AOCO Tables and Refresh Materialized Views in Parallel
Since v1.5.0, Cloudberry Database supports creating append-optimized (AO) tables and append-optimized column-oriented (AOCO) tables in parallel by using the CREATE TABLE AS statement, and supports refreshing materialized views in parallel based on the AO or AOCO tables. Parallel processing accelerates table creation and materialized view refresh.
📄️ Parallel Queries Execution
This document introduces the usage scenarios, usage examples, restrictions, and common issues of parallel query execution in Cloudberry Database. When Cloudberry Database executes a query, multiple CPU cores are used to process a single query, thereby improving query performance. The database dynamically adjusts the number of computing nodes (including the SeqScan operator) according to the data volume change.
📄️ Use Aggregation Pushdown to Speed Up Query Execution
Aggregation pushdown is an optimization technique that moves the aggregation operation closer to the data source. Cloudberry Database supports pushing down aggregation operations, which means that the aggregation operator is processed before the join operator.
📄️ Use IndexScan on AO Tables
Cloudberry Database supports IndexScan when querying on App-Optimized (AO) tables to improve query efficiency in certain scenarios, such as the following query:
📄️ Use RuntimeFilter to Optimize Join Queries
When performing join queries on large tables, the SQL optimizer of Cloudberry Database often uses the HashJoin operator. HashJoin builds a hash table based on the join key to match the join key, which might cause a memory access bottleneck and disk bottleneck. RuntimeFilter is an optimization technique that generates filters in real-time during HashJoin operations, which can pre-filter data before executing HashJoin to speed up HashJoin operations. In some scenarios, the RuntimeFilter optimization can double execution efficiency.
📄️ Use Column-Level Compression
Apache Cloudberry supports column-level compression, which reduces storage space by compressing specific columns. In some cases, it can also improve query performance, especially when processing large-scale data.
📄️ Resource Groups
You can use resource groups to manage and protect the resource allocation of CPU, memory, concurrent transaction limits, and disk I/O in Apache Cloudberry. Once you define a resource group, you assign the group to one or more Apache Cloudberry roles, or to an external component such as PL/Container, in order to control the resources used by them.