ANALYZE
Collects statistics about a database.
Synopsis
ANALYZE [VERBOSE] [SKIP_LOCKED] [<table> [ (<column> [, ...] ) ]]
ANALYZE [VERBOSE] [SKIP_LOCKED] {<root_partition_table_name>|<leaf_partition_table_name>} [ (<column> [, ...] )]
ANALYZE [VERBOSE] [SKIP_LOCKED] ROOTPARTITION {ALL | <root_partition_table_name> [ (<column> [, ...] )]}
Description
ANALYZE
collects statistics about the contents of tables in the database, and stores the results in the system table pg_statistic. Subsequently, Apache Cloudberry uses these statistics to help determine the most efficient execution plans for queries. For information about the table statistics that are collected, see Notes.
With no parameter, ANALYZE
collects statistics for every table in the current database. You can specify a table name to collect statistics for a single table. You can specify a set of column names in a specific table, in which case the statistics only for those columns from that table are collected.
ANALYZE
does not collect statistics on external tables.
For partitioned tables, ANALYZE
collects additional statistics, HyperLogLog (HLL) statistics, on the leaf partitions. HLL statistics are used are used to derive number of distinct values (NDV) for queries against partitioned tables.
- When aggregating NDV estimates across multiple leaf partitions, HLL statistics generate a more accurate NDV estimates than the standard table statistics.
- When updating HLL statistics,
ANALYZE
operations are required only on leaf partitions that have changed. For example,ANALYZE
is required if the leaf partition data has changed, or if the leaf partition has been exchanged with another table. For more information about updating partitioned table statistics, see Notes.
Important If you intend to run queries on partitioned tables with GPORCA enabled (the default), then you must collect statistics on the root partition of the partitioned table with the
ANALYZE
orANALYZE ROOTPARTITION
command. For information about collecting statistics on partitioned tables and when theROOTPARTITION
keyword is required, see Notes. For information about GPORCA, see Overview of GPORCA in the Apache Cloudberry Administrator Guide.
Note You can also use the Apache Cloudberry utility
analyzedb
to update table statistics. Theanalyzedb
utility can update statistics for multiple tables concurrently. The utility can also check table statistics and update statistics only if the statistics are not current or do not exist. For information about the utility, see the Apache Cloudberry Utility Guide.
Parameters
{ root_partition_table_name | leaf_partition_table_name } [ (column [, ...] ) ]
Collect statistics for partitioned tables including HLL statistics. HLL statistics are collected only on leaf partitions.
ANALYZE root_partition_table_name
, collects statistics on all leaf partitions and the root partition.
ANALYZE leaf_partition_table_name
, collects statistics on the leaf partition.
By default, if you specify a leaf partition, and all other leaf partitions have statistics, ANALYZE
updates the root partition statistics. If not all leaf partitions have statistics, ANALYZE
logs information about the leaf partitions that do not have statistics. For information about when root partition statistics are collected, see Notes.
ROOTPARTITION [ALL]
Collect statistics only on the root partition of partitioned tables based on the data in the partitioned table. If possible, ANALYZE
uses leaf partition statistics to generate root partition statistics. Otherwise, ANALYZE
collects the statistics by sampling leaf partition data. Statistics are not collected on the leaf partitions, the data is only sampled. ALL
statistics are not collected.
For information about when the ROOTPARTITION
keyword is required, see Notes.
When you specify ROOTPARTITION
, you must specify either ALL
or the name of a partitioned table.
If you specify ALL
with ROOTPARTITION
, Apache Cloudberry collects statistics for the root partition of all partitioned tables in the database. If there are no partitioned tables in the database, a message stating that there are no partitioned tables is returned. For tables that are not partitioned tables, statistics are not collected.
If you specify a table name with ROOTPARTITION
and the table is not a partitioned table, no statistics are collected for the table and a warning message is returned.
The ROOTPARTITION
clause is not valid with VACUUM ANALYZE
. The command VACUUM ANALYZE ROOTPARTITION
returns an error.
If all the leaf partitions have statistics, performing ANALYZE ROOTPARTITION
to generate root partition statistics should be quick (a few seconds depending on the number of partitions and table columns). If some of the leaf partitions do not have statistics, then all the table data is sampled to generate root partition statistics. Sampling table data takes longer and results in lower quality root partition statistics.
For the partitioned table sales_curr_yr, this example command collects statistics only on the root partition of the partitioned table.
ANALYZE ROOTPARTITION sales_curr_yr;
This example ANALYZE
command collects statistics on the root partition of all the partitioned tables in the database.
ANALYZE ROOTPARTITION ALL;
VERBOSE
Enables display of progress messages. When specified, ANALYZE
emits this information
- The table that is being processed.
- The query that is run to generate the sample table.
- The column for which statistics is being computed.
- The queries that are issued to collect the different statistics for a single column.
- The statistics that are collected.
SKIP_LOCKED
Specifies that ANALYZE
should not wait for any conflicting locks to be released when beginning work on a relation: if it cannot lock a relation immediately without waiting, it skips the relation. Note that even with this option, ANALYZE
may still block when opening the relation's indexes or when acquiring sample rows from partitions, table inheritance children, and some types of foreign tables. Also, while ANALYZE
ordinarily processes all partitions of specified partitioned tables, this option will cause ANALYZE
to skip all partitions if there is a conflicting lock on the partitioned table.
table
The name (possibly schema-qualified) of a specific table to analyze. If omitted, all regular tables (but not foreign tables) in the current database are analyzed.
column
The name of a specific column to analyze. Defaults to all columns.
Notes
Foreign tables are analyzed only when explicitly selected. Not all foreign data wrappers support ANALYZE
. If the table's wrapper does not support ANALYZE
, the command prints a warning and does nothing.
It is a good idea to run ANALYZE
periodically, or just after making major changes in the contents of a table. Accurate statistics helps Apache Cloudberry choose the most appropriate query plan, and thereby improve the speed of query processing. A common strategy for read-mostly databases is to run VACUUM and ANALYZE
once a day during a low-usage time of day. (This will not be sufficient if there is heavy update activity.) You can check for tables with missing statistics using the gp_stats_missing
view, which is in the gp_toolkit
schema:
SELECT * from gp_toolkit.gp_stats_missing;