Version: Next

Manage Spill Files Generated by Queries

Apache Cloudberry creates spill files, also known as workfiles, on disk if it does not have sufficient memory to run an SQL query in memory.

The maximum number of spill files for a given query is governed by the gp_workfile_limit_files_per_query server configuration parameter setting. The default value of 100000 spill files is sufficient for the majority of queries.

If a query creates more than the configured number of spill files, Apache Cloudberry returns this error:

ERROR: number of workfiles per query limit exceeded

Apache Cloudberry might generate a large number of spill files when:

Data skew is present in the queried data.
The amount of memory allocated for the query is too low. You control the maximum amount of memory that can be used by a query with the Apache Cloudberry server configuration parameters max_statement_mem and statement_mem, or through resource group or resource queue configuration.

You might be able to run the query successfully by changing the query, changing the data distribution, or changing the system memory configuration. The gp_toolkit.gp_workfile_* views display spill file usage information. You can use this information to troubleshoot and tune queries. The gp_workfile_** views are described in Check Query Disk Spill Space Usage.

Additional documentation resource:

Using Resource Groups describes memory and spill considerations when resource group-based resource management is active.