The Apache Cloudberry (Incubating) community is pleased to announce that the Cloudberry main branch has upgraded its PostgreSQL kernel from 14.4 to 16.9.
Apache Cloudberry users can now leverage PostgreSQL 16 features—including the MERGE command, SQL/JSON constructors, parallel VACUUM, and a rewritten statistics subsystem—directly in their MPP workloads. This upgrade spans two major PostgreSQL upstream releases (PostgreSQL 15 and 16) and incorporates over 5,730 upstream commits. It marks Cloudberry’s second cross-version kernel upgrade since the project was derived from Greenplum Database and began its independent evolution (the first being the leap from PostgreSQL 12.x — the kernel carried by Greenplum Database 7.0 — to 14.4). It is also the first major kernel upgrade driven by a broad group of community contributors since Cloudberry joined the Apache Incubator.
Why this upgrade matters
The PostgreSQL database kernel is the foundation Cloudberry is built on. Tracking the PostgreSQL upstream release cadence is both an engineering necessity and a responsibility to our ecosystem.
- PostgreSQL 14.x is approaching end of life. Following the PostgreSQL community’s release lifecycle, PostgreSQL 14 is scheduled to reach end of support in November 2026. Completing the major-version upgrade in advance avoids a future where Cloudberry users have to choose between security patches and stability.
- Faster access to PostgreSQL innovation. Features like
pg_stat_io, Memoize, and LZ4 WAL compression—originally developed by the upstream community—now flow into Cloudberry without years of backporting effort. Staying aligned with PostgreSQL upstream lets new features, planner improvements, and ecosystem tooling reach Cloudberry users naturally.
By staying aligned with a modern PostgreSQL releases, Cloudberry ensures users run their MPP workloads on an actively maintained PostgreSQL kernel — one that delivers a decade of upstream improvements without the delays and risks of backporting.
What this upgrade brings
PostgreSQL 15 and 16 together accumulated a large body of improvements. To make this extensive list of updates easy to navigate, we have grouped the key changes into four primary categories for users and administrators, followed by developer-specific internal changes:
1. Modern SQL Capabilities & Query Standards
Upgrading to PostgreSQL 16 brings full support for the SQL-standard MERGE command, robust SQL/JSON constructors, and advanced database collations to Cloudberry:
MERGEStatement: Full support for the SQL-standardMERGEcommand, allowingINSERT / UPDATE / DELETEto be expressed in a single statement. In Cloudberry’s distributed environment, this significantly simplifies ETL/ELT pipelines, replacing complex client-side workarounds and reducing coordinator-segment round-trips. Accompanying fixes cover partitioned tables, cross-partition triggers, RLS,EvalPlanQual, self-modified tuples, andpg_stat_statementsintegration.- SQL/JSON Enhancements: New
IS JSONpredicate and standard JSON constructor functions (json_array,json_object,json_scalar,json_serialize), plus auto-generated query jumbling viagen_node_support.pland stability fixes such as thejson_array()subquery double-transformation issue. - Built-in ICU Support: ICU support is now built by default, bringing superior multilingual sorting and localization out-of-the-box.
initdbusesuloc_getDefault(), andCREATE DATABASE LOCALEapplies across all collation providers. - SQL Types & Functions:
- Numeric scale can now be negative or greater than precision.
- New
ANY_VALUEaggregate. unnest(multirange)andrange_aggwith multirange inputs.pg_size_pretty/pg_size_bytesextended to petabyte scale.ALTER TABLE SET ACCESS METHODfor switching table storage backends.- New
SYSTEM_USERfunction and anumeric_mul()overflow fix.
2. Performance & Resource Optimization
Tracking upstream PostgreSQL releases introduces critical performance, query planning, and storage enhancements designed to scale with your MPP workloads:
- Parallel
VACUUM& Autovacuum:- Parallel VACUUM (PostgreSQL 16, with the new
vacuumparallel.c) helps reclaim space faster, allowing segment nodes to execute vacuuming operations in parallel to reduce disk I/O bottlenecks and locking overhead. ON COMMIT DELETE ROWSno longer errors out afterrelhassubclass=f.- VACUUM end-of-truncation phase uses
WaitLatch()instead ofpg_usleep(). - Numeric overflow fix in parallel numeric aggregates.
ANALYZEpreservesrelhasindexfor partitioned tables.
- Parallel VACUUM (PostgreSQL 16, with the new
- Optimizer & Executor Improvements:
- Hash-table acceleration for
NOT IN (values). - Datum sorts for single-column sorts.
- Result Cache renamed to Memoize, with continued performance tuning.
- Parallel Hash Full Join to speed up complex distributed queries.
- Lower planning cost for deeply nested views, and fewer pallocs when building partition bounds.
- Outer join optimization enhancements and symmetric hash join improvements (PostgreSQL 16).
- Hash-table acceleration for
- WAL & Storage Subsystem:
- LZ4 compression for full-page writes (FPW) to reduce WAL volume and speed up replication under write-heavy workloads.
- Custom WAL resource managers so extensions can register their own WAL record types.
- Direct I/O on macOS, and skipping WAL recycling/preallocation during archive recovery.
- Buffer manager refactored for faster relation extension, with new helpers like
smgrzeroextend(),FileZero(), andFileFallocate(). - WAL/Recovery code split into
xlogrecovery.candxlogprefetcher.c, plus a new backup sink architecture (PostgreSQL 15). - To guarantee data integrity, the
HeapBitmapScanskip_fetchoptimization, which had correctness concerns, has been removed.
- Critical Correctness Planner/Executor Fixes:
- Recognition of multiple hashable
ScalarArrayOpExprs in the planner. - Consistent whole-row Var construction between parsing and planning.
- WindowAgg evaluation crash.
WITH RECURSIVE UNIONassertion failure.ORDER BY/DISTINCTaggregates withFILTER.- Missing expression processing on prune steps in
setrefs.c. - Extreme skew detection in Parallel Hash Join.
- Recognition of multiple hashable
3. Data Sync & Ecosystem Connectivity
This upgrade enhances the reliability and performance of logical replication, backup/restore workflows, and client-side protocols:
- Major Logical Replication Improvements:
- Built-in support for two-phase commit (2PC) on logical replication, which can be enabled directly through the replication protocol. This is vital for maintaining transactional consistency across distributed segments in an MPP layout.
- Parallel apply workers (PostgreSQL 16), row/column-level filtering and schema-level publications (PostgreSQL 15) significantly improve replication throughput and flexibility.
- New predefined role
pg_create_subscriptionfor finer-grained permission management. - A series of correctness and robustness fixes, including a critical data-loss issue, slot invalidation, snapshot handling, and memory leaks.
- Backup, Restore, and Upgrade Tooling:
pg_dump: generic compression API, with zstd and LZ4 frame-only formats.pg_basebackup: extended compression options, server-side compression (including-Fp), client-side LZ4 decompression, and parallel zstd compression.pg_upgrade: fixes for an unintentional'NULL'string literal and cross-versionXMLSERIALIZE(NO INDENT)handling;pg_dumpallcorrectly handles dangling OIDs inpg_auth_members.
- Client Protocol & libpq:
- New
PQsendFlushRequestand a fix for the pipeline-mode state machine. - SASL code refactored into a generic interface.
- Escape function fixes for invalid encoding data, including security-relevant changes.
- Length handling fixes in
PQescapeLiteral()/PQescapeIdentifier(). - Build-time check that libpq does not call
exit()orabort().
- New
- Operator & psql Experience:
\watchhonorsPSQL_WATCH_PAGER.- New
\drgcommand for inspecting role grants. \copy fromnow sends data in larger chunks.pg_waldump --rmgraccepts multiple specifications.
4. Enterprise Observability, Security & Stability
Upgrading the core kernel to PostgreSQL 16 brings a modern, shared-memory statistics subsystem, security guardrails, and key robustness fixes:
- Observability & the pgstat Rewrite:
- Shared-Memory pgstat Rewrite (PostgreSQL 15): The dedicated stats collector process is replaced with shared memory stats. This eliminates a potential performance bottleneck under high concurrent workloads. The monolithic
pgstat.cwas split into 14 files underutils/activity/, making the database kernel cleaner and easier to maintain. - New
pg_stat_ioView (PostgreSQL 16): Detailed I/O statistics including shared buffer hits, I/O times, and writebacks. pg_stat_statementsUpgrades: JIT counters, temp file block I/O timing, normalized utility query strings,MERGEsupport, and a 32-bit integer overflow fix.- Other improvements: SP-GiST index scans counted in pgstat, enhanced
pg_stat_reset_single_table_counters, and new test helperspg_stat_force_next_flush()andpg_stat_have_stats().
- Shared-Memory pgstat Rewrite (PostgreSQL 15): The dedicated stats collector process is replaced with shared memory stats. This eliminates a potential performance bottleneck under high concurrent workloads. The monolithic
- Security and Privilege Model:
CREATEROLEprivilege restrictions: A substantial improvement to the role/privilege model that closes long-standing privilege escalation paths.- New GUCs:
createrole_self_grant,reserved_connections. - New predefined roles:
pg_maintain,pg_create_subscription. - Non-superuser predefined roles for vacuum / analyze workflows.
PUBLIC CREATEis revoked from the public schema, which is now owned bypg_database_owner.- Security Invoker Views (
SECURITY INVOKER). - Fixes around
session_authorizationinteractions and ascram_SaltedPassword()integer overflow.
- Partitioned Table Improvements:
- Self-referencing foreign keys on partitioned tables.
- Correct handling of detach with top-level FKs.
relhassubclassreset on ATTACH; consistent trigger renames; firing state preserved on clone.- Partitionwise join / grouping is disabled when collations do not match, preventing incorrect results.
- Critical Stability and Data Correctness Fixes:
- Fixed data corruption on relation truncation failure.
- Page header is no longer corrupted when an update lands on a non-
LP_NORMALTID; the operation now fails cleanly. - Unsafe
BufferDescriptorsaccess. - GB18030 SIGSEGV from out-of-bounds read.
- Snowball stemmer null pointer dereference after OOM.
- Rare standby assertion failure on primary restart.
- Catcache invalidation during list entry construction.
5. Under the Hood: Updates for Kernel Contributors
If you are a contributor working on the Cloudberry database engine, this upgrade introduces several codebase updates, cleaner abstraction layers, and automated testing tools:
- Pervasive Storage Refactor:
RelFileNode➔RelFileLocatorrename. This touches hundreds of files, aligning Cloudberry's internal naming conventions with modern PostgreSQL. - Configuration Engine Decoupling: GUC system split:
guc.c➔guc.c+guc_tables.c+guc_funcs.c. - Header Optimization:
varatt.hseparated frompostgres.h. - Modernized Testing Cluster: The test suite transitioned from
PostgresNode.pmto the newPostgreSQL::Test::Clusterframework, improving testing speed and reliability. - Parser Automation:
gen_node_support.plauto-generatescopyfuncs/equalfuncs/outfuncs/readfuncsand query jumbling code, with a NodeTag ABI stability check.
What this means for Cloudberry users
- A modern PostgreSQL foundation. Cloudberry users can now benefit directly from features accumulated across PostgreSQL 15 and 16, including MERGE, SQL/JSON, two-phase logical replication, Memoize, Parallel Hash Full Join, and
pg_stat_io. - Better observability and operability. The shared-memory pgstat rewrite, the new I/O view, and
pg_stat_statementsenhancements give operators and DBAs a richer toolbox for performance analysis. - A stronger security posture.
CREATEROLEtightening, new predefined roles, and security invoker views align Cloudberry’s privilege model more closely with modern production practices. - Smoother ecosystem integration. Staying close to PostgreSQL upstream means Cloudberry can interoperate more naturally with PostgreSQL extensions, tools, and clients.
What comes next
Landing this work on the main branch is a milestone, not the finish line.
- Try it out. Some issues uncovered during the upgrade (tracked on the community project board) are not yet fully resolved; the community is actively working on them and aims to address them as quickly as possible. We warmly invite users and developers to exercise the new kernel in test environments and share feedback.
- Continued optimization. Expect ongoing work on MPP-side integration with the new kernel, performance regression fixes, and further tuning of existing capabilities on top of PostgreSQL 16.
- Community collaboration. Issues, pull requests, and real-world usage reports all help Cloudberry move forward. If you would like to contribute, the door is open.
Acknowledgments
This kernel upgrade was a collaborative effort across the community. Special thanks to the contributors who drove the merge:
@chenjinbao1989, @lss602726449, @reshke, @gfphoenix78, @yjhjstz, @tuhaihe
And to everyone who participated in code reviews, testing, and issue reporting throughout the year-long effort.
Join us
Apache Cloudberry is a community-driven open-source project, and we welcome contributors of all backgrounds—whether you are a database veteran or just getting started with MPP systems.
- Website: https://cloudberry.apache.org
- Repository: https://github.com/apache/cloudberry
- Discussions: https://github.com/apache/cloudberry/discussions
- Issue tracker: https://github.com/apache/cloudberry/issues
- Mailing list: https://lists.apache.org/list.html?dev@cloudberry.apache.org
- Slack workspace: https://join.slack.com/t/asf-cloudberry/shared_invite/zt-3um34r7hf-Sh~6jG6hVxlQJo1tbhK2sw
- Discord server: Join our Discord server