Version: 2.x

Write a Foreign Data Wrapper

This guide outlines how to write a new foreign data wrapper (FDW) for Apache Cloudberry. A foreign data wrapper is a library that consists of a set of functions that the Apache Cloudberry server calls to access external data sources. The FDW is responsible for fetching data from remote data stores and returning it to the Apache Cloudberry executor.

FDWs enable Apache Cloudberry to treat external data sources as if they were regular database tables, allowing you to query external data using standard SQL.

Before you begin

Requirements

When developing with the Apache Cloudberry foreign data wrapper API:

Your development system must have the same hardware and software architecture as your Apache Cloudberry hosts
Code must be written in a compiled language such as C, using the version-1 interface
Symbol names in your object files must not conflict with each other or with symbols defined in the Apache Cloudberry server
You should be familiar with the foreign table concepts described in Accessing External Data with Foreign Tables

Header files

The Apache Cloudberry header files for FDW development are located in $GPHOME/include/postgresql/server/:

foreign/fdwapi.h - FDW API structures and callback function signatures
foreign/foreign.h - Foreign data wrapper helper structs and functions
catalog/pg_foreign_table.h - Foreign table definition
catalog/pg_foreign_server.h - Foreign server definition

FDW architecture

An FDW consists of two main components:

A handler function that returns a struct containing function pointers to all the callback functions needed by Apache Cloudberry to interact with the external data source.
An optional validator function that validates configuration options provided in CREATE and ALTER commands for the foreign data wrapper, servers, user mappings, and foreign tables.

Handler function

The handler function is SQL-invokable and returns a struct containing pointers to callback functions:

CREATE FUNCTION my_fdw_handler()
  RETURNS fdw_handler
  AS 'MODULE_PATHNAME'
LANGUAGE C STRICT;

Validator function

The validator function validates options for FDW objects:

CREATE FUNCTION my_fdw_validator(text[], oid)
  RETURNS void
  AS 'MODULE_PATHNAME'
LANGUAGE C STRICT;

Callback functions

The FDW API defines callback functions that Apache Cloudberry invokes when scanning and updating foreign tables. The handler function returns a FdwRoutine struct containing pointers to these functions.

These functions are required for all FDWs:

GetForeignRelSize

void GetForeignRelSize(PlannerInfo *root,
                       RelOptInfo *baserel,
                       Oid foreigntableid)

Obtain relation size estimates for a foreign table. Called at the beginning of planning.

GetForeignPaths

void GetForeignPaths(PlannerInfo *root,
                     RelOptInfo *baserel,
                     Oid foreigntableid)

Create possible access paths for a scan on a foreign table. Must call create_foreignscan_path() for Apache Cloudberry compatibility.

GetForeignPlan

ForeignScan *GetForeignPlan(PlannerInfo *root,
                            RelOptInfo *baserel,
                            Oid foreigntableid,
                            ForeignPath *best_path,
                            List *tlist,
                            List *scan_clauses)

Create a ForeignScan plan node from the selected foreign access path.

BeginForeignScan

void BeginForeignScan(ForeignScanState *node,
                      int eflags)

Begin executing a foreign scan. Called during executor startup.

IterateForeignScan

TupleTableSlot *IterateForeignScan(ForeignScanState *node)

Fetch one row from the foreign source. Return NULL when no more rows are available.

ReScanForeignScan

void ReScanForeignScan(ForeignScanState *node)

Restart the scan from the beginning.

EndForeignScan

void EndForeignScan(ForeignScanState *node)

End the scan and release resources.

If your FDW supports write operations, implement these functions:

ExecForeignInsert

TupleTableSlot *ExecForeignInsert(EState *estate,
                                  ResultRelInfo *rinfo,
                                  TupleTableSlot *slot,
                                  TupleTableSlot *planSlot)

Insert a single tuple into the foreign table.

ExecForeignUpdate

TupleTableSlot *ExecForeignUpdate(EState *estate,
                                  ResultRelInfo *rinfo,
                                  TupleTableSlot *slot,
                                  TupleTableSlot *planSlot)

Update a single tuple in the foreign table.

ExecForeignDelete

TupleTableSlot *ExecForeignDelete(EState *estate,
                                  ResultRelInfo *rinfo,
                                  TupleTableSlot *slot,
                                  TupleTableSlot *planSlot)

Delete a single tuple from the foreign table.

Apache Cloudberry considerations

The mpp_execute option

Apache Cloudberry supports parallel execution through the mpp_execute option. Your FDW should handle this option to determine where to request or send data:

ForeignTable *table = GetForeignTable(foreigntableid);
if (table->exec_location == FTEXECLOCATION_ALL_SEGMENTS)
{
    // Execute on all segments in parallel
}
else if (table->exec_location == FTEXECLOCATION_ANY)
{
    // Execute on coordinator or any one segment
}
else if (table->exec_location == FTEXECLOCATION_COORDINATOR)
{
    // Execute on coordinator only (default)
}

Segment identification

For parallel execution (mpp_execute 'all segments'), each segment must determine which portion of the data is its responsibility:

int segmentNumber = GpIdentity.segindex;
int totalNumberOfSegments = getgpsegmentCount();

Parallel write operations

Apache Cloudberry supports parallel write operations only when mpp_execute is set to 'all segments'. For other settings, write operations are initiated through the coordinator.

Example FDW implementation

Here's a basic skeleton for an FDW:

#include "postgres.h"
#include "foreign/fdwapi.h"
#include "foreign/foreign.h"

PG_MODULE_MAGIC;

// Function declarations
extern Datum my_fdw_handler(PG_FUNCTION_ARGS);
extern Datum my_fdw_validator(PG_FUNCTION_ARGS);

// FDW callback functions
static void myGetForeignRelSize(PlannerInfo *root,
                                RelOptInfo *baserel,
                                Oid foreigntableid);
static void myGetForeignPaths(PlannerInfo *root,
                              RelOptInfo *baserel,
                              Oid foreigntableid);
// ... other callback functions

PG_FUNCTION_INFO_V1(my_fdw_handler);
Datum
my_fdw_handler(PG_FUNCTION_ARGS)
{
    FdwRoutine *fdwroutine = makeNode(FdwRoutine);

    // Required scan functions
    fdwroutine->GetForeignRelSize = myGetForeignRelSize;
    fdwroutine->GetForeignPaths = myGetForeignPaths;
    fdwroutine->GetForeignPlan = myGetForeignPlan;
    fdwroutine->BeginForeignScan = myBeginForeignScan;
    fdwroutine->IterateForeignScan = myIterateForeignScan;
    fdwroutine->ReScanForeignScan = myReScanForeignScan;
    fdwroutine->EndForeignScan = myEndForeignScan;

    // Optional write functions
    fdwroutine->ExecForeignInsert = myExecForeignInsert;
    fdwroutine->ExecForeignUpdate = myExecForeignUpdate;
    fdwroutine->ExecForeignDelete = myExecForeignDelete;

    PG_RETURN_POINTER(fdwroutine);
}

PG_FUNCTION_INFO_V1(my_fdw_validator);
Datum
my_fdw_validator(PG_FUNCTION_ARGS)
{
    List *options_list = untransformRelOptions(PG_GETARG_DATUM(0));
    Oid catalog = PG_GETARG_OID(1);
    
    // Validate options based on catalog type
    // Implementation depends on your FDW's specific options
    
    PG_RETURN_VOID();
}

Building and packaging

Using PGXS

Create a Makefile using the PostgreSQL build extension infrastructure:

MODULE_big = my_fdw
OBJS = my_fdw.o

PG_CONFIG = pg_config
PGXS := $(shell $(PG_CONFIG) --pgxs)

PG_CPPFLAGS = -I$(shell $(PG_CONFIG) --includedir)
SHLIB_LINK = -L$(shell $(PG_CONFIG) --libdir)
include $(PGXS)

Creating an extension

Create SQL script file (my_fdw--1.0.sql):

CREATE FUNCTION my_fdw_handler()
  RETURNS fdw_handler
  AS 'MODULE_PATHNAME'
LANGUAGE C STRICT;

CREATE FUNCTION my_fdw_validator(text[], oid)
  RETURNS void
  AS 'MODULE_PATHNAME'
LANGUAGE C STRICT;

CREATE FOREIGN DATA WRAPPER my_fdw
  HANDLER my_fdw_handler
  VALIDATOR my_fdw_validator;

Create control file (my_fdw.control):

# my_fdw extension
comment = 'My custom foreign data wrapper'
default_version = '1.0'
module_pathname = '$libdir/my_fdw'
relocatable = true

Update Makefile for extension:

EXTENSION = my_fdw
DATA = my_fdw--1.0.sql

Deployment

For Apache Cloudberry clusters:

Install shared library to $GPHOME/lib/postgresql/ on all hosts.
Install extension files to $GPHOME/share/postgresql/extension/ on all hosts.
Set permissions so gpadmin user can access all files.
Install on all hosts - coordinator and all segment hosts.

Testing your FDW

Basic functionality test

-- Create extension
CREATE EXTENSION my_fdw;

-- Create foreign server
CREATE SERVER my_server
    FOREIGN DATA WRAPPER my_fdw
    OPTIONS (host 'remote-host', port '5432');

-- Create user mapping
CREATE USER MAPPING FOR current_user
    SERVER my_server
    OPTIONS (user 'remote_user', password 'remote_password');

-- Create foreign table
CREATE FOREIGN TABLE my_foreign_table (
    id integer,
    name text
) SERVER my_server
OPTIONS (table_name 'remote_table');

-- Test query
SELECT * FROM my_foreign_table LIMIT 5;

Parallel execution test

-- Test parallel execution
CREATE FOREIGN TABLE parallel_test (
    id integer,
    data text
) SERVER my_server
OPTIONS (
    table_name 'large_table',
    mpp_execute 'all segments'
);

SELECT count(*) FROM parallel_test;

Best practices

Implement comprehensive error handling throughout your FDW code, ensuring that all failure modes are properly reported to users with clear, actionable error messages.
Use PostgreSQL memory contexts correctly to prevent memory leaks and ensure proper cleanup when queries are cancelled or encounter errors.
Design your FDW to handle bulk data operations efficiently and take advantage of Apache Cloudberry's parallel processing capabilities where appropriate.
Validate all user inputs and configuration parameters, implement secure authentication mechanisms, and follow security best practices for external data access.
Create thorough documentation that explains all configuration options, connection parameters, and usage examples for administrators and developers.
Develop comprehensive test suites that cover various data types, edge cases, error conditions, and performance scenarios across different external data sources.

Debugging

Enable verbose logging

Set in postgresql.conf:

log_min_messages = DEBUG1

Use elog for debugging

elog(DEBUG1, "FDW: Processing %d rows", row_count);
elog(WARNING, "FDW: Connection failed, retrying...");

Common issues

Segmentation faults frequently occur due to improper memory management, such as accessing freed memory or buffer overruns in C code.
Symbol name conflicts can arise when multiple extensions define functions with the same names; ensure all function names are unique and properly namespaced.
Parallel execution issues are often difficult to debug; start testing with single-segment configurations before enabling multi-segment parallel processing.
Performance bottlenecks may appear in data conversion or network operations; use profiling tools to identify and optimize critical code paths.

Write a Foreign Data Wrapper

Before you begin​

Requirements​

Header files​

FDW architecture​

Handler function​

Validator function​

Callback functions​

Required scan-related callbacks​

GetForeignRelSize​

GetForeignPaths​

GetForeignPlan​

BeginForeignScan​

IterateForeignScan​

ReScanForeignScan​

EndForeignScan​

Optional update-related callbacks​

ExecForeignInsert​

ExecForeignUpdate​

ExecForeignDelete​

Apache Cloudberry considerations​

The mpp_execute option​

Segment identification​

Parallel write operations​

Example FDW implementation​

Building and packaging​

Using PGXS​

Creating an extension​

Deployment​

Testing your FDW​

Basic functionality test​

Parallel execution test​

Best practices​

Debugging​

Enable verbose logging​

Use elog for debugging​

Common issues​

Learn more​