Version: 2.x

Data Loading Overview

Apache Cloudberry loads data mainly by transforming external data into external tables (or foreign tables) via loading tools. Then it reads data from these external tables or writes data into them to achieve external data loading.

Loading process

The general process of loading external data into Apache Cloudberry is as follows:

Assess the data loading scenario (such as data source location, data type, and data volume) and select an appropriate loading tool.
Set up and enable the loading tool.
Create an external table, specifying information such as the protocol of the loading tool, data source address, data format in the CREATE EXTERNAL TABLE statement.
Once the external table is created, data from the external table can be queried directly using the SELECT statement, or data can be imported from the external table using INSERT INTO SELECT.

Loading methods and scenarios

Apache Cloudberry offers multiple data loading solutions, and you can select different data loading methods according to different data sources.

Loading method	Data source	Data format	Parallel or not
`copy`	Local file system • Coordinator node host (for a single file) • Segment node host (for multiple files)	• TXT • CSV • Binary	No
`file://` protocol	Local file system (local segment host, accessible only by superuser)	• TXT • CSV	Yes
`gpfdist`	Local host files or files accessible via internal network	• TXT • CSV • Any delimited text format supported by the `FORMAT` clause • XML and JSON (requires conversion to text format via YAML configuration file)	Yes
Batch loading using `gpload` (with `gpfdist` as the underlying worker)	Local host files or files accessible via internal network	• TXT • CSV • Any delimited text format supported by the `FORMAT` clause • XML and JSON (require conversion to text format via YAML configuration file)	Yes
Creating external web tables	Data pulled from network services or from any source accessible by command lines	• TXT • CSV	Yes
Kafka FDW	Streaming data from Apache Kafka	• JSON • CSV	No

Learn more

🗃️ Load Data from Local Files

4 items

📄️ Load Data from Web Services

In Apache Cloudberry, to load data from web services or from any source accessible by command lines, you can create external web tables. The supported data formats are TEXT and CSV.

📄️ Load Data from Kafka Using Kafka FDW

Kafka Foreign Data Wrapper (FDW) allows Apache Cloudberry to connect directly to Apache Kafka, enabling it to read and process Kafka data as external tables. This integration improves the efficiency, flexibility, and reliability of real-time Kafka data processing, enhancing data operations and business performance.