Lesson 3: Create Tables
After creating and preparing a database in Lesson 2: Create and Prepare a Database, you can start to create tables in the database.
To introduce Apache Cloudberry, we use a public data set, the Airline On-Time Statistics and Delay Causes data set, published by the United States Department of Transportation at http://www.transtats.bts.gov/. The On-Time Performance dataset records flights by date, airline, originating airport, destination airport, and many other flight details. The data is available for flights since 1987. The exercises in this guide use data for about a million flights in 2009 and 2010. You are encouraged to review the SQL scripts in the GitHub 000-cbdb-sandbox/configs/faa.tar.gz
directory as you work through this introduction. You can run most of the exercises by entering the commands yourself or by executing a script in the faa
directory.
Create tables using a SQL file in psql
In Apache Cloudberry, you can use the CREATE TABLE
SQL statement to create a table.
In the following steps, you will be guided to run a SQL file create_dim_tables.sql
that contains the CREATE TABLE
statements needed to create faa
databases.
-
Log into Apache Cloudberry in Docker as
gpadmin
. Then enter thefaa
directory, in which the SQL filecreate_dim_tables.sql
is located.[gpadmin@cdw tmp]$ cd /tmp
[gpadmin@cdw tmp]$ tar xzf faa.tar.gz
[gpadmin@cdw tmp]$ cd faa -
Take a look at the
create_dim_tables.sql
file.[gpadmin@cdw faa]$ more create_dim_tables.sql
The
create_dim_tables.sql
file contains the followingCREATE TABLE
statements:DROP TABLE IF EXISTS faa.d_airports; -- Drops the table if it exists to avoid name conflict.
CREATE TABLE faa.d_airports ( -- Creates the table.
AirportID INTEGER,
Name TEXT,
City TEXT,
Country TEXT,
airport_code TEXT,
ICOA_code TEXT,
Latitude FLOAT8,
Longitude FLOAT8,
Altitude FLOAT8,
TimeZoneOffset FLOAT,
DST_Flag TEXT,
TZ TEXT
)
DISTRIBUTED BY (airport_code); -- Specifies the distribution column airport_code.
DROP TABLE IF EXISTS faa.d_wac;
CREATE TABLE faa.d_wac (wac SMALLINT, area_desc TEXT)
DISTRIBUTED BY (wac);
DROP TABLE IF EXISTS faa.d_airlines;
CREATE TABLE faa.d_airlines (airlineid INTEGER, airline_desc TEXT)
DISTRIBUTED BY (airlineid);
DROP TABLE IF EXISTS faa.d_cancellation_codes;
CREATE TABLE faa.d_cancellation_codes (cancel_code TEXT, cancel_desc TEXT)
DISTRIBUTED BY (cancel_code);
DROP TABLE IF EXISTS faa.d_delay_groups;
CREATE TABLE faa.d_delay_groups (delay_group_code TEXT, delay_group_desc TEXT)
DISTRIBUTED BY (delay_group_code);
DROP TABLE IF EXISTS faa.d_distance_groups;
CREATE TABLE faa.d_distance_groups (distance_group_code TEXT, distance_group_desc TEXT)
DISTRIBUTED BY (distance_group_code); -
Connect to the
tutorial
database aslily
using thepsql
. You will run the SQL file aslily
.[gpadmin@cdw faa]$ psql -U lily tutorial
Password for user lily: # changemepsql (14.4, server 14.4)
Type "help" for help.
tutorial=> -
Run the
create_dim_tables.sql
file using thepsql
command\i
.tutorial=> \i create_dim_tables.sql
The following messages are displayed:
psql:create_dim_tables.sql:1: NOTICE: table "d_airports" does not exist, skipping
DROP TABLE
CREATE TABLE
psql:create_dim_tables.sql:18: NOTICE: table "d_wac" does not exist, skipping
DROP TABLE
CREATE TABLE
psql:create_dim_tables.sql:22: NOTICE: table "d_airlines" does not exist, skipping
DROP TABLE
CREATE TABLE
psql:create_dim_tables.sql:26: NOTICE: table "d_cancellation_codes" does not exist, skipping
DROP TABLE
CREATE TABLE
psql:create_dim_tables.sql:30: NOTICE: table "d_delay_groups" does not exist, skipping
DROP TABLE
CREATE TABLE
psql:create_dim_tables.sql:34: NOTICE: table "d_distance_groups" does not exist, skipping
DROP TABLE
CREATE TABLE -
Use the
\dt
command to display all tables in thefaa
schema. You can see that 6 tables have been created.tutorial=> \dt
List of relations
Schema | Name | Type | Owner | Storage
--------+----------------------+-------+-------+---------
faa | d_airlines | table | lily | heap
faa | d_airports | table | lily | heap
faa | d_cancellation_codes | table | lily | heap
faa | d_delay_groups | table | lily | heap
faa | d_distance_groups | table | lily | heap
faa | d_wac | table | lily | heap
(6 rows)