Skip to main content
Version: 4.x

sample-data-catalog-en

VeloDB Cloud provides two types of sample data generator Catalogs: TPCDS and TPCH, used for generating scalable benchmark datasets. You can use this data for performance testing, functional verification, or SQL query learning.

Overview

CatalogDescriptionApplicable Scenarios
TPCDSTPC-DS benchmark data, simulating retail decision support scenarios, containing 24 tables.Complex analytical queries, data warehouse performance testing.
TPCHTPC-H benchmark data, simulating business decision scenarios, containing 8 tables.OLAP query performance testing, introductory learning.

These Catalogs support dynamic data generation. You can write the generated data to VeloDB internal tables, Iceberg tables, or Hive tables for testing.

Create Catalog

Step 1: Enter Creation Page

  1. Log in to the VeloDB Cloud console.
  2. In the left navigation bar, click Catalogs.
  3. Click the Add External Catalog button.
  4. Under the Sample Data category, select TPCDS or TPCH.

Step 2: Configure Catalog

TPCDS Configuration

sample-1

FieldRequiredDescription
Catalog NameUnique name of the Catalog.
CommentOptional description information.
Splits CountConcurrency per node. Default is 32.

TPCH Configuration

sample-2

FieldRequiredDescription
Catalog NameUnique name of the Catalog.
CommentOptional description information.
Splits Per NodeConcurrency per node. Default is 32.

Step 3: Confirm Creation

Click the Confirm button to complete creation.

Use Catalog

View Available Data

-- View databases (datasets of different scales)
SHOW DATABASES FROM tpcds_catalog;
-- Result example: sf1, sf10, sf100, sf1000 ...

-- View tables
SHOW TABLES FROM tpcds_catalog.sf1;

sf in the database name stands for Scale Factor:

  • sf1: Approx. 1GB data
  • sf10: Approx. 10GB data
  • sf100: Approx. 100GB data
  • sf1000: Approx. 1TB data

Query Sample Data

-- Query TPCH data
SELECT * FROM tpch_catalog.sf1.customer LIMIT 10;

-- Query TPCDS data
SELECT * FROM tpcds_catalog.sf1.store_sales LIMIT 10;

Write Data to VeloDB Tables

-- Create VeloDB table and import TPCH data
CREATE TABLE my_db.customer AS
SELECT * FROM tpch_catalog.sf1.customer;

-- Or use INSERT INTO
INSERT INTO my_db.lineitem
SELECT * FROM tpch_catalog.sf10.lineitem;

TPCH Table Structure

Table NameDescription
customerCustomer information
lineitemOrder details
nationNation
ordersOrders
partParts
partsuppPart suppliers
regionRegion
supplierSuppliers

TPCDS Table Structure

TPCDS contains 24 tables, simulating retail scenarios:

CategoryTable Name
Fact Tablesstore_sales, store_returns, catalog_sales, catalog_returns, web_sales, web_returns, inventory
Dimension Tablescustomer, customer_address, customer_demographics, date_dim, time_dim, item, store, catalog_page, web_page, web_site, warehouse, promotion, household_demographics, income_band, ship_mode, reason, call_center