Skip to main content
Version: 4.x

Import Data via Console

This guide walks you through importing data from object storage (AWS S3, Google Cloud Storage, Azure Blob Storage) into VeloDB using the Console visual interface.

Prerequisites

Before starting, ensure you have:

  • Object storage bucket with data files
  • Access credentials (Access Key ID and Secret Access Key)
  • VeloDB Cloud account with an active cluster (see Quick Start)

Try it with Sample Data

Use these credentials to try the S3 import with our sample dataset:

FieldValue
AKAKIA3AUKURBS74337SNB
SKygbR1HGNvMZDTo4DNUWJx0mblpMTF+QpBCCBfxFF
Object Storage Pathhttps://velodb-import-data-us-east-1.s3.us-east-1.amazonaws.com/ssb-flat-sf1/*.parquet

This sample dataset contains SSB (Star Schema Benchmark) data - a widely used benchmark for analytical databases. The dataset includes ~6 million rows of denormalized sales data with 42 columns covering orders, customers, suppliers, and products.

SSB Dataset Schema (42 columns)
Column GroupColumns
Orderlo_orderkey, lo_linenumber, lo_orderdate, lo_commitdate, lo_orderpriority, lo_shippriority, lo_shipmode, lo_year, lo_month, lo_weeknum
Metricslo_quantity, lo_extendedprice, lo_discount, lo_revenue, lo_supplycost, lo_tax
Dated_datekey, d_dayofweek, d_month, d_yearmonth
Customerc_custkey, c_name, c_nation, c_region, c_city, c_mktsegment
Suppliers_suppkey, s_name, s_nation, s_region, s_city
Productp_partkey, p_name, p_brand, p_category, p_mfgr, p_color, p_type, p_size, p_container
tip

This is read-only sample data. You can use it to follow along with the tutorial steps below.

Step 1: Connection

Navigate to Data > Import in the VeloDB Console sidebar, then click Create new and select Object Storage S3.

Step 1: Connection

Configuration

FieldDescription
Task NameA unique name for this import task (e.g., sales_data, user_logs)
Comment(optional) Description of the import task
AuthenticationSelect Access Key authentication
AKYour Access Key ID (e.g., AKIAIOSFODNN7EXAMPLE)
SKYour Secret Access Key
Object Storage PathURL to your data (see format below)

Object Storage Path Format

https://<bucket-name>.s3.<region>.amazonaws.com/<path>/<filename>

Example path formats (replace with your actual bucket and file paths):

  • Single file: https://my-bucket.s3.us-west-1.amazonaws.com/data/orders.csv
  • Multiple files with wildcard: https://my-bucket.s3.us-west-1.amazonaws.com/data/*.csv
  • Parquet files: https://my-bucket.s3.us-west-1.amazonaws.com/warehouse/*.parquet
warning

The object storage bucket must be in the same region as your VeloDB cluster for optimal performance.

Click Next to proceed.

Step 2: Incoming Data

Configure how VeloDB should parse your data files.

Step 2: Incoming Data

File Configuration

FieldDescription
File TypeSelect your file format: CSV, Parquet, ORC, or JSON
File CompressionAuto-detect or specify: GZ, BZ2, LZ4, LZO, DEFLATE, ZSTD, ZLIB
Specify DelimiterColumn separator (, for CSV, \t for TSV)
EncloseQuote character for text fields (usually leave empty)
EscapeEscape character (usually leave empty)
Trim Double QuotesWhether to trim quotes from values
File SizeSet a size limit or leave as Unlimited

Loading Configuration

FieldDescription
Strict ModeON = reject rows with errors, OFF = skip bad rows
info

For standard CSV files, the default settings usually work. Leave Enclose and Escape empty to avoid parsing errors.

Click Next to proceed.

Step 3: Configure Table

Preview your data and configure the destination table.

Step 3: Configure Table

Data Preview

The console displays a preview of your data with:

  • Auto-detected column names (c1, c2, c3, ... or from CSV header row)
  • Sample data rows
  • Inferred column types

Load Data To

OptionDescription
New TableCreate a new table with auto-generated schema
Existing TableLoad into an existing VeloDB table

Table Configuration

FieldDescription
DatabaseChoose target database from dropdown
Table NameName for the new table (e.g., orders, user_events, products)

Step 4: Advanced Settings

Configure table model and distribution settings.

Step 4: Advanced Settings

Table Models

ModelUse CaseExample
DUPLICATERaw data, ad-hoc queries - retains all rows as writtenEvent logs, clickstream, raw transactions
UNIQUEData with updates - keeps only latest row per keyUser profiles, product catalog, dimension tables
AGGREGATEPre-aggregated metrics - auto-aggregates by key columnsSales summaries, time-series metrics, counters

Distribution Settings

FieldDescription
Sorting KeyColumn(s) for data ordering - choose columns frequently used in WHERE clauses or JOINs
PartitionEnable for partitioning by date/time or other dimensions to improve query performance
Bucket KeyColumn for hash distribution across nodes (use tenant_id or user_id to improve query concurrency)
Bucket NumberNumber of data buckets (AUTO is recommended for most cases)
PropertiesAdditional table properties (usually leave empty)

Click Next to proceed, then Submit to start the import.

Step 5: Monitor Import

After submitting, you'll see your import task in the Import list.

Step 5: Import List

Verify Import

After the import completes, verify your data in the SQL Editor:

-- Check row count
SELECT COUNT(*) FROM your_database.your_table;

-- Preview data
SELECT * FROM your_database.your_table LIMIT 10;

-- Check table schema
DESC your_database.your_table;

Sample Queries for SSB Data

If you imported the sample SSB dataset, try these analytical queries:

-- Total revenue by year
SELECT
lo_year,
SUM(lo_revenue) as total_revenue
FROM ssb_flat
GROUP BY lo_year
ORDER BY lo_year;

-- Top 10 customers by revenue
SELECT
c_name,
c_nation,
SUM(lo_revenue) as total_revenue
FROM ssb_flat
GROUP BY c_name, c_nation
ORDER BY total_revenue DESC
LIMIT 10;

-- Revenue by region and year
SELECT
c_region,
lo_year,
SUM(lo_revenue) as revenue,
COUNT(*) as order_count
FROM ssb_flat
GROUP BY c_region, lo_year
ORDER BY c_region, lo_year;

-- Product category performance
SELECT
p_category,
p_brand,
SUM(lo_revenue) as revenue,
AVG(lo_discount) as avg_discount
FROM ssb_flat
GROUP BY p_category, p_brand
ORDER BY revenue DESC
LIMIT 20;

Troubleshooting

IssueSolution
"Can not found files"Check object storage path format and trailing slash
"Access Denied"Verify AK/SK credentials and IAM permissions
Connection timeoutEnsure bucket is in same region as VeloDB
Parsing errorsLeave Enclose and Escape fields empty
Wrong column typesUse Existing Table with predefined schema