VeloDB Cloud
User Guide
Data Ingestion
Data Source
Alibaba Cloud OSS

Alibaba Cloud OSS

Doris provides two ways to load files from Alibaba Cloud OSS:

  • Use S3 Load to load Alibaba Cloud OSS files into Doris, which is an asynchronous load method.
  • Use TVF to load Alibaba Cloud OSS files into Doris, which is a synchronous load method.

load with S3 Load

Use S3 Load to import files on object storage. For detailed steps, please refer to the Broker Load Manual

Step 1: Prepare the data

Create a CSV file s3load_example.csv The file is stored on Alibaba Cloud OSS and its content is as follows:

1,Emily,25
2,Benjamin,35
3,Olivia,28
4,Alexander,60
5,Ava,17
6,William,69
7,Sophia,32
8,James,64
9,Emma,37
10,Liam,64

Step 2: Create a table in Doris

CREATE TABLE test_s3load(
    user_id            BIGINT       NOT NULL COMMENT "user id",
    name               VARCHAR(20)           COMMENT "name",
    age                INT                   COMMENT "age"
)
DUPLICATE KEY(user_id)
DISTRIBUTED BY HASH(user_id) BUCKETS 10;

Step 3: Load data using S3 Load

:::caution Caution The endpoint URLs for public and internal access are different. If the server and OSS are in the same region, it is recommended to use the internal endpoint link.

  • internal endpoint: oss-cn-hangzhou-internal.aliyuncs.com
  • public endpoint: oss-cn-hangzhou.aliyuncs.com :::
LOAD LABEL s3_load_2022_04_01
(
    DATA INFILE("s3://your_bucket_name/s3load_example.csv")
    INTO TABLE test_s3load
    COLUMNS TERMINATED BY ","
    FORMAT AS "CSV"
    (user_id, name, age)
)
WITH S3
(
    "provider" = "OSS",
    "s3.endpoint" = "oss-cn-hangzhou.aliyuncs.com",  
    "s3.region" = "oss-cn-hangzhou",
    "s3.access_key" = "<your-ak>",
    "s3.secret_key" = "<your-sk>"
)
PROPERTIES
(
    "timeout" = "3600"
);

Step 4: Check the imported data

SELECT * FROM test_s3load;

Results:

mysql> select * from test_s3load;
+---------+-----------+------+
| user_id | name      | age  |
+---------+-----------+------+
|       5 | Ava       |   17 |
|      10 | Liam      |   64 |
|       7 | Sophia    |   32 |
|       9 | Emma      |   37 |
|       1 | Emily     |   25 |
|       4 | Alexander |   60 |
|       2 | Benjamin  |   35 |
|       3 | Olivia    |   28 |
|       6 | William   |   69 |
|       8 | James     |   64 |
+---------+-----------+------+
10 rows in set (0.04 sec)

Load with TVF

Step 1: Prepare the data

Create a CSV file s3load_example.csv The file is stored on Alibaba Cloud OSS and its content is as follows:

1,Emily,25
2,Benjamin,35
3,Olivia,28
4,Alexander,60
5,Ava,17
6,William,69
7,Sophia,32
8,James,64
9,Emma,37
10,Liam,64

Step 2: Create a table in Doris

CREATE TABLE test_s3load(
    user_id            BIGINT       NOT NULL COMMENT "user id",
    name               VARCHAR(20)           COMMENT "name",
    age                INT                   COMMENT "age"
)
DUPLICATE KEY(user_id)
DISTRIBUTED BY HASH(user_id) BUCKETS 10;

Step 3: Load data using TVF

:::caution Caution The endpoint URLs for public and internal access are different. If the server and OSS are in the same region, it is recommended to use the internal endpoint link.

  • internal endpoint: oss-cn-hangzhou-internal.aliyuncs.com
  • public endpoint: oss-cn-hangzhou.aliyuncs.com :::
INSERT INTO test_s3load
SELECT * FROM S3
(
    "uri" = "s3://your_bucket_name/s3load_example.csv",
    "format" = "csv",
    "provider" = "OSS",
    "s3.endpoint" = "oss-cn-hangzhou.aliyuncs.com",
    "s3.region" = "oss-cn-hangzhou",
    "s3.access_key" = "<your-ak>",
    "s3.secret_key" = "<your-sk>",
    "column_separator" = ",",
    "csv_schema" = "user_id:int;name:string;age:int"
);

Step 4: Check the imported data

SELECT * FROM test_s3load;

Results:

mysql> select * from test_s3load;
+---------+-----------+------+
| user_id | name      | age  |
+---------+-----------+------+
|       5 | Ava       |   17 |
|      10 | Liam      |   64 |
|       7 | Sophia    |   32 |
|       9 | Emma      |   37 |
|       1 | Emily     |   25 |
|       4 | Alexander |   60 |
|       2 | Benjamin  |   35 |
|       3 | Olivia    |   28 |
|       6 | William   |   69 |
|       8 | James     |   64 |
+---------+-----------+------+
10 rows in set (0.04 sec)