メインコンテンツまでスキップ
バージョン: 4.x

Star Schema Benchmark

Star Schema Benchmark(SSB) は、データウェアハウスシナリオにおける軽量なパフォーマンステストセットです。SSBは TPC-H に基づく簡略化されたスタースキーマデータを提供し、主にスタースキーマ下での複数TableJOINクエリのパフォーマンスをテストするために使用されます。さらに、業界では通常SSBをワイドTableモデル(SSB flatと呼ばれる)に平坦化して、クエリエンジンのパフォーマンスをテストします。

この文書では主に、SSB 1000Gテストセットにおけるtestドキュメント:name="test.html";charset=UTF-8;length=128のパフォーマンスを紹介します。

Apache Dorisバージョン2.0.15.1に基づいて、SSB標準テストデータセットで13のクエリをテストしました。

1. ハードウェア環境

Hardware構成 Instructions
Number of Machines4 Aliyun Virtual Machine (1FE,3BEs)
CPUIntel Xeon (Ice Lake) Platinum 8369B 32C
Memory128G
DiskEnterprise SSD (PL0)

2. ソフトウェア環境

  • Doris Deployed 3BEs and 1FE
  • Kernel Version: Linux version 5.15.0-101-generic
  • OS version: Ubuntu 20.04 LTS (Focal Fossa)
  • Doris software version: Apache Doris 2.0.15.1
  • JDK: openjdk version "1.8.0_352-352"

3. テストデータ量

SSB Table NameRowsAnnotation
lineorder5,999,989,709商品注文詳細
customer30,000,000顧客情報
part2,000,000部品情報
supplier2,000,000サプライヤー情報
dates2,556日付
lineorder_flat5,999,989,709データ平坦化後のワイドTable

4. SSB Flatテスト結果

ここではApache Doris 2.0.15.1を使用して比較テストを行います。テストでは、Query Time(ms)を主要なパフォーマンス指標として使用します。テスト結果は以下の通りです:

QueryDoris 2.0.15.1 (ms)
q1.180
q1.210
q1.3110
q2.11680
q2.21210
q2.31060
q3.12010
q3.21560
q3.3600
q3.410
q4.12380
q4.2190
q4.3120
Total11020

5. 標準SSBテスト結果

ここではApache Doris 2.0.15.1を使用して比較テストを行います。テストでは、Query Time(ms)を主要なパフォーマンス指標として使用します。テスト結果は以下の通りです:

QueryDoris 2.0.15.1 (ms)
q1.1330
q1.280
q1.380
q2.11780
q2.21970
q2.31510
q3.14000
q3.21720
q3.31510
q3.4160
q4.14010
q4.2840
q4.3400
Total19390

6. 環境準備

まず公式ドキュメントを参照してApache Dorisをインストールおよびデプロイし、正常に動作するDorisクラスター(最低1FE 1BEを含み、1FE 3BEsを推奨)を取得してください。

7. データ準備

7.1 SSBデータ生成ツールのダウンロードとインストール

以下のスクリプトを実行して、ssb-tools ツールをダウンロードしコンパイルします。

sh bin/build-ssb-dbgen.sh

インストールが正常に完了すると、dbgenバイナリがssb-dbgen/ディレクトリ下に生成されます。

7.2 SSB Test Set の生成

以下のスクリプトを実行してSSBデータセットを生成します:

sh bin/gen-ssb-data.sh -s 1000

Note 1: sh gen-ssb-data.sh -hでスクリプトのヘルプを確認してください。

Note 2: データはssb-data/ディレクトリの下に.tblサフィックス付きで生成されます。総ファイルサイズは約600GBで、生成には数分から1時間程度かかる場合があります。

Note 3: デフォルトでは100Gの標準テストデータセットが生成されます。

7.3 Tableの作成

7.3.1 doris-cluster.confファイルの準備

スクリプトをインポートする前に、doris-cluster.confファイルにFEのIPポートやその他の情報を記述する必要があります。

このファイルは${DORIS_HOME}/tools/ssb-tools/conf/の下にあります。

ファイルの内容には、FEのIP、HTTPポート、ユーザー名、パスワード、およびインポートするデータのDB名が含まれます:

# Any of FE host
export FE_HOST='127.0.0.1'
# http_port in fe.conf
export FE_HTTP_PORT=8030
# query_port in fe.conf
export FE_QUERY_PORT=9030
# Doris username
export USER='root'
# Doris password
export PASSWORD=''
# The database where SSB tables located
export DB='ssb'

7.3.2 以下のスクリプトを実行してSSBTableを生成・作成する:

sh bin/create-ssb-tables.sh -s 1000

または、create-ssb-tables.sqlcreate-ssb-flat-table.sql のTable作成文をコピーして、MySQLクライアントで実行してください。

7.4 データのインポート

以下のコマンドを使用して、SSBテストセットの全データインポートとSSB FLATワイドTableデータの合成を完了し、Tableにインポートします。

 sh bin/load-ssb-data.sh

7.5 インポートされたデータの確認

select count(*) from part;
select count(*) from customer;
select count(*) from supplier;
select count(*) from dates;
select count(*) from lineorder;
select count(*) from lineorder_flat;

7.6 Query Test

7.6.1 SQLのSSB FLATテスト

--Q1.1
SELECT SUM(LO_EXTENDEDPRICE * LO_DISCOUNT) AS revenue
FROM lineorder_flat
WHERE
LO_ORDERDATE >= 19930101
AND LO_ORDERDATE <= 19931231
AND LO_DISCOUNT BETWEEN 1 AND 3
AND LO_QUANTITY < 25;

--Q1.2
SELECT SUM(LO_EXTENDEDPRICE * LO_DISCOUNT) AS revenue
FROM lineorder_flat
WHERE
LO_ORDERDATE >= 19940101
AND LO_ORDERDATE <= 19940131
AND LO_DISCOUNT BETWEEN 4 AND 6
AND LO_QUANTITY BETWEEN 26 AND 35;

--Q1.3
SELECT SUM(LO_EXTENDEDPRICE * LO_DISCOUNT) AS revenue
FROM lineorder_flat
WHERE
weekofyear(LO_ORDERDATE) = 6
AND LO_ORDERDATE >= 19940101
AND LO_ORDERDATE <= 19941231
AND LO_DISCOUNT BETWEEN 5 AND 7
AND LO_QUANTITY BETWEEN 26 AND 35;

--Q2.1
SELECT
SUM(LO_REVENUE), (LO_ORDERDATE DIV 10000) AS YEAR,
P_BRAND
FROM lineorder_flat
WHERE P_CATEGORY = 'MFGR#12' AND S_REGION = 'AMERICA'
GROUP BY YEAR, P_BRAND
ORDER BY YEAR, P_BRAND;

--Q2.2
SELECT
SUM(LO_REVENUE), (LO_ORDERDATE DIV 10000) AS YEAR,
P_BRAND
FROM lineorder_flat
WHERE
P_BRAND >= 'MFGR#2221'
AND P_BRAND <= 'MFGR#2228'
AND S_REGION = 'ASIA'
GROUP BY YEAR, P_BRAND
ORDER BY YEAR, P_BRAND;

--Q2.3
SELECT
SUM(LO_REVENUE), (LO_ORDERDATE DIV 10000) AS YEAR,
P_BRAND
FROM lineorder_flat
WHERE
P_BRAND = 'MFGR#2239'
AND S_REGION = 'EUROPE'
GROUP BY YEAR, P_BRAND
ORDER BY YEAR, P_BRAND;

--Q3.1
SELECT
C_NATION,
S_NATION, (LO_ORDERDATE DIV 10000) AS YEAR,
SUM(LO_REVENUE) AS revenue
FROM lineorder_flat
WHERE
C_REGION = 'ASIA'
AND S_REGION = 'ASIA'
AND LO_ORDERDATE >= 19920101
AND LO_ORDERDATE <= 19971231
GROUP BY C_NATION, S_NATION, YEAR
ORDER BY YEAR ASC, revenue DESC;

--Q3.2
SELECT
C_CITY,
S_CITY, (LO_ORDERDATE DIV 10000) AS YEAR,
SUM(LO_REVENUE) AS revenue
FROM lineorder_flat
WHERE
C_NATION = 'UNITED STATES'
AND S_NATION = 'UNITED STATES'
AND LO_ORDERDATE >= 19920101
AND LO_ORDERDATE <= 19971231
GROUP BY C_CITY, S_CITY, YEAR
ORDER BY YEAR ASC, revenue DESC;

--Q3.3
SELECT
C_CITY,
S_CITY, (LO_ORDERDATE DIV 10000) AS YEAR,
SUM(LO_REVENUE) AS revenue
FROM lineorder_flat
WHERE
C_CITY IN ('UNITED KI1', 'UNITED KI5')
AND S_CITY IN ('UNITED KI1', 'UNITED KI5')
AND LO_ORDERDATE >= 19920101
AND LO_ORDERDATE <= 19971231
GROUP BY C_CITY, S_CITY, YEAR
ORDER BY YEAR ASC, revenue DESC;

--Q3.4
SELECT
C_CITY,
S_CITY, (LO_ORDERDATE DIV 10000) AS YEAR,
SUM(LO_REVENUE) AS revenue
FROM lineorder_flat
WHERE
C_CITY IN ('UNITED KI1', 'UNITED KI5')
AND S_CITY IN ('UNITED KI1', 'UNITED KI5')
AND LO_ORDERDATE >= 19971201
AND LO_ORDERDATE <= 19971231
GROUP BY C_CITY, S_CITY, YEAR
ORDER BY YEAR ASC, revenue DESC;

--Q4.1
SELECT (LO_ORDERDATE DIV 10000) AS YEAR,
C_NATION,
SUM(LO_REVENUE - LO_SUPPLYCOST) AS profit
FROM lineorder_flat
WHERE
C_REGION = 'AMERICA'
AND S_REGION = 'AMERICA'
AND P_MFGR IN ('MFGR#1', 'MFGR#2')
GROUP BY YEAR, C_NATION
ORDER BY YEAR ASC, C_NATION ASC;

--Q4.2
SELECT (LO_ORDERDATE DIV 10000) AS YEAR,
S_NATION,
P_CATEGORY,
SUM(LO_REVENUE - LO_SUPPLYCOST) AS profit
FROM lineorder_flat
WHERE
C_REGION = 'AMERICA'
AND S_REGION = 'AMERICA'
AND LO_ORDERDATE >= 19970101
AND LO_ORDERDATE <= 19981231
AND P_MFGR IN ('MFGR#1', 'MFGR#2')
GROUP BY YEAR, S_NATION, P_CATEGORY
ORDER BY
YEAR ASC,
S_NATION ASC,
P_CATEGORY ASC;

--Q4.3
SELECT (LO_ORDERDATE DIV 10000) AS YEAR,
S_CITY,
P_BRAND,
SUM(LO_REVENUE - LO_SUPPLYCOST) AS profit
FROM lineorder_flat
WHERE
S_NATION = 'UNITED STATES'
AND LO_ORDERDATE >= 19970101
AND LO_ORDERDATE <= 19981231
AND P_CATEGORY = 'MFGR#14'
GROUP BY YEAR, S_CITY, P_BRAND
ORDER BY YEAR ASC, S_CITY ASC, P_BRAND ASC;

7.6.2 SQL用SSB標準テスト

--Q1.1
SELECT SUM(lo_extendedprice * lo_discount) AS REVENUE
FROM lineorder, dates
WHERE
lo_orderdate = d_datekey
AND d_year = 1993
AND lo_discount BETWEEN 1 AND 3
AND lo_quantity < 25;

--Q1.2
SELECT SUM(lo_extendedprice * lo_discount) AS REVENUE
FROM lineorder, dates
WHERE
lo_orderdate = d_datekey
AND d_yearmonth = 'Jan1994'
AND lo_discount BETWEEN 4 AND 6
AND lo_quantity BETWEEN 26 AND 35;

--Q1.3
SELECT
SUM(lo_extendedprice * lo_discount) AS REVENUE
FROM lineorder, dates
WHERE
lo_orderdate = d_datekey
AND d_weeknuminyear = 6
AND d_year = 1994
AND lo_discount BETWEEN 5 AND 7
AND lo_quantity BETWEEN 26 AND 35;

--Q2.1
SELECT SUM(lo_revenue), d_year, p_brand
FROM lineorder, dates, part, supplier
WHERE
lo_orderdate = d_datekey
AND lo_partkey = p_partkey
AND lo_suppkey = s_suppkey
AND p_category = 'MFGR#12'
AND s_region = 'AMERICA'
GROUP BY d_year, p_brand
ORDER BY p_brand;

--Q2.2
SELECT SUM(lo_revenue), d_year, p_brand
FROM lineorder, dates, part, supplier
WHERE
lo_orderdate = d_datekey
AND lo_partkey = p_partkey
AND lo_suppkey = s_suppkey
AND p_brand BETWEEN 'MFGR#2221' AND 'MFGR#2228'
AND s_region = 'ASIA'
GROUP BY d_year, p_brand
ORDER BY d_year, p_brand;

--Q2.3
SELECT SUM(lo_revenue), d_year, p_brand
FROM lineorder, dates, part, supplier
WHERE
lo_orderdate = d_datekey
AND lo_partkey = p_partkey
AND lo_suppkey = s_suppkey
AND p_brand = 'MFGR#2239'
AND s_region = 'EUROPE'
GROUP BY d_year, p_brand
ORDER BY d_year, p_brand;

--Q3.1
SELECT
c_nation,
s_nation,
d_year,
SUM(lo_revenue) AS REVENUE
FROM customer, lineorder, supplier, dates
WHERE
lo_custkey = c_custkey
AND lo_suppkey = s_suppkey
AND lo_orderdate = d_datekey
AND c_region = 'ASIA'
AND s_region = 'ASIA'
AND d_year >= 1992
AND d_year <= 1997
GROUP BY c_nation, s_nation, d_year
ORDER BY d_year ASC, REVENUE DESC;

--Q3.2
SELECT
c_city,
s_city,
d_year,
SUM(lo_revenue) AS REVENUE
FROM customer, lineorder, supplier, dates
WHERE
lo_custkey = c_custkey
AND lo_suppkey = s_suppkey
AND lo_orderdate = d_datekey
AND c_nation = 'UNITED STATES'
AND s_nation = 'UNITED STATES'
AND d_year >= 1992
AND d_year <= 1997
GROUP BY c_city, s_city, d_year
ORDER BY d_year ASC, REVENUE DESC;

--Q3.3
SELECT
c_city,
s_city,
d_year,
SUM(lo_revenue) AS REVENUE
FROM customer, lineorder, supplier, dates
WHERE
lo_custkey = c_custkey
AND lo_suppkey = s_suppkey
AND lo_orderdate = d_datekey
AND (
c_city = 'UNITED KI1'
OR c_city = 'UNITED KI5'
)
AND (
s_city = 'UNITED KI1'
OR s_city = 'UNITED KI5'
)
AND d_year >= 1992
AND d_year <= 1997
GROUP BY c_city, s_city, d_year
ORDER BY d_year ASC, REVENUE DESC;

--Q3.4
SELECT
c_city,
s_city,
d_year,
SUM(lo_revenue) AS REVENUE
FROM customer, lineorder, supplier, dates
WHERE
lo_custkey = c_custkey
AND lo_suppkey = s_suppkey
AND lo_orderdate = d_datekey
AND (
c_city = 'UNITED KI1'
OR c_city = 'UNITED KI5'
)
AND (
s_city = 'UNITED KI1'
OR s_city = 'UNITED KI5'
)
AND d_yearmonth = 'Dec1997'
GROUP BY c_city, s_city, d_year
ORDER BY d_year ASC, REVENUE DESC;

--Q4.1
SELECT
d_year,
c_nation,
SUM(lo_revenue - lo_supplycost) AS PROFIT
FROM dates, customer, supplier, part, lineorder
WHERE
lo_custkey = c_custkey
AND lo_suppkey = s_suppkey
AND lo_partkey = p_partkey
AND lo_orderdate = d_datekey
AND c_region = 'AMERICA'
AND s_region = 'AMERICA'
AND (
p_mfgr = 'MFGR#1'
OR p_mfgr = 'MFGR#2'
)
GROUP BY d_year, c_nation
ORDER BY d_year, c_nation;

--Q4.2
SELECT
d_year,
s_nation,
p_category,
SUM(lo_revenue - lo_supplycost) AS PROFIT
FROM dates, customer, supplier, part, lineorder
WHERE
lo_custkey = c_custkey
AND lo_suppkey = s_suppkey
AND lo_partkey = p_partkey
AND lo_orderdate = d_datekey
AND c_region = 'AMERICA'
AND s_region = 'AMERICA'
AND (
d_year = 1997
OR d_year = 1998
)
AND (
p_mfgr = 'MFGR#1'
OR p_mfgr = 'MFGR#2'
)
GROUP BY d_year, s_nation, p_category
ORDER BY d_year, s_nation, p_category;

--Q4.3
SELECT
d_year,
s_city,
p_brand,
SUM(lo_revenue - lo_supplycost) AS PROFIT
FROM dates, customer, supplier, part, lineorder
WHERE
lo_custkey = c_custkey
AND lo_suppkey = s_suppkey
AND lo_partkey = p_partkey
AND lo_orderdate = d_datekey
AND s_nation = 'UNITED STATES'
AND (
d_year = 1997
OR d_year = 1998
)
AND p_category = 'MFGR#14'
GROUP BY d_year, s_city, p_brand
ORDER BY d_year, s_city, p_brand;