メインコンテンツまでスキップ
バージョン: 4.x

TPC-H Benchmark

TPC-H は決定支援ベンチマーク(Decision Support Benchmark)で、ビジネス指向の特別なクエリと同時データ変更のセットで構成されています。クエリされ、データベースに格納されるデータには、幅広い業界での関連性があります。このベンチマークは、大量のデータを検査し、高度に複雑なクエリを実行し、重要なビジネス質問に答える決定支援システムを実証します。TPC-H で報告されるパフォーマンス指標は TPC-H 複合クエリパフォーマンス指標毎時(QphH@Size)と呼ばれ、システムのクエリ処理能力の複数の側面を反映しています。これらの側面には、クエリ実行時に選択されたデータベースサイズ、単一ストリームでクエリが送信された際のクエリ処理能力、および多くの同時ユーザーによってクエリが送信された際のクエリスループットが含まれます。

この文書では主に、TPC-H 1000G テストセットにおける Doris のパフォーマンスについて紹介します。

TPC-H 標準テストデータセットの22のクエリについて、Apache Doris 3.0.3-rc03(Compute-Storage Coupled Mode)と Apache Doris 2.1.7-rc03 バージョンに基づく比較テストを実施しました。バージョン3.xの統合ストレージ・コンピューティングモードのパフォーマンスは、バージョン2.1.xをベースにしています。

TPCH_1000G

1. ハードウェア環境

ハードウェア設定説明
マシン台数4台 Aliyun Virtual Machine (1FE,3BEs)
CPUIntel Xeon (Ice Lake) Platinum 8369B 32C
メモリ128G
ディスクEnterprise SSD (PL0)

2. ソフトウェア環境

  • Doris 3BE と 1FE をデプロイ
  • カーネルバージョン: Linux version 5.15.0-101-generic
  • OS バージョン: Ubuntu 20.04 LTS (Focal Fossa)
  • Doris ソフトウェアバージョン: Apache Doris 3.0.3-rc03 (Compute-Storage Coupled Mode), Apache Doris 2.1.7-rc03
  • JDK: openjdk version "17.0.2"

3. テストデータ量

テスト全体のシミュレーションによって生成された TPC-H 1000G データをそれぞれ Apache Doris 3.0.3-rc03(Compute-Storage Coupled Mode)と Apache Doris 2.1.7-rc03 にインポートしてテストを行いました。以下はTableの関連説明とデータ量です。

TPC-H Table名行数説明
REGION5地域
NATION25国家
SUPPLIER10,000,000供給者
PART200,000,000部品
PARTSUPP800,000,000部品供給
CUSTOMER150,000,000顧客
ORDERS1,500,000,000注文
LINEITEM5,999,989,709注文詳細

4. テスト SQL

TPC-H 22 テストクエリ文: TPCH-Query-SQL

5. テスト結果

ここでは Apache Doris 3.0.3-rc03(Compute-Storage Coupled Mode)と Apache Doris 2.1.7-rc03 で比較テストを実施しました。テストでは Query Time(ms) を主要なパフォーマンス指標として使用しました。テスト結果は以下の通りです:

QueryApache Doris 3.0.3-rc03 Compute-Storage Coupled Mode (ms)Apache Doris 2.1.7-rc03 (ms)
q011339011880
q02250280
q0348003890
q0428702570
q0567806630
q06200170
q0723502420
q0843003730
q091516015910
q1050007880
q11650560
q12530500
q1394409540
q14570590
q1512101170
q161070910
q1720001920
q181795017700
q1925502370
q20490560
q2178309150
q2228901130
Total102280101460

6. 環境準備

公式ドキュメントを参照して Doris をインストールおよびデプロイし、正常に動作する Doris クラスタ(最低 1 FE 1 BE、1 FE 3 BE を推奨)を取得してください。

7. データ準備

7.1 TPC-H データ生成ツールのダウンロードとインストール

以下のスクリプトを実行して tpch-tools ツールをダウンロードおよびコンパイルしてください。

sh bin/build-tpch-dbgen.sh

インストールが正常に完了すると、dbgenバイナリがTPC-H_Tools_v3.0.0/ディレクトリ下に生成されます。

7.2 TPC-H Test Setの生成

以下のスクリプトを実行してTPC-Hデータセットを生成します:

sh bin/gen-tpch-data.sh -s 1000

注意 1: sh gen-tpch-data.sh -h でスクリプトのヘルプを確認してください。

注意 2: データは tpch-data/ ディレクトリ下に .tbl サフィックス付きで生成されます。総ファイルサイズは約1000GBで、生成に数分から1時間程度かかる場合があります。

注意 3: デフォルトでは100Gの標準テストデータセットが生成されます。

7.3 Table作成

7.3.1 doris-cluster.conf ファイルの準備

スクリプトをインポートする前に、doris-cluster.conf ファイルにFEのipポートやその他の情報を記述する必要があります。

ファイルは ${DORIS_HOME}/tools/tpch-tools/conf/ に配置されています。

ファイルの内容には、FEのip、HTTPポート、ユーザー名、パスワード、およびインポート対象データのDB名が含まれます:

# Any of FE host
export FE_HOST='127.0.0.1'
# http_port in fe.conf
export FE_HTTP_PORT=8030
# query_port in fe.conf
export FE_QUERY_PORT=9030
# Doris username
export USER='root'
# Doris password
export PASSWORD=''
# The database where TPC-H tables located
export DB='tpch'

TPC-HTableを生成・作成するために以下のスクリプトを実行する

sh bin/create-tpch-tables.sh -s 1000

または、create-tpch-tables.sqlのTable作成文をコピーして、Dorisで実行してください。

7.4 データのインポート

以下のコマンドでデータインポートを実行してください:

sh bin/load-tpch-data.sh

7.5 インポートしたデータの確認

以下のSQL文を実行して、インポートしたデータが上記のデータと一致していることを確認してください。

select count(*)  from  lineitem;
select count(*) from orders;
select count(*) from partsupp;
select count(*) from part;
select count(*) from customer;
select count(*) from supplier;
select count(*) from nation;
select count(*) from region;
select count(*) from revenue0;

7.6 Query Test

7.6.1 Executing Query Scripts

上記のテストSQLを実行するか、以下のコマンドを実行してください

sh bin/run-tpch-queries.sh -s 1000

7.6.2 単一SQL実行

以下はテストで使用されるSQL文です。最新のSQLはコードベースから取得することもできます。

--Q1
select
l_returnflag,
l_linestatus,
sum(l_quantity) as sum_qty,
sum(l_extendedprice) as sum_base_price,
sum(l_extendedprice * (1 - l_discount)) as sum_disc_price,
sum(l_extendedprice * (1 - l_discount) * (1 + l_tax)) as sum_charge,
avg(l_quantity) as avg_qty,
avg(l_extendedprice) as avg_price,
avg(l_discount) as avg_disc,
count(*) as count_order
from
lineitem
where
l_shipdate <= date '1998-12-01' - interval '90' day
group by
l_returnflag,
l_linestatus
order by
l_returnflag,
l_linestatus;

--Q2
select
s_acctbal,
s_name,
n_name,
p_partkey,
p_mfgr,
s_address,
s_phone,
s_comment
from
part,
supplier,
partsupp,
nation,
region
where
p_partkey = ps_partkey
and s_suppkey = ps_suppkey
and p_size = 15
and p_type like '%BRASS'
and s_nationkey = n_nationkey
and n_regionkey = r_regionkey
and r_name = 'EUROPE'
and ps_supplycost = (
select
min(ps_supplycost)
from
partsupp,
supplier,
nation,
region
where
p_partkey = ps_partkey
and s_suppkey = ps_suppkey
and s_nationkey = n_nationkey
and n_regionkey = r_regionkey
and r_name = 'EUROPE'
)
order by
s_acctbal desc,
n_name,
s_name,
p_partkey
limit 100;

--Q3
select
l_orderkey,
sum(l_extendedprice * (1 - l_discount)) as revenue,
o_orderdate,
o_shippriority
from
customer,
orders,
lineitem
where
c_mktsegment = 'BUILDING'
and c_custkey = o_custkey
and l_orderkey = o_orderkey
and o_orderdate < date '1995-03-15'
and l_shipdate > date '1995-03-15'
group by
l_orderkey,
o_orderdate,
o_shippriority
order by
revenue desc,
o_orderdate
limit 10;

--Q4
select
o_orderpriority,
count(*) as order_count
from
orders
where
o_orderdate >= date '1993-07-01'
and o_orderdate < date '1993-07-01' + interval '3' month
and exists (
select
*
from
lineitem
where
l_orderkey = o_orderkey
and l_commitdate < l_receiptdate
)
group by
o_orderpriority
order by
o_orderpriority;

--Q5
select
n_name,
sum(l_extendedprice * (1 - l_discount)) as revenue
from
customer,
orders,
lineitem,
supplier,
nation,
region
where
c_custkey = o_custkey
and l_orderkey = o_orderkey
and l_suppkey = s_suppkey
and c_nationkey = s_nationkey
and s_nationkey = n_nationkey
and n_regionkey = r_regionkey
and r_name = 'ASIA'
and o_orderdate >= date '1994-01-01'
and o_orderdate < date '1994-01-01' + interval '1' year
group by
n_name
order by
revenue desc;

--Q6
select
sum(l_extendedprice * l_discount) as revenue
from
lineitem
where
l_shipdate >= date '1994-01-01'
and l_shipdate < date '1994-01-01' + interval '1' year
and l_discount between .06 - 0.01 and .06 + 0.01
and l_quantity < 24;

--Q7
select
supp_nation,
cust_nation,
l_year,
sum(volume) as revenue
from
(
select
n1.n_name as supp_nation,
n2.n_name as cust_nation,
extract(year from l_shipdate) as l_year,
l_extendedprice * (1 - l_discount) as volume
from
supplier,
lineitem,
orders,
customer,
nation n1,
nation n2
where
s_suppkey = l_suppkey
and o_orderkey = l_orderkey
and c_custkey = o_custkey
and s_nationkey = n1.n_nationkey
and c_nationkey = n2.n_nationkey
and (
(n1.n_name = 'FRANCE' and n2.n_name = 'GERMANY')
or (n1.n_name = 'GERMANY' and n2.n_name = 'FRANCE')
)
and l_shipdate between date '1995-01-01' and date '1996-12-31'
) as shipping
group by
supp_nation,
cust_nation,
l_year
order by
supp_nation,
cust_nation,
l_year;

--Q8

select
o_year,
sum(case
when nation = 'BRAZIL' then volume
else 0
end) / sum(volume) as mkt_share
from
(
select
extract(year from o_orderdate) as o_year,
l_extendedprice * (1 - l_discount) as volume,
n2.n_name as nation
from
part,
supplier,
lineitem,
orders,
customer,
nation n1,
nation n2,
region
where
p_partkey = l_partkey
and s_suppkey = l_suppkey
and l_orderkey = o_orderkey
and o_custkey = c_custkey
and c_nationkey = n1.n_nationkey
and n1.n_regionkey = r_regionkey
and r_name = 'AMERICA'
and s_nationkey = n2.n_nationkey
and o_orderdate between date '1995-01-01' and date '1996-12-31'
and p_type = 'ECONOMY ANODIZED STEEL'
) as all_nations
group by
o_year
order by
o_year;

--Q9
select
nation,
o_year,
sum(amount) as sum_profit
from
(
select
n_name as nation,
extract(year from o_orderdate) as o_year,
l_extendedprice * (1 - l_discount) - ps_supplycost * l_quantity as amount
from
part,
supplier,
lineitem,
partsupp,
orders,
nation
where
s_suppkey = l_suppkey
and ps_suppkey = l_suppkey
and ps_partkey = l_partkey
and p_partkey = l_partkey
and o_orderkey = l_orderkey
and s_nationkey = n_nationkey
and p_name like '%green%'
) as profit
group by
nation,
o_year
order by
nation,
o_year desc;

--Q10
select
c_custkey,
c_name,
sum(l_extendedprice * (1 - l_discount)) as revenue,
c_acctbal,
n_name,
c_address,
c_phone,
c_comment
from
customer,
orders,
lineitem,
nation
where
c_custkey = o_custkey
and l_orderkey = o_orderkey
and o_orderdate >= date '1993-10-01'
and o_orderdate < date '1993-10-01' + interval '3' month
and l_returnflag = 'R'
and c_nationkey = n_nationkey
group by
c_custkey,
c_name,
c_acctbal,
c_phone,
n_name,
c_address,
c_comment
order by
revenue desc
limit 20;


--Q11
select
ps_partkey,
sum(ps_supplycost * ps_availqty) as value
from
partsupp,
supplier,
nation
where
ps_suppkey = s_suppkey
and s_nationkey = n_nationkey
and n_name = 'GERMANY'
group by
ps_partkey having
sum(ps_supplycost * ps_availqty) > (
select
sum(ps_supplycost * ps_availqty) * 0.000002
from
partsupp,
supplier,
nation
where
ps_suppkey = s_suppkey
and s_nationkey = n_nationkey
and n_name = 'GERMANY'
)
order by
value desc;

--Q12
select
l_shipmode,
sum(case
when o_orderpriority = '1-URGENT'
or o_orderpriority = '2-HIGH'
then 1
else 0
end) as high_line_count,
sum(case
when o_orderpriority <> '1-URGENT'
and o_orderpriority <> '2-HIGH'
then 1
else 0
end) as low_line_count
from
orders,
lineitem
where
o_orderkey = l_orderkey
and l_shipmode in ('MAIL', 'SHIP')
and l_commitdate < l_receiptdate
and l_shipdate < l_commitdate
and l_receiptdate >= date '1994-01-01'
and l_receiptdate < date '1994-01-01' + interval '1' year
group by
l_shipmode
order by
l_shipmode;

--Q13
select
c_count,
count(*) as custdist
from
(
select
c_custkey,
count(o_orderkey) as c_count
from
customer left outer join orders on
c_custkey = o_custkey
and o_comment not like '%special%requests%'
group by
c_custkey
) as c_orders
group by
c_count
order by
custdist desc,
c_count desc;

--Q14
select
100.00 * sum(case
when p_type like 'PROMO%'
then l_extendedprice * (1 - l_discount)
else 0
end) / sum(l_extendedprice * (1 - l_discount)) as promo_revenue
from
lineitem,
part
where
l_partkey = p_partkey
and l_shipdate >= date '1995-09-01'
and l_shipdate < date '1995-09-01' + interval '1' month;

--Q15
select
s_suppkey,
s_name,
s_address,
s_phone,
total_revenue
from
supplier,
revenue0
where
s_suppkey = supplier_no
and total_revenue = (
select
max(total_revenue)
from
revenue0
)
order by
s_suppkey;

--Q16
select
p_brand,
p_type,
p_size,
count(distinct ps_suppkey) as supplier_cnt
from
partsupp,
part
where
p_partkey = ps_partkey
and p_brand <> 'Brand#45'
and p_type not like 'MEDIUM POLISHED%'
and p_size in (49, 14, 23, 45, 19, 3, 36, 9)
and ps_suppkey not in (
select
s_suppkey
from
supplier
where
s_comment like '%Customer%Complaints%'
)
group by
p_brand,
p_type,
p_size
order by
supplier_cnt desc,
p_brand,
p_type,
p_size;

--Q17
select
sum(l_extendedprice) / 7.0 as avg_yearly
from
lineitem,
part
where
p_partkey = l_partkey
and p_brand = 'Brand#23'
and p_container = 'MED BOX'
and l_quantity < (
select
0.2 * avg(l_quantity)
from
lineitem
where
l_partkey = p_partkey
);

--Q18
select
c_name,
c_custkey,
o_orderkey,
o_orderdate,
o_totalprice,
sum(l_quantity)
from
customer,
orders,
lineitem
where
o_orderkey in (
select
l_orderkey
from
lineitem
group by
l_orderkey having
sum(l_quantity) > 300
)
and c_custkey = o_custkey
and o_orderkey = l_orderkey
group by
c_name,
c_custkey,
o_orderkey,
o_orderdate,
o_totalprice
order by
o_totalprice desc,
o_orderdate
limit 100;


--Q19
select
sum(l_extendedprice* (1 - l_discount)) as revenue
from
lineitem,
part
where
(
p_partkey = l_partkey
and p_brand = 'Brand#12'
and p_container in ('SM CASE', 'SM BOX', 'SM PACK', 'SM PKG')
and l_quantity >= 1 and l_quantity <= 1 + 10
and p_size between 1 and 5
and l_shipmode in ('AIR', 'AIR REG')
and l_shipinstruct = 'DELIVER IN PERSON'
)
or
(
p_partkey = l_partkey
and p_brand = 'Brand#23'
and p_container in ('MED BAG', 'MED BOX', 'MED PKG', 'MED PACK')
and l_quantity >= 10 and l_quantity <= 10 + 10
and p_size between 1 and 10
and l_shipmode in ('AIR', 'AIR REG')
and l_shipinstruct = 'DELIVER IN PERSON'
)
or
(
p_partkey = l_partkey
and p_brand = 'Brand#34'
and p_container in ('LG CASE', 'LG BOX', 'LG PACK', 'LG PKG')
and l_quantity >= 20 and l_quantity <= 20 + 10
and p_size between 1 and 15
and l_shipmode in ('AIR', 'AIR REG')
and l_shipinstruct = 'DELIVER IN PERSON'
);

--Q20
select
s_name,
s_address
from
supplier,
nation
where
s_suppkey in (
select
ps_suppkey
from
partsupp
where
ps_partkey in (
select
p_partkey
from
part
where
p_name like 'forest%'
)
and ps_availqty > (
select
0.5 * sum(l_quantity)
from
lineitem
where
l_partkey = ps_partkey
and l_suppkey = ps_suppkey
and l_shipdate >= date '1994-01-01'
and l_shipdate < date '1994-01-01' + interval '1' year
)
)
and s_nationkey = n_nationkey
and n_name = 'CANADA'
order by
s_name;

--Q21
select
s_name,
count(*) as numwait
from
supplier,
lineitem l1,
orders,
nation
where
s_suppkey = l1.l_suppkey
and o_orderkey = l1.l_orderkey
and o_orderstatus = 'F'
and l1.l_receiptdate > l1.l_commitdate
and exists (
select
*
from
lineitem l2
where
l2.l_orderkey = l1.l_orderkey
and l2.l_suppkey <> l1.l_suppkey
)
and not exists (
select
*
from
lineitem l3
where
l3.l_orderkey = l1.l_orderkey
and l3.l_suppkey <> l1.l_suppkey
and l3.l_receiptdate > l3.l_commitdate
)
and s_nationkey = n_nationkey
and n_name = 'SAUDI ARABIA'
group by
s_name
order by
numwait desc,
s_name
limit 100;

--Q22
select
cntrycode,
count(*) as numcust,
sum(c_acctbal) as totacctbal
from
(
select
substring(c_phone, 1, 2) as cntrycode,
c_acctbal
from
customer
where
substring(c_phone, 1, 2) in
('13', '31', '23', '29', '30', '18', '17')
and c_acctbal > (
select
avg(c_acctbal)
from
customer
where
c_acctbal > 0.00
and substring(c_phone, 1, 2) in
('13', '31', '23', '29', '30', '18', '17')
)
and not exists (
select
*
from
orders
where
o_custkey = c_custkey
)
) as custsale
group by
cntrycode
order by
cntrycode;