バージョン: 26.x

Hive カタログ

Hive MetastoreまたはHive Metastore互換のメタデータサービスに接続することで、DorisはHiveデータベースとTable情報を自動的に取得し、データクエリを実行できます。

Hiveに加えて、多くの他のシステムがメタデータの保存にHive Metastoreを使用しています。したがって、Hive カタログを通じて、HiveTableだけでなく、IcebergやHudiなど、メタデータストレージにHive Metastoreを使用する他のTable形式にもアクセスできます。

適用シナリオ

シナリオ	説明
クエリ加速	Dorisのディストリビューティッドコンピューティングエンジンを使用して、Hiveデータに直接アクセスし、クエリを高速化します。
データ統合	Hiveデータを読み取ってDoris内部Tableに書き込むか、Dorisコンピューティングエンジンを使用してZeroETL操作を実行します。
データ書き戻し	Dorisがサポートする任意のソースからのデータを処理し、HiveTableに書き戻します。

カタログの設定

構文

CREATE CATALOG [IF NOT EXISTS] catalog_name PROPERTIES (
    'type'='hms', -- required
    'hive.metastore.type' = '<hive_metastore_type>', -- optional
    'hive.version' = '<hive_version>', -- optional
    'fs.defaultFS' = '<fs_defaultfs>', -- optional
    {MetaStoreProperties},
    {StorageProperties},
    {HiveProperties},
    {CommonProperties}
);

<hive_metastore_type>

Hive Metastoreのタイプを指定します。
- hms: 標準のHive Metastoreサービス。
- glue: Hive Metastore互換インターフェースを使用してAWS Glueメタデータサービスにアクセス。
- dlf: Hive Metastore互換インターフェースを使用してAlibaba Cloud DLFメタデータサービスにアクセス。
<fs_defaultfs>

このパラメータは、DorisからこのHive カタログのTableにデータを書き込む際に必要です。例：

'fs.defaultFS' = 'hdfs://namenode:port'
{MetaStoreProperties}

MetaStorePropertiesセクションは、Metastoreメタデータサービスの接続および認証情報を入力するためのものです。詳細については「サポートされているメタデータサービス」セクションを参照してください。
{StorageProperties}

StoragePropertiesセクションは、ストレージシステムに関連する接続および認証情報を入力するためのものです。詳細については「サポートされているストレージシステム」セクションを参照してください。
{HiveProperties}

HivePropertiesセクションは、Hive カタログに関連するプロパティを入力するためのものです。
- get_schema_from_table: デフォルト値はfalseです。デフォルトでは、DorisはHive MetastoreからTableスキーマ情報を取得します。ただし、場合によってはStorage schema reading not supportedなどのエラーのように、互換性の問題が発生することがあります。この場合、このパラメータをtrueに設定すると、TableスキーマはTableオブジェクトから直接取得されます。ただし、この方法では列のデフォルト値情報が無視されることに注意してください。このプロパティはバージョン2.1.10および3.0.6以降でサポートされています。
- hive.recursive_directories: パーティションディレクトリをリストする際にサブディレクトリに再帰するかどうか。このパラメータはバージョン3.0.2以降でサポートされています。バージョン4.0より前では、このパラメータはデフォルトでfalseでしたが、それ以降のバージョンではtrueがデフォルトです。一部のHiveTableのパーティションパスは、Tableスキーマのパーティション情報と一致しない場合があります。サブディレクトリ内のデータファイルを取得するには、このパラメータをtrueに設定する必要があります。
- hive.ignore_absent_partitions: 存在しないパーティションを無視するかどうか。デフォルトはtrueです。falseに設定した場合、存在しないパーティションに遭遇するとクエリはエラーを報告します。このパラメータはバージョン3.0.2以降でサポートされています。
{CommonProperties}

CommonPropertiesセクションは、共通属性を入力するためのものです。カタログ概要の「Common Properties」セクションを参照してください。

サポートされているHiveバージョン

Hive 1.x、2.x、3.x、および4.xをサポートします。

HiveトランザクショナルTableはバージョン3.x以降でサポートされています。詳細については「Hive Transactional Tables」セクションを参照してください。

サポートされているメタデータサービス

注意：異なるDorisバージョンでサポートされているサービスタイプとパラメータは若干異なります。[Examples]セクションを参照してください。

サポートされているストレージシステム

DorisでHiveTableを作成してデータを書き込むには、カタログ属性にfs.defaultFSプロパティを明示的に追加する必要があります。カタログがクエリのみを目的として作成される場合は、このパラメータを省略できます。

異なるDorisバージョンでサポートされているサービスタイプとパラメータは若干異なります。[Examples]セクションを参照してください。

サポートされているデータフォーマット

Hive
Hudi
- Parquet
- ORC
Iceberg
- Parquet
- ORC

列タイプマッピング

Hive タイプ	Doris タイプ	コメント
boolean	boolean
tinyint	tinyint
smallint	smallint
int	int
bigint	bigint
date	date
timestamp	datetime(6)	精度6のdatetimeにマッピング
float	float
double	double
decimal(P, S)	decimal(P, S)	精度が指定されていない場合はdecimal(9, 0)がデフォルト
char(N)	char(N)
varchar(N)	varchar(N)
string	string
binary	string/varbinary	カタログの`enable.mapping.varbinary`プロパティによって制御されます（4.0.2以降でサポート）。デフォルトは`false`で、`string`にマッピングされます。`true`の場合は`varbinary`タイプにマッピングされます。
array	array
map	map
struct	struct
other	unsupported

例

Hive Metastore

Version 3.1+

Kerberos認証なしでHMSおよびHDFSサービスにアクセス

CREATE CATALOG hive_hms_hdfs_test_catalog PROPERTIES (
    'type' = 'hms',
    'hive.metastore.uris' = 'thrift://127.0.0.1:9383',
    'fs.defaultFS' = 'hdfs://127.0.0.1:8520',
    'hadoop.username' = 'doris'
);

Kerberosによる認証が有効化されたHMSとHDFSサービスにアクセスする

CREATE CATALOG hive_hms_hdfs_kerberos_test_catalog PROPERTIES (
    'type' = 'hms',
    'hive.metastore.uris' = 'thrift://127.0.0.1:9583',
    'hive.metastore.client.principal' = 'hive/presto-master.docker.cluster@LABS.TERADATA.COM',
    'hive.metastore.client.keytab' = '/keytabs/hive-presto-master.keytab',
    'hive.metastore.service.principal' = 'hive/hadoop-master@LABS.TERADATA.COM',
    'hive.metastore.sasl.enabled ' = 'true',
    'hive.metastore.authentication.type' = 'kerberos',
    'fs.defaultFS' = 'hdfs://127.0.0.1:8520',
    'hadoop.security.auth_to_local' = 'RULE:[2:\$1@\$0](.*@LABS.TERADATA.COM)s/@.*//
    RULE:[2:\$1@\$0](.*@OTHERLABS.TERADATA.COM)s/@.*//
    RULE:[2:\$1@\$0](.*@OTHERREALM.COM)s/@.*//
    DEFAULT',
    'hadoop.security.authentication' = 'kerberos',
    'hadoop.kerberos.principal' = 'hive/presto-master.docker.cluster@LABS.TERADATA.COM',
    'hadoop.kerberos.keytab' = '/keytabs/hive-presto-master.keytab'
);

CREATE CATALOG hive_viewfs PROPERTIES (
    'type' = 'hms',
    'hive.metastore.uris' = 'thrift://172.0.0.1:9083',
    'fs.defaultFS' = 'viewfs://your-cluster',
    'fs.viewfs.mounttable.your-cluster.link./ns1' = 'hdfs://your-nameservice/',
    'fs.viewfs.mounttable.your-cluster.homedir' = '/ns1',
    'dfs.nameservices' = 'your-nameservice',
    'dfs.ha.namenodes.your-nameservice' = 'nn1,nn2',
    'dfs.namenode.rpc-address.your-nameservice.nn1' = '172.21.0.2:8088',
    'dfs.namenode.rpc-address.your-nameservice.nn2' = '172.21.0.3:8088',
    'dfs.client.failover.proxy.provider.your-nameservice' = 'org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider'
);

Kerberos認証なしでHMSにアクセス

CREATE CATALOG hive_hms_s3_test_catalog PROPERTIES (
    'type' = 'hms',
    'hive.metastore.uris' = 'thrift://127.0.0.1:9383',
    's3.region' = 'ap-east-1',
    's3.access_key' = '<ak>',
    's3.secret_key' = '<sk>'
);

IAM Assumed Role を使用したS3アクセス認証情報の取得 (3.1.2+)

CREATE CATALOG hive_hms_on_s3_iamrole PROPERTIES (
    'type' = 'hms',
    'hive.metastore.uris' = 'thrift://127.0.0.1:9383',
    's3.region'   = 'us-east-1',
    's3.role_arn' = 'arn:aws:iam::543815668950:role/role'
);

Kerberos認証なしでHMSにアクセス

CREATE CATALOG hive_hms_on_oss_catalog PROPERTIES (
    'type' = 'hms',
    'hive.metastore.uris' = 'thrift://127.0.0.1:9383',
    'oss.region' = 'cn-beijing',
    'oss.endpoint' = 'oss-cn-beijing.aliyuncs.com',
    'oss.access_key' = '<ak>',
    'oss.secret_key' = '<sk>'
);

Kerberos認証を有効にしたHMSへのアクセス

CREATE CATALOG hive_hms_on_oss_kerberos_old_catalog PROPERTIES (
    'type' = 'hms',
    'hive.metastore.uris' = 'thrift://127.0.0.1:9583',
    'hive.metastore.client.principal' = 'hive/presto-master.docker.cluster@LABS.TERADATA.COM',
    'hive.metastore.client.keytab' = '/keytabs/hive-presto-master.keytab',
    'hive.metastore.service.principal' = 'hive/hadoop-master@LABS.TERADATA.COM',
    'hive.metastore.sasl.enabled ' = 'true',
    'hive.metastore.authentication.type' = 'kerberos',
    'hadoop.security.auth_to_local' = 'RULE:[2:\$1@\$0](.*@LABS.TERADATA.COM)s/@.*//
    RULE:[2:\$1@\$0](.*@OTHERLABS.TERADATA.COM)s/@.*//
    RULE:[2:\$1@\$0](.*@OTHERREALM.COM)s/@.*//
    DEFAULT',
    'oss.region' = 'cn-beijing',
    'oss.endpoint' = 'oss-cn-beijing.aliyuncs.com',
    'oss.access_key' = '<ak>',
    'oss.secret_key' = '<sk>'
);

Kerberos認証なしでHMSにアクセスする

CREATE CATALOG hive_hms_cos_test_region_catalog PROPERTIES (
    'type' = 'hms',
    'hive.metastore.uris' = 'thrift://127.0.0.1:9383',
    'cos.region' = 'ap-beijing',
    'cos.endpoint' = 'cos.ap-beijing.myqcloud.com',
    'cos.access_key' = '<ak>',
    'cos.secret_key' = '<sk>'
);

Kerberos認証を有効にしたHMSへのアクセス

CREATE CATALOG hive_hms_on_cos_kerberos_new_catalog PROPERTIES (
    'type' = 'hms',
    'hive.metastore.uris' = 'thrift://127.0.0.1:9583',
    'hive.metastore.client.principal' = 'hive/presto-master.docker.cluster@LABS.TERADATA.COM',
    'hive.metastore.client.keytab' = '/keytabs/hive-presto-master.keytab',
    'hive.metastore.service.principal' = 'hive/hadoop-master@LABS.TERADATA.COM',
    'hive.metastore.sasl.enabled ' = 'true',
    'hive.metastore.authentication.type' = 'kerberos',
    'hadoop.security.auth_to_local' = 'RULE:[2:\$1@\$0](.*@LABS.TERADATA.COM)s/@.*//
    RULE:[2:\$1@\$0](.*@OTHERLABS.TERADATA.COM)s/@.*//
    RULE:[2:\$1@\$0](.*@OTHERREALM.COM)s/@.*//
    DEFAULT',
    'cos.region' = 'ap-beijing',
    'cos.endpoint' = 'cos.ap-beijing.myqcloud.com',
    'cos.access_key' = '<ak>',
    'cos.secret_key' = '<sk>'
);

Kerberos認証なしでHMSにアクセス

CREATE CATALOG test_hive_on_hms_obs_catalog PROPERTIES (
    'type' = 'hms',
    'hive.metastore.uris' = 'thrift://127.0.0.1:9383',
    'obs.region' = 'cn-north-4',
    'obs.endpoint' = 'obs.cn-north-4.myhuaweicloud.com',
    'cos.access_key' = '<ak>',
    'cos.secret_key' = '<sk>'
);

Kerberos認証を有効にしてHMSにアクセスする

CREATE CATALOG hive_hms_on_obs_kerberos_new_catalog PROPERTIES (
    'type' = 'hms',
    'iceberg.catalog.type' = 'hms',
    'hive.metastore.uris' = 'thrift://127.0.0.1:9583',
    'hive.metastore.service.principal' = 'hive/hadoop-master@LABS.TERADATA.COM',
    'hive.metastore.sasl.enabled' = 'true',
    'hive.metastore.client.principal' = 'hive/presto-master.docker.cluster@LABS.TERADATA.COM',
    'hive.metastore.client.keytab' = '/keytabs/hive-presto-master.keytab',
    'hive.metastore.authentication.type' = 'kerberos',
    'hadoop.security.auth_to_local' = 'RULE:[2:\$1@\$0](.*@LABS.TERADATA.COM)s/@.*//
    RULE:[2:\$1@\$0](.*@OTHERLABS.TERADATA.COM)s/@.*//
    RULE:[2:\$1@\$0](.*@OTHERREALM.COM)s/@.*//
    DEFAULT',
    'obs.region' = 'cn-north-4',
    'obs.endpoint' = 'obs.cn-north-4.myhuaweicloud.com',
    'obs.access_key' = '<ak>',
    'obs.secret_key' = '<sk>'
);

Kerberos認証なしでHMSにアクセス

CREATE CATALOG `hive_hms_on_gcs_new_catalog` PROPERTIES (
    'type' = 'hms',
    'hive.metastore.uris' = 'thrift://127.0.0.1:9383',
    'gs.access_key' = '<ak>',
    'gs.secret_key' = '<sk>',
    'fs.gcs.support'='true'
);

3.1.3以降でサポート

CREATE CATALOG `hive_hms_on_azure_blob_new_catalog` PROPERTIES (
    'type' = 'hms',
    'hive.metastore.uris' = 'thrift://127.0.0.1:9383',
    'azure.account_name' = '<account_name>',
    'azure.account_key' = '<account_key>',
    'azure.endpoint' = 'https://<account_name>.blob.core.windows.net',
    'fs.azure.support'='true'
);

CREATE CATALOG test_hive_on_hms_minio_catalog PROPERTIES (
    'type' = 'hms',
    'hive.metastore.uris' = 'thrift://127.0.0.1:9383',
    'fs.minio.support' = 'true',
    'minio.use_path_style' = 'true',
    'minio.endpoint' = 'http://127.0.0.1:19001',
    'minio.access_key' = '<ak>',
    'minio.secret_key' = '<sk>'
);

Version 2.1 & 3.0

Kerberos認証なしでHMSにアクセス

CREATE CATALOG hive_hms_hdfs_test_catalog PROPERTIES (
    'type' = 'hms',
    'hive.metastore.uris' = 'thrift://127.0.0.1:9383',
    'hadoop.username' = 'doris',
    'fs.defaultFS' = 'hdfs://127.0.0.1:8320'
);

Kerberosメントを有効にしたHMSへのアクセス

CREATE CATALOG test_two_hive_kerberos PROPERTIES (
    'type' = 'hms',
    'hive.metastore.uris' = 'thrift://127.0.0.1:9583',
    'hive.metastore.sasl.enabled ' = 'true',
    'hive.metastore.kerberos.principal' = 'hive/hadoop-master@LABS.TERADATA.COM',
    'fs.defaultFS' = 'hdfs://127.0.0.1:8520',
    'hadoop.kerberos.min.seconds.before.relogin' = '5',
    'hadoop.security.authentication' = 'kerberos',
    'hadoop.kerberos.principal' = 'hive/presto-master.docker.cluster@LABS.TERADATA.COM',
    'hadoop.kerberos.keytab' = '/keytabs/hive-presto-master.keytab',
    'hadoop.security.auth_to_local' = 'RULE:[2:\$1@\$0](.*@LABS.TERADATA.COM)s/@.*//
    RULE:[2:\$1@\$0](.*@OTHERLABS.TERADATA.COM)s/@.*//
    RULE:[2:\$1@\$0](.*@OTHERREALM.COM)s/@.*//
    DEFAULT'
);

CREATE CATALOG hms_on_s3_catalog PROPERTIES (
    'type' = 'hms',
    'hive.metastore.uris' = 'thrift://127.0.0.1:9383',
    's3.region' = 'ap-east-1',
    's3.endpoint' = 's3.ap-east-1.amazonaws.com',
    's3.access_key' = '<ak>',
    's3.secret_key' = '<sk>'
);

CREATE CATALOG hive_hms_on_oss_catalog PROPERTIES (
    'type' = 'hms',
    'hive.metastore.uris' = 'thrift://127.0.0.1:9383',
    'oss.region' = 'cn-beijing',
    'oss.endpoint' = 'oss-cn-beijing.aliyuncs.com',
    'oss.access_key' = '<ak>',
    'oss.secret_key' = '<sk>'
);

CREATE CATALOG hive_hms_cos_test_region_catalog PROPERTIES (
    'type' = 'hms',
    'hive.metastore.uris' = 'thrift://127.0.0.1:9383',
    'cos.region' = 'ap-beijing',
    'cos.endpoint' = 'cos.ap-beijing.myqcloud.com',
    'cos.access_key' = '<ak>',
    'cos.secret_key' = '<sk>'
);

CREATE CATALOG test_hive_on_hms_obs_catalog PROPERTIES (
    'type' = 'hms',
    'hive.metastore.uris' = 'thrift://127.0.0.1:9383',
    'obs.region' = 'cn-north-4',
    'obs.endpoint' = 'obs.cn-north-4.myhuaweicloud.com',
    'obs.access_key' = '<ak>',
    'obs.secret_key' = '<sk>'
);

CREATE CATALOG hive_hms_on_gcs_s3 PROPERTIES (
    'type'='hms',
    'hive.metastore.uris' = 'thrift://127.0.0.1:9383',
    'warehouse' = 's3://bucket/gcs/paimon_warehouse',
    'gs.access_key' = '<ak>',
    'gs.secret_key' = '<sk>',
    'gs.endpoint' = 'storage.googleapis.com'
);

CREATE CATALOG test_hive_on_hms_minio_catalog PROPERTIES (
    'type' = 'hms',
    'hive.metastore.uris' = 'thrift://127.0.0.1:9383',
    's3.endpoint' = 'http://127.0.0.1:19000',
    's3.access_key' = '<ak>',
    's3.secret_key' = '<sk>',
    's3.path.style.access' = 'true'
);

AWS Glue

Version 3.1+

AWS GlueとS3ストレージサービスは同じ認証情報を共有します。

CREATE CATALOG hive_glue_on_s3_catalog PROPERTIES (
    'type' = 'hms',
    'hive.metastore.type' = 'glue',
    'glue.region' = 'ap-east-1',
    'glue.endpoint' = 'https://glue.ap-east-1.amazonaws.com',
    'glue.access_key' = '<ak>',
    'glue.secret_key' = '<sk>'
);

Glueサービスの認証情報とS3の認証情報が異なる場合、以下の方法でS3の認証情報を個別に指定できます。

CREATE CATALOG hive_glue_on_s3_catalog PROPERTIES (
    'type' = 'hms',
    'hive.metastore.type' = 'glue',
    'glue.region' = 'ap-east-1',
    'glue.endpoint' = 'https://glue.ap-east-1.amazonaws.com',
    'glue.access_key' = '<ak>',
    'glue.secret_key' = '<sk>',
    's3.region' = 'ap-east-1',
    's3.endpoint' = 'https://s3.ap-east-1.amazonaws.com/',
    's3.access_key' = '<ak>',
    's3.secret_key' = '<sk>'
);

IAM Assumed Roleを使用してS3アクセス認証情報を取得する（3.1.2以降）

CREATE CATALOG `glue_hive_iamrole` PROPERTIES ( 
    'type' = 'hms',
    'hive.metastore.type' = 'glue',
    'glue.region' = 'us-east-1',
    'glue.endpoint' = 'https://glue.us-east-1.amazonaws.com',
    'glue.role_arn' = '<role_arn>'
);
</TabItem>
</Tabs>
</details>
<details>
<summary>Version 2.1 & 3.0</summary>
<Tabs>
<TabItem value='S3' label='S3' default>
AWS Glue and S3 storage services share the same authentication credentials.

In non-EC2 environments, you need to use [aws configure](https://docs.aws.amazon.com/cli/latest/userguide/cli-configure-files.html) to configure Credentials information and generate a credentials file in the ~/.aws directory.

Alternatively, you can explicitly specify the credentials provider factory class
when creating the catalog:

<b>"aws.catalog.credentials.provider.factory.class" = "com.amazonaws.glue.catalog.credentials.ConfigurationAWSCredentialsProviderFactory"</b>
```sql
create catalog hive_glue PROPERTIES(
    'type' = 'hms',
    'hive.metastore.type' = 'glue',
    'glue.endpoint' = 'https://glue.ap-northeast-1.amazonaws.com',
    'glue.region' = 'ap-northeast-1',
    'glue.access_key' = '<ak>',
    'glue.secret_key' = '<sk>'
);

Aliyun DLF

Version 3.1+

DLF 1.0

CREATE CATALOG hive_dlf_oss_test_catalog PROPERTIES (
    'type' = 'hms',
    'hive.metastore.type' = 'dlf',
    'dlf.uid' = '203225413946383283',
    'dlf.catalog_id' = 'p2_regression_case',
    'dlf.endpoint' = 'dlf.cn-beijing.aliyuncs.com',
    'dlf.region' = 'cn-beijing',
    'dlf.access_key' = '<ak>',
    'dlf.secret_key' = '<sk>'
);

Version 2.1 & 3.0

DLF 1.0

CREATE CATALOG hive_dlf_oss_test_catalog PROPERTIES (
    'type' = 'hms',
    'hive.metastore.type' = 'dlf',
    'dlf.uid' = '203225413946383283',
    'dlf.catalog.id' = 'p2_regression_case',
    'dlf.endpoint' = 'dlf.cn-beijing.aliyuncs.com',
    'dlf.region' = 'cn-beijing',
    'dlf.access_key' = '<ak>',
    'dlf.secret_key' = '<sk>'
);

Query 運用

Basic Query

After configuring the カタログ, you can query the table data within the カタログ using the following method:

-- 1. カタログに切り替え、データベースを使用してクエリを実行
SWITCH hive_ctl;
USE hive_db;
SELECT * FROM hive_tbl LIMIT 10;

-- 2. hiveデータベースを直接使用
USE hive_ctl.hive_db;
SELECT * FROM hive_tbl LIMIT 10;

-- 3. 完全修飾名を使用してクエリを実行
SELECT * FROM hive_ctl.hive_db.hive_tbl LIMIT 10;

Querying Hive Partitions

You can query Hive partition information using the following two methods:

SHOW PARTITIONS FROM [catalog.][db.]hive_table

This statement lists all partitions and their values for the specified Hive table.

SHOW PARTITIONS FROM hive_table;

* Using the `table$partitions` Metadata Table

Starting from versions 2.1.7 and 3.0.3, you can query Hive partition information through the `table$partitions` metadata table. This table is essentially relational, with each partition column represented as a column, allowing it to be used in any SELECT statement.

```sql
SELECT * FROM hive_table$partitions;

+------------+-------------+
| pt1        | pt2         |
+------------+-------------+
| 2024-10-10 | beijing     |
| 2024-10-10 | shanghai    |
| 2024-10-12 | nanjing     |
| 2024-10-11 | beijing     |
| 2024-10-11 | shanghai    |
+------------+-------------+

Querying Hive Transactional Tables

Hive Transactional tables support ACID semantics. For more details, see Hive Transactions.

Support for Hive Transactional Tables

Table タイプ	Hiveでサポートされている操作	Hive Table Properties	サポート対象のHiveバージョン
Full-ACID Transactional Table	Supports Insert, アップデート, Delete	`'transactional'='true'`	4.x, 3.x, 2.x (2.x requires Major コンパクション in Hive to read)
Insert-Only Transactional Table	Supports Insert only	`'transactional'='true'`, `'transactional_properties'='insert_only'`	4.x, 3.x, 2.x (specify `hive.version` when creating the catalog)

Current Limitations

Original Files scenarios are not supported. When a table is converted to a Transactional table, new data files will use the Hive Transactional table schema, but existing data files will not be converted. These files are referred to as Original Files.

Querying Hive Views

You can query Hive Views, but there are some limitations:

The Hive View definition (HiveQL) must be compatible with SQL statements supported by Doris. Otherwise, a parsing error will occur.
Some functions supported by HiveQL may have the same name as those in Doris but behave differently. This could lead to discrepancies between the results obtained from Hive and Doris. If you encounter such issues, please report them to the community.

Write 運用

Data can be written to Hive tables using the INSERT statement. This is supported for Hive tables created by Doris or existing Hive tables with compatible formats.

For partitioned tables, data will automatically be written to the corresponding partition or a new partition will be created based on the data. Currently, specifying a partition for writing is not supported.

INSERT INTO

The INSERT operation appends data to the target table. Specifying a partition for writing is currently not supported.

INSERT INTO hive_tbl VALUES (val1, val2, val3, val4);
INSERT INTO hive_ctl.hive_db.hive_tbl SELECT col1, col2 FROM internal.db1.tbl1;

INSERT INTO hive_tbl(col1, col2) VALUES (val1, val2);
INSERT INTO hive_tbl(col1, col2, partition_col1, partition_col2) VALUES (1, 2, "beijing", "2023-12-12");

INSERT OVERWRITE

INSERT OVERWRITE completely replaces the existing data in the table with new data. Specifying a partition for writing is currently not supported.

INSERT OVERWRITE TABLE hive_tbl VALUES (val1, val2, val3, val4);
INSERT OVERWRITE TABLE hive_ctl.hive_db.hive_tbl(col1, col2) SELECT col1, col2 FROM internal.db1.tbl1;

The semantics of INSERT OVERWRITE are consistent with Hive, with the following behaviors:

If the target table is partitioned and the source table is empty, the operation has no effect. The target table remains unchanged.
If the target table is non-partitioned and the source table is empty, the target table will be cleared.
Since specifying a partition for writing is not supported, INSERT OVERWRITE automatically handles the relevant partitions in the target table based on the source table values. If the target table is partitioned, only the affected partitions will be overwritten; unaffected partitions remain unchanged.

CTAS

You can create a Hive table and insert data using the CTAS (CREATE TABLE AS SELECT) statement:

CREATE TABLE hive_ctas ENGINE=hive AS SELECT * FROM other_table;

CTAS supports specifying file formats, partitioning methods, and more, as shown below:

CREATE TABLE hive_ctas ENGINE=hive
PARTITION BY LIST (pt1, pt2) ()
AS SELECT col1, pt1, pt2 FROM part_ctas_src WHERE col1 > 0;
    
CREATE TABLE hive_ctl.hive_db.hive_ctas (col1, col2, pt1) ENGINE=hive
PARTITION BY LIST (pt1) ()
PROPERTIES (
    "file_format"="parquet",
    "compression"="zstd"
)
AS SELECT col1, pt1 AS col2, pt2 AS pt1 FROM test_ctas.part_ctas_src WHERE col1 > 0;

Session variables

Parameter name	Default value	Desciption	Since version
`hive_parquet_use_column_names`	`true`	When Doris reads the Parquet data type of the Hive table, it will find the column with the same name from the Parquet file to read the data according to the column name of the Hive table by default. When this variable is `false`, Doris will read data from the Parquet file according to the column order in the Hive table, regardless of the column name. Similar to the `parquet.column.index.access` variable in Hive. This parameter only applies to the top-level column name and is invalid inside the Struct.	2.1.6+, 3.0.3+
`hive_orc_use_column_names`	`true`	Similar to `hive_parquet_use_column_names`, it is for the Hive table ORC data type. Similar to the `orc.force.positional.evolution` variable in Hive.	2.1.6+, 3.0.3+

パラメータ名	デフォルト値	デスクリプション
`hive_sink_max_file_size`	1GB	Maximum data file size. When the data size exceeds this limit, the current file will be closed, and a new file will be created for further writing.
`table_sink_partition_write_max_partition_nums_per_writer`	128	Maximum number of partitions that each instance can write to on a BE node.
`table_sink_non_partition_write_scaling_data_processed_threshold`	25MB	Data volume threshold for starting scaling-write for non-partitioned tables. A new writer (instance) will be used for every additional `table_sink_non_partition_write_scaling_data_processed_threshold` of data. This mechanism adjusts the number of writers (instances) based on data volume to enhance concurrent write throughput, saving resources and minimizing file numbers for smaller data volumes.
`table_sink_partition_write_min_data_processed_rebalance_threshold`	25MB	Minimum data volume threshold to trigger rebalancing for partitioned tables. Rebalancing is triggered if `current accumulated data volume` - `data volume since last rebalancing or initial accumulation` >= `table_sink_partition_write_min_data_processed_rebalance_threshold`. If the final file size varies significantly, reduce this threshold to improve balance. However, a lower threshold may increase rebalancing costs, potentially affecting performance.
`table_sink_partition_write_min_partition_data_processed_rebalance_threshold`	15MB	Minimum partition data volume threshold to trigger rebalancing. Rebalancing occurs if `current partition data volume` >= `threshold` * `number of tasks already allocated to the partition`. If the final file size varies significantly, reduce this threshold to improve balance. However, a lower threshold may increase rebalancing costs, potentially affecting performance.

Database and Table Management

Users can create and delete databases and tables in the Hive Metastore through Doris. Note that Doris only calls the Hive Metastore API for these 運用 and does not store or persist any Hive metadata itself.

Creating and Dropping Databases

You can switch to the appropriate カタログ using the SWITCH statement and execute the CREATE DATABASE statement:

SWITCH hive_ctl;
CREATE DATABASE [IF NOT EXISTS] hive_db;

You can also create a database using a fully qualified name or specify a location, such as:

CREATE DATABASE [IF NOT EXISTS] hive_ctl.hive_db;

CREATE DATABASE [IF NOT EXISTS] hive_ctl.hive_db
PROPERTIES ('location'='hdfs://172.21.16.47:4007/path/to/db/');

You can view the location information of the Database using the SHOW CREATE DATABASE command:

mysql> SHOW CREATE DATABASE hive_db;
+----------+---------------------------------------------------------------------------------------------+
| Database | Create Database                                                                             |
+----------+---------------------------------------------------------------------------------------------+
| hive_db  | CREATE DATABASE hive_db LOCATION 'hdfs://172.21.16.47:4007/usr/hive/warehouse/hive_db.db'   |
+----------+---------------------------------------------------------------------------------------------+

To drop a database:

DROP DATABASE [IF EXISTS] hive_ctl.hive_db;

注意

For a Hive Database, you must first delete all tables under that Database before you can delete the Database itself; otherwise, an error will occur. This operation will also delete the corresponding Database in Hive.

Creating and Dropping Tables

Creating Tables

Doris supports creating both partitioned and non-partitioned tables in Hive.

-- パーティション化されていないhiveTableを作成 CREATE TABLE unpartitioned_table ( col1 BOOLEAN COMMENT 'col1', col2 INT COMMENT 'col2', col3 BIGINT COMMENT 'col3', col4 CHAR(10) COMMENT 'col4', col5 FLOAT COMMENT 'col5', col6 DOUBLE COMMENT 'col6', col7 DECIMAL(9,4) COMMENT 'col7', col8 VARCHAR(11) COMMENT 'col8', col9 STRING COMMENT 'col9' ) ENGINE=hive PROPERTIES ( 'file_format'='parquet' );

-- パーティション化されたhiveTableを作成 -- パーティションカラムはTableのカラム定義リストに含まれている必要があります CREATE TABLE partition_table ( col1 BOOLEAN COMMENT 'col1', col2 INT COMMENT 'col2', col3 BIGINT COMMENT 'col3', col4 DECIMAL(2,1) COMMENT 'col4', pt1 VARCHAR COMMENT 'pt1', pt2 VARCHAR COMMENT 'pt2' ) ENGINE=hive PARTITION BY LIST (pt1, pt2) () PROPERTIES ( 'file_format'='orc', 'compression'='zlib' );

-- textフォーマットTableを作成（2.1.7 & 3.0.3以降） CREATE TABLE text_table ( id INT, name STRING ) PROPERTIES ( 'file_format'='text', 'compression'='gzip', 'field.delim'='\t', 'line.delim'='\n', 'collection.delim'=';', 'mapkey.delim'=':', 'serialization.null.format'='\N', 'escape.delim'='\' );

After creating a table, you can view the Hive table creation statement using the `SHOW CREATE TABLE` command.

Note that unlike Hive's table creation syntax, when creating a partitioned table in Doris, partition columns must be included in the table schema. Additionally, partition columns must be placed at the end of the schema and maintain the same order.

:::tip
For Hive clusters where ACID transaction features are enabled by default, tables created by Doris will have the `transactional` property set to `true`. Since Doris only supports certain features of Hive transactional tables, this may lead to issues where Doris cannot read the Hive tables it creates. To avoid this, explicitly set `"transactional" = "false"` in the table properties to create non-transactional Hive tables:

```sql
CREATE TABLE non_acid_table(
  `col1` BOOLEAN COMMENT 'col1',
  `col2` INT COMMENT 'col2',
  `col3` BIGINT COMMENT 'col3'
) ENGINE=hive
PROPERTIES (
  'transactional'='false'
);

:::

Dropping Tables

You can delete a Hive table using the DROP TABLE statement. When a table is deleted, all data, including partition data, is also removed.
Column タイプ Mapping

Refer to the [Column タイプ Mapping] section for details. Note the following restrictions:
- Columns must be of the default nullable type; NOT NULL is not supported.
- Hive 3.0 supports setting default values. To set default values, explicitly add "hive.version" = "3.0.0" in the catalog properties.
- If inserted data types are incompatible (e.g., inserting 'abc' into a numeric type), the value will be converted to null.
Partitioning

In Hive, partition types correspond to List partitions in Doris. Therefore, when creating a Hive partitioned table in Doris, use the List partition syntax, but there is no need to explicitly enumerate each partition. Doris will automatically create the corresponding Hive partition based on data values during data insertion. Single-column or multi-column partitioned tables are supported.
File Formats
- ORC (default)
- Parquet
  
  Note that when the DATETIME type is written to a Parquet file, the physical type used is INT96 instead of INT64. This is to be compatible with the logic of Hive versions prior to 4.0.
- Text (supported from versions 2.1.7 and 3.0.3)
  
  Text format supports the following table properties:
  - field.delim: Column delimiter. Default is \1.
  - line.delim: Line delimiter. Default is \n.
  - collection.delim: Delimiter for elements in complex types. Default is \2.
  - mapkey.delim: Delimiter for map key-value pairs. Default is \3.
  - serialization.null.format: Format for storing NULL values. Default is \N.
  - escape.delim: Escape character. Default is \.
Compression Formats
- Parquet: snappy (default), zstd, plain (no compression)
- ORC: snappy, zlib (default), zstd, plain (no compression)
- Text: gzip, deflate, bzip2, zstd, lz4, lzo, snappy, plain (default, no compression)
Storage Medium
- HDFS
- Object Storage

Subscribing to Hive Metastore Events

By having the FE nodes periodically read Notification Events from the HMS, Doris can detect real-time changes in Hive table metadata, improving metadata timeliness. Currently, the following events are supported:

Event	Action and Corresponding Behavior
CREATE DATABASE	Creates a database in the corresponding data directory.
DROP DATABASE	Deletes a database in the corresponding data directory.
ALTER DATABASE	Mainly affects changes to database properties, comments, and default storage locations. These changes do not affect Doris's ability to query external data directories, so this event is currently ignored.
CREATE TABLE	Creates a table in the corresponding database.
DROP TABLE	Deletes a table in the corresponding database and invalidates the table cache.
ALTER TABLE	If renamed, deletes the old table and creates a new one with the new name; otherwise, invalidates the table cache.
ADD PARTITION	Adds a partition to the cached partition list of the corresponding table.
DROP PARTITION	Removes a partition from the cached partition list and invalidates the partition cache.
ALTER PARTITION	If renamed, deletes the old partition and creates a new one with the new name; otherwise, invalidates the partition cache.

ヒント

When data import causes file changes, partitioned tables trigger an ALTER PARTITION event, while non-partitioned tables trigger an ALTER TABLE event.
If you bypass HMS and directly manipulate the file system, HMS will not generate corresponding events, and Doris will not detect metadata changes.

The following parameters in fe.conf are related to this feature:

enable_hms_events_incremental_sync: Enables automatic incremental metadata synchronization. Disabled by default.
hms_events_polling_interval_ms: Interval for reading events, default is 10000 milliseconds.
hms_events_batch_size_per_rpc: Maximum number of events to read per RPC, default is 500.

To use this feature (excluding Huawei Cloud MRS), you need to modify the hive-site.xml of HMS and restart both HMS and HiveServer2.

<property>
    <name>hive.metastore.event.db.notification.api.auth</name>
    <value>false</value>
</property>
<property>
    <name>hive.metastore.dml.events</name>
    <value>true</value>
</property>
<property>
    <name>hive.metastore.transactional.event.listeners</name>
    <value>org.apache.hive.hcatalog.listener.DbNotificationListener</value>
</property>

For Huawei Cloud MRS, you need to modify the hivemetastore-site.xml and restart both HMS and HiveServer2.

<property>
    <name>metastore.transactional.event.listeners</name>
    <value>org.apache.hive.hcatalog.listener.DbNotificationListener</value>
</property>

付録

取引メカニズム

Write 運用 to Hive are placed in a separate transaction. Before the transaction is committed, the data is not visible externally. Only after the transaction is committed do the related table 運用 become visible to others.

Transactions ensure the atomicity of 運用, meaning all 運用 within a transaction either succeed together or fail together.

Transactions cannot fully guarantee isolation of 運用, but they strive to minimize inconsistencies by separating file system 運用 from Hive Metastore metadata 運用.

For example, in a transaction that requires modifying multiple partitions of a Hive table, if the task is divided into two batches, the first batch might be visible externally before the second batch is completed. This means the first batch of partitions can be read, but the second batch cannot.

If any exceptions occur during the transaction commit process, the transaction will be rolled back completely, including modifications to HDFS files and Hive Metastore metadata, without requiring any additional user intervention.

並行書き込みメカニズム

Apache Doris currently supports concurrent writing using multiple insert statements. However, users need to ensure that concurrent writes do not result in potential conflicts.

Since regular non-transactional Hive tables lack a complete 取引メカニズム, the Apache Doris 取引メカニズム aims to minimize inconsistency windows but cannot guarantee true ACID properties. Therefore, concurrent writes to Hive tables in Apache Doris may lead to data consistency issues.

Concurrent INSERT 運用
INSERT 運用 append data and do not conflict when executed concurrently, producing the expected results.
Concurrent INSERT OVERWRITE 運用
Concurrent INSERT OVERWRITE 運用 on the same table or partition may lead to data loss or corruption, resulting in unpredictable outcomes.
Common solutions include:
- For partitioned tables, write data to different partitions. Concurrent 運用 on different partitions do not conflict.
- For non-partitioned tables, use INSERT instead of INSERT OVERWRITE to avoid conflicts.
- For 運用 that may conflict, ensure that only one write operation occurs at a time on the business side.

HDFS File 運用

For Hive table data on HDFS, data is typically first written to a temporary directory, then finalized using file system 運用 like rename. Below is a detailed explanation of the specific file 運用 on HDFS for different data 運用.

The temporary directory format for data is: /tmp/.doris_staging/<username>/<uuid>

The format for the written data file names is: <query-id>_<uuid>-<index>.<compress-type>.<file-type>

Here are examples of file 運用 under various scenarios:

Non-Partitioned Table
- Append (Add Data)
  - Target table directory: hdfs://ns/usr/hive/warehouse/example.db/table1
  - Temporary file: hdfs://ns/tmp/.doris_staging/root/f02247cb662846038baae272af5eeb05/b35fdbcea3a4e39-86d1f36987ef1492_7e3985bf-9de9-4fc7-b84e-adf11aa08756-0.orc
  - During the commit phase, all temporary files are moved to the target table directory.
- Overwrite (Replace Data)
  - Target table directory: hdfs://ns/usr/hive/warehouse/example.db/table1
  - Temporary file: hdfs://ns/tmp/.doris_staging/root/f02247cb662846038baae272af5eeb05/b35fdbcea3a4e39-86d1f36987ef1492_7e3985bf-9de9-4fc7-b84e-adf11aa08756-0.orc
  - Commit phase steps:
    1. Rename the target table directory to a temporary directory: hdfs://ns/usr/hive/warehouse/example.db/_temp_b35fdbcea3a4e39-86d1f36987ef1492_table1
    2. Rename the temporary directory to the target table directory.
    3. Delete the temporary target table directory.
Partitioned Table
- Add (Add to New パーティション)
  - Target table directory: hdfs://ns/usr/hive/warehouse/example.db/table2/part_col=2024-01-01
  - Temporary file: hdfs://ns/tmp/.doris_staging/root/a7eac7505d7a42fdb06cb9ef1ea3e912/par1=a/d678a74d232345e0-b659e2fb58e86ffd_549ad677-ee75-4fa1-b8a6-3e821e1dae61-0.orc
  - During the commit phase, the temporary directory is renamed to the target table directory.
- Append (Add Data to Existing パーティション)
  - Target table directory: hdfs://ns/usr/hive/warehouse/example.db/table2/part_col=2024-01-01
  - Temporary file: hdfs://ns/tmp/.doris_staging/root/a7eac7505d7a42fdb06cb9ef1ea3e912/par1=a/d678a74d232345e0-b659e2fb58e86ffd_549ad677-ee75-4fa1-b8a6-3e821e1dae61-0.orc
  - During the commit phase, files from the temporary directory are moved to the target table directory.
- Overwrite (Replace Existing パーティション)
  - Target table directory: hdfs://ns/usr/hive/warehouse/example.db/table2/part_col=2024-01-01
  - Temporary file: hdfs://ns/tmp/.doris_staging/root/a7eac7505d7a42fdb06cb9ef1ea3e912/par1=a/d678a74d232345e0-b659e2fb58e86ffd_549ad677-ee75-4fa1-b8a6-3e821e1dae61-0.orc
  - Commit phase steps:
    1. Rename the target partition directory to a temporary partition directory: hdfs://ns/usr/hive/warehouse/example.db/table2/_temp_d678a74d232345e0-b659e2fb58e86ffd_part_col=2024-01-01
    2. Rename the temporary partition directory to the target partition directory.
    3. Delete the temporary target partition directory.

Change ログ

Doris Version	Feature Support
2.1.6	Support for writing back to Hive tables
3.0.4	Support for Hive tables in JsonSerDe format. Support for transactional tables in Hive4.

適用シナリオ​

カタログの設定​

構文​

サポートされているHiveバージョン​

サポートされているメタデータサービス​

サポートされているストレージシステム​

サポートされているデータフォーマット​

列タイプマッピング​

例​

Hive Metastore​

AWS Glue​

Aliyun DLF​

Query 運用​

Basic Query​

Querying Hive Partitions​

Querying Hive Transactional Tables​

Querying Hive Views​

Write 運用​

INSERT INTO​

INSERT OVERWRITE​

CTAS​

Related パラメータ​

Database and Table Management​

Creating and Dropping Databases​

Creating and Dropping Tables​

Subscribing to Hive Metastore Events​

付録​

取引メカニズム​

並行書き込みメカニズム​

HDFS File 運用​

Change ログ​

適用シナリオ

カタログの設定

構文

サポートされているHiveバージョン

サポートされているメタデータサービス

サポートされているストレージシステム

サポートされているデータフォーマット

列タイプマッピング

例

Hive Metastore

AWS Glue

Aliyun DLF

Query 運用

Basic Query

Querying Hive Partitions

Querying Hive Transactional Tables

Querying Hive Views

Write 運用

INSERT INTO

INSERT OVERWRITE

CTAS

Related パラメータ

Database and Table Management

Creating and Dropping Databases

Creating and Dropping Tables

Subscribing to Hive Metastore Events

付録

取引メカニズム

並行書き込みメカニズム

HDFS File 運用

Change ログ