メインコンテンツまでスキップ
バージョン: 26.x

LINEAR_HISTOGRAM

説明

LINEAR_HISTOGRAM関数はデータの分布を記述するために使用されます。この関数は「等幅」バケット戦略を使用し、値のサイズに応じてデータをバケットに分割します。

構文

LINEAR_HISTOGRAM(<expr>, DOUBLE <interval>[, DOUBLE <offset>])

パラメータ

パラメータ説明
intervalバケットの幅。サポートされる型: TinyInt, SmallInt, Int, BigInt, LargeInt, Float, Double, Decimal。
offsetオプション。デフォルトは0、範囲は [0, interval)、型Double をサポート。

戻り値

JSON型の計算された値を返します。

-- setup
create table histogram_test(
a int
) distributed by hash(a) buckets 1
properties ("replication_num"="1");
insert into histogram_test values
(0), (1), (2), (3), (4), (5), (6), (7), (8), (9), (10), (11), (null);
select linear_histogram(a, 2) from histogram_test;
+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| linear_histogram(a, 2) |
+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| {"num_buckets":6,"buckets":[{"lower":0.0,"upper":2.0,"count":2,"acc_count":2},{"lower":2.0,"upper":4.0,"count":2,"acc_count":4},{"lower":4.0,"upper":6.0,"count":2,"acc_count":6},{"lower":6.0,"upper":8.0,"count":2,"acc_count":8},{"lower":8.0,"upper":10.0,"count":2,"acc_count":10},{"lower":10.0,"upper":12.0,"count":2,"acc_count":12}]} |
+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
select linear_histogram(a, 2, 1) from histogram_test;
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| linear_histogram(a, 2, 1) |
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| {"num_buckets":7,"buckets":[{"lower":-1.0,"upper":1.0,"count":1,"acc_count":1},{"lower":1.0,"upper":3.0,"count":2,"acc_count":3},{"lower":3.0,"upper":5.0,"count":2,"acc_count":5},{"lower":5.0,"upper":7.0,"count":2,"acc_count":7},{"lower":7.0,"upper":9.0,"count":2,"acc_count":9},{"lower":9.0,"upper":11.0,"count":2,"acc_count":11},{"lower":11.0,"upper":13.0,"count":1,"acc_count":12}]} |
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
select linear_histogram(a, 2, 1) from histogram_test where a is null;
+--------------------------------+
| linear_histogram(a, 2, 1) |
+--------------------------------+
| {"num_buckets":0,"buckets":[]} |

フィールドの説明:

  • num_buckets:バケットの数。
  • buckets:ヒストグラム内のバケット。
    • lower:下限(含む)。
    • upper:上限(含まない)。
    • count:バケット内の要素数。
    • acc_count:このバケットまでの累積カウント。