Skip to main content
VeloDB Cloud 4.x·Apache Doris 3.x·"Since X.Y" tags refer to Doris versionsversion mapping →

VARIANT

VARIANT

Overview

The VARIANT type stores semi-structured JSON data. It can contain different primitive types (integers, strings, booleans, etc.), one-dimensional arrays, and nested objects. On write, Doris infers the structure and type of sub-paths based on JSON paths and performs Subcolumnization on frequent paths, exposing them as independent columnar subcolumns for both flexibility and performance.

Why choose VARIANT

VARIANT is a good fit when fields change over time but queries still depend on a small set of hot paths.

  • Hot paths participate in Subcolumnization, so they benefit from columnar performance, file pruning, and vectorized execution.
  • Key paths can use path-level indexes, full-text search, and still benefit from Doris sparse-index pruning.
  • In Doris 3.1 and later, wide-column optimizations keep automatic Subcolumnization practical at 10k-scale subcolumns. Note that DOC mode, the recommended storage layout for 10k-scale wide columns, is available starting from Doris 4.x; for 3.x, see the Workload Guide for tuning guidance.
Before you configure a workload

If this is your first VARIANT workload on Doris 3.x, start with VARIANT Workload Guide. This page is the reference for syntax, type rules, indexes, limits, and configuration.

Using VARIANT

Create table syntax

Declare a VARIANT column when creating a table:

CREATE TABLE IF NOT EXISTS ${table_name} (
k BIGINT,
v VARIANT
)
PROPERTIES("replication_num" = "1");

Constrain certain paths with a Schema Template (see “Extended types”):

This feature has been supported since version 3.1.0.

CREATE TABLE IF NOT EXISTS ${table_name} (
k BIGINT,
v VARIANT <
'id' : INT, -- restrict path id to INT
'message*' : STRING, -- restrict message* prefix to STRING
'tags*' : ARRAY<TEXT> -- restrict tags* prefix to ARRAY<TEXT>
>
)
PROPERTIES("replication_num" = "1");

Query syntax

-- Access nested fields (returns VARIANT; explicit or implicit CAST is required for aggregation/comparison)
SELECT v['properties']['title'] FROM ${table_name};

-- CAST to a concrete type before aggregation
SELECT CAST(v['properties']['title'] AS STRING) AS title
FROM ${table_name}
GROUP BY title;

-- Query arrays
SELECT *
FROM ${table_name}
WHERE ARRAY_CONTAINS(CAST(v['tags'] AS ARRAY<TEXT>), 'Doris');

Primitive types

VARIANT infers subcolumn types automatically. Supported types include:

Supported types
TinyInt
NULL (equivalent to JSON null)
BigInt (64 bit)
Double
String (Text)
Jsonb
Variant (nested object)
Array<T> (one-dimensional only)

Simple INSERT example:

INSERT INTO vartab VALUES
(1, 'null'),
(2, NULL),
(3, 'true'),
(4, '-17'),
(5, '123.12'),
(6, '1.912'),
(7, '"A quote"'),
(8, '[-1, 12, false]'),
(9, '{ "x": "abc", "y": false, "z": 10 }'),
(10, '"2021-01-01"');

Tip: Non-standard JSON types such as date/time will be stored as strings unless a Schema Template is provided. For better computation efficiency, consider extracting them to static columns or declaring their types via a Schema Template.

Extended types (Schema Template)

This feature has been supported since version 3.1.0.

Besides primitive types, VARIANT supports the following extended types via Schema Template:

  • Number (extended)
    • Decimal: Decimal32 / Decimal64 / Decimal128 / Decimal256
    • LargeInt
  • Datetime
  • Date
  • IPV4 / IPV6
  • Boolean
  • ARRAY<T> (T can be any of the above, one-dimensional only)

Note: Predefined Schema can only be specified at table creation. ALTER is currently not supported (future versions may support adding new subcolumn definitions, but changing an existing subcolumn type is not supported).

Example:

CREATE TABLE test_var_schema (
id BIGINT NOT NULL,
v1 VARIANT<
'large_int_val': LARGEINT,
'string_val': STRING,
'decimal_val': DECIMAL(38, 9),
'datetime_val': DATETIME,
'ip_val': IPV4
> NULL
)
PROPERTIES ("replication_num" = "1");

INSERT INTO test_var_schema VALUES (1, '{
"large_int_val" : "123222222222222222222222",
"string_val" : "Hello World",
"decimal_val" : 1.11111111,
"datetime_val" : "2025-05-16 11:11:11",
"ip_val" : "127.0.0.1"
}');

SELECT variant_type(v1) FROM test_var_schema;

+----------------------------------------------------------------------------------------------------------------------------+
| variant_type(v1) |
+----------------------------------------------------------------------------------------------------------------------------+
| {"datetime_val":"datetimev2","decimal_val":"decimal128i","ip_val":"ipv4","large_int_val":"largeint","string_val":"string"} |
+----------------------------------------------------------------------------------------------------------------------------+

{"date": 2020-01-01} and {"ip": 127.0.0.1} are invalid JSON texts; the correct format is {"date": "2020-01-01"} and {"ip": "127.0.0.1"}.

Once a Schema Template is specified, if a JSON value conflicts with the declared type and cannot be converted, it will be stored as NULL. For example:

INSERT INTO test_var_schema VALUES (1, '{
"decimal_val" : "1.11111111",
"ip_val" : "127.xxxxxx.xxxx",
"large_int_val" : "aaabbccc"
}');

-- Only decimal_val remains
SELECT * FROM test_var_schema;

+------+-----------------------------+
| id | v1 |
+------+-----------------------------+
| 1 | {"decimal_val":1.111111110} |
+------+-----------------------------+

Schema only guides the persisted storage type. During query execution, the effective type depends on actual data at runtime:

-- At runtime v['a'] may still be STRING
SELECT variant_type(CAST('{"a" : "12345"}' AS VARIANT<'a' : INT>)['a']);

Wildcard matching and order:

CREATE TABLE test_var_schema (
id BIGINT NOT NULL,
v1 VARIANT<
'enumString*' : STRING,
'enum*' : ARRAY<TEXT>,
'ip*' : IPV6
> NULL
)
PROPERTIES ("replication_num" = "1");

-- If enumString1 matches both patterns, the first matching pattern in definition order (STRING) is used

If a column name contains * and you want to match it by its literal name (not as a prefix wildcard), use:

v1 VARIANT<
MATCH_NAME 'enumString*' : STRING
> NULL

Matched subpaths participate in Subcolumnization by default and are exposed as columns. If too many paths match and generate excessive columns, consider enabling variant_enable_typed_paths_to_sparse (see “Configuration”).

Type conflicts and promotion rules

When incompatible types appear on the same path (e.g., the same field shows up as both integer and string), the type is promoted to JSONB to avoid information loss:

{"a" : 12345678}
{"a" : "HelloWorld"}
-- a will be promoted to JSONB

Promotion rules:

Source typeCurrent typeFinal type
TinyIntBigIntBigInt
TinyIntDoubleDouble
TinyIntStringJSONB
TinyIntArrayJSONB
BigIntDoubleJSONB
BigIntStringJSONB
BigIntArrayJSONB
DoubleStringJSONB
DoubleArrayJSONB
Array<Double>Array<String>Array<Jsonb>

If you need strict types (for stable indexing and storage), declare them via Schema Template.

Variant indexes

Choosing indexes

VARIANT supports BloomFilter and Inverted Index on subpaths.

  • High-cardinality equality/IN filters: prefer BloomFilter (sparser index, better write performance).
  • Tokenization/phrase/range search: use Inverted Index and set proper parser/analyzer properties.
...  
PROPERTIES("replication_num" = "1", "bloom_filter_columns" = "v");

-- Use BloomFilter for equality/IN filters
SELECT * FROM tbl WHERE v['id'] = 12345678;
SELECT * FROM tbl WHERE v['id'] IN (1, 2, 3);

Once an inverted index is created on a VARIANT column, all subpaths inherit the same index properties (e.g., parser):

CREATE TABLE IF NOT EXISTS tbl (
k BIGINT,
v VARIANT,
INDEX idx_v(v) USING INVERTED PROPERTIES("parser" = "english")
);

-- All subpaths inherit the english parser
SELECT * FROM tbl WHERE v['id_1'] MATCH 'Doris';
SELECT * FROM tbl WHERE v['id_2'] MATCH 'Apache';

Index by subpath

This feature has been supported since version 3.1.0.

In 3.1.x/4.0 and later, you can specify index properties for certain VARIANT subpaths, and even configure both tokenized and non-tokenized inverted indexes for the same path. Path-specific indexes require the path type to be declared via Schema Template.

-- Common properties: field_pattern (target path), analyzer, parser, support_phrase, etc.
CREATE TABLE IF NOT EXISTS tbl (
k BIGINT,
v VARIANT<'content' : STRING>,
INDEX idx_tokenized(v) USING INVERTED PROPERTIES("parser" = "english", "field_pattern" = "content"),
INDEX idx_v(v) USING INVERTED PROPERTIES("field_pattern" = "content")
);

-- v.content has both tokenized and non-tokenized inverted indexes
SELECT * FROM tbl WHERE v['content'] MATCH 'Doris';
SELECT * FROM tbl WHERE v['content'] = 'Doris';

Wildcard path indexing:

CREATE TABLE IF NOT EXISTS tbl (
k BIGINT,
v VARIANT<'pattern_*' : STRING>,
INDEX idx_tokenized(v) USING INVERTED PROPERTIES("parser" = "english", "field_pattern" = "pattern_*"),
INDEX idx_v(v) USING INVERTED -- global non-tokenized inverted index
);

SELECT * FROM tbl WHERE v['pattern_1'] MATCH 'Doris';
SELECT * FROM tbl WHERE v['pattern_1'] = 'Doris';

Note: 2.1.7+ supports only InvertedIndex V2 properties (fewer files, lower write IOPS; suitable for disaggregated storage/compute). 2.1.8+ removes offline Build Index.

When indexes don’t work

  1. Type changes cause index loss: if a subpath changes to an incompatible type (e.g., INT → JSONB), the index is lost. Fix by pinning types and indexes via Schema Template.
  2. Query type mismatch:
    -- v['id'] is actually STRING; using INT equality causes index not to be used
    SELECT * FROM tbl WHERE v['id'] = 123456;
  3. Misconfigured index: indexes apply to subpaths, not the entire VARIANT column.
    -- VARIANT itself cannot be indexed as a whole
    SELECT * FROM tbl WHERE v MATCH 'Doris';

    -- If whole-JSON search is needed, store a duplicate STRING column and index it
    CREATE TABLE IF NOT EXISTS tbl (
    k BIGINT,
    v VARIANT,
    v_str STRING,
    INDEX idx_v_str(v_str) USING INVERTED PROPERTIES("parser" = "english")
    );
    SELECT * FROM tbl WHERE v_str MATCH 'Doris';

Variant Nested Group

Variant Nested Group is a 4.1.x storage and search capability for array<object> values inside VARIANT, including top-level arrays of objects. When enabled, Doris organizes nested object subpaths by array element, which is useful for logs, traces, event details, and other records that contain arrays of objects.

The key query semantics are same-element matching. If a row contains multiple items elements, msg:hello AND title:news matches only when both predicates are true within the same array element, avoiding accidental matches across different elements.

Enable Nested Group

For new tables, explicitly enable Nested Group on the VARIANT column that needs it:

CREATE TABLE variant_nested_group_tbl (
id INT,
data VARIANT<PROPERTIES("variant_enable_nested_group" = "true")>,
INDEX idx_data (data) USING INVERTED PROPERTIES("parser" = "english")
) ENGINE=OLAP
DUPLICATE KEY(id)
DISTRIBUTED BY HASH(id) BUCKETS 1
PROPERTIES (
"replication_num" = "1"
);

You can also set the session default before creating a table:

SET default_variant_enable_nested_group = true;

If both the column property and the session variable are set, the column property takes precedence. This property can only be specified during table creation and cannot be changed with ALTER.

Write and read nested arrays

Nested Group supports array paths inside objects and top-level arrays:

INSERT INTO variant_nested_group_tbl VALUES
(1, '{"items":[{"msg":"hello","title":"news"},{"msg":"foo","title":"bar"}]}'),
(2, '{"items":[{"msg":"hello"},{"title":"news"}]}'),
(3, '[{"msg":"hello","title":"news"}]'),
(4, NULL);

-- Read an array path inside an object
SELECT id, CAST(data['items'] AS STRING)
FROM variant_nested_group_tbl
ORDER BY id;

-- A top-level array is still read through the VARIANT column itself
SELECT id, CAST(data AS STRING)
FROM variant_nested_group_tbl
WHERE id = 3;

After creating an inverted index on the VARIANT column, use NESTED(path, predicate) in the SEARCH DSL to combine predicates within the same nested array element:

-- Same-element match under object field items. This matches only row 1.
SELECT id
FROM variant_nested_group_tbl
WHERE SEARCH('NESTED(data.items, msg:hello AND title:news)')
ORDER BY id;

-- Same-element match on a top-level array. This matches only row 3.
SELECT id
FROM variant_nested_group_tbl
WHERE SEARCH('NESTED(data, msg:hello AND title:news)')
ORDER BY id;

-- Single-field match
SELECT id
FROM variant_nested_group_tbl
WHERE SEARCH('NESTED(data.items, msg:hello)')
ORDER BY id;

Fields inside NESTED must be relative to the nested path. For example, use NESTED(data.items, msg:hello), not NESTED(data.items, data.items.msg:hello). Inner predicates support regular SEARCH DSL terms, phrases, ANY/ALL, and AND/OR/NOT, but nested NESTED() calls are not supported.

Complete top-level array example

If the VARIANT column itself is a top-level array of objects, use the column name directly as the NESTED path. In the example below, row 2 has msg=hello and title=news in different array elements, so the same-element AND query does not match row 2:

CREATE TABLE variant_top_array_tbl (
id INT,
data VARIANT<PROPERTIES("variant_enable_nested_group" = "true")>,
INDEX idx_data (data) USING INVERTED PROPERTIES("parser" = "english")
) ENGINE=OLAP
DUPLICATE KEY(id)
DISTRIBUTED BY HASH(id) BUCKETS 1
PROPERTIES (
"replication_num" = "1"
);

INSERT INTO variant_top_array_tbl VALUES
(1, '[{"msg":"hello","title":"news"},{"msg":"foo","title":"bar"}]'),
(2, '[{"msg":"hello"},{"title":"news"}]'),
(3, '[{"msg":"hello","title":"sports"}]');

-- Read the top-level array
SELECT id, CAST(data AS STRING)
FROM variant_top_array_tbl
ORDER BY id;

-- Same-element match for msg=hello and title=news. This matches only id=1.
SELECT id
FROM variant_top_array_tbl
WHERE SEARCH('NESTED(data, msg:hello AND title:news)')
ORDER BY id;

-- Single-field match for msg=hello. This matches id=1, 2, and 3.
SELECT id
FROM variant_top_array_tbl
WHERE SEARCH('NESTED(data, msg:hello)')
ORDER BY id;

Expand arrays with explode

To expand array elements into rows for projection or filtering, use LATERAL VIEW explode. In a Nested Group table, non-top-level and top-level arrays differ only in the expression passed to explode:

  • Non-top-level array: pass the array path, such as data['items']
  • Top-level array: pass the VARIANT column itself, such as data
-- Non-top-level array: expand elements under object field items
SELECT
id,
CAST(e['msg'] AS STRING) AS msg,
CAST(e['title'] AS STRING) AS title
FROM variant_nested_group_tbl
LATERAL VIEW explode(data['items']) tmp AS e
WHERE CAST(e['msg'] AS STRING) = 'hello'
ORDER BY id, title;

-- Top-level array: expand the VARIANT column itself
SELECT
id,
CAST(e['msg'] AS STRING) AS msg,
CAST(e['title'] AS STRING) AS title
FROM variant_top_array_tbl
LATERAL VIEW explode(data) tmp AS e
WHERE CAST(e['msg'] AS STRING) = 'hello'
ORDER BY id, title;

explode returns each array element as a row, and each element can still be accessed through VARIANT subpaths. Use SEARCH('NESTED(...)') for indexed same-element matching; use LATERAL VIEW explode(...) when you need to flatten elements into rows before further computation or filtering.

Compatibility and limits

  • variant_enable_nested_group = true is mutually exclusive with variant_enable_doc_mode = true.
  • variant_enable_nested_group = true is mutually exclusive with variant_max_subcolumns_count > 0; after enabling Nested Group, the column no longer uses sparse-column properties.
  • The legacy variant_enable_flatten_nested switch is deprecated and is not the same feature as variant_enable_nested_group. For 4.1.x new tables, use variant_enable_nested_group.
  • The BE config variant_nested_group_max_depth controls the maximum nested-array depth tracked by Nested Group. The default is 10; deeper paths are stored as JSONB.
  • Keep the data shape stable for the same path. If the same path is written as both a scalar and an array<object>, conflict handling is triggered, and the nested array structure takes precedence by default.

INSERT and load

INSERT INTO VALUES

CREATE TABLE IF NOT EXISTS variant_tbl (
k BIGINT,
v VARIANT
) PROPERTIES("replication_num" = "1");

INSERT INTO variant_tbl VALUES (1, '{"a" : 123}');

select * from variant_tbl;
+------+-----------+
| k | v |
+------+-----------+
| 1 | {"a":123} |
+------+-----------+

-- v['a'] is a VARIANT
select v['a'] from variant_tbl;
+--------+
| v['a'] |
+--------+
| 123 |
+--------+

-- Accessing a non-existent key returns NULL
select v['a']['no_such_key'] from variant_tbl;;
+-----------------------+
| v['a']['no_such_key'] |
+-----------------------+
| NULL |
+-----------------------+

Load (Stream Load)

# Line-delimited JSON (one JSON record per line)
curl --location-trusted -u root: -T gh_2022-11-07-3.json \
-H "read_json_by_line:true" -H "format:json" \
http://127.0.0.1:8030/api/test_variant/github_events/_stream_load

See also: https://doris.apache.org/docs/dev/data-operate/import/complex-types/variant

After loading, verify with SELECT count(*) or sample with SELECT * ... LIMIT 1. For high-throughput ingestion, prefer RANDOM bucketing and enable Group Commit.

Output

The JSON text returned when reading a VARIANT column is not byte-for-byte identical to the JSON text that was written in: inside a JSON object, keys are emitted in sorted (lexicographic) order regardless of the order they appeared in the input JSON.

INSERT INTO variant_tbl VALUES
(2, '{ "b": 2, "a": 1, "c": { "y": 20, "x": 10 } }');

SELECT v FROM variant_tbl WHERE k = 2;
+-----------------------------------+
| v |
+-----------------------------------+
| {"a":1,"b":2,"c":{"x":10,"y":20}} |
+-----------------------------------+

Sorting applies at every level — top-level keys become a, b, c, and the nested object's keys become x, y.

Supported operations and CAST rules

  • VARIANT cannot be compared/operated directly with other types; comparisons between two VARIANTs are not supported either.
  • For comparison, filtering, aggregation, and ordering, CAST subpaths to concrete types (explicitly or implicitly).
-- Explicit CAST
SELECT CAST(v['arr'] AS ARRAY<TEXT>) FROM tbl;
SELECT * FROM tbl WHERE CAST(v['decimal'] AS DECIMAL(27, 9)) = 1.111111111;
SELECT * FROM tbl WHERE CAST(v['date'] AS DATE) = '2021-01-02';

-- Implicit CAST
SELECT * FROM tbl WHERE v['bool'];
SELECT * FROM tbl WHERE v['str'] MATCH 'Doris';
  • VARIANT itself cannot be used directly in ORDER BY, GROUP BY, as a JOIN KEY, or as an aggregate argument; CAST subpaths instead.
  • Strings can be implicitly converted to VARIANT.
VARIANTCastableCoercible
ARRAY
BOOLEAN
DATE/DATETIME
FLOAT
IPV4/IPV6
DECIMAL
MAP
TIMESTAMP
VARCHAR
JSON

Limitations

  • variant_enable_nested_group = true is mutually exclusive with variant_enable_doc_mode = true and variant_max_subcolumns_count > 0. It can only be specified at table creation.
  • Fields inside NESTED() must be relative to the nested path, and nested NESTED() calls are not supported.
  • variant_max_subcolumns_count (3.1+): default 0 (no limit). In production, set to 2048 (tablet level) to control the number of paths that go through Subcolumnization. Above the threshold, low-frequency/sparse paths are moved to a shared data structure; reading from it may be slower (see “Configuration”).
  • If a path type is specified via Schema Template (3.1+), that path will be forced into Subcolumnization; when variant_enable_typed_paths_to_sparse = true, it also counts toward the threshold and may be moved to the shared structure.
  • JSON key length ≤ 255.
  • Cannot be a primary key or sort key.
  • Cannot be nested within other types (e.g., Array<Variant>, Struct<Variant>).
  • Reading the entire VARIANT column scans all subpaths. If a column has many subpaths, consider storing the original JSON string in an extra STRING/JSONB column for whole-object searches like LIKE:
CREATE TABLE example_table (
id INT,
data_variant VARIANT
);
SELECT * FROM example_table WHERE data_variant LIKE '%doris%';

-- Better: keep the original JSON string for whole-object matching
CREATE TABLE example_table (
id INT,
data_string STRING,
data_variant VARIANT
);
SELECT * FROM example_table WHERE data_string LIKE '%doris%';

Configuration

Since version 3.1.0, all column-level VARIANT properties below are supported. They can only be specified at table creation and cannot be modified using ALTER.

CREATE TABLE example_table (
id INT,
data_variant VARIANT<
'path_1' : INT,
'path_2' : STRING,
properties(
'variant_max_subcolumns_count' = '2048',
'variant_enable_typed_paths_to_sparse' = 'true'
)
>
);
PropertyDescription
`variant_max_subcolumns_count`Max number of paths that can go through Subcolumnization. Above the threshold, new paths may be stored in a shared data structure. Default: 2048 (Recommended). 0 means no limit; do not exceed 10000.
`variant_enable_typed_paths_to_sparse`By default, typed paths always participate in Subcolumnization (and do not count against `variant_max_subcolumns_count`). When set to `true`, typed paths also count toward the threshold and may be moved to the shared structure.
CREATE TABLE example_nested_group_table (
id INT,
data_variant VARIANT<
properties(
'variant_enable_nested_group' = 'true'
)
>
);
Nested Group propertyDescription
`variant_enable_nested_group`Enables Variant Nested Group. When set to `true`, `array<object>` values inside `VARIANT`, including top-level arrays of objects, are stored by nested element and can be queried with `SEARCH('NESTED(...)')` for same-element matching. The default follows the session variable `default_variant_enable_nested_group`, whose current default is `false`. This property can only be specified at table creation.
`default_variant_enable_nested_group`Session variable that controls whether newly created `VARIANT` columns enable Nested Group by default when the column property is not specified.
`variant_nested_group_max_depth`BE configuration that controls the maximum nested-array depth tracked by Nested Group. The default is 10.

Behavior at limits and tuning suggestions (items 1-3 apply to 3.1+ only):

  1. After exceeding the threshold, new paths are written into the shared structure; Rowset merges may also recycle some paths into the shared structure.
  2. The system prefers to keep paths with higher non-null ratios and higher access frequencies in Subcolumnization.
  3. Close to 10,000 paths in Subcolumnization requires strong hardware (≥128G RAM, ≥32C per node recommended).
  4. Ingestion tuning: increase client batch_size appropriately, or use Group Commit (increase group_commit_interval_ms/group_commit_data_bytes as needed).
  5. If partition pruning is not needed, consider RANDOM bucketing and enabling single-tablet loading to reduce compaction write amplification.
  6. BE tuning knobs: max_cumu_compaction_threads (≥8), vertical_compaction_num_columns_per_group=500 (improves vertical compaction but increases memory), segment_cache_memory_percentage=20 (improves metadata cache efficiency).
  7. Watch Compaction Score; if it keeps rising, compaction is lagging—reduce ingestion pressure.
  8. Avoid large SELECT * on VARIANT; prefer specific projections like SELECT v['path'].

Note: If you see Stream Load error [DATA_QUALITY_ERROR]Reached max column size limit 2048 (only on 2.1.x and 3.0.x), it means the merged tablet schema reached its column limit. You may increase variant_max_merged_tablet_schema_size (not recommended beyond 4096; requires strong hardware).

Inspect number of columns and types

Solution 1 is supported starting from version 3.1.0. For earlier versions, it is recommended to use DESC.

Approach 1: use variant_type to inspect per-row schema (more precise, higher cost):

SELECT variant_type(v) FROM variant_tbl;

Approach 2: extended DESC to show subpaths extracted through Subcolumnization:

SET describe_extend_variant_column = true;
DESC variant_tbl;
DESCRIBE ${table_name} PARTITION ($partition_name);

Use both: Approach 1 is precise; Approach 2 is efficient.

Compared with JSON type

  • Storage: JSON is stored as JSONB (row-oriented). VARIANT uses Subcolumnization on write (higher compression, smaller size).
  • Query: JSON requires parsing. VARIANT scans columns directly and is usually much faster.

ClickBench (43 queries):

  • Storage: VARIANT saves ~65% vs JSON.
  • Query: VARIANT is 8x+ faster than JSON, close to predefined static columns.

Storage space

TypeSize
Predefined columns12.618 GB
VARIANT12.718 GB
JSON35.711 GB

~65% space savings

RunPredefinedVARIANTJSON
First (cold)233.79s248.66sMost timed out
Second (hot)86.02s94.82s789.24s
Third (hot)83.03s92.29s743.69s

FAQ

  1. Are null in VARIANT and SQL NULL different?
    • No. They are equivalent.
  2. Why doesn’t my query/index work?
    • Check whether you CAST paths to the correct types; whether the type was promoted to JSONB due to conflicts; or whether you mistakenly expect an index on the whole VARIANT instead of on subpaths.
  3. What is the difference between Variant Nested Group and the old variant_enable_flatten_nested switch?
    • variant_enable_flatten_nested is a legacy experimental switch and is not recommended for 4.1.x new tables. variant_enable_nested_group is the new storage and search capability for arrays of nested objects, and it supports same-element predicate matching with SEARCH('NESTED(...)').