IPV6
IPV6
description
IPv6 type, stored in UInt128 format in 16 bytes, used to represent IPv6 addresses. The range of values is ['::', 'ffff:ffff:ffff:ffff:ffff:ffff:ffff:ffff'].
Inputs that exceed the value range or have invalid format will return NULL
example
Create table example:
CREATE TABLE ipv6_test (
`id` int,
`ip_v6` ipv6
) ENGINE=OLAP
DISTRIBUTED BY HASH(`id`) BUCKETS 4
PROPERTIES (
"replication_allocation" = "tag.location.default: 1"
);
Insert data example:
insert into ipv6_test values(1, '::');
insert into ipv6_test values(2, '2001:16a0:2:200a::2');
insert into ipv6_test values(3, 'ffff:ffff:ffff:ffff:ffff:ffff:ffff:ffff');
insert into ipv6_test values(4, 'ffff:ffff:ffff:ffff:ffff:ffff:ffff:ffffg'); // invalid data
Select data example:
mysql> select * from ipv6_test order by id;
+------+-----------------------------------------+
| id | ip_v6 |
+------+-----------------------------------------+
| 1 | :: |
| 2 | 2001:16a0:2:200a::2 |
| 3 | ffff:ffff:ffff:ffff:ffff:ffff:ffff:ffff |
| 4 | NULL |
+------+-----------------------------------------+
Best Practice: store mixed IPv4/IPv6 in a single IPV6 column with an inverted index
A single IPV6 column can store both IPv4 and IPv6 addresses: write IPv4 in its IPv4-mapped IPv6 form (::ffff:x.y.z.w) — for example 192.168.1.1 is stored as ::ffff:192.168.1.1 — while native IPv6 (such as 2001:16a0:2:200a::2) is written as-is. This lets a mixed-IP field be managed in one column, and an inverted index on that column accelerates IP filtering, supporting equality (=), IN, and CIDR functions such as is_ip_address_in_range().
Schema
CREATE TABLE access_log (
`id` BIGINT NOT NULL,
`request_time` DATETIME NOT NULL,
`client_ip` IPV6 NULL COMMENT 'Stored uniformly as IPv6; IPv4 stored in ::ffff: form',
INDEX idx_client_ip (`client_ip`) USING INVERTED
) ENGINE=OLAP
DUPLICATE KEY(`id`, `request_time`)
DISTRIBUTED BY HASH(`id`) BUCKETS 4
PROPERTIES (
"replication_allocation" = "tag.location.default: 1"
);
Normalizing on load
Regardless of the load method, the key is to use the same expression in the column mapping to convert the raw IP string into IPv6 (below, ip_str is the source field and client_ip is the table's IPV6 column):
client_ip = COALESCE(
ipv4_to_ipv6(to_ipv4_or_null(ip_str)),
to_ipv6_or_null(ip_str)
)
- A plain IPv4 string is first parsed to
IPV4byto_ipv4_or_null(), then converted to::ffff:x.y.z.wbyipv4_to_ipv6(); - If it is not a valid IPv4,
COALESCEfalls back toto_ipv6_or_null()and parses it as native IPv6 text; - If both fail,
NULLis written. To reject invalid data instead, replace_or_nullwith the strictto_ipv4/to_ipv6and handle it via the load error threshold.
Routine Load
CREATE ROUTINE LOAD example_db.access_log_job ON access_log
COLUMNS TERMINATED BY ",",
COLUMNS(
id, request_time, ip_str,
client_ip = COALESCE(
ipv4_to_ipv6(to_ipv4_or_null(ip_str)),
to_ipv6_or_null(ip_str)
)
)
PROPERTIES ("desired_concurrent_number" = "3", "max_error_number" = "1000")
FROM KAFKA ("kafka_broker_list" = "broker:9092", "kafka_topic" = "ip_logs");
Stream Load
Apply the same conversion in the columns header:
curl --location-trusted -u user:passwd \
-H "format: csv" \
-H "column_separator: ," \
-H "columns: id, request_time, ip_str, client_ip=COALESCE(ipv4_to_ipv6(to_ipv4_or_null(ip_str)), to_ipv6_or_null(ip_str))" \
-T data.csv \
"http://<fe_host>:8030/api/example_db/access_log/_stream_load"
INSERT INTO (migrating from an old VARCHAR table)
Convert the raw IP string to IPv6 in the SELECT (ip_str is the VARCHAR column of the old table):
INSERT INTO access_log (id, request_time, client_ip)
SELECT
id, request_time,
COALESCE(
ipv4_to_ipv6(to_ipv4_or_null(ip_str)),
to_ipv6_or_null(ip_str)
)
FROM access_log_varchar;
Rewriting queries (important)
Once everything is stored as mapped IPv6, IPv4 query conditions must also be rewritten into mapped IPv6 form; otherwise a type / address-family mismatch causes the index to be skipped:
-
IPv6 CIDR: write it directly.
is_ip_address_in_range(client_ip, '2001:4860:4801::/48') -
IPv4 CIDR → mapped IPv6 CIDR: add the
::ffff:prefix to the address and add +96 to the prefix length (the first 96 bits of a mapped address are a fixed prefix).IPv4 CIDR mapped IPv6 CIDR 10.42.0.0/16::ffff:10.42.0.0/112192.178.4.0/24::ffff:192.178.4.0/1203.219.120.76/32::ffff:3.219.120.76/128is_ip_address_in_range(client_ip, '::ffff:10.42.0.0/112') -
Exact IPv4 match: compare against the mapped value too.
client_ip = ipv4_to_ipv6(to_ipv4('3.219.120.76'))
-- or
client_ip = to_ipv6('::ffff:3.219.120.76') -
Allowlist (keep only rows in the range): use
is_ip_address_in_range(...)directly as the filter.WHERE is_ip_address_in_range(client_ip, '::ffff:10.42.0.0/112') -
Blocklist (exclude rows in the range): use
NOT is_ip_address_in_range(...), and avoid theis_ip_address_in_range(...) = 0form (which is harder to push down to the index).WHERE NOT is_ip_address_in_range(client_ip, '::ffff:10.42.0.0/112')
Full example (a stat query that excludes several ranges). Because it contains CIDR range conditions, lower the threshold first to ensure the index is used:
SET inverted_index_skip_threshold = 0;
SELECT client_ip, COUNT(*) AS cnt
FROM access_log
WHERE request_time BETWEEN '2026-06-01 00:00:00' AND '2026-06-02 00:00:00'
AND client_ip NOT IN (to_ipv6('::ffff:3.219.120.76')) -- exact IPv4 exclusion -> mapped
AND NOT is_ip_address_in_range(client_ip, '::ffff:10.42.0.0/112') -- IPv4 /16 -> /112
AND NOT is_ip_address_in_range(client_ip, '2001:4860:4801::/48') -- IPv6 range
GROUP BY client_ip;
Conditions for hitting the index
- The column must be a native
IPV4/IPV6type with aUSING INVERTEDinverted index built on it; - The CIDR argument of
is_ip_address_in_range()must be a constant string; - An
IPV6column only handles IPv6 CIDRs (IPv4 ranges must first be converted to::ffff:mapped form), and anIPV4column only handles IPv4 CIDRs; a type / CIDR-family mismatch skips the index and falls back to expression filtering; - The session variable
inverted_index_skip_thresholddefaults to50: when more than 50% of the rows in a segment match the index, the index is skipped and data is read directly (bypass).
inverted_index_skip_thresholdFor a CIDR range query, is_ip_address_in_range() is internally split into two BKD queries, >= range start and <= range end. Each side alone often matches more than 50% of the rows (even when the final intersection is narrow), triggering bypass and skipping the index. Such queries therefore need SET inverted_index_skip_threshold = 0; (or a smaller value) in the session to hit the index reliably. Equality / IN are high-selectivity point lookups and are unaffected.
keywords
IPV6