CREATE-ASYNC-MATERIALIZED-VIEW
Name
CREATE ASYNC MATERIALIZED VIEW
Description
This statement is used to create an asynchronous materialized view.
syntax
CREATE MATERIALIZED VIEW (IF NOT EXISTS)? mvName=multipartIdentifier
(LEFT_PAREN cols=simpleColumnDefs RIGHT_PAREN)? buildMode?
(REFRESH refreshMethod? refreshTrigger?)?
((DUPLICATE)? KEY keys=identifierList)?
(COMMENT STRING_LITERAL)?
(PARTITION BY LEFT_PAREN mvPartition RIGHT_PAREN)?
(DISTRIBUTED BY (HASH hashKeys=identifierList | RANDOM) (BUCKETS (INTEGER_VALUE | AUTO))?)?
propertyClause?
AS query
illustrate
simpleColumnDefs
Used to define the materialized view column information, if not defined, it will be automatically derived
simpleColumnDefs
: cols+=simpleColumnDef (COMMA cols+=simpleColumnDef)*
;
simpleColumnDef
: colName=identifier (COMMENT comment=STRING_LITERAL)?
;
For example, define two columns aa and bb, where the annotation for aa is "name"
CREATE MATERIALIZED VIEW mv1
(aa comment "name",bb)
buildMode
Used to define whether the materialized view is refreshed immediately after creation, default to IMMEDIATE
IMMEDIATE:Refresh Now
DEFERRED:Delay refresh
buildMode
: BUILD (IMMEDIATE | DEFERRED)
;
For example, specifying the materialized view to refresh immediately
CREATE MATERIALIZED VIEW mv1
BUILD IMMEDIATE
refreshMethod
Used to define the refresh method for materialized views, default to AUTO
COMPLETE:Full refresh
AUTO:Try to refresh incrementally as much as possible. If incremental refresh is not possible, refresh in full
The SQL definition and partition fields of the materialized view need to meet the following conditions for partition incremental updates:
- At least one of the base tables used by the materialized view is a partitioned table.
- The base tables used by the materialized view must use list or range partitioning strategies.
- The
partition by
clause in the SQL definition of the materialized view can only have one partitioning column. - The partitioning column specified in the
partition by
clause of the materialized view's SQL must come after theSELECT
statement. - If the materialized view's SQL includes a
group by
clause, the columns used for partitioning must come after thegroup by
clause. - If the materialized view's SQL includes window functions, the columns used for partitioning must come after the
partition by
clause. - Data changes should occur on partitioned tables. If data changes occur on non-partitioned tables, the materialized view needs to be fully rebuilt.
- If a field from the null-generating side of a join is used as a partitioning column for the materialized view, it cannot be incrementally updated by partition. For example, for a LEFT OUTER JOIN, the partitioning column must be on the left side, not the right.
refreshMethod
: COMPLETE | AUTO
;
For example, specifying full refresh of materialized views
CREATE MATERIALIZED VIEW mv1
REFRESH COMPLETE
refreshTrigger
Trigger method for refreshing data in materialized views, default to MANUAL
MANUAL:Manual refresh
SCHEDULE:Timed refresh
COMMIT: Trigger-based refresh. When the base table data changes, a task to refresh the materialized view is automatically generated.
refreshTrigger
: ON MANUAL
| ON SCHEDULE refreshSchedule
| ON COMMIT
;
refreshSchedule
: EVERY INTEGER_VALUE mvRefreshUnit (STARTS STRING_LITERAL)?
;
mvRefreshUnit
: MINUTE | HOUR | DAY | WEEK
;
For example: executed every 2 hours, starting from 21:07:09 on December 13, 2023
CREATE MATERIALIZED VIEW mv1
REFRESH ON SCHEDULE EVERY 2 HOUR STARTS "2023-12-13 21:07:09"
key
The materialized view is the DUPLICATE KEY model, therefore the specified columns are arranged in sequence
identifierList
: LEFT_PAREN identifierSeq RIGHT_PAREN
;
identifierSeq
: ident+=errorCapturingIdentifier (COMMA ident+=errorCapturingIdentifier)*
;
For example, specifying k1 and k2 as sorting sequences
CREATE MATERIALIZED VIEW mv1
KEY(k1,k2)
partition
There are two types of partitioning methods for materialized views. If no partitioning is specified, there will be a default single partition. If a partitioning field is specified, the system will automatically deduce the source base table of that field and synchronize all partitions of the base table (currently supporting OlapTable
and hive
). (Limitation: If the base table is an OlapTable
, it can only have one partition field)
For example, if the base table is a range partition with a partition field of create_time
and partitioning by day, and partition by(ct) as select create_time as ct from t1
is specified when creating a materialized view,
then the materialized view will also be a range partition with a partition field of 'ct' and partitioning by day
Materialized views can also reduce the number of partitions by using partition roll-up. Currently, the partition roll-up function supports date_trunc
, and the roll-up units supported are year
, month
, and day
.
The selection of partition fields and the definition of materialized views must meet the conditions for partition incremental updates for the materialized view to be created successfully; otherwise, an error "Unable to find a suitable base table for partitioning" will occur.
mvPartition
: partitionKey = identifier
| partitionExpr = functionCallExpression
;
For example, if the base table is partitioned by day, the materialized view is also partitioned by day.
partition by (`k2`)
For example, if the base table is partitioned by day, the materialized view is partitioned by month.
partition by (date_trunc(`k2`,'month'))
property
The materialized view can specify both the properties of the table and the properties unique to the materialized view.
The properties unique to materialized views include:
grace_period
: When performing query rewrites, there is a maximum allowed delay time (measured in seconds) for the data of the materialized view. If there is a discrepancy between the data of partition A and the base table, and the last refresh time of partition A of the materialized view was 1, while the current system time is 2, then this partition will not undergo transparent rewriting. However, if the grace_period is greater than or equal to 1, this partition will be used for transparent rewriting.
excluded_trigger_tables
: Table names ignored during data refresh, separated by commas. For example, table1, table2
refresh_partition_num
: The number of partitions refreshed by a single insert statement is set to 1 by default. When refreshing a materialized view, the system first calculates the list of partitions to be refreshed and then splits it into multiple insert statements that are executed in sequence according to this configuration. If any insert statement fails, the entire task will stop executing. The materialized view ensures the transactionality of individual insert statements, meaning that failed insert statements will not affect partitions that have already been successfully refreshed.
workload_group
: The name of the workload_group used by the materialized view when performing refresh tasks. This is used to limit the resources used for refreshing data in the materialized view, in order to avoid affecting the operation of other business processes. For details on how to create and use workload_group, refer to WORKLOAD-GROUP
query
: Create a query statement for the materialized view, and the result is the data in the materialized view
enable_nondeterministic_function
: Whether the SQL definition of the materialized view allows containing nondeterministic
functions, such as current_date(), now(), random(), etc. If true, they are allowed; otherwise, they are not allowed.
By default, they are not allowed.
Example
-
Create a materialized view mv1 that refreshes immediately and then once a week, with the data source being the hive catalog
CREATE MATERIALIZED VIEW mv1 BUILD IMMEDIATE REFRESH COMPLETE ON SCHEDULE EVERY 1 WEEK DISTRIBUTED BY RANDOM BUCKETS 2 PROPERTIES ( "replication_num" = "1" ) AS SELECT * FROM hive_catalog.db1.user;
-
Create a materialized view with multiple table joins
CREATE MATERIALIZED VIEW mv1 BUILD IMMEDIATE REFRESH COMPLETE ON SCHEDULE EVERY 1 WEEK DISTRIBUTED BY RANDOM BUCKETS 2 PROPERTIES ( "replication_num" = "1" ) AS select user.k1,user.k3,com.k4 from user join com on user.k1=com.k1;
Keywords
CREATE, ASYNC, MATERIALIZED, VIEW