VeloDB Enterprise
Release Notes
Enterprise Core

Enterprise Core

Enterprise Core 2.1.x

Enterprise Core 2.1.7

November 12, 2024

Behavior changes

  • The following global variables will be forcibly set to the following default values:
    • enable_nereids_dml: true
    • enable_nereids_dml_with_pipeline: true
    • enable_nereids_planner: true
    • enable_fallback_to_original_planner: true
    • enable_pipeline_x_engine: true
  • New columns have been added to the audit log.

New features

  • Async Materialized View
    • An asynchronous materialized view has added a property called use_for_rewrite to control whether it participates in transparent rewriting
  • Query Execution
    • The list of changed session variables is now output in the Profile
    • Support for trim_in, ltrim_in, and rtrim_in functions has been added
    • Support for several URL functions (top_level_domain, first_significant_subdomain, cut_to_first_significant_subdomain) has been added
    • The bit_set function has been added
    • The count_substrings function has been added
    • The translate and url_encode functions have been added
    • The normal_cdf, to_iso8601, and from_iso8601_date functions have been added
    • Support for trim_in, ltrim_in, and rtrim_in functions has been added
  • Storage Management
    • The information_schema.table_options and table_properties system tables have been added, supporting the querying of attributes set during table creation
    • Support for bitmap_empty as a default value has been implemented
    • A new session variable require_sequence_in_insert has been introduced to control whether a sequence column must be provided when performing INSERT INTO SELECT writes to a unique key table
  • Others
    • Allow for generating flame graphs on the BE WebUI page

Improvements

  • Lakehouse
    • Support for writing data to Hive text format tables. For more information, please refer to docs (opens in a new tab)
    • Access MaxCompute data using MaxCompute Open Storage API. For more information, please refer to docs (opens in a new tab)
    • Support for Paimon DLF Catalog. For more information, please refer to docs (opens in a new tab)
    • Added table$partitions syntax to directly query Hive partition information. For more information, please refer to docs (opens in a new tab)
    • Support for reading Parquet files in brotli compression format
    • Support for reading DECIMAL 256 types in Parquet files
    • Support for reading Hive tables in OpenCsvSerde format
  • Async Materialized View
    • Refined the granularity of lock holding during the build process for asynchronous materialized views
  • Query optimizer
    • Improved the accuracy of statistic information collection and usage in extreme cases to enhance planning stability.
    • Runtime filters can now be generated in more scenarios to improve query performance.
    • Enhanced constant folding capabilities for numerical, date, and string functions to boost query performance.
    • Optimized the column pruning algorithm to enhance query performance.
  • Query Execution
    • Supported parallel preparation to reduce the time consumed by short queries.
    • Corrected the names of some counters in the profile to match the audit logs.
    • Added new local shuffle rules to speed up certain queries.
  • Storage Management
    • The SHOW PARTITIONS command now supports displaying the commit version.
    • Checked for unreasonable partition expressions when creating tables.
    • Optimized the scheduling logic when encountering EOF in Routine Load.
    • Made Routine Load aware of schema changes.
    • Improved the timeout logic for Routine Load tasks.
  • Others
    • Allowed closing the built-in service port of BRPC via BE configuration.
    • Fixed issues with missing fields and duplicate records in audit logs.

Bug fixes

  • Lakehouse
    • Fixed the inconsistency in the behavior of INSERT OVERWRITE with Hive.
    • Cleaned up temporarily created folders to address the issue of too many empty folders on HDFS.
    • Resolved memory leaks in FE caused by using the JDBC Catalog in some cases.
    • Resolved memory leaks in BE caused by using the JDBC Catalog in some cases.
    • Fixed errors in reading Snappy compressed formats in certain scenarios.
    • Addressed potential FileSystem leaks on the FE side in certain scenarios.
    • Resolved issues where using EXPLAIN VERBOSE to view external table execution plans could cause null pointer exceptions in some cases.
    • Fixed the inability to read tables in Paimon parquet format.
    • Addressed performance issues introduced by compatibility changes in the JDBC Oracle Catalog.
    • Disabled predicate pushing down after implicit conversion to resolve incorrect query results in some cases with JDBC Catalog.
    • Fixed issues with case-sensitive access to table names in the External Catalog.
  • Async Materialized View
    • Fixed the issue where user-specified start times were not effective.
    • Resolved the issue of nested materialized views not refreshing.
    • Fixed the issue where materialized views might not refresh after the base table was deleted and recreated.
    • Addressed issues where partition compensation rewrites could lead to incorrect results.
    • Fixed potential errors in rewrite results when sql_select_limit was set.
  • Semi-Structured Data Management
    • Fixed the issue of index file handle leaks.
    • Addressed inaccuracies in the count() function of inverted indexes in special cases.
    • Fixed exceptions with variant when light schema change was not enabled.
    • Resolved memory leaks when variant returns arrays.
  • Query optimizer
    • Corrected potential errors in nullable calculations for filter conditions during external table queries, leading to execution exceptions.
    • Fixed potential errors in optimizing range comparison expressions.
  • Query Execution
    • The match_regexp function could not correctly handle empty strings.
    • Resolved issues where the scanner thread pool could become stuck in high-concurrency scenarios.
    • Fixed errors in the results of the data_floor function.
    • Addressed incorrect cancel messages in some scenarios.
    • Fixed issues with excessive warning logs printed by arrow flight.
    • Resolved issues where runtime filters failed to send in some scenarios.
    • Fixed problems where some system table queries could not end normally or became stuck.
    • Addressed incorrect results from window functions.
    • Fixed issues where the encrypt and decrypt functions caused BE cores.
    • Resolved errors in the results of the conv function.
  • Storage Management
    • Fixed import failures when Memtable migration was used in multi-replica scenarios with machine crashes.
    • Addressed inaccurate memory statistics during the Memtable flush phase during imports.
    • Fixed fault tolerance issues with Memtable migration in multi-replica scenarios.
    • Resolved inaccurate bvar statistics with Memtable migration.
    • Fixed inaccurate progress reporting for S3 loads.
  • Permissions
    • Fixed permission issues related to show columns, show sync, and show data from db.table.
  • Others
    • Fixed the issue where the audit log plugin for version 2.0 could not be used in version 2.1.

Enterprise Core 2.1.7-rc01

September 13, 2024

New features

  • Storage Management: Added the information_schema.table_options and table_properties system tables to support querying some attributes set during table creation; Introduced support for bitmap_empty as a default value.

Improvements

  • Query Execution: Enhanced parallel prepare support to reduce latency for short queries.
  • Storage Management: The Show Partitions command now supports displaying the commit version; Added validation for unreasonable partition expressions during table creation.

Bug fixes

  • Lakehouse: Fixed the inconsistency in insert overwrite behavior with Hive; Added additional checks when creating external DLF tables to prevent errors during queries; Cleaned up temporarily created folders to address the issue of too many empty folders on HDFS.
  • Async Materialized View: Resolved the issue where the user-specified start time did not take effect; Fixed the issue of nested materialized views not refreshing.
  • Query Execution: Addressed the issue where the match_regexp function could not properly handle empty strings; Solved the issue of scanner thread pool getting stuck in high concurrency scenarios.
  • Storage Management: Fixed the issue of import failures during Memtable migration in multi-replica scenarios when a machine goes down; Addressed the issue of inaccurate memory statistics during the Memtable flush phase during imports.
  • Permissions: Fixed permission issues related to show columns, show sync, and show data from db.table.

Enterprise Core 2.1.6

September 13, 2024

Behavior changes

  • Removed the delete_if_exists option from create repository.
  • Added the enable_prepared_stmt_audit_log session variable to control whether JDBC prepared statements record audit logs, with the default being no recording.
  • Implemented fd limit and memory constraints for segment cache.
  • When the FE configuration item sys_log_mode is set to BRIEF, file location information is added to the logs.
  • Changed the default value of the session variable max_allowed_packet to 16MB.
  • When a single request contains multiple statements, semicolons must be used to separate them.
  • Added support for statements to begin with a semicolon.
  • Aligned type formatting with MySQL in statements such as show create table.
  • When the new optimizer planning times out, it no longer falls back to prevent the old optimizer from using longer planning times.

New features

  • Lakehouse: Supported writeback for Iceberg tables. SQL interception rules now support external tables. Added the system table file_cache_statistics to view BE data cache metrics.
  • Async Materialized View: Supported transparent rewriting during inserts.Supported transparent rewriting when variant types exist in queries.
  • Semi-Structured Data Management: Supported casting ARRAY MAP to JSON type.Supported the json_keys function. Supported specifying the JSON path $. when importing JSON. ARRAY, MAP, STRUCT types now support replace_if_not_null. ARRAY, MAP, STRUCT types now support adjusting column order. Added the multi_match function to match keywords across multiple fields, with support for inverted index acceleration.
  • Query Optimizer: Filled in the original database name, table name, column name, and alias for returned columns in the MySQL protocol. Supported the aggregation function group_concat with both order by and distinct simultaneously. SQL cache now supports reusing cached results for queries with different comments. In partition pruning, supported including date_trunc and date functions in filter conditions. Allowed using the database name where the table resides as a qualifier prefix for table aliases. Supported hint-style comments.
  • Others: Added the system table table_properties for viewing table properties. Introduced deadlock and slow lock detection in FE.

Improvements

  • Lakehouse: Reimplemented the external table metadata caching mechanism. Added the session variable keep_carriage_return with a default value of false. By default, reading Hive Text format tables treats both \r\n and \n as newline characters. Optimized memory statistics for Parquet/ORC file read/write operations. Supported pushing down IN/NOT IN predicates for Paimon tables. Enhanced the optimizer to support Time Travel syntax for Hudi tables. Optimized Kerberos authentication-related processes. Enabled reading Hive tables after renaming column operations. Optimized the reading performance of partition columns for external tables. Improved the data shard merging strategy during external table query planning to avoid performance degradation caused by a large number of small shards. Added attributes such as location to SHOW CREATE DATABASE/TABLE. Supported complex types in MaxCompute Catalog. Optimized the file cache loading strategy by using asynchronous loading to avoid long BE startup times. Improved the file cache eviction strategy, such as evicting locks held for extended periods.
  • Async Materialized View: Supported hourly, weekly, and quarterly partition roll-up construction. For materialized views based on Hive external tables, the metadata cache is now updated before refresh to ensure the latest data is obtained during each refresh. Improved the performance of transparent rewrite planning in storage-compute decoupled mode by batch fetching metadata. Enhanced the performance of transparent rewrite planning by prohibiting duplicate enumerations. Improved the performance of transparent rewrite for refreshing materialized views based on Hive external table partitions.
  • Semi-Structured Data Management: Optimized memory allocation for TOPN queries to improve performance. Enhanced the performance of string processing in inverted indexes. Optimized the performance of inverted indexes in MOW tables. Supported specifying the row-store page_size during table creation to control compression effectiveness.
  • Query Optimizer: Adjusted the row count estimation algorithm for mark joins, resulting in more accurate cardinality estimates for mark joins. Optimized the cost estimation algorithm for semi/anti joins, enabling more accurate selection of semi/anti join orders. Adjusted the filter estimation algorithm for cases where some columns have no statistical information, leading to more accurate cardinality estimates. Modified the instance calculation logic for set operation operators to prevent insufficient parallelism in extreme cases. Adjusted the usage strategy of bucket shuffle, achieving better performance when data is not sufficiently shuffled. Enabled early filtering of window function data, supporting multiple window functions in a single projection. When a NullLiteral exists in a filter condition, it can now be folded into false, further converted to an EmptySet to reduce unnecessary data scanning and computation. Expanded the scope of predicate derivation, reducing data scanning in queries with specific patterns. Supported partial short-circuit evaluation logic in partition pruning to improve partition pruning performance, achieving over 100% improvement in specific scenarios. Enabled the computation of arbitrary scalar functions within user variables. Maintained error messages consistent with MySQL when alias conflicts exist in queries.
  • Query Execution: Adapted AggState for compatibility from 2.1 to 3.x and fixed coredump issues. Refactored the strategy selection for local shuffle when no joins are involved. Modified the scanner for internal table queries to an asynchronous approach to prevent blocking during internal table queries. Optimized the block merge process when building hash tables in Join operators. Reduced the lock holding time for MultiCast operations. Optimized gRPC's keepAliveTime and added a connection monitoring mechanism, reducing the probability of query failures due to RPC errors during query execution. Cleaned up all dirty pages in jemalloc when memory limits are exceeded. Improved the performance of aes_encrypt/decrypt functions when handling constant types. Optimized the performance of json_extract functions when processing constant data. Optimized the performance of ParseURL functions when processing constant data.
  • Backup Recovery / CCR: Restore now supports deleting redundant tablets and partition options. Check storage connectivity when creating a repository. Enables binlog to support DROP TABLE, allowing CCR to incrementally synchronize DROP TABLE operations.
  • Compaction: Improves the issue where high-priority compaction tasks were not subject to task concurrency control limits. Automatically reduces compaction memory consumption based on data characteristics. Fixes an issue where the sequential data optimization strategy could lead to incorrect data in aggregate tables or MOR UNIQUE tables. Optimizes the rowset selection strategy during compaction during replica replenishment to avoid triggering -235 errors.
  • MOW (Merge-On-Write): Optimizes slow column updates caused by concurrent column updates and compactions. Fixes an issue where segcompaction during bulk data imports could lead to incorrect MOW data. Fixes data loss in column updates that may occur after BE restarts.
  • Storage Management: Adds FE configuration to control whether queries under hot-cold tiering prefer local data replicas. Optimizes expired BE report messages to include newly created tablets. Optimizes replica scheduling priority strategy to prioritize replicas with missing data. Prevents tablets with unfinished ALTER jobs from being balanced. Enables modifying the number of buckets for tables with list partitioning. Prefers querying from online disk services. Improves error messages for materialized view base tables that do not support deletion during synchronization. Improves error messages for single columns exceeding 4GB. Fixes an issue where aborted transactions were omitted when plan errors occurred during INSERT statements. Fixes exceptions during SSL connection closure. Fixes an issue where table locks were not held when aborting transactions using labels. Fixes gson pretty causing large image issues. Fixes an issue where the new optimizer did not check for bucket values of 0 in CREATE TABLE statements. Fixes errors when Chinese column names are included in DELETE condition predicates. Fixes frequent tablet balancing issues in partition balancing mode.Fixes an issue where partition storage policy attributes were lost. Fixes incorrect statistics when importing multiple tables within a transaction. Fixes errors when deleting random bucket tables. Fixes issues where FE fails to start due to non-existent UDFs. Fixes inconsistencies in the last failed version between FE master and slave. Fixes an issue where related tablets may still be in schema change state when schema change jobs are canceled. Fixes errors when modifying type and column order in a single statement schema change (SC).
  • Data Loading: Improves error messages for -238 errors during imports. Allows importing to other partitions while restoring a partition. Optimizes the strategy for FE to select BEs during group commit. Avoids printing stack traces for some common streamload error messages. Improves handling of issues where offline BEs may affect import errors.
  • Permissions: Optimizes access performance after enabling the Ranger authentication plugin. Optimizes permission strategies for Refresh Catalog/Database/Table operations, allowing users to perform these operations with only SHOW permissions.

Bug fixes

  • Lakehouse: Fixes the issue where switching catalogs may result in an error of not finding the database. Addresses exceptions caused by attempting to read non-existent data on S3. Resolves the issue where specifying an abnormal path during export operations may lead to incorrect export locations. Fixes the timezone issue for time columns in Paimon tables. Temporarily disables the Parquet PageIndex feature to avoid certain erroneous behaviors. Corrects the selection of Backend nodes in the blacklist during external table queries. Resolves errors caused by missing subcolumns in Parquet Struct column types. Addresses several issues with predicate pushdown in JDBC Catalog. Fixes issues where some historical Parquet formats led to incorrect query results.
  • Async Materialized View: Fixes the inability to use SHOW CREATE MATERIALIZED VIEW on follower FEs. Unifies the object type of asynchronous materialized views in metadata as tables to enable proper display in data tools. Resolves the issue where nested asynchronous materialized views always perform full refreshes. Fixes the issue where canceled tasks may show as running after restarting FEs. Addresses incorrect use of contexts, which may lead to unexpected failures of materialized view refresh tasks. Resolves issues that may cause varchar type write failures due to unreasonable lengths when creating asynchronous materialized views based on external tables. Fixes the potential invalidation of asynchronous materialized views based on external tables after FE restarts or catalog rebuilds. Prohibits the use of partition rollup for materialized views with list partitions to prevent the generation of incorrect data.
  • Semi-Structured Data Management: Removes support for prepared statements in the old optimizer. Fixes issues with JSON escape character handling. Resolves issues with duplicate processing of JSON fields. Fixes issues with some ARRAY and MAP functions. Resolves complex combinations of inverted index queries and LIKE queries.
  • Query Optimizer: Fixed the potential partition pruning error issue when the 'OR' condition exists in partition filter conditions. Fixed the potential partition pruning error issue when complex expressions are involved. Fixed the issue where nullable in agg_state subtypes might be planned incorrectly, leading to execution errors. Fixed the issue where nullable in set operation operators might be planned incorrectly, leading to execution errors. Fixed the incorrect execution priority issue of intersect operator. Fixed the NPE issue that may occur when the maximum valid date literal exists in the query. Fixed the occasional planning error that results in an illegal slot error during execution.
  • Query Execution: Fixed the issue where the pipeline execution engine gets stuck in multiple scenarios, causing queries not to end. Fixed the coredump issue caused by null and non-null columns in set difference calculations. Fixed the incorrect result issue of the width_bucket function. Fixed the query error issue when a single row of data is large and the result set is also large (exceeding 2GB). Fixed the incorrect result issue of stddev with DecimalV2 type. Fixed the coredump issue caused by the MULTI_MATCH_ANY function. Fixed the issue where insert overwrite auto partition causes transaction rollback.
  • Backup & Recovery / CCR: Fixed the issue where the data version after backup and recovery may be incorrect, leading to unreadability. Fixed the issue of using restore version across versions. Fixed the issue where the job is not canceled when backup fails. Fixed the NPE issue in ccr during the upgrade from 2.1.4 to 2.1.5, causing the FE to fail to start. Fixed the issue where views and materialized views cannot be used after restoration.
  • Storage Management: Fixed possible memory leaks in routine load when loading multiple tables from a single stream. Fixed the issue where delimiters and escape characters in routine load were not effective. Fixed incorrectly show routine load results when the routine load task name contained uppercase letters. Fixed the issue where the offset cache was not reset when changing the routineload topic. Fixed the potential exception triggered by show routineload under concurrent scenarios. Fixed the issue where routine load might import data repeatedly.
  • Data Exporting: Fixed the issue where enabling the delete_existing_files property during export operations might result in duplicate deletion of exported data.
  • Permissions: Fixed the incorrect requirement of ALTER TABLE permission when creating a materialized view. Fixed the issue where the db was explicitly displayed as empty when showing routine load. Fixed the incorrect requirement of CREATE permission on the original table when using CREATE TABLE LIKE. Fixed the issue where grant operations did not check if the object existed.

Enterprise Core 2.1.5

July 25, 2024

Behavior changes

  • The default connection pool size for the JDBC Catalog has been increased from 10 to 30 to prevent connection exhaustion in high-concurrency scenarios.
  • The system's reserved memory (low water mark) has been adjusted to min(6.4GB, MemTotal * 5%) to mitigate BE OOM issues.
  • When processing multiple statements in a single request, only the last statement's result is returned if the CLIENT_MULTI_STATEMENTS flag is not set.
  • Direct modifications to data in asynchronous materialized views are no longer permitted.
  • A session variable use_max_length_of_varchar_in_ctas has been added to control the behavior of varchar and char type length generation during CTAS (Create Table As Select). The default value is true. When set to false, the derived varchar length is used instead of the maximum length.
  • Statistics collection now defaults to enabling the functionality of estimating the number of rows in Hive tables based on file size.
  • Transparent rewrite for asynchronous materialized views is now enabled by default.
  • Transparent rewrite utilizes partitioned materialized views. If partitions fail, the base tables are unioned with the materialized view to ensure data correctness.

New features

  • Lakehouse: The session variable read_csv_empty_line_as_null can be used to control whether empty lines are ignored when reading CSV format files. By default, empty lines are ignored. When set to true, empty lines will be read as rows where all columns are null.Compatibility with Presto's complex type output format can be enabled by setting serde_dialect="presto".
  • Multi-Table Materialized View: Supports non-deterministic functions in materialized view building. Atomically replaces definitions of asynchronous materialized views. Views creation statements can be viewed via SHOW CREATE MATERIALIZED VIEW. Transparent rewrites for multi-dimensional aggregation and non-aggregate queries. Supports DISTINCT aggregations with key columns and partitioning for roll-ups. Support for partitioning materialized views to roll up partitions using date_trunc. Partitioned table-valued functions (TVFs) are supported.
  • Semi-Structured Data Management: Tables using the VARIANT type now support partial column updates. PreparedStatement support is now enabled by default. The VARIANT type can be exported to CSV format. explode_json_object function transposes JSON Object rows into columns. The ES Catalog now maps ES NESTED or OBJECT types to the Doris JSON type. By default, support_phrase is enabled for inverted indexes with specified analyzers to improve the performance of match_phrase series queries.
  • Query Optimizer: Support for explaining DELETE FROM statements. Support for hint form of constant expression parameters. Memory Management: Added an HTTP API to clear the cache. Permissions: Support for authorization of resources within Table-Valued Functions (TVFs).

Improvements

  • Lakehouse: Upgraded Paimon to version 0.8.1.Fixes ClassNotFoundException for org.apache.commons.lang.StringUtils when querying Paimon tables.Added support for Tencent Cloud LakeFS. Optimized the timeout duration when fetching file lists for external table queries. Configurable via the session variable fetch_splits_max_wait_time_ms. Improved default connection logic for SQLServer JDBC Catalog.Added serde properties to the show create table statements for Hive tables.Changed the default cache time for Hive table lists on the FE from 1 day to 4 hours. Data export (Export/Outfile) now supports specifying compression formats for Parquet and ORC.When creating a table using CTAS+TVF, partition columns in the TVF are automatically mapped to Varchar(65533) instead of String, allowing them to be used as partition columns for internal tables. Optimized the number of metadata accesses for Hive write operations. ES Catalog now supports mapping nested/object types to Doris's Json type. Improved error messages when connecting to Oracle using older versions of the ojdbc driver. When Hudi tables return an empty set during Incremental Read, Doris now also returns an empty set instead of error. Fixed an issue where inner-outer table join queries could lead to FE timeouts in some cases. Fixed an issue with FE metadata replay errors during upgrades from older versions to newer versions when the Hive metastore event listener is enabled.
  • Multi-Table Materialized View: Automate key column selection for asynchronous materialized views. Support date_trunc in materialized view partition definitions. Enable transparent rewrites across nested materialized view aggregations. Asynchronous materialized views remain available when schema changes do not affect the correctness of their data. Improve planning speed for transparent rewrites. When calculating the availability of asynchronous materialized views, the current refresh status is no longer taken into account.
  • Semi-Structured Data Management: Optimize DESC performance for viewing VARIANT sub-columns through sampling. Support for special JSON data with empty keys in the JSON type.
  • Inverted Index: Reduce latency by minimizing the invocation of inverted index exists to avoid delays in accessing object storage.Optimize the overhead of the inverted index query process.Prevent inverted indices in materialized views.
  • Query Optimizer: When both sides of a comparison expression are literals, the string literal will attempt to convert to the type of the other side. Refactored the sub-path pushdown functionality for the variant type, now better supporting complex pushdown scenarios. Optimized the logic for calculating the cost of materialized views, enabling more accurate selection of lower-cost materialized views. Improved the SQL cache planning speed when using user variables in SQL. Optimized the row estimation logic for NOT NULL expressions, resulting in better performance when NOT NULL is present in queries. Optimized the null rejection derivation logic for LIKE expressions. Improved error messages when querying a specific partition fails, making it clearer which table is causing the issue.
  • Query Execution: Improved the performance of the bitmap_union operator up to 3 times in certain scenarios.Enhanced the reading performance of Arrow Flight in ARM environments.Optimized the execution performance of the explode, explode_map, and explode_json functions.
  • Data Loading: Support setting max_filter_ratio for INSERT INTO ... FROM TABLE VALUE FUNCTION

Bug fixes

  • Various issues have been fixed in areas such as lakehouse, multi-table materialized view, semi-structured data analysis, inverted index, query optimizer, query execution and storage management.

Enterprise Core 2.1.4

June 27, 2024

  • Query optimizer supports the FE Flame Graph tool, simultaneous use of SELECT DISTINCT with aggregate functions, rewriting single-table queries without GROUP BY, and high-concurrency point queries.
  • Lakehouse integration supports Paimon's native reader to handle Deletion Vectors, using Resources in Table-Valued Functions (TVF), and achieving data masking through the Ranger plugin.
  • Asynchronous materialized view construction now supports partition roll-up, trigger-based updates, specifying store_row_column and Storage Medium, and transparent rewriting supports single-table asynchronous materialized views and AGG_STATE type aggregate roll-up.
  • Other feature enhancements include the addition of the replace_empty function, support for the show storage policy using statement, and JVM metrics on the BE side.
  • Several optimizations have been made, including improving the accuracy of memory estimation consumed by the Segment Cache and supporting the creation of inverted indexes for Chinese column names.
  • Various issues have been fixed in areas such as the query optimizer, query execution, materialized views, and semi-structured data analysis.

Enterprise Core 2.1.3

May 17, 2024

  • Support INSERT INTO hive table in Hive Catalog.
  • Add show views statement to query views.
  • Workload group support bind to specific BE hosts.
  • Broker Load spport compressed JSON format.
  • Truncate function can use column as scale argument.
  • Add new function uuid_to_int and int_to_uuid.
  • Support create mtmv based on other mtmv.
  • Support rewrite by mv nested materialized view.
  • Add BypassWorkloadGroup to pass query queue.
  • Add function strcmp.
  • Support hll functions hll_from_base64, hll_to_base64.

Enterprise Core 2.1.2

April 18, 2024

  • Add processlist table in information_schema database, users could use this table to query active connections.
  • Add a new table valued function LOCAL to allow access file system like shared storage.
  • Set the default value of the data_consistence property of EXPORT to partition to make export more stable during load.
  • Some of MySQL Connector (eg, dotnet MySQL.Data) rely on variable's column type to make connection.
  • Add rollup table name in profile to help find the mv selection problem.
  • Add test connection function to DB2 database to allow user check the connection when create DB2 Catalog.
  • Add DNS Cache for FQDN to accelerate the connect process among BEs in K8s env.
  • Refresh external table's rowcount async to make the query plan more stable.

Enterprise Core 2.1.1

April 8, 2024

  • Change float type output format to improve float type serialization performance.
  • Fix issues during rolling upgrade from 2.0.x to 2.1.x, including backend node core problems and JDBC Catalog query errors.
  • Enable proxy protocol to support IP transparency. Using this protocol, IP transparency for load balancing can be achieved, so that after load balancing, Doris can still obtain the client's real IP and implement permission control such as whitelisting.
  • Add new system table backend_active_tasks to monitor the realtime query statics on every BE.
  • Add inverted index support for CCR.
  • Support arrow serialization for varint type.
  • Fixed 20 bugs, including occasional core issues in the BE during the restore process.

Enterprise Core 2.1.0

March 18, 2024

  • Further improvement in the performance of complex SQL queries has been achieved, with over 100% performance enhancement on the TPC-DS 1TB test dataset, positioning query performance at the forefront of the industry.
  • Performance improvements in data lake analytics scenarios, with 4-6 times better performance compared to Trino and Spark, have been made. Additionally, compatibility with multiple SQL dialects has been introduced, enabling seamless migration from existing systems to Apache Doris.
  • For data science and other forms of large-scale data reading scenarios, a high-speed reading interface based on Arrow Flight has been introduced, resulting in a 100-fold improvement in data transfer efficiency.
  • In semi-structured data analysis scenarios, new Variant and IP data types have been introduced, along with enhancements to a series of analytical functions, making storage and analysis of complex semi-structured data more convenient.
  • The introduction of asynchronous materialized views based on multiple tables has improved query performance. This includes support for transparent rewriting acceleration, automatic refreshing, external-to-internal table materialized views, and direct querying of materialized views. Leveraging these capabilities, materialized views can also be used for data warehouse tiered modeling, job scheduling, and data processing.
  • In terms of data storage, capabilities such as auto-increment columns, automatic partitioning, MemTable forwarding, and server-side batching have been introduced to improve the efficiency of real-time data writing at scale.
  • Further improvements have been made in workload management, enhancing the isolation capability of Workload Group resource groups and adding the ability to view SQL resource usage at runtime, thereby enhancing stability in multi-load scenarios.

Enterprise Core 2.0.x

Enterprise Core 2.0.14

August 8, 2024

  • Adds a REST interface to retrieve the most recent query profile: curl http://user:password@127.0.0.1:8030/api/profile/text .
  • Optimizes the primary key point query performance for MOW tables with sequence columns.
  • Enhances the performance of inverted index queries with many conditions.
  • Automatically enables the support_phrase option when creating a tokenized inverted index to accelerate match_phrase phrase queries.
  • Supports simplified SQL hints, for example: SELECT /*+ query_timeout(3000) */ * FROM t.
  • Automatically retries reading from object storage when encountering a 429 error to improve stability.
  • LEFT SEMI / ANTI JOIN terminates subsequent matching execution upon matching a qualifying data row to enhance performance.
  • Prevents coredump when returning illegal data to MySQL results.
  • Unifies the output of type names in lowercase to maintain compatibility with MySQL and be more friendly to BI tools.

Enterprise Core 2.0.13

July 23, 2024

  • SQL input is treated as multiple statements only when the CLIENT_MULTI_STATEMENTS setting is enabled on the client side, enhancing compatibility with MySQL.
  • A new BE configuration allow_zero_date has been added, allowing dates with all zeros. When set to false, 0000-00-00 is parsed as NULL, and when set to true, it is parsed as 0000-01-01. The default value is false to maintain consistency with previous behavior.
  • LogicalWindow and LogicalPartitionTopN support multi-field predicate pushdown to improve performance.
  • The ES Catalog now maps ES nested or object types to Doris JSON types.
  • Queries with LIMIT end reading data earlier to reduce resource consumption and improve performance.
  • Special JSON data with empty keys is now supported.
  • Stability and usability of routine load have been improved, including load balancing, automatic recovery, exception handling, and more user-friendly error messages.
  • BE load balancing selection of hard disk strategy and speed optimization.
  • Stability and usability of the JDBC catalog have been improved, including encryption, thread pool connection count configuration, and more user-friendly error messages.

Enterprise Core 2.0.12

July 1, 2024

  • No longer set the default table comment to the table type. Instead, set it to be empty by default, for example, change COMMENT 'OLAP' to COMMENT ''. This new behavior is more friendly for BI software that relies on table comments.
  • Change the type of the @@autocommit variable from BOOLEAN to BIGINT to prevent errors from certain MySQL clients (such as .NET MySQL.Data).
  • Remove the disable_nested_complex_type parameter and allow the creation of nested ARRAY, MAP, and STRUCT types by default.
  • The HMS catalog supports the SHOW CREATE DATABASE command.
  • Add more inverted index metrics to the query profile.
  • Cross-Cluster Replication (CCR) supports inverted indices.

Enterprise Core 2.0.11

June 20, 2024

  • Add trino jdbc catalog type mapping for JSON and TIME.
  • FE exit when failed to transfer to (non)master to prevent unknown state and too many logs.
  • Write audit log while doing drop stats table.
  • Ignore min/max column stats if table is partially analyzed to avoid inefficient query plan.
  • Support minus operation for set like set1 - set2.
  • Improve perfmance of LIKE and REGEXP clause with concat(col, pattern_str), eg. col1 LIKE concat('%', col2, '%').
  • Add query options for short circuit queries for upgrade compatibility.
  • Since the inverted index is now mature and stable, it can replace the old BITMAP INDEX. Therefore, any newly created BITMAP INDEX will automatically switch to an INVERTED INDEX, while existing BITMAP INDEX will remain unchanged. Users can disable this automatic switch by setting the FE configuration enable_create_bitmap_index_as_inverted_index to false.

Enterprise Core 2.0.10

May 20, 2024

  • This enhancement introduces the read_only and super_read_only variables to the database system, ensuring compatibility with MySQL's read-only modes.
  • When the check status is not IO_ERROR, the disk path should not be added to the broken list. This ensures that only disks with actual I/O errors are marked as broken.
  • When performing a Create Table As Select (CTAS) operation from an external table, convert the varchar column to string type.
  • Support mapping Paimon column type "Row" to Doris type "Struct"
  • Choose disk tolerate with little skew when creating tablet
  • Write editlog for set replica drop to avoid confusing status on follower FE

Enterprise Core 2.0.9

April 24, 2024

  • Allow predicate appearing on both key and value mv columns
  • Enable mv with bitmap_union(bitmap_from_array())
  • Introduce a configuration to forcibly replicate allocation for all olap tables within the cluster
  • Add timezone support for date literals in the new optimizer Nereids
  • Enable the use of "slop" in full-text search's match_phrase to specify word distances
  • Display index ID in SHOW PROC INDEXES
  • Incorporate a secondary argument in first_value/last_value to overlook NULL values
  • Allow the use of 0 as the offset parameter in the lead/lag function

Enterprise Core 2.0.8

April 12, 2024

  • Make Inverted Index work with TopN opt in Nereids
  • Limit the max string length to 1024 while collecting column stats to control BE memory usage
  • JDBC Catalog close when JDBC client is not empty
  • Accept all Iceberg database and do not check the name format of database
  • Refresh external table's rowcount async to avoid cache miss and unstable query plan
  • Simplify the isSplitable method of hive external table to avoid too many hadoop metrics

Enterprise Core 2.0.7

March 26, 2024

  • Support make miss slot as null alias when converting outer join to anti join to speed up query.
  • Add DEFAULT_ENCRYPTION column in information_schema table and add processlist table for better compatibility for BI tools.
  • Automatically test connectivity by default when creating a JDBC Catalog.
  • Enhance auto resume to keep routine load stable.
  • Use lowercase by default for Chinese tokenizer in inverted index.
  • Add error msg if exceeded maximum default value in repeat function.
  • Skip hidden file and dir in hive table.
  • Reduce file meta cache size and disable cache for some cases to avoid OOM.
  • Reduce jvm heap memory consumed by profiles of BrokerLoadJob.
  • Remove sort which is under table sink to speed up query like INSERT INTO t1 SELECT * FROM t2 ORDER BY k.

Enterprise Core 2.0.6

March 14, 2024

  • Support match a function with alias in materialized-view.
  • Add a command to drop a tablet replica safely on backend.
  • Add row count cache for external table.
  • Support analyze rollup to gather statistics for optimizer.
  • Improve tablet schema cache memory by using deterministic way to serialize protobuf.
  • Improve show column stats performance.
  • Support estimate row count for iceberg and paimon.
  • Support sqlserver timestamp type read for JDBC catalog.

Enterprise Core 2.0.3

December 8, 2023

  • Supports automatic statistics collection, which aids the optimizer in understanding data distribution characteristics. This enables the selection of more optimal plans, significantly improving query efficiency. Starting from version 2.0.3, this feature is officially supported and is enabled by default throughout the day.
  • Data Lake supports JAVA UDF, JDBC, Hudi MOR, and more system support for complex data types.
  • Cross-Cluster Replication (CCR) now supports features like throttling and table truncation enhancements.
  • Additional built-in functions such as SHA, JSON have been added.
  • Over 20 performance improvements including inverted index, case when, predicate pushdown, etc.
  • Enhanced distributed replica management, including skipping deleted partitions, colocate group, continuous write balancing failure, and inability to balance cold-hot tiered tables.
  • Stability improvements in complex data types, inverted index, materialized views, import and compaction, Data Lake compatibility, SQL planning, and more.

Enterprise Core 2.0.2

October 8, 2023

  • Improved usability, including optimized priority network matching logic and support for role-based authorization at the row level.
  • Enhanced statistics data collection in the new optimizer, Nereids, including the elimination of file caching during the execution of analysis tasks and support for basic JDBC external table statistics collection.
  • Performance optimization and enhancement in inverted index queries, including the addition of BKD indices for improved efficiency and optimization of multi-word conjunction queries.
  • Improved support for multiple types of data sources in the multi-catalog feature for data lakes, including JDBC, HDFS, Hive, MySQL, MaxCompute, and more.
  • Optimization of array functions, with the array_union now supporting multiple parameters.

Enterprise Core 2.0.1

September 16, 2023

  • Improved the functionality and stability of complex data types such as arrays and maps, including nested complex types in inner tables and nesting outer tables with ORC/Parquet formats.
  • Enhanced performance in inverted index queries, covering tokenization, numerical processing, predicate pushdown, and more.
  • Improved query performance, including enhancements in bitmap operations, LIKE queries, scans, and aggregate functions.
  • Refined and stabilized Cross-Cluster Replication (CCR) functionality.
  • Accelerated and improved accuracy in the collection of statistics by the query optimizer, resulting in enhanced automatic query optimization.
  • Enhanced functionality and performance in the multi-catalog feature for data lakes, including performance optimizations for Iceberg and support for complex data types.

Enterprise Core 1.2.x

Enterprise Core 1.2.8

September 05, 2023

  • Fixed several decimal-related issues.
  • Resolved the problem where "show tables" couldn't display tables for which the user had select permissions.
  • Addressed issues related to replica scheduling.
  • Fixed several query planning problems.
  • Addressed an issue of file handle leakage in BE processes under certain circumstances.
  • Fixed a problem with table creation timing out in certain scenarios.
  • Resolved errors when reading ORC format files.
  • Fixed an issue where closing the FileSystem in Broker caused read errors.
  • Optimized the logic for calculating replica sizes in Auto Bucket.
  • Fixed a NullPointerException issue in Spark Load under certain circumstances.