VeloDB Enterprise
Release Notes
Enterprise Core

Enterprise Core

Enterprise Core 2.1.x

Enterprise Core 2.1.5

July 25, 2024

Behavior changes

  • The default connection pool size for the JDBC Catalog has been increased from 10 to 30 to prevent connection exhaustion in high-concurrency scenarios.
  • The system's reserved memory (low water mark) has been adjusted to min(6.4GB, MemTotal * 5%) to mitigate BE OOM issues.
  • When processing multiple statements in a single request, only the last statement's result is returned if the CLIENT_MULTI_STATEMENTS flag is not set.
  • Direct modifications to data in asynchronous materialized views are no longer permitted.
  • A session variable use_max_length_of_varchar_in_ctas has been added to control the behavior of varchar and char type length generation during CTAS (Create Table As Select). The default value is true. When set to false, the derived varchar length is used instead of the maximum length.
  • Statistics collection now defaults to enabling the functionality of estimating the number of rows in Hive tables based on file size.
  • Transparent rewrite for asynchronous materialized views is now enabled by default.
  • Transparent rewrite utilizes partitioned materialized views. If partitions fail, the base tables are unioned with the materialized view to ensure data correctness.

New features

  • Lakehouse: The session variable read_csv_empty_line_as_null can be used to control whether empty lines are ignored when reading CSV format files. By default, empty lines are ignored. When set to true, empty lines will be read as rows where all columns are null.Compatibility with Presto's complex type output format can be enabled by setting serde_dialect="presto".
  • Multi-Table Materialized View: Supports non-deterministic functions in materialized view building. Atomically replaces definitions of asynchronous materialized views. Views creation statements can be viewed via SHOW CREATE MATERIALIZED VIEW. Transparent rewrites for multi-dimensional aggregation and non-aggregate queries. Supports DISTINCT aggregations with key columns and partitioning for roll-ups. Support for partitioning materialized views to roll up partitions using date_trunc. Partitioned table-valued functions (TVFs) are supported.
  • Semi-Structured Data Management: Tables using the VARIANT type now support partial column updates. PreparedStatement support is now enabled by default. The VARIANT type can be exported to CSV format. explode_json_object function transposes JSON Object rows into columns. The ES Catalog now maps ES NESTED or OBJECT types to the Doris JSON type. By default, support_phrase is enabled for inverted indexes with specified analyzers to improve the performance of match_phrase series queries.
  • Query Optimizer: Support for explaining DELETE FROM statements. Support for hint form of constant expression parameters. Memory Management: Added an HTTP API to clear the cache. Permissions: Support for authorization of resources within Table-Valued Functions (TVFs).

Improvements

  • Lakehouse: Upgraded Paimon to version 0.8.1.Fixes ClassNotFoundException for org.apache.commons.lang.StringUtils when querying Paimon tables.Added support for Tencent Cloud LakeFS. Optimized the timeout duration when fetching file lists for external table queries. Configurable via the session variable fetch_splits_max_wait_time_ms. Improved default connection logic for SQLServer JDBC Catalog.Added serde properties to the show create table statements for Hive tables.Changed the default cache time for Hive table lists on the FE from 1 day to 4 hours. Data export (Export/Outfile) now supports specifying compression formats for Parquet and ORC.When creating a table using CTAS+TVF, partition columns in the TVF are automatically mapped to Varchar(65533) instead of String, allowing them to be used as partition columns for internal tables. Optimized the number of metadata accesses for Hive write operations. ES Catalog now supports mapping nested/object types to Doris's Json type. Improved error messages when connecting to Oracle using older versions of the ojdbc driver. When Hudi tables return an empty set during Incremental Read, Doris now also returns an empty set instead of error. Fixed an issue where inner-outer table join queries could lead to FE timeouts in some cases. Fixed an issue with FE metadata replay errors during upgrades from older versions to newer versions when the Hive metastore event listener is enabled.
  • Multi-Table Materialized View: Automate key column selection for asynchronous materialized views. Support date_trunc in materialized view partition definitions. Enable transparent rewrites across nested materialized view aggregations. Asynchronous materialized views remain available when schema changes do not affect the correctness of their data. Improve planning speed for transparent rewrites. When calculating the availability of asynchronous materialized views, the current refresh status is no longer taken into account.
  • Semi-Structured Data Management: Optimize DESC performance for viewing VARIANT sub-columns through sampling. Support for special JSON data with empty keys in the JSON type.
  • Inverted Index: Reduce latency by minimizing the invocation of inverted index exists to avoid delays in accessing object storage.Optimize the overhead of the inverted index query process.Prevent inverted indices in materialized views.
  • Query Optimizer: When both sides of a comparison expression are literals, the string literal will attempt to convert to the type of the other side. Refactored the sub-path pushdown functionality for the variant type, now better supporting complex pushdown scenarios. Optimized the logic for calculating the cost of materialized views, enabling more accurate selection of lower-cost materialized views. Improved the SQL cache planning speed when using user variables in SQL. Optimized the row estimation logic for NOT NULL expressions, resulting in better performance when NOT NULL is present in queries. Optimized the null rejection derivation logic for LIKE expressions. Improved error messages when querying a specific partition fails, making it clearer which table is causing the issue.
  • Query Execution: Improved the performance of the bitmap_union operator up to 3 times in certain scenarios.Enhanced the reading performance of Arrow Flight in ARM environments.Optimized the execution performance of the explode, explode_map, and explode_json functions.
  • Data Loading: Support setting max_filter_ratio for INSERT INTO ... FROM TABLE VALUE FUNCTION

Bug fixes

  • Various issues have been fixed in areas such as lakehouse, multi-table materialized view, semi-structured data analysis, inverted index, query optimizer, query execution and storage management.

Enterprise Core 2.1.4

June 27, 2024

  • Query optimizer supports the FE Flame Graph tool, simultaneous use of SELECT DISTINCT with aggregate functions, rewriting single-table queries without GROUP BY, and high-concurrency point queries.
  • Lakehouse integration supports Paimon's native reader to handle Deletion Vectors, using Resources in Table-Valued Functions (TVF), and achieving data masking through the Ranger plugin.
  • Asynchronous materialized view construction now supports partition roll-up, trigger-based updates, specifying store_row_column and Storage Medium, and transparent rewriting supports single-table asynchronous materialized views and AGG_STATE type aggregate roll-up.
  • Other feature enhancements include the addition of the replace_empty function, support for the show storage policy using statement, and JVM metrics on the BE side.
  • Several optimizations have been made, including improving the accuracy of memory estimation consumed by the Segment Cache and supporting the creation of inverted indexes for Chinese column names.
  • Various issues have been fixed in areas such as the query optimizer, query execution, materialized views, and semi-structured data analysis.

Enterprise Core 2.1.3

May 17, 2024

  • Support INSERT INTO hive table in Hive Catalog.
  • Add show views statement to query views.
  • Workload group support bind to specific BE hosts.
  • Broker Load spport compressed JSON format.
  • Truncate function can use column as scale argument.
  • Add new function uuid_to_int and int_to_uuid.
  • Support create mtmv based on other mtmv.
  • Support rewrite by mv nested materialized view.
  • Add BypassWorkloadGroup to pass query queue.
  • Add function strcmp.
  • Support hll functions hll_from_base64, hll_to_base64.

Enterprise Core 2.1.2

April 18, 2024

  • Add processlist table in information_schema database, users could use this table to query active connections.
  • Add a new table valued function LOCAL to allow access file system like shared storage.
  • Set the default value of the data_consistence property of EXPORT to partition to make export more stable during load.
  • Some of MySQL Connector (eg, dotnet MySQL.Data) rely on variable's column type to make connection.
  • Add rollup table name in profile to help find the mv selection problem.
  • Add test connection function to DB2 database to allow user check the connection when create DB2 Catalog.
  • Add DNS Cache for FQDN to accelerate the connect process among BEs in K8s env.
  • Refresh external table's rowcount async to make the query plan more stable.

Enterprise Core 2.1.1

April 8, 2024

  • Change float type output format to improve float type serialization performance.
  • Fix issues during rolling upgrade from 2.0.x to 2.1.x, including backend node core problems and JDBC Catalog query errors.
  • Enable proxy protocol to support IP transparency. Using this protocol, IP transparency for load balancing can be achieved, so that after load balancing, Doris can still obtain the client's real IP and implement permission control such as whitelisting.
  • Add new system table backend_active_tasks to monitor the realtime query statics on every BE.
  • Add inverted index support for CCR.
  • Support arrow serialization for varint type.
  • Fixed 20 bugs, including occasional core issues in the BE during the restore process.

Enterprise Core 2.1.0

March 18, 2024

  • Further improvement in the performance of complex SQL queries has been achieved, with over 100% performance enhancement on the TPC-DS 1TB test dataset, positioning query performance at the forefront of the industry.
  • Performance improvements in data lake analytics scenarios, with 4-6 times better performance compared to Trino and Spark, have been made. Additionally, compatibility with multiple SQL dialects has been introduced, enabling seamless migration from existing systems to Apache Doris.
  • For data science and other forms of large-scale data reading scenarios, a high-speed reading interface based on Arrow Flight has been introduced, resulting in a 100-fold improvement in data transfer efficiency.
  • In semi-structured data analysis scenarios, new Variant and IP data types have been introduced, along with enhancements to a series of analytical functions, making storage and analysis of complex semi-structured data more convenient.
  • The introduction of asynchronous materialized views based on multiple tables has improved query performance. This includes support for transparent rewriting acceleration, automatic refreshing, external-to-internal table materialized views, and direct querying of materialized views. Leveraging these capabilities, materialized views can also be used for data warehouse tiered modeling, job scheduling, and data processing.
  • In terms of data storage, capabilities such as auto-increment columns, automatic partitioning, MemTable forwarding, and server-side batching have been introduced to improve the efficiency of real-time data writing at scale.
  • Further improvements have been made in workload management, enhancing the isolation capability of Workload Group resource groups and adding the ability to view SQL resource usage at runtime, thereby enhancing stability in multi-load scenarios.

Enterprise Core 2.0.x

Enterprise Core 2.0.12

July 1, 2024

  • No longer set the default table comment to the table type. Instead, set it to be empty by default, for example, change COMMENT 'OLAP' to COMMENT ''. This new behavior is more friendly for BI software that relies on table comments.
  • Change the type of the @@autocommit variable from BOOLEAN to BIGINT to prevent errors from certain MySQL clients (such as .NET MySQL.Data).
  • Remove the disable_nested_complex_type parameter and allow the creation of nested ARRAY, MAP, and STRUCT types by default.
  • The HMS catalog supports the SHOW CREATE DATABASE command.
  • Add more inverted index metrics to the query profile.
  • Cross-Cluster Replication (CCR) supports inverted indices.

Enterprise Core 2.0.11

June 20, 2024

  • Add trino jdbc catalog type mapping for JSON and TIME.
  • FE exit when failed to transfer to (non)master to prevent unknown state and too many logs.
  • Write audit log while doing drop stats table.
  • Ignore min/max column stats if table is partially analyzed to avoid inefficient query plan.
  • Support minus operation for set like set1 - set2.
  • Improve perfmance of LIKE and REGEXP clause with concat(col, pattern_str), eg. col1 LIKE concat('%', col2, '%').
  • Add query options for short circuit queries for upgrade compatibility.
  • Since the inverted index is now mature and stable, it can replace the old BITMAP INDEX. Therefore, any newly created BITMAP INDEX will automatically switch to an INVERTED INDEX, while existing BITMAP INDEX will remain unchanged. Users can disable this automatic switch by setting the FE configuration enable_create_bitmap_index_as_inverted_index to false.

Enterprise Core 2.0.10

May 20, 2024

  • This enhancement introduces the read_only and super_read_only variables to the database system, ensuring compatibility with MySQL's read-only modes.
  • When the check status is not IO_ERROR, the disk path should not be added to the broken list. This ensures that only disks with actual I/O errors are marked as broken.
  • When performing a Create Table As Select (CTAS) operation from an external table, convert the varchar column to string type.
  • Support mapping Paimon column type "Row" to Doris type "Struct"
  • Choose disk tolerate with little skew when creating tablet
  • Write editlog for set replica drop to avoid confusing status on follower FE

Enterprise Core 2.0.9

April 24, 2024

  • Allow predicate appearing on both key and value mv columns
  • Enable mv with bitmap_union(bitmap_from_array())
  • Introduce a configuration to forcibly replicate allocation for all olap tables within the cluster
  • Add timezone support for date literals in the new optimizer Nereids
  • Enable the use of "slop" in full-text search's match_phrase to specify word distances
  • Display index ID in SHOW PROC INDEXES
  • Incorporate a secondary argument in first_value/last_value to overlook NULL values
  • Allow the use of 0 as the offset parameter in the lead/lag function

Enterprise Core 2.0.8

April 12, 2024

  • Make Inverted Index work with TopN opt in Nereids
  • Limit the max string length to 1024 while collecting column stats to control BE memory usage
  • JDBC Catalog close when JDBC client is not empty
  • Accept all Iceberg database and do not check the name format of database
  • Refresh external table's rowcount async to avoid cache miss and unstable query plan
  • Simplify the isSplitable method of hive external table to avoid too many hadoop metrics

Enterprise Core 2.0.7

March 26, 2024

  • Support make miss slot as null alias when converting outer join to anti join to speed up query.
  • Add DEFAULT_ENCRYPTION column in information_schema table and add processlist table for better compatibility for BI tools.
  • Automatically test connectivity by default when creating a JDBC Catalog.
  • Enhance auto resume to keep routine load stable.
  • Use lowercase by default for Chinese tokenizer in inverted index.
  • Add error msg if exceeded maximum default value in repeat function.
  • Skip hidden file and dir in hive table.
  • Reduce file meta cache size and disable cache for some cases to avoid OOM.
  • Reduce jvm heap memory consumed by profiles of BrokerLoadJob.
  • Remove sort which is under table sink to speed up query like INSERT INTO t1 SELECT * FROM t2 ORDER BY k.

Enterprise Core 2.0.6

March 14, 2024

  • Support match a function with alias in materialized-view.
  • Add a command to drop a tablet replica safely on backend.
  • Add row count cache for external table.
  • Support analyze rollup to gather statistics for optimizer.
  • Improve tablet schema cache memory by using deterministic way to serialize protobuf.
  • Improve show column stats performance.
  • Support estimate row count for iceberg and paimon.
  • Support sqlserver timestamp type read for JDBC catalog.

Enterprise Core 2.0.3

December 8, 2023

  • Supports automatic statistics collection, which aids the optimizer in understanding data distribution characteristics. This enables the selection of more optimal plans, significantly improving query efficiency. Starting from version 2.0.3, this feature is officially supported and is enabled by default throughout the day.
  • Data Lake supports JAVA UDF, JDBC, Hudi MOR, and more system support for complex data types.
  • Cross-Cluster Replication (CCR) now supports features like throttling and table truncation enhancements.
  • Additional built-in functions such as SHA, JSON have been added.
  • Over 20 performance improvements including inverted index, case when, predicate pushdown, etc.
  • Enhanced distributed replica management, including skipping deleted partitions, colocate group, continuous write balancing failure, and inability to balance cold-hot tiered tables.
  • Stability improvements in complex data types, inverted index, materialized views, import and compaction, Data Lake compatibility, SQL planning, and more.

Enterprise Core 2.0.2

October 8, 2023

  • Improved usability, including optimized priority network matching logic and support for role-based authorization at the row level.
  • Enhanced statistics data collection in the new optimizer, Nereids, including the elimination of file caching during the execution of analysis tasks and support for basic JDBC external table statistics collection.
  • Performance optimization and enhancement in inverted index queries, including the addition of BKD indices for improved efficiency and optimization of multi-word conjunction queries.
  • Improved support for multiple types of data sources in the multi-catalog feature for data lakes, including JDBC, HDFS, Hive, MySQL, MaxCompute, and more.
  • Optimization of array functions, with the array_union now supporting multiple parameters.

Enterprise Core 2.0.1

September 10, 2023

  • Improved the functionality and stability of complex data types such as arrays and maps, including nested complex types in inner tables and nesting outer tables with ORC/Parquet formats.
  • Enhanced performance in inverted index queries, covering tokenization, numerical processing, predicate pushdown, and more.
  • Improved query performance, including enhancements in bitmap operations, LIKE queries, scans, and aggregate functions.
  • Refined and stabilized Cross-Cluster Replication (CCR) functionality.
  • Accelerated and improved accuracy in the collection of statistics by the query optimizer, resulting in enhanced automatic query optimization.
  • Enhanced functionality and performance in the multi-catalog feature for data lakes, including performance optimizations for Iceberg and support for complex data types.

Enterprise Core 1.2.x

Enterprise Core 1.2.8

September 05, 2023

  • Fixed several decimal-related issues.
  • Resolved the problem where "show tables" couldn't display tables for which the user had select permissions.
  • Addressed issues related to replica scheduling.
  • Fixed several query planning problems.
  • Addressed an issue of file handle leakage in BE processes under certain circumstances.
  • Fixed a problem with table creation timing out in certain scenarios.
  • Resolved errors when reading ORC format files.
  • Fixed an issue where closing the FileSystem in Broker caused read errors.
  • Optimized the logic for calculating replica sizes in Auto Bucket.
  • Fixed a NullPointerException issue in Spark Load under certain circumstances.