VeloDB Cloud
Integration
Observability
Doris OpenTelemetry Integration

Doris OpenTelemetry Integration

Introduction

OpenTelemetry, also known as OTel, is a vendor-neutral open source Observability framework for instrumenting, generating, collecting, and exporting telemetry data such as traces, metrics, and logs. OpenTelemetry defines a set of standards and protocols for observability, widely adopted by the observability community and vendors, gradually becoming the de facto standard in the field of observability.

A major goal of OpenTelemetry is to enable easy instrumentation of your applications and systems, regardless of the programming language, infrastructure, and runtime environments used. The backend (storage) and the frontend (visualization) of telemetry data are intentionally left to other tools. Doris, as a storage backend integrated with OpenTelemetry, provides high-performance, low-cost, unified observability data storage and analysis capabilities. The overall architecture is as follows:

Doris OpenTelemetry Integration

Installation

Download the OpenTelemetry Collector Contrib installation package from the OpenTelemetry official Release page (opens in a new tab), for example:
https://github.com/open-telemetry/opentelemetry-collector-releases/releases/download/v0.132.2/otelcol-contrib_0.132.2_linux_amd64.tar.gz (opens in a new tab)

Extract the package to obtain the otelcol-contrib executable file.

Parameter Configuration

The core configuration of the OpenTelemetry Collector Doris Exporter is as follows:

ConfigurationDescription
endpointDoris FE HTTP address in the format host:port, e.g., "127.0.0.1:8030"
mysql_endpointDoris FE MySQL address in the format host:port, e.g., "127.0.0.1:9030"
usernameDoris username with write permissions for the corresponding database tables
passwordPassword for the Doris user
databaseTarget Doris database name
table.logsDoris table name for logs data (default: otel_logs)
table.tracesDoris table name for traces data (default: otel_traces)
table.metricsDoris table name for metrics data (default: otel_metrics)
create_schemaWhether to automatically create Doris database tables (default: true)
history_daysRetention days for historical data in auto-created tables (default: 0, meaning permanent retention)
create_history_daysInitial partition days for auto-created tables (default: 0, meaning no partitions created)
label_prefixDoris Stream Load Label prefix. The final label is {label_prefix}_{db}_{table}_{yyyymmdd_hhmmss}_{uuid} (default: open_telemetry)
headersHeaders for Doris Stream Load in YAML map format
log_progress_intervalInterval (in seconds) for logging throughput (default: 10, set to 0 to disable)

For more configurations, refer to:
https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/exporter/dorisexporter (opens in a new tab)

Usage Example

Usage Example

TEXT Log Collection Example

This example demonstrates TEXT log collection using Doris FE logs.

1. Data

FE log files are typically located in the fe/log/fe.log file under the Doris installation directory. These are standard Java program logs, including fields such as timestamps, log levels, thread names, code locations, and log content. In addition to normal logs, there are also exception logs with stack traces. Since stack traces span multiple lines, the log collection and storage process must combine the main log and stack trace into a single log entry.

Example logs:

2024-07-08 21:18:01,432 INFO (Statistics Job Appender|61) [StatisticsJobAppender.runAfterCatalogReady():70] Stats table not available, skip  
2024-07-08 21:18:53,710 WARN (STATS_FETCH-0|208) [StmtExecutor.executeInternalQuery():3332] Failed to run internal SQL: OriginStatement{originStmt='SELECT * FROM __internal_schema.column_statistics WHERE part_id is NULL  ORDER BY update_time DESC LIMIT 500000', idx=0}  
org.apache.doris.common.UserException: errCode = 2, detailMessage = tablet 10031 has no queryable replicas. err: replica 10032's backend 10008 does not exist or not alive  
        at org.apache.doris.planner.OlapScanNode.addScanRangeLocations(OlapScanNode.java:931) ~[doris-fe.jar:1.2-SNAPSHOT]  
        at org.apache.doris.planner.OlapScanNode.computeTabletInfo(OlapScanNode.java:1197) ~[doris-fe.jar:1.2-SNAPSHOT]  

2. OpenTelemetry Configuration

The log collection configuration file (e.g., opentelemetry_java_log.yml) consists of three main parts, corresponding to the ETL process:

  1. Receivers – Read raw data.
  2. Processors – Transform data.
  3. Exporters – Output data.

Example configuration (opentelemetry_java_log.yml):

# 1. Receivers – Read raw data  
# The `filelog` receiver reads log files from the local filesystem.  
# The `multiline` configuration merges stack traces with the main log by matching timestamps.  
receivers:  
  filelog:  
    include:  
      - /path/to/fe.log  
    start_at: beginning  
    multiline:  
      line_start_pattern: '^\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2},\d{3}'  # Match timestamp as new log start  
    operators:  
      # Parse logs  
      - type: regex_parser  
        regex: '^(?P<time>\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2},\d{3}) (?P<severity>INFO|WARN|ERROR) (?P<message>.*)'  
        timestamp:  
          parse_from: attributes.time  
          layout: '%Y-%m-%d %H:%M:%S,%f'  
        severity:  
          parse_from: attributes.severity  
          trace: TRACE  
          debug: DEBUG  
          info: INFO  
          warn: WARN  
          error: ERROR  
          fatal: FATAL  
 
# 2. Processors – Transform data  
# The `batch` processor groups logs before sending.  
processors:  
  batch:  
    send_batch_size: 100000  # Number of logs per batch (recommended batch size: 100MB–1GB)  
    timeout: 10s  
 
# 3. Exporters – Output data  
# The `doris` exporter sends data to Doris using the Stream Load HTTP interface (JSON format by default).  
exporters:  
  doris:  
    endpoint: http://localhost:8030  # FE HTTP address  
    mysql_endpoint: localhost:9030   # FE MySQL address  
    database: doris_db_name  
    username: doris_username  
    password: doris_password  
    table:  
      logs: otel_logs  
    create_schema: true  # Automatically create schema (if false, tables must be created manually)  
    history_days: 10  
    create_history_days: 10  
    timezone: Asia/Shanghai  
    timeout: 60s  # HTTP Stream Load client timeout  
    log_response: true  
    sending_queue:  
      enabled: true  
      num_consumers: 20  
      queue_size: 1000  
    retry_on_failure:  
      enabled: true  
      initial_interval: 5s  
      max_interval: 30s  
    headers:  
      load_to_single_tablet: "true"  
 
service:  
  pipelines:  
    logs:  
      receivers: [filelog]  
      processors: [batch]  
      exporters: [doris]  

3. Running OpenTelemetry

./otelcol-contrib --config config/opentelemetry_java_log.yml  
 
# When `log_response` is true, logs will display Stream Load request/response details:  
2025-08-18T00:33:22.543+0800	info	dorisexporter@v0.132.0/exporter_logs.go:181	log response:  
{  
    "TxnId": 52,  
    "Label": "open_telemetry_otel_otel_logs_20250818003321_498bb8ec-040c-4982-9eb4-452b15129782",  
    "Comment": "",  
    "TwoPhaseCommit": "false",  
    "Status": "Success",  
    "Message": "OK",  
    "NumberTotalRows": 50355,  
    "NumberLoadedRows": 50355,  
    "NumberFilteredRows": 0,  
    "NumberUnselectedRows": 0,  
    "LoadBytes": 31130235,  
    "LoadTimeMs": 680,  
    "BeginTxnTimeMs": 0,  
    "StreamLoadPutTimeMs": 3,  
    "ReadDataTimeMs": 106,  
    "WriteDataTimeMs": 653,  
    "ReceiveDataTimeMs": 11,  
    "CommitAndPublishTimeMs": 23  
}  
 
# Progress logs (every 10s) show throughput:  
2025-08-18T00:05:00.017+0800	info	dorisexporter@v0.132.0/progress_reporter.go:63	[LOG] total 11 MB 18978 ROWS, total speed 0 MB/s 632 R/s, last 10 seconds speed 1 MB/s 1897 R/s  

JSON Log Collection Example

This example demonstrates JSON log collection using GitHub Events Archive data.

1. Data

GitHub Events Archive contains archived user activity events in JSON format, downloadable from https://www.gharchive.org/ (opens in a new tab). For example, to download data from January 1, 2024, at 15:00:

wget https://data.gharchive.org/2024-01-01-15.json.gz

Below is a sample entry (formatted for readability; actual data is one JSON object per line):

{
  "id": "37066529221",
  "type": "PushEvent",
  "actor": {
    "id": 46139131,
    "login": "Bard89",
    "display_login": "Bard89",
    "gravatar_id": "",
    "url": "https://api.github.com/users/Bard89",
    "avatar_url": "https://avatars.githubusercontent.com/u/46139131?"
  },
  "repo": {
    "id": 780125623,
    "name": "Bard89/talk-to-me",
    "url": "https://api.github.com/repos/Bard89/talk-to-me"
  },
  "payload": {
    "repository_id": 780125623,
    "push_id": 17799451992,
    "size": 1,
    "distinct_size": 1,
    "ref": "refs/heads/add_mvcs",
    "head": "f03baa2de66f88f5f1754ce3fa30972667f87e81",
    "before": "85e6544ede4ae3f132fe2f5f1ce0ce35a3169d21"
  },
  "public": true,
  "created_at": "2024-04-01T23:00:00Z"
}

2. OpenTelemetry Configuration

The key difference from the TEXT log configuration is the json_parser operator in the filelog receiver, which parses each line as JSON. Extracted fields are used for subsequent processing.

Example configuration (opentelemetry_json_log.yml):

receivers:
  filelog:
    include:
      - /path/to/2024-01-01-15.json
    start_at: beginning
    operators:
      - type: json_parser
        timestamp:
          parse_from: attributes.created_at
          layout: '%Y-%m-%dT%H:%M:%SZ'
 
processors:
  batch:
    send_batch_size: 100000  # Logs per batch (recommended: 100MB–1GB total size)
    timeout: 10s
 
exporters:
  doris:
    endpoint: http://localhost:8030  # FE HTTP address
    mysql_endpoint: localhost:9030   # FE MySQL address
    database: doris_db_name
    username: doris_username
    password: doris_password
    table:
      logs: otel_logs
    create_schema: true  # Auto-create schema (if false, manually create tables)
    history_days: 10
    create_history_days: 10
    timezone: Asia/Shanghai
    timeout: 60s  # HTTP Stream Load client timeout
    log_response: true
    sending_queue:
      enabled: true
      num_consumers: 20
      queue_size: 1000
    retry_on_failure:
      enabled: true
      initial_interval: 5s
      max_interval: 30s
    headers:
      load_to_single_tablet: "true"
 
service:
  pipelines:
    logs:
      receivers: [filelog]
      processors: [batch]
      exporters: [doris]

3. Running OpenTelemetry

./otelcol-contrib --config config/opentelemetry_json_log.yml

Trace Collection Example

1. OpenTelemetry Configuration

Create the configuration file otel_trace.yml as follows:

receivers:
  otlp:  # OTLP protocol to receive data from OpenTelemetry Java Agent
    protocols:
      grpc:
        endpoint: 0.0.0.0:4317
      http:
        endpoint: 0.0.0.0:4318
 
processors:
  batch:
    send_batch_size: 100000  # Number of traces per batch (recommended batch size: 100MB–1GB)
    timeout: 10s
 
exporters:
  doris:
    endpoint: http://localhost:8030  # FE HTTP address
    database: doris_db_name
    username: doris_username
    password: doris_password
    table:
      traces: doris_table_name
    create_schema: true  # Auto-create schema (if false, tables must be created manually)
    mysql_endpoint: localhost:9030  # FE MySQL address
    history_days: 10
    create_history_days: 10
    timezone: Asia/Shanghai
    timeout: 60s  # HTTP Stream Load client timeout
    log_response: true
    sending_queue:
      enabled: true
      num_consumers: 20
      queue_size: 1000
    retry_on_failure:
      enabled: true
      initial_interval: 5s
      max_interval: 30s
    headers:
      load_to_single_tablet: "true"
 
service:
  pipelines:
    traces:
      receivers: [otlp]
      processors: [batch]
      exporters: [doris]

2. Run OpenTelemetry

./otelcol-contrib --config otel_trace.yaml

3. Application Integration with OpenTelemetry SDK

Here we demonstrate integration using a Spring Boot example application (from the official demo (opens in a new tab)), which returns "Hello World!" for the "/" endpoint.

  1. Download the OpenTelemetry Java Agent (opens in a new tab).

  2. Before starting the application, set these environment variables (no code modification needed):

export JAVA_TOOL_OPTIONS="${JAVA_TOOL_OPTIONS} -javaagent:/your/path/to/opentelemetry-javaagent.jar"  # Path to OpenTelemetry Java Agent
export OTEL_JAVAAGENT_LOGGING="none"  # Disable OTEL logs to avoid interference with application logs
export OTEL_SERVICE_NAME="myproject"
export OTEL_TRACES_EXPORTER="otlp"  # Use OTLP protocol to send trace data
export OTEL_EXPORTER_OTLP_ENDPOINT="http://localhost:4317"  # OpenTelemetry Collector address
 
java -jar myproject-0.0.1-SNAPSHOT.jar