VeloDB Enterprise
Enterprise Manager
VeloDB Monitor and Alerting
Doris Monitor

Doris Cluster Monitoring

Doris Manager integrates Prometheus, Grafana, and AlertManager, allowing you to view and manage cluster monitoring directly within Manager.

View Cluster Monitoring

Doris Manager provides a rich set of predefined monitoring metrics to help you understand the real-time operational status of your cluster.

monitor

Descriptions of the monitoring metrics are as follows:

CategoryMetric NameMetric Description
Cluster OverviewFE NodeTotal number of FE nodes in the cluster
FE Not AliveNumber of offline FE nodes in the cluster
Used CapacityUsed space of BEs in the cluster
BE NodeTotal number of BE nodes in the cluster
BE Not AliveNumber of offline BE nodes in the cluster
Total CapacityTotal available storage space of BEs in the cluster
FE JVM Heap Use RateJVM heap usage rate of FEs in the cluster
BE Compaction ScoreCompaction score of each BE
Load Rows RateData import status within a unit of time
QPSQPS status of different FEs
99th Latency99th percentile query latency of different FEs
Host MonitorCPU Used RateCPU usage rate of the node
Mem UsageMemory usage size of the node
Mem Used RateMemory usage rate of the node
I/O UtilDisk I/O utilization within a unit of time
Disk Used RatePercentage of disk space used
Disk Write ThroughputDisk write throughput
Disk Read ThroughputDisk read throughput
Network Outbound TrafficOutbound traffic of the gateway
Network Inbound TrafficInbound traffic of the gateway
Query StatisticRPSRequests per second for different FEs within a unit of time
QPSQPS of different FEs
99th Latency99th percentile query latency
Query PercentileQuery latency (at different percentiles)
Query Error [1m]Query failure rate within 1 minute
ConnectionsNumber of connections for each FE
JobsBroker Load JobStatus distribution of Broker load tasks
Insert Load JobStatus distribution of Insert tasks
Routine Load JobStatus distribution of Routine load tasks
Spark Load JobStatus distribution of Spark load tasks
Broker Load TendencyBroker load task status trend
Insert Load TendencyInsert task status trend
Routine Load TendencyRoutine load task status trend
Spark Load TendencySpark load task status trend
SC JobNumber of running schema change tasks
Report Queue SizeReport Queue Size of the master node
Rollup JobNumber of running rollup tasks
TransactionsTxn Begin/Success on FETotal number of transactions initiated and successful transactions on FE
Txn Failed/Reject on FEFailed and rejected rates of BE transactions within a unit of time
Publish Task on BETotal number of publish tasks on BE
Txn Status on FENumber of transactions in different states
Txn Load Bytes/Rows rateRows and size of data imported within a unit of time
FEMax Replayed Journal IDJournal ID of FE
Edit Log SizeEdit log size of FE
Image WriteNumber of image writes on FE
Image PushNumber of image pushes on FE
Image CounterNumber of image writes and pushes on FE
Image CleanSuccess and failure status of FE image cleanup
Edit log CleanSuccess and failure status of FE edit log cleanup
BDBJE Write99th percentile write latency of BDBJE
BDBJE ReadReads of BDBJE within a unit of time
JVM HeapJVM heap usage of FE
Scheduling TabletsNumber of tablets to be scheduled during data balancing or recovery
JVM Old GCOld GC
JVM Young GCYoung GC
JVM OldJVM old size
JVM YoungJVM young size
FE Collect Compaction ScoreCompaction score of each BE collected by FE
JVM Non HeapJVM non-heap usage of FE
JVM ThreadsNumber of JVM threads
BEDisk UsageDisk space usage rate of BE
BE FD CountFD usage on BE
BE Thread NumThread distribution on BE
Tablet Meta ReadMetadata read status of BE within a unit of time
Tablet Meta WriteMetadata write status of BE within a unit of time
Tablet DistributionTablet distribution on BE
BE Compaction BaseRate of base compaction tasks performed by BE within a unit of time
BE Compaction CumulateRate of cumulative compaction tasks performed by BE within a unit of time
BE Push BytesSize of push_request_write data on BE within a unit of time
BE Push RowsNumber of rows for push_request_write on BE within a unit of time
BE Scan BytesSize of scanned data by BE within a unit of time
BE Scan RowsNumber of scanned rows by BE within a unit of time
BE TasksFinish Task ReportTotal number of tasks completed on each BE
Push TaskNumber of successfully executed push tasks on each BE
Push Task Cost TimeTime cost of executing push tasks on each BE
DeleteTotal number of delete tasks executed on BE
Base CompactionTotal number of base_compaction tasks executed on BE
Cumulative CompactionTotal number of cumulative_compaction tasks executed on BE
CloneTotal number of clone tasks executed on BE
Create RollupTotal number of create_rollup tasks executed on BE
Schema ChangeTotal number of schema_change tasks executed on BE
Create TabletTotal number of create_tablet tasks executed on BE

Create New Monitoring Dashboard

There are two monitoring dashboards in Manager:

  • Doris Dashboard Overview: A predefined Doris monitoring dashboard that provides basic Doris and host monitoring items, which cannot be modified.

  • Default Custom Doris Dashboard Overview: A user-defined monitoring dashboard that can be modified.

When creating a new dashboard, you can modify the Default Custom Doris Dashboard Overview panel to add custom dashboards.

  1. Select the "Default Custom Doris Dashboard Overview" Dashboard

    In the top-left corner of the monitoring page, select the "Default Custom Doris Dashboard Overview" panel:

    dashboard

  2. Duplicate a New Dashboard

    Duplicate a new panel. You can drag and drop it into any module:

    duplicate-panel

  3. Edit the Duplicated Panel

    Edit the panel. Refer to edit panel (opens in a new tab) for rules.

    edit-panel

Manage Cluster Monitoring

Enable/Disable Cluster Monitoring

In the user configuration, select "Service Configuration" to enable or disable monitoring and alerting services.

enable-monitor

Enable/Disable Monitoring Authentication

Starting from Doris Manager v24.0.3, authentication for monitoring components is enabled by default. You can set accounts and passwords for Prometheus, AlertManager, and Grafana separately. In the webserver/conf/manager.conf file, you can modify the following configurations:

ConfigurationTypeDescription
MONITOR_AUTH_ENABLEBOOLEANEnable or disable monitoring authentication, default is TRUE.
GRAFANA_USERSTRINGGrafana username, currently only supports the 'admin' user.
GRAFANA_PASSSTRINGGrafana password. If not configured separately, a random password will be set.
PROMETHEUS_USERSTRINGPrometheus username, defaults to the 'admin' user.
PROMETHEUS_PASSSTRINGPrometheus password. If not configured separately, a random password will be set.
ALERTMANAGER_USERSTRINGAlertManager username, defaults to 'admin'.
ALERTMANAGER_PASSSTRINGAlertManager password. If not configured separately, a random password will be set.