Skip to main content
VeloDB Cloud 26.x·Apache Doris 4.x (≤ 4.0 supported)·"Since X.Y" tags refer to Doris versionsversion mapping →

User Guide

This page is for VeloDB users. Your administrator has already deployed the VeloDB MCP Service for you; you only need to know:

  1. How to connect the service to an AI agent (Claude Code / Cursor / Codex)
  2. How to query data through an AI conversation

For defining and managing the semantic model (administrators), see Semantic Model.

Connect an AI Agent (Claude Code / Cursor / Codex)

Before you start, get the following:

  • MCP Connection URL: open the Connection page → Connection Methods → copy the MCP Connection URL.

  • VeloDB username and password: used for authentication (format username:password, e.g. admin:admin)

  • Workspace name: the workspace that holds the semantic model (default example)

Connect Claude Code

Step 1: Add the MCP Server

claude mcp add --transport http velodb https://<warehouse-id>.<region>.aws.velodb.io/mcp \
--header "Authorization: Bearer admin:admin"

Step 2: Verify the connection

In Claude Code, type:

Check service health

The AI automatically calls check_service_health and returns the VeloDB connection status and the status of every workspace. If you see "velodb": "connected", the connection succeeded.

Step 3: Start querying

List all available metrics

The AI follows a three-step flow: list_metricslist_dimensions_for_metricquery_metric.

Connect Cursor / VS Code

Create a .cursor/mcp.json in your project root, or the equivalent MCP configuration for VS Code:

{
"mcpServers": {
"velodb": {
"url": "https://<warehouse-id>.<region>.aws.velodb.io/mcp",
"transport": "streamable-http",
"headers": {
"Authorization": "Bearer admin:admin"
}
}
}
}

Available Tools (10 total)

Startup & health check (call these first when a conversation begins)

ToolParametersPurpose
get_query_guideCall this first. Returns the complete workflow guide: when to use the semantic layer vs. raw SQL, the order of tool calls, and query syntax. Skipping it makes the agent take detours.
check_service_healthdetail (optional)Returns VeloDB connectivity plus the status of each workspace (healthy / no_models / not_ready) and its metric count. Call it at the start of every conversation.

Semantic queries (preferred — generates correct SQL automatically)

ToolParametersPurpose
list_metricsworkspace (required), page_size, page_tokenLists all available metrics (name + description). The agent matches a metric to the user's intent.
list_dimensions_for_metricworkspace (required), metric_name (required)Returns the metric's available group_by dimensions (name + type + description), so group_by does not fail.
query_metricworkspace (required), metrics (required), group_by, where, order_by, limit, having, database, max_rowsThe core query tool. Compiles and executes a semantic query. Anything involving counts, sums, ratios, ranking, or trends must go through here. Supports HAVING to filter aggregated results. database sets the target database; max_rows hard-caps the returned rows (0 = default 10000).
reload_semantic_layerworkspace (required)Manually triggers a semantic-layer reload. Use it when an administrator has changed a YAML model and committed it and you need it to take effect immediately; otherwise the layer polls every 60s.

Low-level metadata & raw SQL (use only in the two cases below)

Two raw-SQL cases:

  • Case A — semantic layer unavailable: check_service_health reports that no workspace is healthy (all are no_models / not_ready). All queries must then go through list_databaseslist_tablesdescribe_tableexecute_query.

  • Case B — no matching metric: the semantic layer is healthy, but no metric in list_metrics matches the user's intent. Fall back to the raw-SQL path in the same way.

ToolParametersPurpose
list_databasespage_size, page_tokenLists all databases. Useful while exploring, but not a substitute for list_metrics.
list_tablesdatabase (required), like, page_size, page_tokenLists table names in a database. Returns table names only, no column information. Use describe_table to see columns.
describe_tabledatabase (required), table (required), detail_levelShows the physical table structure (column names, types). detail_level: names/summary/full.
execute_querysql (required), database, max_rowsRaw-SQL fallback. Read-only only (SELECT/SHOW/DESCRIBE/EXPLAIN). If the semantic layer has a matching metric, prefer query_metric.

Standard flow for every conversation:

  1. get_query_guide() — get the full guide (so the agent knows which tool to use when)

  2. check_service_health() — confirm VeloDB connectivity and see the available workspaces

  3. Routing decision:

If the user asks "how many", "calculate", "trend", or "ratio" → use the semantic layer (4-5-6)

  1. If the user asks "show the table structure" or "what databases are there" → use the metadata tools

  2. Semantic layer unavailable (no workspace is healthy) → skip 4-6 and go straight to 7

  3. list_metrics(workspace) — list metrics and match the user's intent

  4. list_dimensions_for_metric(workspace, metric_name) — get the available dimensions

  5. query_metric(workspace, metrics, group_by, ...) — run the query

  6. Fallback: if step 4 finds no matching metric, use list_databaseslist_tablesdescribe_tableexecute_query for the raw-SQL path

Hard rules:

  • Semantic layer healthy + a matching metric → never bypass query_metric to write SQL directly with execute_query.

  • Semantic layer healthy + no matching metric → first tell the user "the semantic layer does not cover this query", then use execute_query, and attach a warning to the result.

Conversation Example

User: What was the total order amount by channel last month?

AI reasoning:
1. get_query_guide → got the guide ✓
2. check_service_health → workspace "example" healthy, 5 metrics ✓
3. Involves "total amount" and "channel" → use the semantic layer
4. list_metrics(example) → total_amount (total order amount) ✓
5. list_dimensions_for_metric(example, total_amount) → channel ✓
6. query_metric(example, metrics=[total_amount], group_by=[channel],
where="order_date >= '2026-04-30'")

AI reply:
channel total_amount
WEB 1,096.00
APP 2,396.00
MINI 298.00

Query Capabilities

Once connected, you ask questions in plain language in Claude Code, and the AI calls the matching MCP tool automatically.

Metadata Discovery

What you wantJust sayTool called
See the databases"List all databases"list_databases
See the tables in a database"List the tables in the dw database"list_tables
See a table's structure"Show the columns of the orders table"describe_table

SQL Queries

What you wantJust sayTool called
Query data"Show the top 10 orders"execute_query
Aggregate analysis"Count orders by status"execute_query

execute_query allows read-only SQL only (SELECT / SHOW / DESCRIBE / EXPLAIN).

Semantic-Layer Queries (requires an administrator-configured semantic model)

If your VeloDB cluster has a semantic layer configured, the agent completes a query in three steps:

  1. list_metrics — list all available metrics (name + description)

  2. list_dimensions_for_metric — see the metric's available dimensions (name + type + description)

  3. query_metric — run the semantic query (supports having to filter aggregated results)

What you wantJust sayTool called
See the metrics"List all metrics"list_metrics
Check a revenue trend"Show revenue for the past 7 days, grouped by day"query_metric
See available dimensions"Which dimensions can total_amount be analyzed by?"list_dimensions_for_metric
Check service status"Check service health"check_service_health

Every semantic tool requires the workspace parameter ("example" is available by default).

Query Examples

Example 1: Explore the data structure

You: List all databases
AI: [calls list_databases]
dw

You: What tables are in the dw database?
AI: [calls list_tables]
orders, users, products, dim_date

You: Show the structure of the orders table
AI: [calls describe_table]
Column Type Comment
order_id BIGINT Order ID
user_id BIGINT User ID
amount DECIMAL Amount
channel VARCHAR Channel
status VARCHAR Status
order_date DATE Order date

Example 2: Query data

You: Show the order count and total amount by status for the past week
AI: [calls execute_query]
status count total_amount
completed 1234 567890.00
pending 567 89012.00
cancelled 89 12345.00

Example 3: Semantic metrics

You: List all available metrics
AI: [calls list_metrics]
total_amount Total order amount
order_count Order count
avg_amount Average order value
unique_users Distinct ordering users
user_count User count

You: Show total order amount by channel for the past 7 days
AI: [calls list_dimensions_for_metric] → channel
[calls query_metric]
channel total_amount
Online 12345.00
Offline 15678.00
...

FAQ

Q: The connection fails with a 401

Make sure the token format is correct: username:password (separated by an ASCII colon).

Q: A semantic query returns "workspace not found"

Make sure the workspace name is correct; "example" is available by default. Sign in to the WebUI to see the available workspaces.

Q: Can it perform writes?

No. execute_query allows read-only SQL only (SELECT / SHOW / DESCRIBE / EXPLAIN).

Q: Queries are slow

For large data volumes, ask the AI to add LIMIT or narrow the time range.

Q: Which metrics can the semantic layer use?

Ask the AI to "list all metrics" (it calls list_metrics), or sign in to the WebUI to see the metrics configured for each workspace.

Next Steps

  • Semantic Model — Administrators: define entities, dimensions, measures, and metrics in YAML.