VeloDB Cloud 26.x·Apache Doris 4.x (≤ 4.0 supported)·"Since X.Y" tags refer to Doris versionsversion mapping →

User Guide

This page is for VeloDB users. Your administrator has already deployed the VeloDB MCP Service for you; you only need to know:

How to connect the service to an AI agent (Claude Code / Cursor / Codex)
How to query data through an AI conversation

For defining and managing the semantic model (administrators), see Semantic Model.

Connect an AI Agent (Claude Code / Cursor / Codex)

Before you start, get the following:

MCP Connection URL: open the Connection page → Connection Methods → copy the MCP Connection URL.
VeloDB username and password: used for authentication (format username:password, e.g. admin:admin)
Workspace name: the workspace that holds the semantic model (default example)

Connect Claude Code

Step 1: Add the MCP Server

claude mcp add --transport http velodb https://<warehouse-id>.<region>.aws.velodb.io/mcp \
  --header "Authorization: Bearer admin:admin"

Step 2: Verify the connection

In Claude Code, type:

Check service health

The AI automatically calls check_service_health and returns the VeloDB connection status and the status of every workspace. If you see "velodb": "connected", the connection succeeded.

Step 3: Start querying

List all available metrics

The AI follows a three-step flow: list_metrics → list_dimensions_for_metric → query_metric.

Connect Cursor / VS Code

Create a .cursor/mcp.json in your project root, or the equivalent MCP configuration for VS Code:

{
  "mcpServers": {
    "velodb": {
      "url": "https://<warehouse-id>.<region>.aws.velodb.io/mcp",
      "transport": "streamable-http",
      "headers": {
        "Authorization": "Bearer admin:admin"
      }
    }
  }
}

Available Tools (10 total)

Startup & health check (call these first when a conversation begins)

Tool	Parameters	Purpose
`get_query_guide`	—	Call this first. Returns the complete workflow guide: when to use the semantic layer vs. raw SQL, the order of tool calls, and query syntax. Skipping it makes the agent take detours.
`check_service_health`	detail (optional)	Returns VeloDB connectivity plus the status of each workspace (healthy / no_models / not_ready) and its metric count. Call it at the start of every conversation.

Semantic queries (preferred — generates correct SQL automatically)

Tool	Parameters	Purpose
`list_metrics`	workspace (required), page_size, page_token	Lists all available metrics (name + description). The agent matches a metric to the user's intent.
`list_dimensions_for_metric`	workspace (required), metric_name (required)	Returns the metric's available group_by dimensions (name + type + description), so group_by does not fail.
`query_metric`	workspace (required), metrics (required), group_by, where, order_by, limit, having, database, max_rows	The core query tool. Compiles and executes a semantic query. Anything involving counts, sums, ratios, ranking, or trends must go through here. Supports HAVING to filter aggregated results. `database` sets the target database; `max_rows` hard-caps the returned rows (0 = default 10000).
`reload_semantic_layer`	workspace (required)	Manually triggers a semantic-layer reload. Use it when an administrator has changed a YAML model and committed it and you need it to take effect immediately; otherwise the layer polls every 60s.

Low-level metadata & raw SQL (use only in the two cases below)

Two raw-SQL cases:

Case A — semantic layer unavailable: check_service_health reports that no workspace is healthy (all are no_models / not_ready). All queries must then go through list_databases → list_tables → describe_table → execute_query.
Case B — no matching metric: the semantic layer is healthy, but no metric in list_metrics matches the user's intent. Fall back to the raw-SQL path in the same way.

Tool	Parameters	Purpose
`list_databases`	page_size, page_token	Lists all databases. Useful while exploring, but not a substitute for `list_metrics`.
`list_tables`	database (required), like, page_size, page_token	Lists table names in a database. Returns table names only, no column information. Use `describe_table` to see columns.
`describe_table`	database (required), table (required), detail_level	Shows the physical table structure (column names, types). detail_level: names/summary/full.
`execute_query`	sql (required), database, max_rows	Raw-SQL fallback. Read-only only (SELECT/SHOW/DESCRIBE/EXPLAIN). If the semantic layer has a matching metric, prefer `query_metric`.

Recommended Agent Workflow

Standard flow for every conversation:

get_query_guide() — get the full guide (so the agent knows which tool to use when)
check_service_health() — confirm VeloDB connectivity and see the available workspaces
Routing decision:

If the user asks "how many", "calculate", "trend", or "ratio" → use the semantic layer (4-5-6)

If the user asks "show the table structure" or "what databases are there" → use the metadata tools
Semantic layer unavailable (no workspace is healthy) → skip 4-6 and go straight to 7
list_metrics(workspace) — list metrics and match the user's intent
list_dimensions_for_metric(workspace, metric_name) — get the available dimensions
query_metric(workspace, metrics, group_by, ...) — run the query
Fallback: if step 4 finds no matching metric, use list_databases → list_tables → describe_table → execute_query for the raw-SQL path

Hard rules:

Semantic layer healthy + a matching metric → never bypass query_metric to write SQL directly with execute_query.
Semantic layer healthy + no matching metric → first tell the user "the semantic layer does not cover this query", then use execute_query, and attach a warning to the result.

Conversation Example

User: What was the total order amount by channel last month?

AI reasoning:
  1. get_query_guide → got the guide ✓
  2. check_service_health → workspace "example" healthy, 5 metrics ✓
  3. Involves "total amount" and "channel" → use the semantic layer
  4. list_metrics(example) → total_amount (total order amount) ✓
  5. list_dimensions_for_metric(example, total_amount) → channel ✓
  6. query_metric(example, metrics=[total_amount], group_by=[channel],
                  where="order_date >= '2026-04-30'")

AI reply:
  channel  total_amount
  WEB      1,096.00
  APP      2,396.00
  MINI       298.00

Query Capabilities

Once connected, you ask questions in plain language in Claude Code, and the AI calls the matching MCP tool automatically.

Metadata Discovery

What you want	Just say	Tool called
See the databases	"List all databases"	`list_databases`
See the tables in a database	"List the tables in the dw database"	`list_tables`
See a table's structure	"Show the columns of the orders table"	`describe_table`

SQL Queries

What you want	Just say	Tool called
Query data	"Show the top 10 orders"	`execute_query`
Aggregate analysis	"Count orders by status"	`execute_query`

execute_query allows read-only SQL only (SELECT / SHOW / DESCRIBE / EXPLAIN).

Semantic-Layer Queries (requires an administrator-configured semantic model)

If your VeloDB cluster has a semantic layer configured, the agent completes a query in three steps:

list_metrics — list all available metrics (name + description)
list_dimensions_for_metric — see the metric's available dimensions (name + type + description)
query_metric — run the semantic query (supports having to filter aggregated results)

What you want	Just say	Tool called
See the metrics	"List all metrics"	`list_metrics`
Check a revenue trend	"Show revenue for the past 7 days, grouped by day"	`query_metric`
See available dimensions	"Which dimensions can total_amount be analyzed by?"	`list_dimensions_for_metric`
Check service status	"Check service health"	`check_service_health`

Every semantic tool requires the workspace parameter ("example" is available by default).

Query Examples

Example 1: Explore the data structure

You: List all databases
AI: [calls list_databases]
    dw

You: What tables are in the dw database?
AI: [calls list_tables]
    orders, users, products, dim_date

You: Show the structure of the orders table
AI: [calls describe_table]
    Column      Type      Comment
    order_id    BIGINT    Order ID
    user_id     BIGINT    User ID
    amount      DECIMAL   Amount
    channel     VARCHAR   Channel
    status      VARCHAR   Status
    order_date  DATE      Order date

Example 2: Query data

You: Show the order count and total amount by status for the past week
AI: [calls execute_query]
    status      count   total_amount
    completed   1234    567890.00
    pending     567     89012.00
    cancelled   89      12345.00

Example 3: Semantic metrics

You: List all available metrics
AI: [calls list_metrics]
    total_amount      Total order amount
    order_count       Order count
    avg_amount        Average order value
    unique_users      Distinct ordering users
    user_count        User count

You: Show total order amount by channel for the past 7 days
AI: [calls list_dimensions_for_metric] → channel
    [calls query_metric]
    channel   total_amount
    Online    12345.00
    Offline   15678.00
    ...

FAQ

Q: The connection fails with a 401

Make sure the token format is correct: username:password (separated by an ASCII colon).

Q: A semantic query returns "workspace not found"

Make sure the workspace name is correct; "example" is available by default. Sign in to the WebUI to see the available workspaces.

Q: Can it perform writes?

No. execute_query allows read-only SQL only (SELECT / SHOW / DESCRIBE / EXPLAIN).

Q: Queries are slow

For large data volumes, ask the AI to add LIMIT or narrow the time range.

Q: Which metrics can the semantic layer use?

Ask the AI to "list all metrics" (it calls list_metrics), or sign in to the WebUI to see the metrics configured for each workspace.

Next Steps

Semantic Model — Administrators: define entities, dimensions, measures, and metrics in YAML.

Connect an AI Agent (Claude Code / Cursor / Codex)​

Connect Claude Code​

Connect Cursor / VS Code​

Available Tools (10 total)​

Recommended Agent Workflow​

Conversation Example​

Query Capabilities​

Metadata Discovery​

SQL Queries​

Semantic-Layer Queries (requires an administrator-configured semantic model)​

Query Examples​

Example 1: Explore the data structure​

Example 2: Query data​

Example 3: Semantic metrics​

FAQ​

Q: The connection fails with a 401​

Q: A semantic query returns "workspace not found"​

Q: Can it perform writes?​

Q: Queries are slow​

Q: Which metrics can the semantic layer use?​

Next Steps​