VeloDB Cloud 26.x·Apache Doris 4.x (≤ 4.0 supported)·"Since X.Y" tags refer to Doris versionsversion mapping →

Semantic Model

The VeloDB MCP Service (based on the Apache Doris MCP Server) lets an AI agent query data in VeloDB Cloud directly through the MCP protocol. It offers two query paths:

Semantic layer: define business metrics in YAML ahead of time, and the agent queries them in natural language while the system generates correct SQL (recommended).
Raw SQL: fall back to read-only SQL queries when no metric matches.

This page covers authoring and managing the semantic model; for connecting and querying as an end user, see the User Guide.

What Is a Semantic Model?

A semantic model is a business description of a database table. You tell the system:

What the table's primary key is (an entity)
Which fields can be used for grouping (dimensions)
Which fields need aggregation (measures)

Once defined, users can run natural-language-level queries ("show me the total order amount per month"), and the system generates the correct SQL automatically.

In one sentence

YAML is the spec and MetricFlow is the translator: it turns "total order amount per month" into SELECT DATE_TRUNC('month', order_date), SUM(amount) FROM orders GROUP BY 1.

Quick Start

Step 1: Write a YAML file

Each YAML file describes one table. Put it under the models/ directory:

# models/orders.yaml
---
semantic_model:
  name: orders                         # model name
  db_table: dw.orders                    # the corresponding VeloDB table
  defaults:
    agg_time_dimension: order_date    # default time dimension

  entities:
    - name: order_id                     # primary key
      type: primary
      expr: order_id

  dimensions:
    - name: order_date                   # time dimension
      type: time
      type_params:
        time_granularity: day
    - name: channel                      # categorical dimension
      type: categorical

  measures:
    - name: total_amount                 # total order amount
      agg: sum
      expr: amount
    - name: order_count                  # order count
      agg: count
      expr: order_id

Step 2: Define the time configuration

You need a project.yaml that specifies the calendar table (used for period-over-period and year-over-year time metrics):

# models/project.yaml
---
time_config:
  calendar:
    - table: dw.dim_date
      column: date_id
      grain: day

Step 3: Validate & commit

Click Validate to check that the model is correct
Once it passes, click Commit to make the model take effect
Use list_metrics to see available metrics, and query_metric to query data

Validation checks automatically

Whether the table exists, whether column names are correct, and whether entities and dimensions are complete. If something is wrong, it returns the specific error and its location.

Semantic Model Structure

A complete semantic_model contains the following fields:

Field	Type	Required	Description
`name`	string	✅	Globally unique model name. Starts with a lowercase letter; may contain digits and underscores
`db_table`	string	✅	The VeloDB physical table, in `database.table` format, e.g. `dw.orders`
`defaults`	object	✅	Default configuration. Must currently contain `agg_time_dimension`
`entities`	list	✅	The table's entity definitions. At least one `type: primary` main entity
`dimensions`	list	✅	Dimensions for grouping and filtering
`measures`	list	Recommended	Aggregation definitions. Once defined, they automatically become queryable metrics
`description`	string	Optional	Text description of the model
`label`	string	Optional	Display name
`primary_entity`	string	Conditional	Required when `entities` has no `type: primary` entity

Naming rules

Every name (model, entity, dimension, measure) must:

Start with a lowercase letter
Contain only lowercase letters, digits, and underscores
Not contain two consecutive underscores __
Be at least 2 characters

✅ order_id, total_amount, user_count

❌ OrderID, order__id, a

Entities — the Table's Identity

Entities define uniqueness and relationships between rows. Every table must have one main entity.

Field	Type	Required	Description
`name`	string	✅	Entity name, unique within this model
`type`	enum	✅	Entity type (see the table below)
`expr`	string	Recommended	The corresponding database column. Can be omitted (defaults to `name`); SQL expressions are also supported
`description`	string	Optional	Text description
`label`	string	Optional	Display name

Entity Types

Type	Meaning	When to use
`primary`	Primary key. Unique per row, covers all records	The table's ID column. Each table must have exactly one primary entity
`foreign`	Foreign key. May have duplicates and nulls	A column that joins to another table, such as `customer_id` or `product_id`
`unique`	Unique key. Unique per row, may not cover all records	E.g. an email or ID number
`natural`	Natural key. A real-world unique identifier	E.g. a product barcode or employee number

Example: entity definitions for the orders table

entities:
  - name: order_id               # primary key: the unique ID of each order
    type: primary
    expr: order_id

  - name: customer               # foreign key: joins to the users table
    type: foreign
    expr: user_id

  - name: order_ref              # foreign key + SQL expression
    type: foreign
    expr: substring(trace_id FROM 1 FOR 8)

Note

If the table has no type: primary entity, use primary_entity: entity_name at the top level of the model.

Dimensions — How Data Is Grouped

Dimensions define how data is grouped and filtered. There are two types: time dimensions and categorical dimensions.

Field	Type	Required	Description
`name`	string	✅	Dimension name
`type`	enum	✅	`time` or `categorical`
`type_params`	object	Required for time	Time-granularity configuration (see below)
`expr`	string	Recommended	The corresponding column or a SQL expression
`is_partition`	bool	Optional	Whether it is a partition column. Default `false`
`description`	string	Optional	Text description
`label`	string	Optional	Display name

Time Granularity (`time_granularity`)

Granularity	Meaning	Example
`day`	By day	2025-01-15
`week`	By week	2025-W03
`month`	By month	2025-01
`quarter`	By quarter	2025-Q1
`year`	By year	2025
`hour`	By hour	2025-01-15 14:00
`minute`	By minute	2025-01-15 14:30

Example

dimensions:
  - name: order_date               # order date (by day)
    type: time
    type_params:
      time_granularity: day
    expr: order_date

  - name: order_month              # order month (by month)
    type: time
    type_params:
      time_granularity: month
    expr: order_date               # same column, different granularity

  - name: channel                   # channel (categorical)
    type: categorical
    expr: channel

  - name: status_label             # with a SQL expression
    type: categorical
    expr: concat(status, '_', channel)

Measures — Aggregation

Measures define aggregations over data columns. Each measure automatically generates a queryable metric of the same name.

Field	Type	Required	Description
`name`	string	✅	Measure name (also becomes the metric name)
`agg`	enum	✅	Aggregation type (see the table below)
`expr`	string	Recommended	The column or SQL expression to aggregate
`description`	string	Optional	Metric description
`label`	string	Optional	Display name
`create_metric`	bool	Optional	Set to `false` to not auto-generate a metric. Default `true`
`agg_time_dimension`	string	Optional	Overrides the model's default time dimension

Aggregation Types

Type	Meaning	Typical use
`sum`	Sum	Total amount, total count
`count`	Count	Order count, user count
`count_distinct`	Distinct count	Distinct users, active devices
`average`	Average	Average amount, average duration
`min`	Minimum	Lowest price, earliest time
`max`	Maximum	Highest price, latest time
`median`	Median	Median order amount
`percentile`	Percentile	P99 latency, P95 amount (requires `agg_params.percentile`)
`sum_boolean`	Boolean sum	Conversion count, pass count

Example

measures:
  - name: total_amount            # total order amount
    description: "Sum of all order amounts"
    agg: sum
    expr: amount
    label: "Total Amount"

  - name: order_count             # order count
    agg: count
    expr: order_id

  - name: unique_customers        # distinct customers
    description: "Number of distinct customers who placed an order"
    agg: count_distinct
    expr: user_id

  - name: p99_amount             # P99 order amount
    agg: percentile
    expr: amount
    agg_params:
      percentile: 0.99

  - name: internal_counter       # not exposed as a metric
    agg: sum
    expr: raw_value
    create_metric: false

Full Example

Here is a complete semantic-model definition for an e-commerce scenario — the orders table:

# models/orders.yaml — orders fact table
---
semantic_model:
  name: orders
  description: "E-commerce order fact table; each row is one order"
  db_table: dw.orders
  defaults:
    agg_time_dimension: order_date

  # ── Entities ──
  entities:
    - name: order_id
      description: "Order primary key"
      type: primary
      expr: order_id

    - name: customer
      description: "Associated user"
      type: foreign
      expr: user_id

    - name: product
      description: "Associated product"
      type: foreign
      expr: product_id

  # ── Dimensions ──
  dimensions:
    - name: order_date
      description: "Order date (by day)"
      type: time
      type_params:
        time_granularity: day
      expr: order_date

    - name: order_month
      description: "Order month"
      type: time
      type_params:
        time_granularity: month
      expr: order_date

    - name: channel
      description: "Order channel"
      type: categorical
      expr: channel

    - name: status
      description: "Order status"
      type: categorical
      expr: status

  # ── Measures ──
  measures:
    - name: total_amount
      description: "Total order amount"
      label: "Total Amount"
      agg: sum
      expr: amount

    - name: order_count
      description: "Total order count"
      label: "Order Count"
      agg: count
      expr: order_id

    - name: unique_customers
      description: "Distinct customers who placed an order"
      label: "Distinct Customers"
      agg: count_distinct
      expr: user_id

    - name: avg_amount
      description: "Average order amount"
      label: "Average Order Value"
      agg: average
      expr: amount

Companion: the users and products tables

# models/users.yaml — users dimension table
---
semantic_model:
  name: users
  description: "User dimension table"
  db_table: dw.users
  defaults:
    agg_time_dimension: register_date

  entities:
    - name: user_id
      type: primary
      expr: user_id

  dimensions:
    - name: register_date
      type: time
      type_params:
        time_granularity: day
    - name: city
      type: categorical
    - name: level
      type: categorical

  measures:
    - name: user_count
      agg: count
      expr: user_id

# models/products.yaml — products dimension table (pure dimension table, no measures, so defaults is omitted)
---
semantic_model:
  name: products
  description: "Product dimension table"
  db_table: dw.products

  entities:
    - name: product
      type: primary
      expr: product_id

  dimensions:
    - name: product_name
      type: categorical
      expr: name
    - name: category
      type: categorical
      expr: category
    - name: brand
      type: categorical
      expr: brand

Advanced Metric Definitions

Beyond the simple metrics auto-generated from measures, you can define advanced metrics with a metric: document. Advanced metrics combine existing measures or metrics to implement more complex logic. Four types are supported:

Type	Meaning	Typical scenario
`ratio`	Ratio metric: numerator ÷ denominator	Conversion rate, profit margin, share
`derived`	Derived metric: an expression over existing metrics	Period-over-period growth, year-over-year change, weighted calculations
`cumulative`	Cumulative metric: accumulates over a time window	Sales over the last 7 days, monthly cumulative registrations
`conversion`	Conversion metric: conversion analysis between two events	Order conversion rate, registration conversion rate

Ratio Metrics

Compute the ratio of two metrics, e.g. orders per user = order count / user count.

# models/orders_per_user.yaml
---
metric:
  name: orders_per_user
  description: "Orders per user: order count / user count"
  type: ratio
  type_params:
    numerator: order_count       # numerator: references an existing metric
    denominator: user_count      # denominator: references an existing metric

Both the numerator and the denominator must be already-defined metric names (either simple metrics or other advanced metrics).

Derived Metrics

Computed from one or more existing metrics via an expression. Most often used for period-over-period and year-over-year calculations.

# models/revenue_growth.yaml — period-over-period growth
---
metric:
  name: revenue_growth
  description: "Revenue period-over-period growth rate"
  type: derived
  type_params:
    expr: (current_revenue - prev_revenue) / prev_revenue
    metrics:
      - name: total_amount         # current-period revenue
        alias: current_revenue
      - name: total_amount         # prior-period revenue (same metric, with an offset)
        alias: prev_revenue
        offset_window: 1 month     # shift back by one time window

Parameter	Description
`expr`	The calculation expression; reference each input metric by its `alias`
`metrics`	The list of input metrics
`name`	The referenced metric name
`alias`	The alias used in `expr`
`offset_window`	Time offset, e.g. `1 month`, `7 days`, `1 year`
`offset_to_grain`	The granularity to offset to, e.g. `month`, `year`

Cumulative Metrics

Accumulate a metric over a time window, e.g. "sales over the last 7 days".

# models/weekly_sales.yaml
---
metric:
  name: weekly_sales
  description: "Sales over the last 7 days"
  type: cumulative
  type_params:
    measure:
      name: total_amount
    window: 7 days          # time window: the past 7 days

Parameter	Description
`measure`	The referenced measure name (from a `semantic_model`'s measures)
`window`	Time-window format `number granularity`, e.g. `28 days`, `4 weeks`, `3 months`
`grain_to_date`	Optional. Accumulate to a given grain, e.g. `month` (month-to-date), `year` (year-to-date)

Conversion Metrics

Measure the rate at which users convert from one event (base) to another (conversion). Commonly used to analyze user funnels.

# models/order_conversion.yaml
---
metric:
  name: register_to_order_conversion
  description: "Registration-to-order conversion rate"
  type: conversion
  type_params:
    conversion_type_params:
      base_measure:                 # base event (registration)
        name: user_count
      conversion_measure:          # conversion event (order)
        name: order_count
      entity: user               # join entity: the dimension to compute conversion by
      calculation: conversion_rate  # calculation method

Parameter	Description
`base_measure`	The base event's measure name
`conversion_measure`	The conversion event's measure name
`entity`	The join entity to compute conversion by (usually user or session)
`calculation`	`conversion_rate` (the rate) or `conversions` (absolute count)
`window`	Optional. Conversion window, e.g. `7 days`

Common Scenarios

Scenario 1: The same column as dimensions at different granularities

A single date column can define day, week, and month dimensions at once:

dimensions:
  - name: order_date
    type: time
    type_params:
      time_granularity: day
    expr: order_date

  - name: order_week
    type: time
    type_params:
      time_granularity: week
    expr: order_date        # same column!

  - name: order_month
    type: time
    type_params:
      time_granularity: month
    expr: order_date        # same column!

Scenario 2: Using SQL expressions

When column names are not intuitive, or you need computed fields, use a SQL expression:

dimensions:
  - name: user_label
    type: categorical
    expr: concat(level, '_', city)  # concatenated field

entities:
  - name: user_short_id
    type: foreign
    expr: substring(trace_id FROM 1 FOR 8)  # substring

measures:
  - name: net_amount
    agg: sum
    expr: coalesce(amount, 0) - coalesce(discount, 0)  # computed field

Supported

Standard SQL functions such as substring, concat, coalesce, and cast. These expressions are recognized during physical validation and skip column-name checks.

Scenario 3: Partitioned tables

If the table has a partition column, mark it with is_partition: true:

dimensions:
  - name: ds
    type: time
    type_params:
      time_granularity: day
    is_partition: true              # mark as the partition column
    expr: ds

Scenario 4: Hiding internal measures

Some measures are only intermediate calculations and should not be exposed to end users; set create_metric: false:

measures:
  - name: total_amount            # ✅ public metric
    agg: sum
    expr: amount

  - name: _raw_count              # ❌ not exposed
    agg: count
    expr: order_id
    create_metric: false

Advanced Features

Filters

You can add SQL filter conditions in a measure definition or a metric definition. Filters are applied before aggregation.

Where filters can go

The filter: field of a measures entry — limits the data range of a single measure
The filter: field of a metric: — limits the data range of the whole metric
A measure's input_measures[].filter: — limits a measure referenced by an advanced metric

# Example 1: measure-level filter — only the total amount of "completed" orders
measures:
  - name: completed_amount
    description: "Sum of completed order amounts"
    agg: sum
    expr: amount
    filter: {{ render_dimension_template('status') }} = 'completed'

# Example 2: metric-level filter
---
metric:
  name: premium_user_orders
  description: "Order count for premium users"
  type: simple
  type_params:
    measure:
      name: order_count
      filter: {{ render_dimension_template('user_level') }} = 'premium'

Filter syntax

In YAML, reference a dimension with {{ Dimension('qualified_name') }} or {{ render_dimension_template('dimension_name') }}
Reference an entity with {{ Entity('entity_name') }} or {{ render_entity_template('entity_name') }}
Follow it with a normal SQL condition, e.g. = 'value', IN ('a', 'b')
The where parameter of query_metric accepts raw SQL directly (e.g. "channel = 'APP'"); the compiler converts it to MetricFlow template syntax automatically

Saved Queries

Save a frequently used combination of metrics + grouping + filters as a query template that users can call directly.

# models/weekly_report.yaml
---
saved_query:
  name: weekly_revenue_report
  description: "Weekly revenue report: total order amount and order count grouped by channel"
  label: "Weekly Revenue Report"
  query_params:
    metrics:
      - total_amount
      - order_count
    group_by:
      - order_id__order_week      # group by week
      - order_id__channel         # group by channel
    order_by:
      - "-order_id__order_week"  # descending by week
    limit: 52

Field	Description
`metrics`	The list of metric names to query
`group_by`	Grouping dimensions, in `entity_name__dimension_name` format (joined by a double underscore)
`order_by`	Sorting; a `-` prefix means descending
`where`	Filter condition (same syntax as filters)
`limit`	Maximum number of rows

Dimension reference format

entity_name__dimension_name (double underscore). For example, order_id__order_date is the order_date dimension of the orders table.

Non-additive Measures & Slowly Changing Dimensions (SCD Type II)

Some measures cannot simply be summed (such as inventory or account balance) and need a snapshot value along a specific dimension.

# Non-additive measure: inventory (month-end snapshot)
measures:
  - name: monthly_inventory
    description: "Month-end inventory"
    agg: sum
    expr: inventory_count
    non_additive_dimension:
      name: snapshot_date       # the non-additive dimension
      window_choice: max        # take the maximum within the time window
      window_groupings:
        - product                # snapshot grouped by product

SCD Type II (slowly changing dimensions): when a dimension table has validity time ranges, mark the start and end dimensions:

# Mark SCD Type II in the dimension table
dimensions:
  - name: valid_from
    description: "Validity start time"
    type: time
    type_params:
      time_granularity: day
      validity_params:
        is_start: true              # mark as the start time

  - name: valid_to
    description: "Validity end time"
    type: time
    type_params:
      time_granularity: day
      validity_params:
        is_end: true                # mark as the end time

Null Filling & Timeline Alignment

# Replace NULL with 0 (useful so count-type metrics show 0 on dates with no data)
metric:
  name: daily_orders
  type: simple
  type_params:
    measure:
      name: order_count
    fill_nulls_with: 0         # show 0 on dates with no data
    join_to_timespine: true   # align to the timeline (fill in missing dates)

Parameter	Description
`fill_nulls_with`	Replace NULL in the aggregated result with a given value (usually 0)
`join_to_timespine`	Join the metric result with the timeline table so every day/month/year has a row (missing dates filled with NULL or 0)

Native Table Reference Format

Besides the db_table shorthand, MetricFlow's native node_relation format is also supported, including three-part catalog references:

# Two-part: database.table
node_relation:
  schema_name: dw
  alias: orders

# Three-part: catalog.database.table
node_relation:
  database: catalog
  schema_name: dw
  alias: orders

# db_table also supports the three-part form
db_table: catalog.dw.orders

Aggregation Mode for Cumulative Metrics

Cumulative metrics support three period_agg modes that control aggregation within the time window:

# last: take the value of the last day in the window (default behavior)
metric:
  name: end_of_week_inventory
  type: cumulative
  type_params:
    cumulative_type_params:
      measure:
        name: inventory_count
      window: 7 days
      period_agg: last       # take the last day's value

# average: daily average within the window
      period_agg: average    # 7-day average

# first: the value of the first day in the window
      period_agg: first      # take the first day's value

`period_agg`	Description
`last`	The last day's value in the window (default)
`average`	The daily average within the window
`first`	The first day's value in the window

Constant Properties for Conversion Metrics

In a conversion metric, use constant_properties to specify properties that must stay constant between the two events:

# Analyze conversion by "traffic source", requiring the source to be the same across both events
metric:
  name: register_to_order_by_source
  type: conversion
  type_params:
    conversion_type_params:
      base_measure:
        name: user_count
      conversion_measure:
        name: order_count
      entity: user
      constant_properties:
        - base_property: order_id__channel    # the base event's property
          conversion_property: order_id__channel # the conversion event's property (must match)

Metric Time Granularity & Offset Grain

Metric-level time_granularity: you can set a time granularity directly in the metric definition (overriding the query-time default):

metric:
  name: monthly_revenue
  type: simple
  time_granularity: month     # this metric defaults to monthly aggregation
  type_params:
    measure:
      name: total_amount

offset_to_grain: in a derived metric, align the offset to a given grain (rather than the default day level):

metric:
  name: yoy_growth
  type: derived
  type_params:
    expr: (current - prev) / prev
    metrics:
      - name: total_amount
        alias: current
      - name: total_amount
        alias: prev
        offset_window: 1 year
        offset_to_grain: month    # offset to month grain (rather than day)

Entity Roles

The same entity (such as user_id) may play multiple roles in a table. For example, user_id in the orders table is both the "buyer" and the "referrer". Use role to distinguish them:

entities:
  - name: user
    type: foreign
    expr: buyer_id
    role: buyer              # this user entity's role is "buyer"

  - name: user
    type: foreign
    expr: referrer_id
    role: referrer           # this user entity's role is "referrer"

When role is not specified, the default role equals the entity name.

Best Practices

One file per table. Name files table_name.yaml for a clear, maintainable structure.

✅ orders.yaml, users.yaml, products.yaml
Define entities first, then dimensions, then measures. Entities are the skeleton of the semantic model, dimensions provide grouping, and measures are the query targets. Writing them in this order helps avoid omissions.
Write a description for every measure. End users see the metric name and description. A good description lets them understand a metric without reading the YAML.
The same time column can define multiple granularity dimensions. For example, the order_date column can have day, week, month, quarter, and year granularities at once.
Use meaningful business names for foreign-key entities. The entity name customer is easier to understand than user_id — it represents the concept of "customer", not just "the user_id column".
Validate before you commit. After each YAML change, click Validate. The system checks table existence, column-name correctness, and naming rules. Commit only after it passes.
Measure names must be unique within a model. Measure names can repeat across models, but that causes metric overrides — so it is best to keep them globally unique.

Common Errors & Troubleshooting

Error message	Cause	Fix
`Table xxx does not exist`	The table that `db_table` points to does not exist	Check the table-name spelling; confirm both database and table names are correct
`measure references missing column: dw.orders.xxx`	A column referenced by a measure does not exist in the table	Check that `expr` matches the actual column name (mind the case)
`entity references missing column`	A column referenced by an entity's `expr` does not exist	Same as above; if it is a SQL expression (e.g. `substring(...)`), confirm the function and column names
`Duplicate measure 'xxx' defined in 2 models`	Two models define a measure with the same name	Rename one of the measures to keep it globally unique
`Duplicate semantic_model name`	Two files use the same model name	Rename one of the models
`Did not find exactly one project configuration`	Missing `project.yaml`	Create a `project.yaml` that contains `time_config`
`'xxx' does not match '^(?!.*__)...$'`	The name violates the naming rules	Start with a lowercase letter, remove double underscores, and make it at least 2 characters
`No staging changes to validate`	There are no changes to validate	Upload or edit a YAML file first, then click Validate

Confirm the table exists in VeloDB: DESCRIBE dw.orders
Confirm column names match exactly, including case
Check that the YAML indentation is correct (use spaces, not tabs)
Confirm every name follows the lowercase-letter-first rule

Managing the Semantic Model in the VeloDB MCP Service

Workspace

A workspace is an isolated container for semantic models. Each workspace has its own:

Model files (YAML definitions)
Metric list (list_metrics only sees this workspace's metrics)
Query-engine instance (the MetricFlow compiler)
VeloDB connection pool

Workspace A's metrics are completely invisible to workspace B. This suits splitting by team, project, or environment.

Naming rule

A workspace name must start with a letter and contain only letters, digits, and underscores. For example, marketing or finance_v2.

The Three Workspace States

Call check_service_health to see each workspace's runtime state. A workspace is always in one of three states:

State	Icon	Meaning	Trigger
healthy	🟢	Running normally. The semantic model loaded successfully and metrics are queryable.	YAML files are committed, bootstrap parsing succeeded, and the MetricFlow engine is ready.
no_models	⚪	Empty workspace. No YAML files uploaded yet.	A newly created workspace, or all files were deleted.
not_ready	🔴	Load failed. Files exist but cannot compile into a semantic manifest.	YAML syntax error, missing table, missing project.yaml, MetricFlow validation failure, etc. Check the `validate` error message.

  ┌──────────────┐    upload YAML   ┌──────────────┐    commit OK      ┌──────────────┐
  │  no_models   │ ──────────────→ │  not_ready   │ ───────────────→ │   healthy    │
  │  (empty)     │                 │  (to fix)    │                   │  (running)   │
  └──────────────┘                 └──────────────┘                   └──────────────┘
                                         ↑                                  │
                                         │      upload a bad YAML            │
                                         └──────────────────────────────────┘

Version tracking

After each reload, the system records the version number, metric count, and whether loading succeeded. Use the metric_count returned by check_service_health to confirm the number of currently loaded metrics.

Two-Layer Storage: Active Store and Staging Store

Each workspace has two storage layers internally:

Storage layer	Table	Purpose
Active Store (live)	`active_store_{workspace}`	Committed and loaded models. The query engine uses this version. Files are read-only.
Staging Store (pending)	`staging_store_{workspace}`	Pending changes uploaded or edited by users. Add/modify/delete allowed. Promoted to Active after validation passes.

   Staging Store               Active Store
  ┌─────────────┐   commit    ┌─────────────┐
  │ edit / add  │ ─────────→ │  query use  │
  │ validate ✓  │            │  read-only  │
  │ discard  ✗  │            │             │
  └─────────────┘            └─────────────┘
       ↑                          ↑
  user edits                query engine

Update Flow

Updating a semantic model follows an edit → validate → commit → auto-reload flow:

Step	Action	Description
1	Upload/edit YAML	Files enter the Staging Store (without affecting running queries)
2	Validate	Required! The system checks table existence, column correctness, YAML syntax, MetricFlow semantic rules, naming rules, and cross-model duplicate names
3	Commit	Staging → Active. The system triggers an engine reload automatically (no manual restart)
4	⟳ Auto-reload	A background thread parses the new models, compiles the JSON manifest, and updates tool routing. Usually completes in 2-5 seconds

Validate before commit

A Staging that has not passed Validate is rejected on Commit, returning "Staging must be validated before commit".

Permission Model

Only admin users can manage semantic models:

Action	admin	Regular user
View semantic models	✅	✅
Query metrics (`query_metric`)	✅	✅
List metrics (`list_metrics`)	✅	✅
Upload/edit/delete YAML	✅	❌
Validate / Commit / Discard	✅	❌
Create/delete workspaces	✅	❌
Run SQL (`execute_query`)	✅	✅ (restricted)

Default Example Workspace

On first startup, the system automatically creates the example workspace with complete seed data:

Content	Description
`dw.orders`	Orders table (12 mock rows): order_id, user_id, product_id, amount, channel, status, order_date
`dw.users`	Users table (5 mock rows): user_id, name, city, level, register_date
`dw.products`	Products table (5 mock rows): product_id, name, category, brand, price
`dw.dim_date`	Date dimension (calendar) table, used for timeline alignment and cumulative calculations
Semantic-model YAML	orders.yaml, users.yaml, products.yaml + project.yaml (auto-injected into active_store_example)

Example metrics include total_amount (total order amount), order_count (order count), avg_amount (average order value), and unique_users (distinct users) from the orders model, and user_count (user count) from the users model.

Set seed_example: false in the configuration to disable automatic seeding.

WebUI Management Interface

WebUI (Browser Interface)

Visit http://<server-address>:<port>/mcp/web and sign in to use the graphical management interface:

Page	URL	Function
Sign in	`/mcp/web/login`	Sign in with a VeloDB account (admin:admin is the default administrator)
Home	`/mcp/web`	Workspace list; switch, create, or delete workspaces
Model management	`/mcp/web/models?workspace=xxx`	View the Active and Staging file lists; edit, upload, delete
File editor	`/mcp/web/{filename}?workspace=xxx`	Edit YAML files online with syntax highlighting

Key action buttons (admin only):

Button	Action
Validate	Validate all Staging changes (physical + semantic validation + duplicate detection)
Commit	Promote validated Staging to Active and trigger an engine reload
Discard	Discard all Staging changes and revert to the Active version
Reload	Manually trigger an engine reload (usually automatic after Commit)
+ New	Create a new YAML file

Next Steps

User Guide — Connect the VeloDB MCP Service to an AI agent and query data in natural language.

What Is a Semantic Model?​

Quick Start​

Step 1: Write a YAML file​

Step 2: Define the time configuration​

Step 3: Validate & commit​

Semantic Model Structure​

Entities — the Table's Identity​

Entity Types​

Example: entity definitions for the orders table​

Dimensions — How Data Is Grouped​

Time Granularity (time_granularity)​

Example​

Measures — Aggregation​

Aggregation Types​

Example​

Full Example​

Companion: the users and products tables​

Advanced Metric Definitions​

Ratio Metrics​

Derived Metrics​

Cumulative Metrics​

Conversion Metrics​

Common Scenarios​

Scenario 1: The same column as dimensions at different granularities​

Scenario 2: Using SQL expressions​

Scenario 3: Partitioned tables​

Scenario 4: Hiding internal measures​

Advanced Features​

Filters​

Saved Queries​

Non-additive Measures & Slowly Changing Dimensions (SCD Type II)​

Null Filling & Timeline Alignment​

Native Table Reference Format​

Aggregation Mode for Cumulative Metrics​

Constant Properties for Conversion Metrics​

Metric Time Granularity & Offset Grain​

Entity Roles​

Best Practices​

Common Errors & Troubleshooting​

Managing the Semantic Model in the VeloDB MCP Service​

Workspace​

The Three Workspace States​

Two-Layer Storage: Active Store and Staging Store​

Update Flow​

Permission Model​

Default Example Workspace​

WebUI Management Interface​

WebUI (Browser Interface)​

Next Steps​

What Is a Semantic Model?

Quick Start

Step 1: Write a YAML file

Step 2: Define the time configuration

Step 3: Validate & commit

Semantic Model Structure

Entities — the Table's Identity

Entity Types

Example: entity definitions for the orders table

Dimensions — How Data Is Grouped

Time Granularity (`time_granularity`)

Example

Measures — Aggregation

Aggregation Types

Example

Full Example

Companion: the users and products tables

Advanced Metric Definitions

Ratio Metrics

Derived Metrics

Cumulative Metrics

Conversion Metrics

Common Scenarios

Scenario 1: The same column as dimensions at different granularities

Scenario 2: Using SQL expressions

Scenario 3: Partitioned tables

Scenario 4: Hiding internal measures

Advanced Features

Filters

Saved Queries

Non-additive Measures & Slowly Changing Dimensions (SCD Type II)

Null Filling & Timeline Alignment

Native Table Reference Format

Aggregation Mode for Cumulative Metrics

Constant Properties for Conversion Metrics

Metric Time Granularity & Offset Grain

Entity Roles

Best Practices

Common Errors & Troubleshooting

Managing the Semantic Model in the VeloDB MCP Service

Workspace

The Three Workspace States

Two-Layer Storage: Active Store and Staging Store

Update Flow

Permission Model

Default Example Workspace

WebUI Management Interface

WebUI (Browser Interface)

Next Steps