Auto Scaling
Auto Scaling automatically adjusts the CPU cores of a compute cluster based on workload demand. It is useful for workloads with changing traffic patterns, especially when peak times or capacity needs are hard to predict.
When Auto Scaling is enabled, VeloDB Cloud monitors CPU and memory usage, calculates a recommended cluster size, and scales the cluster within the minimum and maximum CPU limits that you configure. This helps keep enough capacity during traffic spikes and reduces unused resources during quiet periods.
Key Use Cases:
- Workloads with unpredictable or fast-changing traffic.
- Systems that need automatic responses to CPU or memory pressure.
- Teams that want flexible cost limits instead of fixed scaling schedules.
Enabling Auto Scaling
You can enable Auto Scaling and configure CPU limits when you create or provision a compute cluster.
| Parameter | Definition | Best Practices |
|---|---|---|
| Min CPU Cores | The minimum number of CPU cores that the cluster can scale down to. | Set this to the capacity required for your baseline traffic, typically 4–16 cores. |
| Max CPU Cores | The maximum number of CPU cores that the cluster can scale up to. | Set this to the largest capacity and cost limit that you want to allow for peak traffic. |
If a recommendation falls outside these limits, VeloDB Cloud adjusts it to the nearest configured limit. The cluster will never scale below Min CPU Cores or above Max CPU Cores.
Eligibility Criteria
Auto Scaling is applied only when all of the following conditions are met:
| Dimension | Requirement |
|---|---|
| Cluster Type | Compute Cluster |
| Billing Model | Post-paid (Pay-as-you-go) |
| Cluster Status | Running |
| Configuration | Auto Scaling enabled |
Note: Subscription (pre-paid) clusters and suspended or stopped clusters are excluded from Auto Scaling evaluations.
How Auto Scaling Works
VeloDB Cloud runs a background recommendation engine that checks resource usage at regular intervals. The engine calculates a target cluster size, maps it to the nearest supported compute tier, and scales the cluster when usage crosses a threshold.
Dual-Window Analysis
To balance fast response with stability, VeloDB Cloud evaluates metrics across two time windows:
- Short Window (3-hour default): Captures sudden traffic changes and allows the cluster to scale down quickly after traffic drops.
- Long Window (30-hour default): Captures larger daily peaks so the cluster can scale up to a suitable tier without stepping through many smaller tiers.
Each window generates its own recommendation: scale up, scale down, or keep the current size. The engine then combines the recommendations into one scaling decision.
Target Tracking & Watermarks
The engine compares resource usage with low and high watermarks. Usage above the high watermark triggers scale-up. Usage below the low watermark triggers scale-down. Usage between the two watermarks keeps the cluster at its current size.
| Resource Dimension | Low Watermark | High Watermark | Target Utilization |
|---|---|---|---|
| CPU | 37.5% | 75% | ~53% |
| Memory | 40% | 80% | ~57% |
Target utilization is the geometric mean of the low and high watermarks. This helps map the same workload back to the same cluster tier, which prevents repeated upward drift and avoids under-provisioning.
Independent Resource Evaluation
CPU and memory are evaluated independently. VeloDB Cloud selects the larger target size calculated from the two metrics. As a result, a memory-heavy workload will not scale down just because CPU usage is low, and a CPU-heavy workload can still scale up even when memory usage is stable.
Scaling Triggers: Examples
Example 1 (Scale Up): A 16-core cluster reaches a CPU load peak of 14 cores during the short window.
- Utilization = $14 / 16 \approx 87.5%$ (above the 75% high watermark).
- Target Calculation = $\lceil 14 / 0.53 \rceil \approx 27$ cores.
- The engine rounds up to the next available infrastructure tier: 32 cores.
Example 2 (Scale Down): The same 16-core cluster has a sustained peak of only 5 cores during the short window.
- Utilization = $5 / 16 \approx 31%$ (below the 37.5% low watermark).
- Target Calculation = $\lceil 5 / 0.53 \rceil \approx 10$ cores.
- The engine rounds down to the nearest valid infrastructure tier: 8 cores.
Supported Compute Tiers
Clusters scale across predefined CPU tiers instead of arbitrary core counts. The valid tiers are:
4, 8, 16, 32, 48, 64, 80, 96, 128, 160, 192, ... (Tiers above 80 cores scale in increments of 32)
The calculated target is rounded to the closest valid tier. If a recommendation is exactly between two tiers, VeloDB Cloud chooses the larger tier. The Min CPU Cores and Max CPU Cores values that you configure are also aligned to these tiers.
Flapping & Jitter Protection
To prevent "flapping", which means frequent back-and-forth scaling caused by unstable traffic, the engine uses several safety controls.
Cooldown Periods
After a successful scaling event, the engine waits for a 15-minute cooldown period before making another scaling change. Newly provisioned or restarted clusters also have a 15-minute warm-up window before they become eligible for Auto Scaling.
Window Conflict Resolution
If the short window and long window recommend different scaling directions, the engine checks the trend in the short window:
- If the short-window trend is going up, traffic is increasing. The engine prioritizes the long window and scales up.
- If the short-window trend is flat or going down, traffic is easing. The engine prioritizes the short window and scales down.
This helps the cluster respond to incoming traffic spikes while avoiding unnecessary scaling during small traffic changes.
Monitoring Scaling History
All Auto Scaling actions are recorded in Activity Logs. Each entry includes the timestamp, scaling direction, and the cluster size before and after the change.
- Path: Console Left Navigation -> Organization -> Activity Logs
After changing your Auto Scaling limits, review these logs to check whether the scaling frequency and cluster sizes match your expectations.
FAQ
Q: I enabled Auto Scaling, but my cluster size remains unchanged. Why?
This is typically caused by one of the following factors:
- Cooldown Active: The cluster was scaled, created, or restarted less than 15 minutes ago.
- Not Enough Metrics: The short window does not have enough metrics, which is common for brand-new clusters.
- Stable Utilization: Current metrics are within the optimal target bounds (CPU: 37.5%–75%, Memory: 40%–80%).
- Already at the Target Tier: The calculated target matches the current cluster size.
- Not Eligible: The cluster does not meet the basic requirements, for example, it is not in the Running state.
Q: Can I disable the 15-minute cooldown to accelerate scale-up times?
No. The cooldown is a system guardrail that prevents repeated scaling loops and resource instability. If you expect a large, scheduled traffic spike, such as a product launch or flash sale, increase Min CPU Cores in advance and lower it after the event.
Q: Will a scale-down event shrink my cluster to zero and terminate workloads?
No. The cluster will never scale below the configured Min CPU Cores. VeloDB Cloud also enforces a minimum of 4 cores to keep the cluster available.
Q: Is there a risk of runaway scaling causing cost overruns?
No. The engine always respects your Max CPU Cores limit. VeloDB Cloud also enforces a platform limit of 2048 cores per cluster.
Best Practices
- Align Min CPU Cores with baseline traffic: If this value is too low, sudden spikes may require several scaling events before the cluster reaches the right size. If it is too high, you may lose savings during quiet periods.
- Provide enough headroom in Max CPU Cores: Set the maximum to about 1.5–2 times your expected peak traffic so the cluster can handle sudden surges.
- Use Scheduled Scaling for predictable cycles: If your workload has clear daily peak periods, scheduled scaling can be easier to manage than dynamic Auto Scaling.
- Pre-warm the cluster before major events: Before important launches, campaigns, or migrations, increase Min CPU Cores in advance and restore it after the event.
- Review Activity Logs regularly: Use scaling history to fine-tune your minimum and maximum CPU limits.