Cloud Cost Anomaly Detection

import { Steps } from ‘@astrojs/starlight/components’;

Cloud Cost Anomaly Detection

Anomaly detection catches unexpected spend spikes before they compound into a surprise bill. Xplorr monitors every connected cloud account automatically — no configuration needed to get started.

How it works

Xplorr calculates a 7-day rolling baseline of your daily spend per service, per account. Each new day’s actual cost is compared against that baseline. If the actual cost exceeds the baseline by more than 50%, it’s flagged as an anomaly.

The detection runs once per day, after the previous day’s cost data is available from your cloud provider (usually by 6–8 AM UTC).

What gets monitored:

Per-service daily spend (e.g., EC2, S3, RDS — each tracked independently)
Per-account total daily spend
Cross-account total daily spend

Severity levels

Severity	Spike range	Example
Low	50–100% above baseline	$100/day baseline, actual $165
Medium	100–200% above baseline	$100/day baseline, actual $250
High	200–500% above baseline	$100/day baseline, actual $420
Critical	500%+ above baseline	$100/day baseline, actual $750

Severity determines notification urgency. Critical anomalies trigger immediate alerts; low-severity ones are batched into the daily digest.

Notification channels

Anomaly alerts go to two places:

Email — sent to all org members who have anomaly notifications enabled in their preferences
Slack — posted to your configured Slack channel with an AI-generated explanation of the likely cause

Each alert includes:

The service and account affected
Yesterday’s cost vs the 7-day baseline
Percentage increase
AI-suggested root cause (based on recent resource changes, deployments, or usage patterns)

Real-world example

A team running a staging environment saw this anomaly:

Critical anomaly — EC2 in us-east-1 Baseline: $42/day | Actual: $185/day | Spike: +340% Likely cause: 14 c5.2xlarge instances launched at 2:17 AM — matches a deployment pipeline run that did not terminate old instances.

The root cause was a stuck blue/green deployment. The old instance group wasn’t drained because the health check endpoint was returning 200 on both old and new task sets. Without anomaly detection, this would have run for the rest of the month — roughly $4,300 in wasted spend.

Managing anomalies

Open the Anomalies page from the left sidebar in the console. Each anomaly has a status:

Open — newly detected, unreviewed
Acknowledged — someone on your team has seen it and is investigating
Resolved — the underlying issue has been fixed

To update an anomaly’s status, click on it and choose Acknowledge or Resolve. Resolved anomalies move to the archive after 7 days.

Custom alert rules vs automatic detection

Automatic detection works out of the box for all services and accounts. But you can also create custom rules for tighter control:

Go to Settings → Alert Rules and click New Rule.
Pick the scope:
- Specific account
- Specific service (e.g., only RDS)
- Specific tag (e.g., env:production)
Set the threshold. Options:
- Percentage above baseline — override the default 50% (e.g., set 20% for production databases)
- Absolute dollar amount — trigger when daily spend exceeds a fixed number (e.g., $500/day)
Choose the notification channel and severity override.
Click Save Rule.

Custom rules are evaluated in addition to automatic detection. If both trigger on the same day, only the custom rule’s alert is sent (to avoid duplicate notifications).

Common mistakes

Ignoring low-severity anomalies. A 60% spike on a $20/day service is only $12 extra. But if it persists for 30 days, that’s $360 — and often a symptom of a misconfiguration that affects other services too.
Not connecting all accounts. Anomaly detection only works on connected accounts. If your dev account isn’t connected, you won’t catch cost spikes there.
Treating anomalies as false positives without checking. Month-end batch jobs or auto-scaling events can cause legitimate spikes. Acknowledge them so your team knows they’ve been reviewed, but always verify first.

FAQ

Can I disable anomaly detection for a specific service? Not directly, but you can create a custom rule with a very high threshold (e.g., 10000%) which effectively silences alerts for that service.

How quickly will I get notified? Anomaly detection runs daily. For real-time spend monitoring, combine anomaly detection with budget alerts set to a tight threshold.

Does it work for Azure and GCP too? Yes. The same 7-day rolling baseline applies to all connected providers.

What if my baseline is $0? If a service had zero spend for the past 7 days and suddenly incurs cost, it’s flagged as a critical anomaly regardless of the dollar amount.

Key takeaways

Anomaly detection is automatic — it starts working as soon as you connect a cloud account.
Four severity levels help you prioritize response: focus on critical and high first.
Combine automatic detection with custom rules for production-critical services where even small spikes matter.
Always acknowledge or resolve anomalies to keep the signal clean for your team.