Cloud Cost Anomaly Detection
import { Steps } from ‘@astrojs/starlight/components’;
Cloud Cost Anomaly Detection
Section titled “Cloud Cost Anomaly Detection”Anomaly detection catches unexpected spend spikes before they compound into a surprise bill. Xplorr monitors every connected cloud account automatically — no configuration needed to get started.
How it works
Section titled “How it works”Xplorr calculates a 7-day rolling baseline of your daily spend per service, per account. Each new day’s actual cost is compared against that baseline. If the actual cost exceeds the baseline by more than 50%, it’s flagged as an anomaly.
The detection runs once per day, after the previous day’s cost data is available from your cloud provider (usually by 6–8 AM UTC).
What gets monitored:
- Per-service daily spend (e.g., EC2, S3, RDS — each tracked independently)
- Per-account total daily spend
- Cross-account total daily spend
Severity levels
Section titled “Severity levels”| Severity | Spike range | Example |
|---|---|---|
| Low | 50–100% above baseline | $100/day baseline, actual $165 |
| Medium | 100–200% above baseline | $100/day baseline, actual $250 |
| High | 200–500% above baseline | $100/day baseline, actual $420 |
| Critical | 500%+ above baseline | $100/day baseline, actual $750 |
Severity determines notification urgency. Critical anomalies trigger immediate alerts; low-severity ones are batched into the daily digest.
Notification channels
Section titled “Notification channels”Anomaly alerts go to two places:
- Email — sent to all org members who have anomaly notifications enabled in their preferences
- Slack — posted to your configured Slack channel with an AI-generated explanation of the likely cause
Each alert includes:
- The service and account affected
- Yesterday’s cost vs the 7-day baseline
- Percentage increase
- AI-suggested root cause (based on recent resource changes, deployments, or usage patterns)
Real-world example
Section titled “Real-world example”A team running a staging environment saw this anomaly:
Critical anomaly — EC2 in us-east-1 Baseline: $42/day | Actual: $185/day | Spike: +340% Likely cause: 14 c5.2xlarge instances launched at 2:17 AM — matches a deployment pipeline run that did not terminate old instances.
The root cause was a stuck blue/green deployment. The old instance group wasn’t drained because the health check endpoint was returning 200 on both old and new task sets. Without anomaly detection, this would have run for the rest of the month — roughly $4,300 in wasted spend.
Managing anomalies
Section titled “Managing anomalies”Open the Anomalies page from the left sidebar in the console. Each anomaly has a status:
- Open — newly detected, unreviewed
- Acknowledged — someone on your team has seen it and is investigating
- Resolved — the underlying issue has been fixed
To update an anomaly’s status, click on it and choose Acknowledge or Resolve. Resolved anomalies move to the archive after 7 days.
Custom alert rules vs automatic detection
Section titled “Custom alert rules vs automatic detection”Automatic detection works out of the box for all services and accounts. But you can also create custom rules for tighter control:
-
Go to Settings → Alert Rules and click New Rule.
-
Pick the scope:
- Specific account
- Specific service (e.g., only RDS)
- Specific tag (e.g.,
env:production)
-
Set the threshold. Options:
- Percentage above baseline — override the default 50% (e.g., set 20% for production databases)
- Absolute dollar amount — trigger when daily spend exceeds a fixed number (e.g., $500/day)
-
Choose the notification channel and severity override.
-
Click Save Rule.
Custom rules are evaluated in addition to automatic detection. If both trigger on the same day, only the custom rule’s alert is sent (to avoid duplicate notifications).
Common mistakes
Section titled “Common mistakes”- Ignoring low-severity anomalies. A 60% spike on a $20/day service is only $12 extra. But if it persists for 30 days, that’s $360 — and often a symptom of a misconfiguration that affects other services too.
- Not connecting all accounts. Anomaly detection only works on connected accounts. If your dev account isn’t connected, you won’t catch cost spikes there.
- Treating anomalies as false positives without checking. Month-end batch jobs or auto-scaling events can cause legitimate spikes. Acknowledge them so your team knows they’ve been reviewed, but always verify first.
Can I disable anomaly detection for a specific service? Not directly, but you can create a custom rule with a very high threshold (e.g., 10000%) which effectively silences alerts for that service.
How quickly will I get notified? Anomaly detection runs daily. For real-time spend monitoring, combine anomaly detection with budget alerts set to a tight threshold.
Does it work for Azure and GCP too? Yes. The same 7-day rolling baseline applies to all connected providers.
What if my baseline is $0? If a service had zero spend for the past 7 days and suddenly incurs cost, it’s flagged as a critical anomaly regardless of the dollar amount.
Key takeaways
Section titled “Key takeaways”- Anomaly detection is automatic — it starts working as soon as you connect a cloud account.
- Four severity levels help you prioritize response: focus on critical and high first.
- Combine automatic detection with custom rules for production-critical services where even small spikes matter.
- Always acknowledge or resolve anomalies to keep the signal clean for your team.