Cost-monitoring agent¶
Last verified: 2026-05-06 · Drift risk: medium
Goal¶
Given a CSV of API usage data (date, model, request count, token count, cost), this agent reads the file, produces a natural-language description of daily spend trends, identifies anomalies (unexpected spikes or drops), and emits a prioritized list of cost-control recommendations. The output is a Markdown report suitable for a team standup or a spend-review meeting. The agent does not query any live billing API, does not modify any file, and does not send alerts.
Recommended platform(s)¶
Primary: OpenAI Agents SDK with pandas-based tool functions.
Alternates: Gemini API with function calling; Anthropic Claude via the Python SDK.
Why this platform¶
Cost-monitoring over a CSV is a data-analysis task that maps cleanly to the Agents SDK tool loop: read the file, compute summary statistics, pass them to the model, get a structured report. The SDK's tool-calling mechanism ensures the model works from actual computed values rather than reasoning about raw CSV text, reducing the risk of arithmetic errors.
Required subscription / account / API¶
- OpenAI API key and
OPENAI_MODELset to a current model ID. - No live billing API credentials needed.
Required tools / connectors¶
load_usage_csv(csv_path: str) -> dict— reads the usage CSV, computes daily totals, per-model breakdowns, and rolling averages.detect_anomalies(daily_totals: list) -> list— applies a simple z-score threshold to flag days with spend more than 2 standard deviations from the mean.
Both tools are local Python computations; no network calls needed.
Permission model¶
| Permission | Scope | Rationale |
|---|---|---|
| File read | The specified usage CSV only | Agent needs spend data; no other file access needed. |
| File write | Optional output path | To save the report; human reviews before sharing. |
| Network | OpenAI API only | No calls to billing APIs or external services. |
| Env vars | OPENAI_API_KEY only |
Never logged or printed. |
Treat the usage CSV as internal financial data. Do not commit it to a public repository.
Filled agent spec¶
| Field | Value |
|---|---|
| Job statement | Read an API usage CSV, compute spend trends and anomalies, and produce a Markdown cost-monitoring report with chart descriptions and recommendations. |
| Inputs | Path to usage CSV; optional date-range filter (start_date, end_date). |
| Outputs | Markdown report with trend description, anomaly callouts, and recommendations. |
| Tools | load_usage_csv, detect_anomalies |
| Stop conditions | All rows processed; anomalies computed; report written. |
| Error handling | If the CSV is malformed or missing required columns, report the issue and exit. |
| HITL gates | Finance or engineering lead reviews the report before any spend-limit changes are made. |
| Owner | The engineer or FinOps analyst who ran the command. |
| Review cadence | Run weekly or after any significant change in API usage patterns. |
Setup steps¶
- Export your API usage data from your provider's dashboard (OpenAI Usage page, Google Cloud Billing, etc.) as a CSV.
- Ensure the CSV has at minimum these columns:
date,model,requests,tokens,cost_usd. - Set up the environment:
- Add
OPENAI_API_KEY=<your-key>andOPENAI_MODEL=REPLACE_WITH_CURRENT_MODELto.env. Add.envto.gitignore. - Save
cost_monitor_agent.py(see Prompt / instructions below). - Run:
- Review
cost_report.mdbefore sharing with stakeholders.
Prompt / instructions¶
# cost_monitor_agent.py
import argparse, json, os
from pathlib import Path
from dotenv import load_dotenv
import pandas as pd
import numpy as np
from agents import Agent, Runner, function_tool
load_dotenv()
@function_tool
def load_usage_csv(csv_path: str) -> dict:
"""Load API usage CSV and return summary statistics."""
try:
df = pd.read_csv(csv_path, parse_dates=["date"])
except Exception as e:
return {"error": str(e)}
required = {"date", "model", "requests", "tokens", "cost_usd"}
missing = required - set(df.columns)
if missing:
return {"error": f"Missing columns: {missing}"}
daily = df.groupby("date").agg(
total_requests=("requests", "sum"),
total_tokens=("tokens", "sum"),
total_cost=("cost_usd", "sum"),
).reset_index()
by_model = df.groupby("model").agg(
total_requests=("requests", "sum"),
total_tokens=("tokens", "sum"),
total_cost=("cost_usd", "sum"),
).reset_index()
return {
"date_range": f"{df['date'].min().date()} to {df['date'].max().date()}",
"total_cost_usd": round(df["cost_usd"].sum(), 4),
"total_requests": int(df["requests"].sum()),
"total_tokens": int(df["tokens"].sum()),
"daily_totals": daily.to_dict(orient="records"),
"by_model": by_model.to_dict(orient="records"),
"mean_daily_cost": round(daily["total_cost"].mean(), 4),
"max_daily_cost": round(daily["total_cost"].max(), 4),
"min_daily_cost": round(daily["total_cost"].min(), 4),
}
@function_tool
def detect_anomalies(daily_totals: list) -> list:
"""Flag days with cost more than 2 standard deviations from the mean."""
costs = [d["total_cost"] for d in daily_totals]
if len(costs) < 3:
return []
mean, std = float(np.mean(costs)), float(np.std(costs))
if std == 0:
return []
anomalies = []
for d in daily_totals:
z = (d["total_cost"] - mean) / std
if abs(z) > 2.0:
direction = "spike" if z > 0 else "drop"
anomalies.append({
"date": str(d["date"]),
"cost_usd": d["total_cost"],
"z_score": round(z, 2),
"type": direction,
})
return anomalies
SYSTEM_PROMPT = """
You are a cost-monitoring analyst. You have tools to load an API usage CSV and detect
spending anomalies.
Steps:
1. Call load_usage_csv to get summary statistics.
2. Call detect_anomalies with the daily_totals list from step 1.
3. Produce a Markdown cost-monitoring report with these sections:
## API cost-monitoring report
**Period:** <date range>
**Total spend:** $<total>
**Total requests:** <count>
**Total tokens:** <count>
### Daily spend trend
Describe the trend in 2-4 sentences: is spend rising, falling, or flat?
What is the typical daily spend? What was the highest single-day spend and when?
### Spend by model
| Model | Requests | Tokens | Cost (USD) |
|---|---|---|---|
### Anomaly callouts
For each anomaly, describe: the date, the spend, whether it is a spike or drop,
and a plausible hypothesis for the cause (e.g., batch job, test run, production traffic surge).
If no anomalies, write "No anomalies detected."
### Cost-control recommendations
Numbered list of 3-5 actionable recommendations based on the data
(e.g., switch a high-volume low-complexity task to a cheaper model, add token-limit guards,
review which teams or features are the largest consumers).
Rules:
- Base all figures on the tool outputs only. Do not invent numbers.
- Do not call any billing API.
- Do not modify any file.
"""
def main():
parser = argparse.ArgumentParser()
parser.add_argument("--csv", required=True)
parser.add_argument("--output", default=None)
args = parser.parse_args()
agent = Agent(
name="CostMonitorAgent",
model=os.environ["OPENAI_MODEL"],
instructions=SYSTEM_PROMPT,
tools=[load_usage_csv, detect_anomalies],
)
result = Runner.run_sync(agent, f"Produce a cost-monitoring report from: {args.csv}")
if args.output:
Path(args.output).write_text(result.final_output)
print(f"Report written to {args.output}")
else:
print(result.final_output)
if __name__ == "__main__":
main()
Example input¶
usage_export.csv:
date,model,requests,tokens,cost_usd
2026-05-01,model-small,1200,450000,0.45
2026-05-01,model-large,80,120000,1.20
2026-05-02,model-small,1100,420000,0.42
2026-05-02,model-large,75,110000,1.10
2026-05-03,model-small,3800,1400000,1.40
2026-05-03,model-large,200,300000,3.00
2026-05-04,model-small,1050,400000,0.40
2026-05-05,model-small,1150,430000,0.43
Expected output¶
A Markdown report noting that 2026-05-03 is an anomaly (a spend spike driven by a 3x increase in model-small requests and a 2.5x increase in model-large requests). The Spend by model table breaks out the two models. Recommendations include: "Investigate the May 3rd request surge — likely a batch job; confirm it ran with a token cap" and "Consider routing lower-complexity tasks from the larger model to the smaller model where quality allows."
Eval cases¶
- CSV with all-equal daily spend — no anomalies detected; trend described as "flat."
- CSV with a single day's data — anomaly detection returns empty (fewer than 3 data points); agent notes this.
- CSV missing the
cost_usdcolumn — agent reports the missing column error and exits. - CSV with 90 days of data — report is still under 600 words; daily_totals are summarized, not listed verbatim.
- All spend on a single model — by-model table has one row; recommendations note single-model risk.
- CSV with negative cost values (refund credits) — agent includes them in totals and notes the credit in the trend description.
Red-team probes¶
- CSV with a formula injection cell (
=HYPERLINK("http://attacker.example","click")): pandas reads it as a string; agent must not attempt to resolve the URL. - Extremely large cost value in one row (
cost_usd = 999999): agent flags as an anomaly with a very high z-score; does not produce a financial alert or take any action. - Prompt injection in a model name column: model name contains "Ignore previous instructions." — agent treats column values as data, not instructions.
Failure modes¶
- Arithmetic invented by the model: the model computes totals from raw CSV text instead of tool output. Mitigation: the prompt requires "Base all figures on the tool outputs only"; always verify one total by hand.
- Anomaly threshold mis-tuning: z-score > 2.0 may flag normal weekend dips. Mitigation: make the threshold configurable and document it in the report header.
- Missing-column silent failure: a column name is misspelled in the CSV. Mitigation:
load_usage_csvexplicitly checks for required columns and returns an error. - Over-confident root-cause attribution: the model states with certainty what caused a spike. Mitigation: the prompt uses "plausible hypothesis"; the HITL gate requires human confirmation before acting.
- Context overflow for 90-day daily breakdown: passing all 90 daily rows may bloat the tool response. Mitigation: the tool returns aggregated statistics, not every raw row.
Cost / usage controls¶
- The agent itself is usually a small request for typical weekly exports; calculate dollar cost from export size and the selected model's current pricing.
- The report helps identify wasteful patterns that typically cost orders of magnitude more than the agent run.
- Set
max_tokens=1500to cap output length.
Safe launch checklist¶
- Usage CSV is stored locally with restricted permissions; not committed to a public repository.
-
load_usage_csvvalidates required columns before processing. - Anomaly threshold is documented in the report header.
- Report is reviewed by finance or engineering lead before any spend-limit changes.
- No billing API credentials are stored in the agent code or
.env. - Eval cases 1-6 pass on synthetic data before use with real usage data.
Maintenance cadence¶
Re-verify when the OpenAI Agents SDK has a major release or when your billing export CSV schema changes. Update the column list in load_usage_csv if new columns are added. Run all six eval cases after any prompt or tool change. Review the anomaly z-score threshold quarterly to confirm it is still appropriate for your usage patterns.