AgentFluent

Local-first agent analytics with behavior-to-improvement diagnostics for Claude Code and the Agent SDK

Overview

AgentFluent is an open-source, local-first analytics tool for AI agents built on Claude Code and the Claude Agent SDK. AI agents are in production at 57% of organizations, and quality is the single top barrier to deployment. When an agent misbehaves — wrong tool choice, retry loops, hallucinated outputs — developers iterate on prompts, tool definitions, etc, but it can be difficult to identify what needs improvement.

Existing observability platforms show what happened: traces, latency, token counts. AgentFluent tells you why the agent misbehaved and what in its configuration to change. It reads your local session JSONL, extracts agent invocations and tool patterns, scores each agent’s configuration against a best-practice rubric, and correlates observed behavior back to a specific fix — a prompt gap, a missing tool constraint, or a stale model selection. No cloud services, no API keys, no data leaves your machine.

Born from CodeFluent research that identified the agent-quality gap. Where CodeFluent coaches the human to interact with Claude Code better, AgentFluent scores the agent’s own config — because in programmatic agents, the prompt and tool setup are the agent.

The Three Axes

Every recommendation lands on one of three axes, so you can prioritize by what matters right now. The three often trade off — saving cost can hurt quality, chasing speed can hurt cost — and AgentFluent surfaces the trade-off rather than collapsing it to a single score.

Axis What it tracks Example finding
[cost] tokens, cache efficiency, model fit, offload candidates This agent uses Opus where Sonnet would do
[speed] duration, retry density, tool-call churn, stuck patterns This agent retries Bash 5× before giving up
[quality] user mid-flight corrections, file rework, reviewer-caught rate This agent ships work that gets immediately rewritten

Commands

analyze

Produces token, cost, and behavior metrics for a project — a per-model cost breakdown, an Agent Invocations table, and behavior diagnostics across metadata, trace, and aggregate layers. A Top-N priority-fixes summary ranks findings by a composite priority_score, and an Offload Candidates section proposes moving repeating tool-use clusters onto cheaper-tier models.

config-check

Walks ~/.claude/agents/*.md and ./.claude/agents/*.md, parses each agent's frontmatter and body, and scores against a 4-dimension rubric — description trigger quality, tool access, model selection, and prompt completeness — with ranked recommendations per agent.

diff

Compares two analyze --json snapshots and surfaces new, resolved, and persisting recommendations plus token / cost / invocation deltas. --fail-on gates exit code 3 on new findings, so diff slots into a PR check the same way a test runner does.

report

Renders an analyze --json snapshot as a Markdown document — the same Summary / Token / Diagnostics / Offload sections — in a form you can paste into a PR comment, attach as a CI artifact, or commit alongside a prompt change as a review trail.

list

Discovers every Claude Code / Agent SDK project under ~/.claude/projects/, with session counts, total size, and last-modified timestamps. Pass --project to drill into one project's individual session files.

What Sets It Apart

The agent observability space is crowded — several tools capture what agents do. None diagnose why they misbehave or what to change from locally-persisted session data.

  • The config is the agent. In interactive sessions the human course-corrects mid-flight; in programmatic agents the prompt and tool setup are the agent, and a flaw compounds at scale. AgentFluent scores description, allowed_tools / disallowedTools, model, and prompt on every agent definition, and audits MCP server configuration (configured-but-unused, observed-but-missing) against real tool usage.
  • Behavior-to-improvement, not just traces. When an agent retries Bash 40% of the time, AgentFluent tells you which prompt clause is missing — not just that the retry happened. Every diagnostic maps to a specific config surface and a pointer to the file to edit.
  • JSON envelope as a contract. A stable {version, command, data} schema lets you build PR gates, trend dashboards, and regression detectors on top without tracking AgentFluent’s internal refactors.
  • CLI-native and local by default. agentfluent analyze --format json | jq ... fits terminal, CI/CD, and PR-check workflows. No outbound network calls unless you explicitly opt in via --git (local git) or --github (GitHub-API quality signals).

Technology Stack

  • Language: Python 3.12+
  • CLI: Typer + Rich for terminal formatting
  • Data Models: Pydantic v2 across module boundaries
  • Config Parsing: PyYAML (safe_load only) for agent frontmatter
  • Optional: scikit-learn for delegation clustering (agentfluent[clustering])
  • Testing: pytest + pytest-cov (1600+ tests), mypy strict mode
  • Tooling: ruff for linting/formatting, uv for packaging
  • CI/CD: GitHub Actions — automated testing, type checking, and PyPI publishing

Supported Platforms

Platform CLI
Linux Yes
macOS Yes
Windows Yes

Pure-Python package; path handling resolves ~/.claude/ on every platform. Requires Python 3.12 or newer.

Install

# Preferred — isolated tool install via uv
uv tool install agentfluent

# Fallback — pip into a venv of your choice
pip install agentfluent

# Zero-install one-shot
uvx agentfluent list

View on GitHub PyPI Package