Setup
Create a Langfuse account
Sign up at cloud.langfuse.com or self-host.
Create a project and get API keys
In the Langfuse dashboard, create a project and copy the Public Key and Secret Key.
Environment Variables
| Variable | Required | Default | Description |
|---|---|---|---|
LANGFUSE_ENABLED | Yes | false | Toggle tracing on/off (true, 1, or yes) |
LANGFUSE_PUBLIC_KEY | When enabled | — | Langfuse project public key |
LANGFUSE_SECRET_KEY | When enabled | — | Langfuse project secret key |
LANGFUSE_HOST | No | https://cloud.langfuse.com | Langfuse server URL (for self-hosted) |
What Gets Traced
Every LLM call through the LangGraph supervisor and worker agents is traced, including:- Input/output messages — full conversation context sent to the model
- Tool calls — which tools were invoked, with arguments and results
- Token usage — input/output/total tokens per call
- Latency — time per LLM call and per tool execution
- Session grouping — all calls within a conversation thread are grouped by session ID
Session Tracking
Traces are automatically grouped by Octo’s conversation thread ID. Each session in Langfuse corresponds to one Octo chat session, making it easy to analyze full conversation flows.Analyzing Traces
Common things to look for in the Langfuse dashboard:Token waste
Token waste
Check the input tokens per call. If a simple question triggers 200K+ input tokens, context is bloated. Look at the input messages to identify large tool results or redundant memory.
Excessive tool calls
Excessive tool calls
Count the number of LLM calls per user message. More than 5-6 calls for a simple question suggests the agent is over-investigating. The supervisor’s efficiency instructions should prevent this.
Cost per interaction
Cost per interaction
Use Langfuse’s cost tracking to identify expensive patterns. Agent handoffs, deep research, and large context windows are the main cost drivers.
Latency breakdown
Latency breakdown
Identify slow tool calls or LLM responses. Bedrock cross-region inference can add latency — check if calls are timing out.
Disabling
SetLANGFUSE_ENABLED=false in .env (or remove it entirely). No code changes needed — Octo skips the Langfuse callback when disabled.
Langfuse is a core dependency (
langfuse>=2.0 in pyproject.toml), so no extra install step is required.Self-Hosting
For teams that need data privacy, Langfuse supports self-hosting. PointLANGFUSE_HOST to your instance:

