Learn how to debug your AI agent using message audit tools and logs to identify and fix issues quickly.
What is Message Audit?
Message Audit is your debugging console for AI agents. It shows:
- Every conversation with your agent
- Full message history for each thread
- Debug information: tokens, reasoning, timing, errors
- Trace IDs to follow requests through the system
- Model responses and tool calls
Think of it as your agent's black box recorder - when something goes wrong, you start here.
Accessing Message Audit
- Log in to console.flutch.ai
- Navigate to Agents → Select your agent
- Go to Message Audit tab
URL: https://console.flutch.ai/agents/{agentId}/message-audit
Understanding the Interface
Conversation List View
The main view shows all conversations:
bash┌─────────────────────────────────────────────────┐ │ Thread ID User Time Messages │ ├─────────────────────────────────────────────────┤ │ thread-abc123 john 2m ago 5 │ │ thread-def456 sarah 5m ago 12 │ │ thread-ghi789 mike 10m ago 3 │ └─────────────────────────────────────────────────┘
Columns:
- Thread ID: Unique conversation identifier
- User: User who started the conversation (if authenticated)
- Time: When the last message was sent
- Messages: Total message count in this thread
Filters:
- By date range
- By user ID
- By error status (show only failed)
- By thread ID (search)
Conversation Details View
Click on any thread to see full conversation:
bash┌─────────────────────────────────────────────────┐ │ Thread: thread-abc123 │ ├─────────────────────────────────────────────────┤ │ 👤 User: What are your pricing plans? │ │ ⏱️ Sent: 2025-01-20 14:30:15 │ ├─────────────────────────────────────────────────┤ │ 🤖 Agent: We offer three pricing plans: │ │ - Basic: $9/month │ │ - Pro: $49/month │ │ - Enterprise: Custom pricing │ │ │ │ ⏱️ Generated: 2025-01-20 14:30:17 │ │ 🔍 Trace ID: trace-xyz789 │ │ 💰 Tokens: 245 (prompt: 120, completion: 125)│ │ ⚡ Duration: 1.8s │ └─────────────────────────────────────────────────┘
Debug Information Per Message:
- Trace ID: Unique identifier for this request
- Token usage: Input and output tokens
- Duration: Time to generate response
- Model used: Which LLM model processed this
- Temperature: Model temperature setting
- Error details: If message failed, why?
Debugging Common Issues
Issue 1: Agent Not Responding
Symptoms:
- User sends message, no response
- Message appears in audit but agent message missing
- Error indicator on conversation
How to debug:
- Open conversation in Message Audit
- Check for error message on agent response
- Look at trace ID and error details
Common causes:
bash❌ "API key invalid or expired" → Fix: Update API key in agent settings ❌ "Model rate limit exceeded" → Fix: Wait or upgrade plan with LLM provider ❌ "Timeout after 30 seconds" → Fix: Optimize system prompt, reduce context length ❌ "Tool execution failed: [tool_name]" → Fix: Check tool configuration, ensure service is accessible
Issue 2: Wrong Responses
Symptoms:
- Agent gives incorrect information
- Agent doesn't follow system prompt
- Agent ignores context or tools
How to debug:
-
Check the system prompt being used:
- Settings → System Prompt
- Verify it's what you expect
-
Check model settings:
- Temperature too high? (> 0.9 = creative but unpredictable)
- Wrong model? (gpt-3.5 vs gpt-4)
-
Check conversation context:
- Is previous context being sent correctly?
- Are tool results being passed to model?
-
Look at reasoning chains (if available):
- What did the model "think" before responding?
- Did it consider using a tool but decided not to?
Example debug session:
bashUser asked: "What's the status of order #12345?" Agent responded: "I don't have access to order information." Debug steps: 1. Check Message Audit → See trace ID: trace-abc123 2. Check tool calls → No tool was called 3. Check system prompt → Missing instruction to use order lookup tool 4. Fix: Update system prompt to mention order tool 5. Test again → Now works correctly
Issue 3: Slow Responses
Symptoms:
- Users complain about wait time
- Message Audit shows high duration (> 10s)
How to debug:
- Check duration in Message Audit for slow messages
- Identify bottleneck:
If model generation is slow (5-10s+):
- Large context (too many previous messages)
- Complex system prompt
- Using slow model (gpt-4 vs gpt-3.5-turbo)
If tool execution is slow:
- External API timeout
- Database query taking too long
- Network issues
Solutions:
- Reduce context window (limit message history)
- Simplify system prompt
- Switch to faster model for simple queries
- Cache expensive tool calls
- Optimize external service calls
Issue 4: Token Usage Too High
Symptoms:
- Bills are higher than expected
- Message Audit shows high token counts per message
How to debug:
- Open Message Audit
- Sort by token usage (highest first)
- Identify patterns:
bashMessage with 5000 tokens (cost: $0.05): - Prompt tokens: 4500 - Completion tokens: 500 Why so high? → Full conversation history sent (100 messages) → Large system prompt (1000 tokens) → Tool descriptions (500 tokens each, 5 tools)
Solutions:
- Limit conversation history (e.g., last 10 messages only)
- Shorten system prompt
- Remove unused tools from configuration
- Use smaller model for simple queries
- Implement token-based conversation summarization
Issue 5: Agent Stopped Working After Update
Symptoms:
- Agent worked before, now broken
- All conversations failing
- Error messages in Message Audit
How to debug:
-
Check recent deployments:
bashflutch info <agent-id> -
Look at deployment history:
- What version is currently active?
- When was it deployed?
- What changed?
-
Check Message Audit for first failing message:
- Compare with last successful message
- What's different?
-
Rollback if needed:
bashflutch rollback <agent-id> --to-version 1.0.0 -
Fix issue in code, redeploy:
bashflutch deploy
Using CLI Logs
For real-time debugging, use CLI logs:
Stream Live Logs
bash# Follow logs in real-time flutch logs <agent-id> --follow # Output: # [2025-01-20 14:30:15] [INFO] New message received: thread-abc123 # [2025-01-20 14:30:16] [DEBUG] Invoking model: gpt-4 # [2025-01-20 14:30:17] [INFO] Response generated (245 tokens) # [2025-01-20 14:30:17] [INFO] Message sent to user
View Recent Logs
bash# Last 100 lines flutch logs <agent-id> --lines 100 # Last 24 hours flutch logs <agent-id> --since 24h # Only errors flutch logs <agent-id> --level error
Search Logs
bash# Find specific trace ID flutch logs <agent-id> --grep "trace-xyz789" # Find errors related to a tool flutch logs <agent-id> --grep "weather_tool" --level error
Save Logs for Analysis
bash# Save to file flutch logs <agent-id> --lines 1000 > debug.log # Share with team cat debug.log | grep ERROR
Advanced Debugging Techniques
Debug with Trace IDs
Every message has a trace ID. Use it to follow a request through the entire system:
- User reports issue: "My message at 2:30 PM didn't work"
- Find message in Message Audit around that time
- Copy trace ID:
trace-xyz789 - Search backend logs:
bash
flutch logs <agent-id> --grep "trace-xyz789" - See full request lifecycle:
[14:30:15] Request received: trace-xyz789 [14:30:16] Model invoked: gpt-4, temperature: 0.7 [14:30:16] Tool called: search_docs, query: "pricing" [14:30:16] Tool result: 3 documents found [14:30:17] Response generated: 245 tokens [14:30:17] Response sent to user
Compare Working vs Broken
When something breaks:
- Find last working conversation in Message Audit
- Find first broken conversation
- Compare side-by-side:
- System prompt (same?)
- Model settings (same?)
- Tools enabled (same?)
- Input format (same?)
- Error messages (what's new?)
Test Locally with Same Input
Reproduce issue locally by running your agent in development mode and sending the same message that failed in production.
Check External Services
If using tools that call external APIs:
- Verify API keys are valid
- Check service status pages
- Test API directly:
bash
curl -X GET "https://api.external-service.com/status" \ -H "Authorization: Bearer YOUR_API_KEY" - Check rate limits
- Verify network connectivity
Performance Monitoring
Token Usage Dashboard
Message Audit shows token statistics:
Per conversation:
- Total tokens used
- Cost estimate
- Average tokens per message
Per time period:
- Daily token usage
- Cost trends
- Most expensive conversations
Response Time Trends
Track response times over time:
bashAverage response time: - Last hour: 1.8s ✅ - Last 24h: 2.1s ✅ - Last 7d: 2.5s ⚠️ (trending up)
If trending up:
- Check if context window is growing
- Verify external tool performance
- Consider model optimization
Debugging Checklist
When something goes wrong, follow this checklist:
- Check Message Audit for error messages
- Look at trace ID and full request details
- Verify API keys are valid
- Check model settings (correct model, temperature)
- Review system prompt for issues
- Verify tools are configured correctly
- Test external services manually
- Check token usage for context bloat
- Compare with last working version
- Search CLI logs for trace ID
- Test locally with same input
- Check recent deployments
Common Error Codes
| Error Code | Meaning | Solution |
|---|---|---|
AUTH_FAILED | Invalid API key | Update key in settings |
RATE_LIMIT | Too many requests | Wait or upgrade plan |
TIMEOUT | Response took > 30s | Optimize prompt or context |
TOOL_ERROR | Tool execution failed | Check tool configuration |
INVALID_INPUT | Malformed message | Validate input format |
MODEL_ERROR | LLM service issue | Check provider status |
CONTEXT_TOO_LARGE | Too many tokens | Reduce context window |
Best Practices
1. Use Verbose Logging Locally
When developing locally, enable verbose logging in your agent code to see:
- Every tool call
- Model reasoning (if available)
- State changes
- Timing for each operation
2. Add Custom Logging
In your graph nodes:
Python:
pythonimport logging logger = logging.getLogger(__name__) async def my_node(state, config): logger.info(f"Processing message: {state['messages'][-1].content}") # ... node logic logger.debug(f"Generated response: {response}") return {"messages": [response]}
TypeScript:
typescriptimport {Logger} from '@nestjs/common'; export class MyNode { private readonly logger = new Logger(MyNode.name); async execute(state: State, config: Config) { this.logger.log(`Processing message: ${state.messages[state.messages.length - 1].content}`); // ... node logic this.logger.debug(`Generated response: ${response}`); return {messages: [response]}; } }
These logs appear in CLI output when using flutch logs.
3. Use Structured Logging
Log important data in structured format:
pythonlogger.info("Tool executed", extra={ "tool_name": "search_docs", "query": query, "results_count": len(results), "duration_ms": duration })
Makes it easier to search and analyze logs.
4. Monitor Key Metrics
Set up alerts for:
- Error rate > 5%
- Average response time > 10s
- Token usage spike (> 2x normal)
- Rate limit errors
5. Document Known Issues
Keep a runbook of common issues:
markdown## Issue: Agent forgets context **Symptoms:** Agent doesn't remember previous messages **Cause:** State not being passed correctly between nodes **Fix:** Verify state definition includes messages as `Annotated[list, add_messages]` **How to test:** Send 3 messages in same thread, verify 3rd references 1st
Troubleshooting Tips
"I don't see my conversation in Message Audit"
- Check if you're looking at the right agent
- Verify time filter (might be filtered out)
- Try searching by thread ID
- Check if conversation was actually created (vs just page loaded)
"Trace ID not found in logs"
- Logs might be delayed (wait 1-2 minutes)
- Check if log level is set high enough
- Verify you're searching the right agent
- Try
--followmode to see live logs
"Can't reproduce issue locally"
- Using different model version?
- Different API keys? (dev vs prod)
- Different system prompt? (hardcoded vs from UI)
"Too many logs, can't find issue"
- Use
--grepto filter:flutch logs --grep "ERROR" - Search for trace ID:
flutch logs --grep "trace-xyz789" - Filter by time:
flutch logs --since 10m - Save to file and use text editor search
Next Steps
- Fix Issues: Configure Agent to update settings
- Prevent Regressions: Acceptance Testing to catch bugs early
- Monitor Performance: Measures & Analytics for ongoing monitoring
Remember: Every bug is an opportunity to add a test case to your acceptance test suite!
Screenshots Needed
TODO: Add screenshots for:
- Message Audit list view
- Conversation details modal
- Debug information panel
- Error message example
- CLI logs output