Debugging: Message Audit & Logs

Learn how to debug your AI agent using message audit tools and logs to identify and fix issues quickly.

What is Message Audit?

Message Audit is your debugging console for AI agents. It shows:

  • Every conversation with your agent
  • Full message history for each thread
  • Debug information: tokens, reasoning, timing, errors
  • Trace IDs to follow requests through the system
  • Model responses and tool calls

Think of it as your agent's black box recorder - when something goes wrong, you start here.

Accessing Message Audit

  1. Log in to console.flutch.ai
  2. Navigate to Agents → Select your agent
  3. Go to Message Audit tab

URL: https://console.flutch.ai/agents/{agentId}/message-audit

Understanding the Interface

Conversation List View

The main view shows all conversations:

bash
┌─────────────────────────────────────────────────┐
│ Thread ID              User    Time    Messages │
├─────────────────────────────────────────────────┤
│ thread-abc123         john    2m ago    5│ thread-def456         sarah   5m ago    12│ thread-ghi789         mike    10m ago   3└─────────────────────────────────────────────────┘

Columns:

  • Thread ID: Unique conversation identifier
  • User: User who started the conversation (if authenticated)
  • Time: When the last message was sent
  • Messages: Total message count in this thread

Filters:

  • By date range
  • By user ID
  • By error status (show only failed)
  • By thread ID (search)

Conversation Details View

Click on any thread to see full conversation:

bash
┌─────────────────────────────────────────────────┐
│ Thread: thread-abc123                           │
├─────────────────────────────────────────────────┤
│ 👤 User: What are your pricing plans?          │
│    ⏱️  Sent: 2025-01-20 14:30:15               │
├─────────────────────────────────────────────────┤
│ 🤖 Agent: We offer three pricing plans:        │
│    - Basic: $9/month                            │
│    - Pro: $49/month                             │
│    - Enterprise: Custom pricing                 │
│                                                 │
│    ⏱️  Generated: 2025-01-20 14:30:17          │
│    🔍 Trace ID: trace-xyz789                   │
│    💰 Tokens: 245 (prompt: 120, completion: 125)│    ⚡ Duration: 1.8s                            │
└─────────────────────────────────────────────────┘

Debug Information Per Message:

  • Trace ID: Unique identifier for this request
  • Token usage: Input and output tokens
  • Duration: Time to generate response
  • Model used: Which LLM model processed this
  • Temperature: Model temperature setting
  • Error details: If message failed, why?

Debugging Common Issues

Issue 1: Agent Not Responding

Symptoms:

  • User sends message, no response
  • Message appears in audit but agent message missing
  • Error indicator on conversation

How to debug:

  1. Open conversation in Message Audit
  2. Check for error message on agent response
  3. Look at trace ID and error details

Common causes:

bash
"API key invalid or expired"
→ Fix: Update API key in agent settings

"Model rate limit exceeded"
→ Fix: Wait or upgrade plan with LLM provider

"Timeout after 30 seconds"
→ Fix: Optimize system prompt, reduce context length

"Tool execution failed: [tool_name]"
→ Fix: Check tool configuration, ensure service is accessible

Issue 2: Wrong Responses

Symptoms:

  • Agent gives incorrect information
  • Agent doesn't follow system prompt
  • Agent ignores context or tools

How to debug:

  1. Check the system prompt being used:

    • Settings → System Prompt
    • Verify it's what you expect
  2. Check model settings:

    • Temperature too high? (> 0.9 = creative but unpredictable)
    • Wrong model? (gpt-3.5 vs gpt-4)
  3. Check conversation context:

    • Is previous context being sent correctly?
    • Are tool results being passed to model?
  4. Look at reasoning chains (if available):

    • What did the model "think" before responding?
    • Did it consider using a tool but decided not to?

Example debug session:

bash
User asked: "What's the status of order #12345?"

Agent responded: "I don't have access to order information."

Debug steps:
1. Check Message Audit → See trace ID: trace-abc123
2. Check tool calls → No tool was called
3. Check system prompt → Missing instruction to use order lookup tool
4. Fix: Update system prompt to mention order tool
5. Test again → Now works correctly

Issue 3: Slow Responses

Symptoms:

  • Users complain about wait time
  • Message Audit shows high duration (> 10s)

How to debug:

  1. Check duration in Message Audit for slow messages
  2. Identify bottleneck:

If model generation is slow (5-10s+):

  • Large context (too many previous messages)
  • Complex system prompt
  • Using slow model (gpt-4 vs gpt-3.5-turbo)

If tool execution is slow:

  • External API timeout
  • Database query taking too long
  • Network issues

Solutions:

  • Reduce context window (limit message history)
  • Simplify system prompt
  • Switch to faster model for simple queries
  • Cache expensive tool calls
  • Optimize external service calls

Issue 4: Token Usage Too High

Symptoms:

  • Bills are higher than expected
  • Message Audit shows high token counts per message

How to debug:

  1. Open Message Audit
  2. Sort by token usage (highest first)
  3. Identify patterns:
bash
Message with 5000 tokens (cost: $0.05):
- Prompt tokens: 4500
- Completion tokens: 500

Why so high?
→ Full conversation history sent (100 messages)
→ Large system prompt (1000 tokens)
→ Tool descriptions (500 tokens each, 5 tools)

Solutions:

  • Limit conversation history (e.g., last 10 messages only)
  • Shorten system prompt
  • Remove unused tools from configuration
  • Use smaller model for simple queries
  • Implement token-based conversation summarization

Issue 5: Agent Stopped Working After Update

Symptoms:

  • Agent worked before, now broken
  • All conversations failing
  • Error messages in Message Audit

How to debug:

  1. Check recent deployments:

    bash
    flutch info <agent-id>
  2. Look at deployment history:

    • What version is currently active?
    • When was it deployed?
    • What changed?
  3. Check Message Audit for first failing message:

    • Compare with last successful message
    • What's different?
  4. Rollback if needed:

    bash
    flutch rollback <agent-id> --to-version 1.0.0
  5. Fix issue in code, redeploy:

    bash
    flutch deploy

Using CLI Logs

For real-time debugging, use CLI logs:

Stream Live Logs

bash
# Follow logs in real-time
flutch logs <agent-id> --follow

# Output:
# [2025-01-20 14:30:15] [INFO] New message received: thread-abc123
# [2025-01-20 14:30:16] [DEBUG] Invoking model: gpt-4
# [2025-01-20 14:30:17] [INFO] Response generated (245 tokens)
# [2025-01-20 14:30:17] [INFO] Message sent to user

View Recent Logs

bash
# Last 100 lines
flutch logs <agent-id> --lines 100

# Last 24 hours
flutch logs <agent-id> --since 24h

# Only errors
flutch logs <agent-id> --level error

Search Logs

bash
# Find specific trace ID
flutch logs <agent-id> --grep "trace-xyz789"

# Find errors related to a tool
flutch logs <agent-id> --grep "weather_tool" --level error

Save Logs for Analysis

bash
# Save to file
flutch logs <agent-id> --lines 1000 > debug.log

# Share with team
cat debug.log | grep ERROR

Advanced Debugging Techniques

Debug with Trace IDs

Every message has a trace ID. Use it to follow a request through the entire system:

  1. User reports issue: "My message at 2:30 PM didn't work"
  2. Find message in Message Audit around that time
  3. Copy trace ID: trace-xyz789
  4. Search backend logs:
    bash
    flutch logs <agent-id> --grep "trace-xyz789"
  5. See full request lifecycle:
    [14:30:15] Request received: trace-xyz789
    [14:30:16] Model invoked: gpt-4, temperature: 0.7
    [14:30:16] Tool called: search_docs, query: "pricing"
    [14:30:16] Tool result: 3 documents found
    [14:30:17] Response generated: 245 tokens
    [14:30:17] Response sent to user
    

Compare Working vs Broken

When something breaks:

  1. Find last working conversation in Message Audit
  2. Find first broken conversation
  3. Compare side-by-side:
    • System prompt (same?)
    • Model settings (same?)
    • Tools enabled (same?)
    • Input format (same?)
    • Error messages (what's new?)

Test Locally with Same Input

Reproduce issue locally by running your agent in development mode and sending the same message that failed in production.

Check External Services

If using tools that call external APIs:

  1. Verify API keys are valid
  2. Check service status pages
  3. Test API directly:
    bash
    curl -X GET "https://api.external-service.com/status" \
      -H "Authorization: Bearer YOUR_API_KEY"
  4. Check rate limits
  5. Verify network connectivity

Performance Monitoring

Token Usage Dashboard

Message Audit shows token statistics:

Per conversation:

  • Total tokens used
  • Cost estimate
  • Average tokens per message

Per time period:

  • Daily token usage
  • Cost trends
  • Most expensive conversations

Track response times over time:

bash
Average response time:
- Last hour: 1.8s ✅
- Last 24h: 2.1s ✅
- Last 7d: 2.5s ⚠️ (trending up)

If trending up:

  • Check if context window is growing
  • Verify external tool performance
  • Consider model optimization

Debugging Checklist

When something goes wrong, follow this checklist:

  • Check Message Audit for error messages
  • Look at trace ID and full request details
  • Verify API keys are valid
  • Check model settings (correct model, temperature)
  • Review system prompt for issues
  • Verify tools are configured correctly
  • Test external services manually
  • Check token usage for context bloat
  • Compare with last working version
  • Search CLI logs for trace ID
  • Test locally with same input
  • Check recent deployments

Common Error Codes

Error CodeMeaningSolution
AUTH_FAILEDInvalid API keyUpdate key in settings
RATE_LIMITToo many requestsWait or upgrade plan
TIMEOUTResponse took > 30sOptimize prompt or context
TOOL_ERRORTool execution failedCheck tool configuration
INVALID_INPUTMalformed messageValidate input format
MODEL_ERRORLLM service issueCheck provider status
CONTEXT_TOO_LARGEToo many tokensReduce context window

Best Practices

1. Use Verbose Logging Locally

When developing locally, enable verbose logging in your agent code to see:

  • Every tool call
  • Model reasoning (if available)
  • State changes
  • Timing for each operation

2. Add Custom Logging

In your graph nodes:

Python:

python
import logging

logger = logging.getLogger(__name__)

async def my_node(state, config):
    logger.info(f"Processing message: {state['messages'][-1].content}")
    # ... node logic
    logger.debug(f"Generated response: {response}")
    return {"messages": [response]}

TypeScript:

typescript
import {Logger} from '@nestjs/common';

export class MyNode {
    private readonly logger = new Logger(MyNode.name);

    async execute(state: State, config: Config) {
        this.logger.log(`Processing message: ${state.messages[state.messages.length - 1].content}`);
        // ... node logic
        this.logger.debug(`Generated response: ${response}`);
        return {messages: [response]};
    }
}

These logs appear in CLI output when using flutch logs.

3. Use Structured Logging

Log important data in structured format:

python
logger.info("Tool executed", extra={
    "tool_name": "search_docs",
    "query": query,
    "results_count": len(results),
    "duration_ms": duration
})

Makes it easier to search and analyze logs.

4. Monitor Key Metrics

Set up alerts for:

  • Error rate > 5%
  • Average response time > 10s
  • Token usage spike (> 2x normal)
  • Rate limit errors

5. Document Known Issues

Keep a runbook of common issues:

markdown
## Issue: Agent forgets context

**Symptoms:** Agent doesn't remember previous messages

**Cause:** State not being passed correctly between nodes

**Fix:** Verify state definition includes messages as `Annotated[list, add_messages]`

**How to test:** Send 3 messages in same thread, verify 3rd references 1st

Troubleshooting Tips

"I don't see my conversation in Message Audit"

  • Check if you're looking at the right agent
  • Verify time filter (might be filtered out)
  • Try searching by thread ID
  • Check if conversation was actually created (vs just page loaded)

"Trace ID not found in logs"

  • Logs might be delayed (wait 1-2 minutes)
  • Check if log level is set high enough
  • Verify you're searching the right agent
  • Try --follow mode to see live logs

"Can't reproduce issue locally"

  • Using different model version?
  • Different API keys? (dev vs prod)
  • Different system prompt? (hardcoded vs from UI)

"Too many logs, can't find issue"

  • Use --grep to filter: flutch logs --grep "ERROR"
  • Search for trace ID: flutch logs --grep "trace-xyz789"
  • Filter by time: flutch logs --since 10m
  • Save to file and use text editor search

Next Steps


Remember: Every bug is an opportunity to add a test case to your acceptance test suite!

Screenshots Needed

TODO: Add screenshots for:

  • Message Audit list view
  • Conversation details modal
  • Debug information panel
  • Error message example
  • CLI logs output