Hub/Docs

Evals

Trace logging and the /evals dashboard for monitoring Discord bot interactions.

Note: The evals system is built and ready. We need community members actively interacting with Slop and the wiki-base before we'll have meaningful data to evaluate. Once we have real usage, this becomes the primary feedback loop for improving bot quality.

Trace Logging

Every Discord bot interaction logs a full trace to the chats table in Turso.

What Gets Logged

FieldDescription
Thread IDDiscord thread where the interaction happened
User messageThe original user prompt
Bot responseFull text of Slop's reply
Tool callsEach tool call: name, arguments, result summary, duration (ms)
Retrieval methodagentic, smalltalk, or latest_node_lookup
ModelLLM model used (e.g. claude-sonnet-4-6)
LatencyTotal request time in milliseconds
Member IDIf the user has /joined, their member node ID
Discord contextChannel, guild, thread metadata

Tool Call Traces

Each tool call in the agentic loop is captured individually:

{
  "tool": "slop_search_nodes",
  "args": { "query": "Terminal-Bench", "limit": 5 },
  "result_preview": "Found 3 nodes...",
  "duration_ms": 142
}

This gives full visibility into what the LLM searched for, what it found, and how long each step took.


Evals Dashboard

Evals Dashboard

The /evals page on the web app provides a visual interface for reviewing bot interaction traces.

Features

  • Chronological feed of all bot interactions
  • Expandable traces showing the full tool-calling chain for each interaction
  • Tool call details: arguments, results, timing for each tool call
  • Filter by retrieval method: see only agentic, smalltalk, or kickoff interactions
  • Response quality review: compare what the bot said against what was in the wiki-base

Access

Navigate to /evals on the web app. Requires the app to be running (not available in readonly mode).


Trace Metadata

The full metadata JSON stored per interaction:

{
  "discord_user_id": "123456789",
  "discord_username": "swyx",
  "discord_channel_id": "987654321",
  "discord_message_id": "111222333",
  "retrieval_method": "mcp:search_content",
  "context_node_ids": [42, 87, 153],
  "tool_calls": [
    {
      "tool": "slop_search_content",
      "args": { "query": "transformer architecture", "limit": 6 },
      "result": { "results_count": 5 },
      "duration_ms": 230
    }
  ],
  "member_id": 55,
  "model": "anthropic/claude-sonnet-4-6",
  "is_slash_command": false,
  "is_kickoff": false,
  "response_length": 1847,
  "latency_ms": 2300
}

Tool call results are summarized to keep row sizes reasonable: large results are reduced to counts like { "results_count": 5 }.

Querying Traces via SQL

-- Recent Discord traces
SELECT id, user_message, created_at,
       json_extract(metadata, '$.discord_username') as username,
       json_extract(metadata, '$.latency_ms') as latency,
       json_extract(metadata, '$.retrieval_method') as method
FROM chats
WHERE chat_type = 'discord'
ORDER BY created_at DESC
LIMIT 20;

-- Traces with tool errors
SELECT id, user_message, json_extract(metadata, '$.tool_calls') as tools
FROM chats
WHERE chat_type = 'discord'
  AND json_extract(metadata, '$.tool_calls') LIKE '%"error"%';

-- Average latency by retrieval method
SELECT json_extract(metadata, '$.retrieval_method') as method,
       COUNT(*) as count,
       AVG(json_extract(metadata, '$.latency_ms')) as avg_latency_ms
FROM chats
WHERE chat_type = 'discord'
GROUP BY method;