Metrics & Evaluation
Metrics & Evaluation
Metrics & Evaluation
RubricHQ evaluates every call using three types of metrics, each with different cost and capability trade-offs.
Evaluation Pipeline
When a call is analyzed, metrics execute in this order:
Code-as-Judge metrics execute last so they can reference results from audio and LLM metrics.
Code-as-Judge
Code-as-Judge lets you write Python code that evaluates calls programmatically. Your code runs in a secure sandbox with no imports, filesystem, or network access — just pure Python logic against the call data.
How It Works
- You write Python code in the metric editor
- Your code receives a
contextdict with all call data - You set
metric["result"]andmetric["explanation"] - Optionally set
structured_outputfor classification - The code runs in under 5 seconds per metric
Output Variables
Your code must set these:
Optionally, set structured output for classification:
Available Context
Your code accesses call data via the context dict. Press / in the code editor to browse all available attributes.
Transcript roles vary by source. Common values: "assistant", "AI Assistant", "user", "User", "bot".
Use case-insensitive matching: t.get("role", "").lower() in ["assistant", "ai assistant", "bot", "agent"]
Examples
Silence Detection (Boolean)
Check if any silence period exceeds 5 seconds:
Average Latency Check (Boolean)
Fail if average response latency exceeds thresholds:
Agent Greeting Check (Boolean)
Verify the agent greets the caller:
Speech Speed Assessment (Rating 1-5)
Rate the agent’s speech speed:
Interruption Count (Numeric)
Count total interruptions and classify severity:
Voice Tone Quality (Rating 1-5)
Check voice clarity and tone scores:
Call Duration Check (Boolean)
Ensure call duration is within acceptable range:
Conversation Turn Count (Numeric)
Count conversation turns and flag abnormal lengths:
Call End Reason Check (Boolean)
Verify the call ended normally:
Agent Identifies Themselves (Boolean)
Check if the agent introduces themselves in the first few turns:
Metadata Presence Check (Boolean)
Verify that call metadata is attached:
Cross-Metric Check (Using Other Metric Results)
Check if multiple metrics passed together:
Testing Your Code
Use the Test with Call button in the code editor to run your code against a real call before saving. The test:
- Loads a call from your selected agent (with all computed metrics)
- Executes your code in the same sandbox used in production
- Shows the result, explanation, structured output, and execution time
- Highlights errors if your code has bugs
Tips
- Use
context.get("key", default)for safe access when data may not be available - Transcript roles vary by source — always use case-insensitive matching against multiple role names
- Silence uses
duration_ms(milliseconds), not seconds - Latency uses
avg_msfor the average - Avoid f-strings with quotes inside — use string concatenation instead (sandbox limitation)
next(),iter(),reversed()are available alongside standard builtins- Keep code simple and fast — 5 second timeout per metric
- Code metrics have access to all other computed metric results via
context["metrics_results"]