Text Testing
Text testing runs RubricHQ’s AI-simulated caller against your agent over text instead of voice. It’s ~10× faster and far cheaper than voice testing — ideal for workflow validation, regression suites, and CI.
Text testing uses a single format: your agent exposes a WebSocket server speaking the X-RUBRIC protocol. Configure it under Agent → Channels → Text Channel. RubricHQ is the client — it connects out to your agent; your agent never connects to RubricHQ. When the conversation ends, RubricHQ records the transcript and evaluates your metrics automatically.
The X-RUBRIC WebSocket protocol gives you a secret handshake, scenario/run identification headers, end-of-conversation signaling, and function-call visibility in the transcript.
Custom WebSocket
What you set in RubricHQ (Agent → Channels → Text Channel)
These are values you enter once in the RubricHQ dashboard for the agent:
What RubricHQ sends to your server on connect
RubricHQ includes all of these HTTP headers on every connection (the X-… row is the only conditional one — it appears only when the scenario has matching metadata). You must validate X-RUBRIC-SECRET; the others identify the agent / scenario / run so you can route and correlate.
Messages — RubricHQ → your agent
Standard user turn:
End signal (RubricHQ decided the conversation is over):
Messages — your agent → RubricHQ
Standard agent reply:
Reply with metadata (merged into the run, last-write-wins; not a turn on its own):
Metadata-only frame (recorded, not treated as a turn):
Function call + result (recorded in the transcript as function_call / function_call_result, not treated as a turn — keep streaming until you send the actual reply):
End the conversation (either form):
…or simply close the socket — RubricHQ treats an unexpected close as an agent-initiated end.
Lifecycle & timeouts
- Either side may end via
{"type": "end_call"}, or your server can close the socket. - Per-turn idle timeout — if your agent doesn’t reply within Per-turn timeout, the run ends
failed(ended_reason: agent_timeout) with the partial transcript saved. - Max session duration — the whole conversation is capped; exceeding it ends the run
failed(max_duration). - A failed connection ends the run
connection_failed; a mid-conversation drop ends itconnection_lost.
A quick conversation (WebSocket)
A full session looks like this — each line is one frame; the label shows who sent it:
A run with a function call looks the same, but the agent streams the call/result frames before its spoken reply (they’re recorded in the transcript but don’t count as a turn):
Local development: expose your local server with a tunnel (e.g. ngrok) and use the wss:// URL in the WebSocket URL field.
Your reply comes from your agent
Your agent’s reply is produced by your real agent — an LLM, a Pipecat pipeline, a LangChain chain, your own model and tools. The WebSocket handler is only a thin bridge:
receive {"content": user_text} → your agent (LLM + your context / tools / state) → send {"content": reply}
So making an existing agent “WebSocket-ready” is just adding that bridge around your real brain. If your agent is built on Pipecat, you already have an LLM context and pipeline — feed the incoming text into your context, run the pipeline (text in → text out), and send the assistant’s text back. A full Pipecat reference ships in the voice-pipecat project, custom_ws_pipecat_agent.py: a real OpenAILLMContext + OpenAILLMService pipeline bridged to this protocol (vs. custom_ws_agent.py, the minimal hand-rolled version).
Example: a Pipecat agent
If your agent is a Pipecat pipeline, the bridge is small — push each incoming content into your LLM context, run the pipeline, and send the assistant’s text back:
That’s the heart of the full reference custom_ws_pipecat_agent.py, which additionally routes on X-RUBRIC-AGENT-ID and parses inline <dtmf/> tags. Any other framework (LangChain, your own model) follows the same shape: content in → your agent → content out.
DTMF (keypad input)
When a scenario enables the send_dtmf tool, the simulated caller can “press keypad digits” during a text conversation. Rather than a separate frame, it sends the digits as an inline tag embedded in a normal text message:
Your agent should parse <dtmf digits=\"...\"/> out of the message content, treat the matched digits as the caller’s keypad entry, and strip the tag before handling the text. Valid characters are 0-9, *, and #. A simple regex works:
The full message (tag included) is recorded in the transcript, so DTMF entries are visible in the run results.
Testing multiple agents on one endpoint
In RubricHQ you create one agent per bot you want to test. If those bots share a single endpoint, your server still needs to know which bot a given request/connection is for. RubricHQ tells you in two ways — pick whichever fits:
Option A — Read the X-RUBRIC-AGENT-ID header. Every WebSocket connection carries X-RUBRIC-AGENT-ID, the RubricHQ agent id. Map each agent id to your bot once (the id is stable per agent), then dispatch on the header:
Option B — Put the bot in the URL. Give each RubricHQ agent a distinct path or query as its WebSocket URL — works on the same host and port:
RubricHQ connects to exactly the URL configured for that agent, so your router reads the path/query and dispatches.
You can also attach your own identifier: any scenario metadata field whose name starts with X- is forwarded verbatim as a header, so e.g. a scenario field X-Bot-Id = billing arrives as the X-Bot-Id header. (Use this for per-scenario routing; use X-RUBRIC-AGENT-ID or the URL for per-agent routing.)
Validate X-RUBRIC-SECRET first, then route on X-RUBRIC-AGENT-ID (or the URL). The other headers (X-RUBRIC-SCENARIO-ID, X-RUBRIC-RUN-ID, X-RUBRIC-BATCH-ID) identify the specific scenario / run / batch within that agent.
Test your endpoint before going live
You don’t need to launch a full RubricHQ run to validate your endpoint. Three levels of dev testing:
1. Probe it yourself (no RubricHQ needed). This small script mimics exactly what RubricHQ sends — it connects with the X-RUBRIC-* headers, sends a few user turns, and prints every frame your agent returns. Point it at your wss:// URL (local or staging):
2. Dry-run one scenario in the app. Use Live Simulations to run a single scenario against your endpoint and watch the transcript and metrics live, instead of launching a full batch — ideal for the first real round-trip.
3. Local development. Expose your local server with a tunnel (e.g. ngrok http 8080) so RubricHQ — or the probe above — can reach wss://… while you iterate.
Once the probe and a single Live Simulation pass, you’re ready to run full batches in production.
Metrics on text runs
- Transcript / LLM metrics (compliance, resolution, custom LLM judges, etc.) run on text exactly as on voice.
- Audio metrics (latency, interruptions, WPM, voice clarity…) are automatically skipped for text — there’s no audio.
- When creating a custom metric, Applies to lets you choose Text and Voice conversations (default) or Text Conversation only. A text-only metric won’t run on voice; anything that runs on voice also runs on text.
Results
After a run completes you’ll see the transcript (including any function-call frames), the metric evaluations, and the end reason if the run failed (e.g. Agent did not respond). Text runs show no recording section.