Text Testing | RubricHQ Docs

Text testing runs RubricHQ’s AI-simulated caller against your agent over text instead of voice. It’s ~10× faster and far cheaper than voice testing — ideal for workflow validation, regression suites, and CI.

Text testing uses a single format: your agent exposes a WebSocket server speaking the X-RUBRIC protocol. Configure it under Agent → Channels → Text Channel. RubricHQ is the client — it connects out to your agent; your agent never connects to RubricHQ. When the conversation ends, RubricHQ records the transcript and evaluates your metrics automatically.

The X-RUBRIC WebSocket protocol gives you a secret handshake, scenario/run identification headers, end-of-conversation signaling, and function-call visibility in the transcript.

Custom WebSocket

What you set in RubricHQ (Agent → Channels → Text Channel)

These are values you enter once in the RubricHQ dashboard for the agent:

Field	Description
WebSocket URL	`wss://your-agent.com/ws` (must start with `wss://` in production)
Secret	A shared secret you choose. Store the same value in your server. RubricHQ sends it on every connection as the `X-RUBRIC-SECRET` header so your server can verify the caller is really RubricHQ.
Per-turn timeout (s)	How long RubricHQ waits for a reply to each message before failing the run. Default `120`.
Max session duration (s)	Hard cap on the whole conversation. Default `600`.

What RubricHQ sends to your server on connect

RubricHQ includes all of these HTTP headers on every connection (the X-… row is the only conditional one — it appears only when the scenario has matching metadata). You must validate X-RUBRIC-SECRET; the others identify the agent / scenario / run so you can route and correlate.

Header	Always sent	What it’s for
`X-RUBRIC-SECRET`	Yes	Authenticate the connection — compare to your stored secret; reject with WS close `1008` if it doesn’t match.
`X-RUBRIC-AGENT-ID`	Yes	Which RubricHQ agent this conversation is for. Use it to route when one endpoint serves multiple bots.
`X-RUBRIC-SCENARIO-ID`	Yes	The scenario being tested.
`X-RUBRIC-RUN-ID`	Yes	This test run’s id.
`X-RUBRIC-BATCH-ID`	Yes	The batch id (groups runs).
`X-…`	When set	Any scenario metadata field whose name starts with `X-`, forwarded verbatim — your own custom routing/context.

Messages — RubricHQ → your agent

Standard user turn:

1 { "content": "I'd like to set up a payment plan" }

End signal (RubricHQ decided the conversation is over):

1 { "content": "Thanks, goodbye!", "type": "end_call" }

Messages — your agent → RubricHQ

Standard agent reply:

1 { "content": "Sure — what monthly amount works for you?" }

Reply with metadata (merged into the run, last-write-wins; not a turn on its own):

1 { "content": "One moment…", "metadata": { "customer_id": "cust_123" } }

Metadata-only frame (recorded, not treated as a turn):

1 { "metadata": { "internal_call_id": "abc-123" } }

Function call + result (recorded in the transcript as function_call / function_call_result, not treated as a turn — keep streaming until you send the actual reply):

1 { "role": "Function Call", "data": { "id": "call_1", "name": "lookup_account", "arguments": "{\"id\":\"123\"}" } }
2 { "role": "Function Call Result", "data": { "id": "call_1", "result": "{\"balance\":420.00}" } }

End the conversation (either form):

1 { "content": "Goodbye!", "type": "end_call" }

…or simply close the socket — RubricHQ treats an unexpected close as an agent-initiated end.

Lifecycle & timeouts

Either side may end via {"type": "end_call"}, or your server can close the socket.
Per-turn idle timeout — if your agent doesn’t reply within Per-turn timeout, the run ends failed (ended_reason: agent_timeout) with the partial transcript saved.
Max session duration — the whole conversation is capped; exceeding it ends the run failed (max_duration).
A failed connection ends the run connection_failed; a mid-conversation drop ends it connection_lost.

A quick conversation (WebSocket)

A full session looks like this — each line is one frame; the label shows who sent it:

   RubricHQ connects  (header  X-RUBRIC-SECRET: shh)
your agent →  {"content": "Hi, thanks for contacting Acme. How can I help?"}
RubricHQ   →  {"content": "I'd like to set up a payment plan"}
your agent →  {"content": "Sure — what monthly amount works for you?"}
RubricHQ   →  {"content": "About $150 a month"}
your agent →  {"content": "Got it. Can you tell me a little more?"}
RubricHQ   →  {"content": "Actually that's all, thanks!", "type": "end_call"}
   conversation ends → RubricHQ records the transcript and runs your metrics

A run with a function call looks the same, but the agent streams the call/result frames before its spoken reply (they’re recorded in the transcript but don’t count as a turn):

RubricHQ   →  {"content": "What's my balance?"}
your agent →  {"role": "Function Call",        "data": {"id": "c1", "name": "lookup_account", "arguments": "{}"}}
your agent →  {"role": "Function Call Result", "data": {"id": "c1", "result": "{\"balance\":420.00}"}}
your agent →  {"content": "Your balance is $420.00."}

Local development: expose your local server with a tunnel (e.g. ngrok) and use the wss:// URL in the WebSocket URL field.

Your reply comes from your agent

Your agent’s reply is produced by your real agent — an LLM, a Pipecat pipeline, a LangChain chain, your own model and tools. The WebSocket handler is only a thin bridge:

receive {"content": user_text} → your agent (LLM + your context / tools / state) → send {"content": reply}

So making an existing agent “WebSocket-ready” is just adding that bridge around your real brain. If your agent is built on Pipecat, you already have an LLM context and pipeline — feed the incoming text into your context, run the pipeline (text in → text out), and send the assistant’s text back. A full Pipecat reference ships in the voice-pipecat project, custom_ws_pipecat_agent.py: a real OpenAILLMContext + OpenAILLMService pipeline bridged to this protocol (vs. custom_ws_agent.py, the minimal hand-rolled version).

Example: a Pipecat agent

If your agent is a Pipecat pipeline, the bridge is small — push each incoming content into your LLM context, run the pipeline, and send the assistant’s text back:

1 # pip install websockets pipecat-ai openai
2 import asyncio, json, os, websockets
3 from pipecat.frames.frames import LLMTextFrame, LLMFullResponseEndFrame
4 from pipecat.pipeline.pipeline import Pipeline
5 from pipecat.pipeline.runner import PipelineRunner
6 from pipecat.pipeline.task import PipelineTask
7 from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext, OpenAILLMContextFrame
8 from pipecat.processors.frame_processor import FrameProcessor
9 from pipecat.services.openai.llm import OpenAILLMService
10 
11 SECRET = os.environ["AGENT_SECRET"]
12 SYSTEM = "You are a helpful billing agent for Acme Corp. Be brief and professional."
13 
14 class ResponseCapture(FrameProcessor):
15     """Collect the assistant's streamed text; resolve one future per turn."""
16     def __init__(self):
17         super().__init__(); self._buf = ""; self._fut = None
18     def expect(self):
19         self._buf = ""; self._fut = asyncio.get_event_loop().create_future(); return self._fut
20     async def process_frame(self, frame, direction):
21         await super().process_frame(frame, direction)
22         if isinstance(frame, LLMTextFrame):
23             self._buf += frame.text or ""
24         elif isinstance(frame, LLMFullResponseEndFrame) and self._fut and not self._fut.done():
25             self._fut.set_result(self._buf.strip())
26         await self.push_frame(frame, direction)
27 
28 async def handler(ws):
29     if ws.request.headers.get("X-RUBRIC-SECRET") != SECRET:
30         await ws.close(1008, "Unauthorized"); return
31 
32     # Your existing Pipecat "brain": context + LLM + aggregator.
33     context = OpenAILLMContext(messages=[{"role": "system", "content": SYSTEM}])
34     llm = OpenAILLMService(api_key=os.environ["OPENAI_API_KEY"], model="gpt-4o-mini")
35     agg = llm.create_context_aggregator(context)        # appends assistant replies to context
36     capture = ResponseCapture()
37     task = PipelineTask(Pipeline([llm, capture, agg.assistant()]))
38     runner_task = asyncio.create_task(PipelineRunner(handle_sigint=False).run(task))
39 
40     async def ask(text):                                 # one user turn -> assistant reply
41         context.add_message({"role": "user", "content": text})
42         fut = capture.expect()
43         await task.queue_frame(OpenAILLMContextFrame(context))
44         return await asyncio.wait_for(fut, timeout=30)
45 
46     try:
47         await ws.send(json.dumps({"content": "Hi, thanks for contacting Acme. How can I help?"}))
48         async for raw in ws:
49             msg = json.loads(raw)
50             if msg.get("type") == "end_call":
51                 break
52             await ws.send(json.dumps({"content": await ask(msg.get("content", ""))}))
53     finally:
54         runner_task.cancel()
55 
56 async def main():
57     async with websockets.serve(handler, "0.0.0.0", 8080):
58         await asyncio.Future()
59 
60 asyncio.run(main())

That’s the heart of the full reference custom_ws_pipecat_agent.py, which additionally routes on X-RUBRIC-AGENT-ID and parses inline <dtmf/> tags. Any other framework (LangChain, your own model) follows the same shape: content in → your agent → content out.

DTMF (keypad input)

When a scenario enables the send_dtmf tool, the simulated caller can “press keypad digits” during a text conversation. Rather than a separate frame, it sends the digits as an inline tag embedded in a normal text message:

1 { "content": "Sure, entering it now. <dtmf digits=\"1234#\"/>" }

Your agent should parse <dtmf digits=\"...\"/> out of the message content, treat the matched digits as the caller’s keypad entry, and strip the tag before handling the text. Valid characters are 0-9, *, and #. A simple regex works:

1 import re
2 m = re.search(r'<dtmf\s+digits="([^"]*)"\s*/>', content)
3 digits = m.group(1) if m else None

The full message (tag included) is recorded in the transcript, so DTMF entries are visible in the run results.

Testing multiple agents on one endpoint

In RubricHQ you create one agent per bot you want to test. If those bots share a single endpoint, your server still needs to know which bot a given request/connection is for. RubricHQ tells you in two ways — pick whichever fits:

Option A — Read the X-RUBRIC-AGENT-ID header. Every WebSocket connection carries X-RUBRIC-AGENT-ID, the RubricHQ agent id. Map each agent id to your bot once (the id is stable per agent), then dispatch on the header:

1 agent_id = headers.get("X-RUBRIC-AGENT-ID")   # e.g. "7"
2 bot = MY_BOTS[agent_id]                         # your mapping

Option B — Put the bot in the URL. Give each RubricHQ agent a distinct path or query as its WebSocket URL — works on the same host and port:

wss://host/ws/billing      (or  wss://host/ws?bot=billing)

RubricHQ connects to exactly the URL configured for that agent, so your router reads the path/query and dispatches.

You can also attach your own identifier: any scenario metadata field whose name starts with X- is forwarded verbatim as a header, so e.g. a scenario field X-Bot-Id = billing arrives as the X-Bot-Id header. (Use this for per-scenario routing; use X-RUBRIC-AGENT-ID or the URL for per-agent routing.)

Validate X-RUBRIC-SECRET first, then route on X-RUBRIC-AGENT-ID (or the URL). The other headers (X-RUBRIC-SCENARIO-ID, X-RUBRIC-RUN-ID, X-RUBRIC-BATCH-ID) identify the specific scenario / run / batch within that agent.

Test your endpoint before going live

You don’t need to launch a full RubricHQ run to validate your endpoint. Three levels of dev testing:

1. Probe it yourself (no RubricHQ needed). This small script mimics exactly what RubricHQ sends — it connects with the X-RUBRIC-* headers, sends a few user turns, and prints every frame your agent returns. Point it at your wss:// URL (local or staging):

1 # rubric_ws_probe.py   —   pip install websockets
2 #   python rubric_ws_probe.py wss://localhost:8080/ws  your-secret
3 import asyncio, json, sys, websockets
4 
5 async def main(url, secret):
6     headers = {
7         "X-RUBRIC-SECRET": secret,
8         "X-RUBRIC-AGENT-ID": "probe",
9         "X-RUBRIC-SCENARIO-ID": "1",
10         "X-RUBRIC-RUN-ID": "1",
11         "X-RUBRIC-BATCH-ID": "1",
12     }
13     async with websockets.connect(url, additional_headers=headers) as ws:
14         print("connected:", url)
15         for turn in ["Hi, I'd like to check my balance", "My account is 12345", "No, that's all — thanks"]:
16             await ws.send(json.dumps({"content": turn})); print("→", turn)
17             while True:                         # your agent may send fn-call / metadata before its reply
18                 msg = json.loads(await asyncio.wait_for(ws.recv(), timeout=30)); print("←  ", msg)
19                 if msg.get("type") == "end_call": return
20                 if "content" in msg: break
21         await ws.send(json.dumps({"content": "", "type": "end_call"})); print("→ end_call")
22 
23 asyncio.run(main(sys.argv[1], sys.argv[2] if len(sys.argv) > 2 else ""))

2. Dry-run one scenario in the app. Use Live Simulations to run a single scenario against your endpoint and watch the transcript and metrics live, instead of launching a full batch — ideal for the first real round-trip.

3. Local development. Expose your local server with a tunnel (e.g. ngrok http 8080) so RubricHQ — or the probe above — can reach wss://… while you iterate.

Once the probe and a single Live Simulation pass, you’re ready to run full batches in production.

Metrics on text runs

Transcript / LLM metrics (compliance, resolution, custom LLM judges, etc.) run on text exactly as on voice.
Audio metrics (latency, interruptions, WPM, voice clarity…) are automatically skipped for text — there’s no audio.
When creating a custom metric, Applies to lets you choose Text and Voice conversations (default) or Text Conversation only. A text-only metric won’t run on voice; anything that runs on voice also runs on text.

Results

After a run completes you’ll see the transcript (including any function-call frames), the metric evaluations, and the end reason if the run failed (e.g. Agent did not respond). Text runs show no recording section.