GitHub Actions CI/CD | RubricHQ Docs

Run your RubricHQ agent test suites automatically on every push or pull request and block deploys when agents regress. The integration works via a published GitHub Action (RubricHQ/agent-test-action) or through the raw REST API — use the API path for GitLab CI, Jenkins, CircleCI, or any other CI platform.

Prerequisites

An agent with at least one scenario in RubricHQ.
A RubricHQ API key — Settings → API Keys → Create key.
In your GitHub repo:
- Secret RUBRICHQ_API_KEY — your API key (Settings → Secrets and variables → Actions → New repository secret).
- Variable RUBRICHQ_AGENT_ID — the numeric ID of the agent to test (Settings → Secrets and variables → Actions → Variables tab → New repository variable).

Using the GitHub Action

Quick-start workflow

Add this file to .github/workflows/agent-tests.yml in your repository:

1 name: Agent Tests
2 on:
3   push:
4     branches: [main]
5   workflow_dispatch:
6 
7 jobs:
8   agent-tests:
9     runs-on: ubuntu-latest
10     steps:
11       - name: Run RubricHQ agent tests
12         uses: RubricHQ/agent-test-action@v1
13         with:
14           api_key: ${{ secrets.RUBRICHQ_API_KEY }}
15           agent_id: ${{ vars.RUBRICHQ_AGENT_ID }}
16           scenario_ids: "12,15,22"

The step exits 0 when the verdict is passed and 1 when it is failed or the timeout is reached — so the job fails exactly when your agents regress.

Inputs

Input	Required	Default	Description
`api_key`	yes	—	RubricHQ API key. Use a secret: `${{ secrets.RUBRICHQ_API_KEY }}`.
`agent_id`	yes	—	Numeric ID of the agent to test.
`scenario_ids`	yes	—	Comma-separated list of scenario IDs to run (e.g. `12,15,22`).
`tags`	no	—	Optional. Tags to also include, on top of `scenario_ids` (union).
`frequency`	no	`1`	How many times to run each scenario. Accepts `1`–`5`.
`success_threshold`	no	`100`	Minimum pass-rate (0–100) required for the verdict to be `passed`.
`timeout`	no	`3600`	Seconds to wait for the run to complete before failing the step.
`poll_interval`	no	`15`	Seconds between status-poll requests.
`api_url`	no	`https://api.rubrichq.io`	Override for self-hosted or staging deployments.

scenario_ids is required. tags is optional — when provided, matching scenarios are added on top (union).

Outputs

Output	Description
`test_run_id`	Numeric ID of the test run created.
`verdict`	`passed` or `failed`.
`pass_rate`	Percentage of runs that passed (e.g. `80.0`).
`report_url`	Link to the full run report in the RubricHQ dashboard.

Testing over web, phone, or text

The channel input controls how each scenario is run:

web — a browser/WebSocket voice call.
phone — a real phone call placed over Twilio (the agent must have a phone number configured).
text — a text-only conversation.

When channel is omitted, it defaults to phone if the agent has a phone number, otherwise web.

Run a single channel by setting channel on the step:

1       - uses: RubricHQ/agent-test-action@v1
2         with:
3           api_key: ${{ secrets.RUBRICHQ_API_KEY }}
4           agent_id: ${{ vars.RUBRICHQ_AGENT_ID }}
5           scenario_ids: "12,15,22"
6           channel: web        # or: phone, text

To exercise both web and phone on every push, use two jobs — each reports its own verdict, and either failing blocks the deploy:

1 jobs:
2   web-tests:
3     runs-on: ubuntu-latest
4     steps:
5       - uses: RubricHQ/agent-test-action@v1
6         with:
7           api_key: ${{ secrets.RUBRICHQ_API_KEY }}
8           agent_id: ${{ vars.RUBRICHQ_AGENT_ID }}
9           scenario_ids: "12,15,22"
10           channel: web
11 
12   phone-tests:
13     runs-on: ubuntu-latest
14     steps:
15       - uses: RubricHQ/agent-test-action@v1
16         with:
17           api_key: ${{ secrets.RUBRICHQ_API_KEY }}
18           agent_id: ${{ vars.RUBRICHQ_AGENT_ID }}
19           scenario_ids: "12,15,22"
20           channel: phone

The two jobs run in parallel. To stop a flaky phone job from blocking the deploy while you stabilize it, add continue-on-error: true to the phone job’s step — it still reports a verdict but won’t fail the workflow.

Gating deploys

Add a deploy job that only runs after agent-tests passes:

1 jobs:
2   agent-tests:
3     runs-on: ubuntu-latest
4     steps:
5       - name: Run RubricHQ agent tests
6         uses: RubricHQ/agent-test-action@v1
7         with:
8           api_key: ${{ secrets.RUBRICHQ_API_KEY }}
9           agent_id: ${{ vars.RUBRICHQ_AGENT_ID }}
10           scenario_ids: "12,15,22"
11 
12   deploy:
13     needs: agent-tests
14     runs-on: ubuntu-latest
15     steps:
16       - name: Deploy
17         run: ./scripts/deploy.sh

If agent-tests fails, deploy is skipped automatically.

Staging → production pipeline

Use two sequential jobs with different secrets to promote only builds that pass in staging first:

1 jobs:
2   test-staging:
3     runs-on: ubuntu-latest
4     steps:
5       - name: Run agent tests against staging
6         uses: RubricHQ/agent-test-action@v1
7         with:
8           api_key: ${{ secrets.RUBRICHQ_API_KEY_STAGING }}
9           agent_id: ${{ vars.RUBRICHQ_AGENT_ID_STAGING }}
10           scenario_ids: "12,15,22"
11           success_threshold: 90
12 
13   test-production:
14     needs: test-staging
15     runs-on: ubuntu-latest
16     steps:
17       - name: Run agent tests against production
18         uses: RubricHQ/agent-test-action@v1
19         with:
20           api_key: ${{ secrets.RUBRICHQ_API_KEY_PROD }}
21           agent_id: ${{ vars.RUBRICHQ_AGENT_ID_PROD }}
22           scenario_ids: "12,15,22"
23           success_threshold: 100

Using the API directly

For GitLab CI, Jenkins, CircleCI, or any CI system that can run shell commands, call the REST API directly.

Trigger a test run

$ curl -sS -X POST https://api.rubrichq.io/api/public/v1/test_runs \
>   -H "Authorization: Bearer $RUBRICHQ_API_KEY" \
>   -H "Content-Type: application/json" \
>   -d '{"agent_id": 1, "scenario_ids": [12, 15, 22], "success_threshold": 90}'

The API returns 202 Accepted immediately — the run is queued, not yet complete:

1 {
2   "test_run_id": 42,
3   "status": "pending",
4   "run_count": 5,
5   "scenario_count": 5,
6   "frequency": 1,
7   "success_threshold": 90,
8   "status_url": "https://api.rubrichq.io/api/public/v1/test_runs/42",
9   "report_url": "https://app.rubrichq.io/batch-run/42"
10 }

Request fields

Field	Type	Required	Description
`agent_id`	integer	yes	The agent to test.
`scenario_ids`	int array or comma-separated string	yes	Scenarios to run.
`tags`	string array or comma-separated string	no	Optional. Also include scenarios matching these tags (union).
`frequency`	integer	no	Runs per scenario (`1`–`5`, default `1`).
`success_threshold`	integer	no	Minimum pass-rate to pass (`0`–`100`, default `100`).
`testing_mode`	string	no	`voice` or `text` (default `voice`).
`channel`	string	no	`phone`, `web`, or `text`. Defaults to `phone` when the agent has a phone number, otherwise `web`.
`name`	string	no	Human-readable label for the run (shows in the dashboard).
`ci_metadata`	object	no	Arbitrary JSON stored with the run and echoed back in status responses. Put commit SHA, branch name, and CI run URL here for traceability.

scenario_ids is required; tags is an optional additional filter.

Poll for the verdict

Voice scenarios take several minutes each. Poll the status_url until verdict is no longer pending:

$ TEST_RUN_ID=42
$ 
$ while true; do
$   verdict=$(curl -sS "https://api.rubrichq.io/api/public/v1/test_runs/$TEST_RUN_ID" \
>     -H "Authorization: Bearer $RUBRICHQ_API_KEY" | python3 -c 'import json,sys; print(json.load(sys.stdin)["verdict"])')
$   [ "$verdict" != "pending" ] && break
$   sleep 15
$ done
$ 
$ [ "$verdict" = "passed" ] || exit 1

Status response

1 {
2   "id": 42,
3   "status": "completed",
4   "verdict": "failed",
5   "pass_rate": 80.0,
6   "success_threshold": 90,
7   "runs": {
8     "total": 5,
9     "completed": 5,
10     "running": 0,
11     "pending": 0,
12     "failed": 0,
13     "passed": 4
14   },
15   "failed_runs": [
16     {
17       "run_id": 207,
18       "scenario_name": "Angry refund caller",
19       "status": "completed",
20       "reason": "critical metric failed: Greeting Check"
21     }
22   ],
23   "ci_metadata": { "sha": "abc123", "branch": "main" },
24   "report_url": "https://app.rubrichq.io/batch-run/42",
25   "created_at": "2026-06-11T10:00:00Z",
26   "updated_at": "2026-06-11T10:12:31Z"
27 }

How the verdict works

A run passes when all of its critical metrics pass (runs with no critical metrics always pass on that dimension). Calls that error out count as failed runs.

The pass rate is passed_runs / total_runs × 100. The verdict is passed when pass_rate >= success_threshold, and failed otherwise.

The verdict stays "pending" until every call in the batch has completed and all metric evaluations for those calls have finished. Poll until the verdict is no longer "pending" — don’t rely on status == "completed" alone, because metric evaluation runs asynchronously after call completion.

Troubleshooting

Symptom	Likely cause	Fix
Step times out before run completes	Large suite with long voice scenarios (each can take several minutes).	Raise the `timeout` input (Action) or extend your CI job’s timeout. A 20-scenario suite at `frequency: 2` can easily run for 40+ minutes.
`"No scenarios matched tags"` error	Tags are exact-match strings — case and whitespace matter.	Check the scenario tags in the RubricHQ app under Scenarios and make sure they match exactly what you’re passing.
HTTP `402 Payment Required`	The workspace has run out of credits.	Top up credits in Settings → Billing, then re-run.
HTTP `422 Unprocessable Entity` listing specific scenario IDs	Those scenario IDs are not attached to the agent, or the scenarios are archived.	Verify the scenario IDs under Agent → Scenarios; archived scenarios must be restored before they can be run.

Use ci_metadata to attach {"sha": "$COMMIT_SHA", "branch": "$BRANCH", "ci_run": "$CI_RUN_URL"} to every test run. It’s stored on the run and echoed back in status responses, so you can trace any dashboard report back to the exact commit that triggered it.