GitHub Actions CI/CD
GitHub Actions CI/CD
Run your RubricHQ agent test suites automatically on every push or pull request and block deploys when agents regress. The integration works via a published GitHub Action (RubricHQ/agent-test-action) or through the raw REST API — use the API path for GitLab CI, Jenkins, CircleCI, or any other CI platform.
Prerequisites
- An agent with at least one scenario in RubricHQ.
- A RubricHQ API key — Settings → API Keys → Create key.
- In your GitHub repo:
- Secret
RUBRICHQ_API_KEY— your API key (Settings → Secrets and variables → Actions → New repository secret). - Variable
RUBRICHQ_AGENT_ID— the numeric ID of the agent to test (Settings → Secrets and variables → Actions → Variables tab → New repository variable).
- Secret
Using the GitHub Action
Quick-start workflow
Add this file to .github/workflows/agent-tests.yml in your repository:
The step exits 0 when the verdict is passed and 1 when it is failed or the timeout is reached — so the job fails exactly when your agents regress.
Inputs
scenario_ids is required. tags is optional — when provided, matching scenarios are added on top (union).
Outputs
Testing over web, phone, or text
The channel input controls how each scenario is run:
web— a browser/WebSocket voice call.phone— a real phone call placed over Twilio (the agent must have a phone number configured).text— a text-only conversation.
When channel is omitted, it defaults to phone if the agent has a phone number, otherwise web.
Run a single channel by setting channel on the step:
To exercise both web and phone on every push, use two jobs — each reports its own verdict, and either failing blocks the deploy:
The two jobs run in parallel. To stop a flaky phone job from blocking the deploy while you stabilize it, add continue-on-error: true to the phone job’s step — it still reports a verdict but won’t fail the workflow.
Gating deploys
Add a deploy job that only runs after agent-tests passes:
If agent-tests fails, deploy is skipped automatically.
Staging → production pipeline
Use two sequential jobs with different secrets to promote only builds that pass in staging first:
Using the API directly
For GitLab CI, Jenkins, CircleCI, or any CI system that can run shell commands, call the REST API directly.
Trigger a test run
The API returns 202 Accepted immediately — the run is queued, not yet complete:
Request fields
scenario_ids is required; tags is an optional additional filter.
Poll for the verdict
Voice scenarios take several minutes each. Poll the status_url until verdict is no longer pending:
Status response
How the verdict works
A run passes when all of its critical metrics pass (runs with no critical metrics always pass on that dimension). Calls that error out count as failed runs.
The pass rate is passed_runs / total_runs × 100. The verdict is passed when pass_rate >= success_threshold, and failed otherwise.
The verdict stays "pending" until every call in the batch has completed and all metric evaluations for those calls have finished. Poll until the verdict is no longer "pending" — don’t rely on status == "completed" alone, because metric evaluation runs asynchronously after call completion.
Troubleshooting
Use ci_metadata to attach {"sha": "$COMMIT_SHA", "branch": "$BRANCH", "ci_run": "$CI_RUN_URL"} to every test run. It’s stored on the run and echoed back in status responses, so you can trace any dashboard report back to the exact commit that triggered it.