Metrics API

Metrics API

Manage your evaluation metrics programmatically. Supports standard (audio), LLM-as-Judge, and Code-as-Judge metrics.

List Metrics

$GET /api/v1/metrics?client_id={workspace_id}&page=1

Optional filters:

ParameterDescription
agent_idFilter by agent
searchSearch by metric name
pagePage number

Response:

1{
2 "standard_metrics": [
3 {
4 "id": 212,
5 "name": "Voice Tone & Clarity",
6 "standard_metric_key": "voice_tone_clarity",
7 "result_type": "rating",
8 "metrics_type": "standard",
9 "is_custom": false
10 }
11 ],
12 "data": [
13 {
14 "id": 261,
15 "name": "Silence More Than 5 Seconds",
16 "metrics_type": "code_as_judge",
17 "result_type": "boolean",
18 "is_custom": true,
19 "scope": "global"
20 }
21 ],
22 "global_metrics": [],
23 "pagination": {
24 "current_page": 1,
25 "total_pages": 1,
26 "total_count": 6
27 }
28}

Create Metric

$POST /api/v1/metrics?client_id={workspace_id}

LLM-as-Judge Metric

1{
2 "metric": {
3 "name": "Authentication Compliance",
4 "metrics_type": "custom_evaluation",
5 "result_type": "boolean",
6 "llm_instructions": "You are an AI quality assurance analyst. Evaluate whether the agent properly verified the caller's identity before accessing account information...",
7 "mark_as_global": false,
8 "agent_ids": [210],
9 "scenario_ids": [1, 2, 3],
10 "evaluation_trigger": {
11 "type": "always"
12 },
13 "structured_output": [
14 {
15 "condition": "Identity verified before account access",
16 "set_metrics_name": "auth_status",
17 "as_metrics_expected_value": "pass",
18 "outcome_classification": "meets_expectations"
19 },
20 {
21 "condition": "Account accessed without verification",
22 "set_metrics_name": "auth_status",
23 "as_metrics_expected_value": "fail",
24 "outcome_classification": "requires_attention"
25 }
26 ]
27 }
28}

Code-as-Judge Metric

1{
2 "metric": {
3 "name": "Silence More Than 5 Seconds",
4 "metrics_type": "code_as_judge",
5 "result_type": "boolean",
6 "code_snippet": "silence = context[\"silence\"]\nsilences = silence.get(\"silences\", [])\nthreshold_ms = 5000\nlong_silences = [s for s in silences if s.get(\"duration_ms\", 0) > threshold_ms]\n\nif len(long_silences) == 0:\n metric[\"result\"] = True\n metric[\"explanation\"] = \"No silence periods exceeded 5 seconds\"\nelse:\n metric[\"result\"] = False\n metric[\"explanation\"] = str(len(long_silences)) + \" silence(s) exceeded 5s\"",
7 "mark_as_global": true,
8 "agent_ids": [210]
9 }
10}

Metric Types

metrics_typeDescription
custom_evaluationLLM evaluates transcript against your instructions
specific_text_/_phrase_checkCheck if a specific phrase appears in the transcript
permitted_options_checkCheck if agent mentioned options from a defined list
code_as_judgeYour Python code evaluates the call data

Result Types

result_typeValuesUse case
booleantrue / falsePass/fail checks
rating1 - 5Quality assessments
enumCustom valuesCategorical outcomes
numericAny numberCounts, scores, ratios

Get Metric

$GET /api/v1/metrics/{id}?client_id={workspace_id}

Returns metric with associated scenario_ids and agent_ids.

Update Metric

$PUT /api/v1/metrics/{id}?client_id={workspace_id}

Same body format as create. Note: metrics_type cannot be changed after creation (e.g., cannot switch from LLM-as-Judge to Code-as-Judge).

Delete Metric

$DELETE /api/v1/metrics/{id}?client_id={workspace_id}

Returns 204 No Content. Standard (system-defined) metrics cannot be deleted.

Generate Metrics with AI

$POST /api/v1/metrics/generate?client_id={workspace_id}
1{
2 "agent_id": 210,
3 "number_of_metrics": 8,
4 "additional_context": "Focus on compliance and authentication"
5}

Returns an array of generated metric definitions. The system checks existing metrics to avoid duplicates.

Bulk Create Metrics

$POST /api/v1/metrics/bulk_create?client_id={workspace_id}
1{
2 "agent_id": 210,
3 "metrics": [
4 {
5 "name": "Greeting Check",
6 "metrics_type": "code_as_judge",
7 "result_type": "boolean",
8 "code_snippet": "..."
9 },
10 {
11 "name": "Tone Assessment",
12 "metrics_type": "custom_evaluation",
13 "result_type": "rating",
14 "llm_instructions": "..."
15 }
16 ]
17}

Duplicate metric names are automatically skipped.

Validate Code Metric

$POST /api/v1/metrics/validate_code?client_id={workspace_id}
1{
2 "code": "metric[\"result\"] = True\nmetric[\"explanation\"] = \"test\""
3}

Response:

1{
2 "valid": true,
3 "errors": []
4}

Test Code Metric Against a Call

$POST /api/v1/metrics/test_code?client_id={workspace_id}
1{
2 "code": "silence = context[\"silence\"]\nmetric[\"result\"] = silence.get(\"count\", 0) < 10\nmetric[\"explanation\"] = \"Silence count: \" + str(silence.get(\"count\", 0))",
3 "call_observation_id": 82
4}

Response:

1{
2 "result": true,
3 "explanation": "Silence count: 32",
4 "metric_result": "pass",
5 "structured_output": {
6 "set_metrics_name": "silence_compliance",
7 "as_metrics_expected_value": "pass",
8 "outcome_classification": "meets_expectations"
9 },
10 "execution_time_ms": 0.2,
11 "error": null,
12 "call": {
13 "id": 82,
14 "identifier": "obs-sample-001",
15 "duration": 120
16 }
17}

Toggle Standard Metrics

Enable or disable standard audio metrics per agent:

$PUT /api/v1/agents/{agent_id}/standard_metrics?client_id={workspace_id}
1{
2 "disabled_metric_keys": ["voice_change_detection", "transcription_accuracy"]
3}

Toggle Custom Metrics

Enable or disable custom LLM/code metrics per agent:

$PUT /api/v1/agents/{agent_id}/llm_metrics?client_id={workspace_id}
1{
2 "disabled_llm_metric_ids": [261, 265]
3}