Metrics API

Manage your evaluation metrics programmatically. Supports standard (audio), LLM-as-Judge, and Code-as-Judge metrics.

List Metrics

$ GET /api/v1/metrics?client_id={workspace_id}&page=1

Optional filters:

Parameter	Description
`agent_id`	Filter by agent
`search`	Search by metric name
`page`	Page number

Response:

1 {
2   "standard_metrics": [
3     {
4       "id": 212,
5       "name": "Voice Tone & Clarity",
6       "standard_metric_key": "voice_tone_clarity",
7       "result_type": "rating",
8       "metrics_type": "standard",
9       "is_custom": false
10     }
11   ],
12   "data": [
13     {
14       "id": 261,
15       "name": "Silence More Than 5 Seconds",
16       "metrics_type": "code_as_judge",
17       "result_type": "boolean",
18       "is_custom": true,
19       "scope": "global"
20     }
21   ],
22   "global_metrics": [],
23   "pagination": {
24     "current_page": 1,
25     "total_pages": 1,
26     "total_count": 6
27   }
28 }

Create Metric

$ POST /api/v1/metrics?client_id={workspace_id}

LLM-as-Judge Metric

1 {
2   "metric": {
3     "name": "Authentication Compliance",
4     "metrics_type": "custom_evaluation",
5     "result_type": "boolean",
6     "llm_instructions": "You are an AI quality assurance analyst. Evaluate whether the agent properly verified the caller's identity before accessing account information...",
7     "mark_as_global": false,
8     "agent_ids": [210],
9     "scenario_ids": [1, 2, 3],
10     "evaluation_trigger": {
11       "type": "always"
12     },
13     "structured_output": [
14       {
15         "condition": "Identity verified before account access",
16         "set_metrics_name": "auth_status",
17         "as_metrics_expected_value": "pass",
18         "outcome_classification": "meets_expectations"
19       },
20       {
21         "condition": "Account accessed without verification",
22         "set_metrics_name": "auth_status",
23         "as_metrics_expected_value": "fail",
24         "outcome_classification": "requires_attention"
25       }
26     ]
27   }
28 }

Code-as-Judge Metric

1 {
2   "metric": {
3     "name": "Silence More Than 5 Seconds",
4     "metrics_type": "code_as_judge",
5     "result_type": "boolean",
6     "code_snippet": "silence = context[\"silence\"]\nsilences = silence.get(\"silences\", [])\nthreshold_ms = 5000\nlong_silences = [s for s in silences if s.get(\"duration_ms\", 0) > threshold_ms]\n\nif len(long_silences) == 0:\n    metric[\"result\"] = True\n    metric[\"explanation\"] = \"No silence periods exceeded 5 seconds\"\nelse:\n    metric[\"result\"] = False\n    metric[\"explanation\"] = str(len(long_silences)) + \" silence(s) exceeded 5s\"",
7     "mark_as_global": true,
8     "agent_ids": [210]
9   }
10 }

Metric Types

`metrics_type`	Description
`custom_evaluation`	LLM evaluates transcript against your instructions
`specific_text_/_phrase_check`	Check if a specific phrase appears in the transcript
`permitted_options_check`	Check if agent mentioned options from a defined list
`code_as_judge`	Your Python code evaluates the call data

Result Types

`result_type`	Values	Use case
`boolean`	`true` / `false`	Pass/fail checks
`rating`	`1` - `5`	Quality assessments
`enum`	Custom values	Categorical outcomes
`numeric`	Any number	Counts, scores, ratios

Get Metric

$ GET /api/v1/metrics/{id}?client_id={workspace_id}

Returns metric with associated scenario_ids and agent_ids.

Update Metric

$ PUT /api/v1/metrics/{id}?client_id={workspace_id}

Same body format as create. Note: metrics_type cannot be changed after creation (e.g., cannot switch from LLM-as-Judge to Code-as-Judge).

Delete Metric

$ DELETE /api/v1/metrics/{id}?client_id={workspace_id}

Returns 204 No Content. Standard (system-defined) metrics cannot be deleted.

Generate Metrics with AI

$ POST /api/v1/metrics/generate?client_id={workspace_id}

1 {
2   "agent_id": 210,
3   "number_of_metrics": 8,
4   "additional_context": "Focus on compliance and authentication"
5 }

Returns an array of generated metric definitions. The system checks existing metrics to avoid duplicates.

Bulk Create Metrics

$ POST /api/v1/metrics/bulk_create?client_id={workspace_id}

1 {
2   "agent_id": 210,
3   "metrics": [
4     {
5       "name": "Greeting Check",
6       "metrics_type": "code_as_judge",
7       "result_type": "boolean",
8       "code_snippet": "..."
9     },
10     {
11       "name": "Tone Assessment",
12       "metrics_type": "custom_evaluation",
13       "result_type": "rating",
14       "llm_instructions": "..."
15     }
16   ]
17 }

Duplicate metric names are automatically skipped.

Validate Code Metric

$ POST /api/v1/metrics/validate_code?client_id={workspace_id}

1 {
2   "code": "metric[\"result\"] = True\nmetric[\"explanation\"] = \"test\""
3 }

Response:

1 {
2   "valid": true,
3   "errors": []
4 }

Test Code Metric Against a Call

$ POST /api/v1/metrics/test_code?client_id={workspace_id}

1 {
2   "code": "silence = context[\"silence\"]\nmetric[\"result\"] = silence.get(\"count\", 0) < 10\nmetric[\"explanation\"] = \"Silence count: \" + str(silence.get(\"count\", 0))",
3   "call_observation_id": 82
4 }

Response:

1 {
2   "result": true,
3   "explanation": "Silence count: 32",
4   "metric_result": "pass",
5   "structured_output": {
6     "set_metrics_name": "silence_compliance",
7     "as_metrics_expected_value": "pass",
8     "outcome_classification": "meets_expectations"
9   },
10   "execution_time_ms": 0.2,
11   "error": null,
12   "call": {
13     "id": 82,
14     "identifier": "obs-sample-001",
15     "duration": 120
16   }
17 }

Toggle Standard Metrics

Enable or disable standard audio metrics per agent:

$ PUT /api/v1/agents/{agent_id}/standard_metrics?client_id={workspace_id}

1 {
2   "disabled_metric_keys": ["voice_change_detection", "transcription_accuracy"]
3 }

Toggle Custom Metrics

Enable or disable custom LLM/code metrics per agent:

$ PUT /api/v1/agents/{agent_id}/llm_metrics?client_id={workspace_id}

1 {
2   "disabled_llm_metric_ids": [261, 265]
3 }