Equistamp 0.0.1

3rd Party AI Evaluation Service Setting & Protecting the Global Standard of AI Safety

Endpoints

GET /auth

Description

Get the current user.

Use the fields parameter if you only want specific fields. This can also be used to get a long lived API token, e.g.:

import requests

res = requests.put(
    'https://equistamp.net/auth',
        json={'email': '<your email address>', 'password': '<your password>'}
)
if res.status_code == 403:
    raise ValueError(f'Invalid email or password: {res.json()}')

session_token = res.json()['session_token']

res = requests.get(
    'https://equistamp.net/auth',
    headers={'Session-Token': session_token},
    params={'fields': 'api_token'}
)

if res.status_code != 200:
    raise ValueError(res.json())

api_token = res.json()['api_token]

Input parameters

Parameter	In	Type	Default	Nullable	Description
`fields`	path			No	Specific fields to be returned in the response, separated by commas - if this is used, only the specified fields will be returned

Responses

200 OK401 Unauthorized403 Forbidden404 Not Found500 Internal Server Error

application/json

{
    "id": "f801655d-5f3c-492c-b815-86105e52d772",
    "email_address": "mr.blobby@some.domain",
    "user_name": "mr_blobby",
    "full_name": "Mr Blobby, esq.",
    "user_image": "https://equistamp.com/avatars/123123123123.png",
    "bio": "Hello, my name is Inigo Montoya. You Killed my Father. Prepare to die",
    "display_options": {
        "bio": true,
        "email_address": true,
        "user_image": false
    },
    "join_date": "2022-04-13",
    "subscription_level": "pro",
    "alerts": [
        {
            "id": "acfd47de-772f-4fc2-bced-77e9aae9e369",
            "name": "They are coming!!",
            "description": "string",
            "public": true,
            "last_trigger_date": "2022-04-13T15:42:05.901Z",
            "trigger_cooldown": "string",
            "owner_id": "959b0298-bf7c-4912-9037-f86a4107448a",
            "triggers": [
                "8297647c-b499-4cb3-bf85-94dcbf150d12"
            ],
            "subscriptions": [
                "76f90940-48c7-4610-a900-f400cc7167eb"
            ]
        }
    ]
}

⚠️ This example has been generated automatically from the schema and it is not accurate. Refer to the schema for more information.

Schema of the response body

{
    "type": "object",
    "properties": {
        "id": {
            "type": "string",
            "format": "uuid"
        },
        "email_address": {
            "type": "string",
            "description": "The email address of this user. User for logging in, so must be unique.",
            "format": "email",
            "example": "mr.blobby@some.domain"
        },
        "user_name": {
            "type": "string",
            "description": "The user name. Used for logging in and as a unique, human readable identifier of this user",
            "example": "mr_blobby"
        },
        "full_name": {
            "type": "string",
            "description": "The presentable name of this user. This can be any string",
            "nullable": true,
            "example": "Mr Blobby, esq."
        },
        "user_image": {
            "type": "string",
            "description": "The user avatar, as bytes when uploading, and its URL when fetching",
            "nullable": true,
            "example": "https://equistamp.com/avatars/123123123123.png"
        },
        "bio": {
            "type": "string",
            "description": "A description of this user. Will be rendered as markdown on the website",
            "nullable": true,
            "example": "Hello, my name is Inigo Montoya. You Killed my Father. Prepare to die"
        },
        "display_options": {
            "description": "A mapping of <displayable field> to true/false, which controls what will be displayed to other users. No option which is not explicitly enabled will be shown to anyone else than you or system admins. To illustrate, the attached example will only allow the user's bio and email address to be returned when other users call this endpoint, and all other fields will not be returned.",
            "type": "object",
            "additonalProperties": "boolean",
            "example": {
                "bio": true,
                "email_address": true,
                "user_image": false
            }
        },
        "join_date": {
            "type": "string",
            "format": "date"
        },
        "subscription_level": {
            "type": "string",
            "description": "The current subscription level of this user",
            "enum": [
                "admin",
                "free",
                "enterprise",
                "pro"
            ],
            "example": "pro"
        },
        "alerts": {
            "type": "array",
            "items": {
                "$ref": "#/components/schemas/ShallowAlert"
            }
        }
    }
}

Refer to the common response description: Unauthorized.

Refer to the common response description: Unauthenticated.

Refer to the common response description: NotFound.

Refer to the common response description: Error.

PUT /auth

Log in the provided user, or send an email with a login link.

Description

This endpoint handles logging in, both when valid credentials are provided, and when the user needs to reset their password. This happens depending on the provided JSON body:

If login credentials are provided, then try to log the user in - if this fails, a 401 will be returned
If reset_email is provided, assume that the user has forgotten their password. If this email can be found in the system, then send them an email with a log in link. Either way, this will always return a 200, to avoid leaking email addresses.

Log in credentials are a user identifier and a password. The following are supported:

username - this is the user name of the user (not the display name)
email - the email of the user
login - this will accept either the email or username

The result of logging in is a JSON object with a Session-Token. This should be provided as the Session-Token header on subsequent calls to the API to authenticate the user. The token will expire after a week of inactivity, but otherwise will be refreshed while using the system.

Request body

application/json

{
    "username": "mr_blobby",
    "email": "mr_blobby@bla.com",
    "login": "mr_blobby@bla.com",
    "password": "hunter2",
    "reset_email": "bla@bla.com"
}

⚠️ This example has been generated automatically from the schema and it is not accurate. Refer to the schema for more information.

Schema of the request body

{
    "type": "object",
    "properties": {
        "username": {
            "type": "string",
            "example": "mr_blobby"
        },
        "email": {
            "type": "string",
            "example": "mr_blobby@bla.com"
        },
        "login": {
            "type": "string",
            "example": "mr_blobby@bla.com"
        },
        "password": {
            "type": "string",
            "format": "password",
            "example": "hunter2"
        },
        "reset_email": {
            "type": "string",
            "format": "email",
            "example": "bla@bla.com",
            "description": "Used when resetting a password. A login link will be sent to this email, but only if can be found in the system. When missing, this will fail silently, i.e. a 200 will be returned"
        }
    }
}

Responses

200 OK401 Unauthorized500 Internal Server Error

application/json

Schema of the response body

{
    "oneOf": [
        {
            "type": "object",
            "description": "Returned when the user successfully logs in",
            "properties": {
                "session_token": {
                    "type": "string",
                    "format": "uuid",
                    "description": "The session token of the logged in user. This should be sent as the \"Session-Token\" header on all subsequent calls. "
                },
                "token_expiration": {
                    "type": "number",
                    "format": "int32",
                    "description": "The POSIX timestamp when this token will expire. Generally in a weeks time."
                }
            }
        },
        {
            "type": "string",
            "description": "This is returned in the case of a password reset."
        }
    ]
}

Refer to the common response description: Unauthorized.

Refer to the common response description: Error.

POST /alert

Create a new alert.

Description

This will create a new alert.

Request body

application/json

{
    "name": "They are coming!!",
    "description": "string",
    "public": true,
    "trigger_cooldown": "string",
    "triggers": [
        "eeb0632b-1935-4498-b1c1-bc3e0664e234"
    ],
    "subscriptions": [
        "efa1e33e-28ba-4ea5-a8cf-824625443d3e"
    ]
}

⚠️ This example has been generated automatically from the schema and it is not accurate. Refer to the schema for more information.

Schema of the request body

{
    "type": "object",
    "properties": {
        "name": {
            "type": "string",
            "description": "The name of the alert, displayed in the list of alerts",
            "example": "They are coming!!"
        },
        "description": {
            "type": "string",
            "nullable": true
        },
        "public": {
            "type": "boolean"
        },
        "trigger_cooldown": {
            "type": "string",
            "description": "How often the trigger can fire",
            "nullable": true
        },
        "triggers": {
            "type": "array",
            "items": {
                "type": "string",
                "format": "uuid"
            }
        },
        "subscriptions": {
            "type": "array",
            "items": {
                "type": "string",
                "format": "uuid"
            }
        }
    }
}

Responses

201 Created401 Unauthorized403 Forbidden500 Internal Server Error

application/json

{
    "id": "b6b01bfa-3e24-4610-ba4f-6b286a05d0b2",
    "name": "They are coming!!",
    "description": "string",
    "public": true,
    "last_trigger_date": "2022-04-13T15:42:05.901Z",
    "trigger_cooldown": "string",
    "owner_id": "922dd638-11ac-4a3d-8191-7e183aa239da",
    "triggers": [
        {
            "id": "3cdfd9dd-8ee4-4cd7-b745-85bef97634e6",
            "type": "string",
            "invert": true,
            "metric": "string",
            "threshold": 10.12,
            "models": null,
            "evaluations": null,
            "alert_id": "1b9aeafd-2cfc-477e-8935-7b2e379d261d"
        }
    ],
    "subscriptions": [
        {
            "confirmed": true,
            "method": "string",
            "destination": "string"
        }
    ]
}

⚠️ This example has been generated automatically from the schema and it is not accurate. Refer to the schema for more information.

Schema of the response body

{
    "type": "object",
    "properties": {
        "id": {
            "type": "string",
            "format": "uuid"
        },
        "name": {
            "type": "string",
            "description": "The name of the alert, displayed in the list of alerts",
            "example": "They are coming!!"
        },
        "description": {
            "type": "string",
            "nullable": true
        },
        "public": {
            "type": "boolean"
        },
        "last_trigger_date": {
            "type": "string",
            "format": "date-time",
            "nullable": true
        },
        "trigger_cooldown": {
            "type": "string",
            "description": "How often the trigger can fire",
            "nullable": true
        },
        "owner_id": {
            "type": "string",
            "format": "uuid"
        },
        "triggers": {
            "type": "array",
            "items": {
                "$ref": "#/components/schemas/ShallowTrigger"
            }
        },
        "subscriptions": {
            "type": "array",
            "items": {
                "$ref": "#/components/schemas/ShallowSubscriberAlert"
            }
        }
    }
}

Refer to the common response description: Unauthorized.

Refer to the common response description: Unauthenticated.

Refer to the common response description: Error.

GET /alert

Input parameters

Parameter	In	Type	Nullable	Description
`endCreationDate`	query	string	Yes	Filter out all alerts that were created after this date
`endPredictedTriggerDate`	query	string	Yes	Filter out all alerts that are expected to trigger after this date
`evaluations`	query	array	Yes	A list of evaluation ids. Only alerts pertaining to these evaluations will be returned
`id`	query	string	Yes	Will return the item with this id, or die trying. When this parameter is provided, then only a single item will be returned
`maxThreshold`	query	number	Yes	Filter out all alerts that have a higher threshold than provided
`minThreshold`	query	number	Yes	Filter out all alerts that have a lower threshold than provided
`models`	query	array	Yes	A list of model ids. Only alerts pertaining to these models will be returned
`order_by`	query	string	Yes	Sort the returned results ascendingly
`owner_id`	query	string	Yes	Return all alerts belonging to the given owner. If `me` is provided, then all alerts of the caller will be returned
`startCreationDate`	query	string	Yes	Filter out all alerts that were created before this date
`startPredictedTriggerDate`	query	string	Yes	Filter out all alerts that are expected to trigger before this date
`subscriber_id`	query	string	Yes	Return all alerts subscribed to by the given owner. If `me` is provided, then subscribed alerts of the caller will be returned. This endpoint requires the caller to be allowed to filter by subscriber_id - it's not something everyone can do
`triggerCooldown`	query	string	Yes	Filter by how often the alert can be triggered

Responses

200 OK404 Not Found500 Internal Server Error

application/json

Schema of the response body

{
    "oneOf": [
        {
            "$ref": "#/components/schemas/Alert"
        },
        {
            "type": "object",
            "properties": {
                "items": {
                    "description": "An array of all the items that were found, but capped at most at `per_page`",
                    "type": "array",
                    "items": {
                        "$ref": "#/components/schemas/Alert"
                    }
                },
                "count": {
                    "description": "The total number of items found",
                    "type": "number",
                    "format": "int32"
                },
                "per_page": {
                    "description": "The number of items returned per page",
                    "type": "number",
                    "format": "int32"
                },
                "page": {
                    "description": "The number of available pages",
                    "type": "number",
                    "format": "int32"
                }
            }
        }
    ]
}

Refer to the common response description: NotFound.

Refer to the common response description: Error.

PUT /alert

Request body

application/json

{
    "name": "They are coming!!",
    "description": "string",
    "public": true,
    "trigger_cooldown": "string",
    "triggers": [
        "022153ca-3866-4196-8e57-88bac2e73275"
    ],
    "subscriptions": [
        "04807a58-388a-43d5-af71-826e23ffee52"
    ]
}

⚠️ This example has been generated automatically from the schema and it is not accurate. Refer to the schema for more information.

Schema of the request body

{
    "type": "object",
    "properties": {
        "name": {
            "type": "string",
            "description": "The name of the alert, displayed in the list of alerts",
            "example": "They are coming!!"
        },
        "description": {
            "type": "string",
            "nullable": true
        },
        "public": {
            "type": "boolean"
        },
        "trigger_cooldown": {
            "type": "string",
            "description": "How often the trigger can fire",
            "nullable": true
        },
        "triggers": {
            "type": "array",
            "items": {
                "type": "string",
                "format": "uuid"
            }
        },
        "subscriptions": {
            "type": "array",
            "items": {
                "type": "string",
                "format": "uuid"
            }
        }
    }
}

Responses

200 OK401 Unauthorized403 Forbidden404 Not Found500 Internal Server Error

application/json

"Alert updated"

⚠️ This example has been generated automatically from the schema and it is not accurate. Refer to the schema for more information.

Schema of the response body

{
    "type": "string",
    "enum": [
        "Alert updated"
    ]
}

Refer to the common response description: Unauthorized.

Refer to the common response description: Unauthenticated.

Refer to the common response description: NotFound.

Refer to the common response description: Error.

POST /dsltest

Check whether DSL code fragments are correct.

Description

This endpoint will execute a provided DSL fragment and return the result. It will be run with test data, but you can use it to call your models or whatever. Queries that take too long will be terminated.

DSL Phases

There are four places where the DSL is used:

Constructing prompts
Sending requests to models
Parsing the responses that models return
Grading the parsed responses

These four steps happen sequentially for each task. This endpoint only checks one phase, which you must specify. That being said, there's nothing stopping you from chaining all four, e.g.:

import requests

API_KEY = "<your api key goes here>"

def run_code(code, stage, overrides):
    headers = {'Api-Token': API_KEY}
    res = requests.post('https://equistamp.net/dsltest', headers=headers, json={"code": code, "stage": stage, "context": overrides})
    if res.status_code != 200:
        raise ValueError(f'bad request: {res.text}')
    return res.json()

prompt = run_code('(str "Do something with this task: " task}', 'prompt')
response = run_code('(POST "https://your.model/endpoint" {:json {"prompt" prompt}})', 'request')
parsed_response = run_code('(get-in response ["path" "to" "response"])', 'response', {"response": response})
grader_result = run_code('parsed-response', 'grader', {"response": response, "parsed-response": parsed_response})

print(grader_result)

Context

When starting a request, a context is created with useful constants:

Base constants

task - the text of the task to be completed
endpoint_type - the type of endpoint - possible values are: aws, together.ai, conversational, google_cloud, azure, text-generation, anthropic, fill-mask, zero-shot-classification, custom, open_ai, text2text-generation, mistral
cache - An atom containing a cache that can be used to store data between requests. Acts as a map, so items can be accessed via (get @cache <key>) and set via (swap! cache assoc <key> <val>).

Task specific context

Mulitple choice tasks

In the case of multiple choice tasks, the following are also available:

num_choices - the number of available choices
letter-choices - the letters corresponding to the available choices
correct - the letters of all correct answers - only available to the Grader

Boolean tasks

Boolean tasks (i.e. true/false) will add the following to the grader's context:

correct - whether the current task is true or false

Free response tasks

Free response tasks are tasks that expect arbitrary text. These kind of tasks don't really have "correct" answers that can be saved, as much as phrases that are similar to what is expected, e.g. "What is a group of whales called?" could be answered with "A pod", "Pod", "it's a pod" or other such combinations, all of which are correct. You could also accept "a family" which is sort of correct, in that some species are very matrilineal, but others form more casual pods. There is also "school", which in general applies to fish, but is sometimes also used for whales. On the other hand "a gander" or "a murder" are flat out incorrect, as those apply to birds. To help manage this, we support positive-examples, which is a list of strings that are close to the kind of response you're expecting, and negative-examples, which is a list of strings that are opposite in meaning to what you expect.

The default grader uses cosine similarities to check responses. It will check the model's response against all positive and negative examples, normalized to <0, 1>. The complement of negative similarities is used, as in their case the idea is to have something that is opposite in meaning (as opposed to just maximally unsimilar). The maximum value is then returned and used as the correctness score for that given task.

The following will be added to the grader's context:

positive-examples - a list of strings that should be similar to the model's response
negative-examples - a list of strings that should be opposite to the model's response
embedder - a one argument function that receives a string and returns an embedding vector

JSON tasks

JSON tasks expect the model to answer with correct JSON according to a schema. The schema will be added to the context.

schema - the expected schema of the resulting JSON object

Stage context

Each subsequent stage (request, response, grader) will have values added in the previous stages:

Request

prompt - the prompt to be sent to the model

Response

response - the result of the Request DSL call

Grader

parsed-response - the result of the Response call

Request body

application/json

{
    "code": "(get-in response [:json \"value\"])",
    "stage": "response"
}

⚠️ This example has been generated automatically from the schema and it is not accurate. Refer to the schema for more information.

Schema of the request body

{
    "type": "object",
    "properties": {
        "code": {
            "description": "The DSL code to be evaluated",
            "type": "string",
            "example": "(get-in response [:json \"value\"])"
        },
        "stage": {
            "description": "The kind of DSL code to be tested",
            "example": "response",
            "type": "string",
            "enum": [
                "system_prompt",
                "prompt",
                "request",
                "response",
                "grader"
            ]
        }
    },
    "context": {
        "description": "Additional items to be added to the execution context",
        "type": "object",
        "properties": {
            "task-type": {
                "description": "The task of type to be used. Must be one of \"FRQ\", \"MCQ\", \"bool\", \"json\"",
                "example": "MCQ"
            },
            "response": {
                "description": "The response used when testing 'response' DSL code. If not provided, a dummy value will be used",
                "example": {
                    "json": {
                        "value": "bla bla"
                    }
                }
            },
            "parsed-response": {
                "description": "The parsed_response used when testing 'grader' DSL code. If not provided, a dummy value will be used",
                "example": "bla bla"
            }
        },
        "additionalProperties": true
    }
}

Responses

200 OK401 Unauthorized403 Forbidden500 Internal Server Error

application/json

{
    "result": null
}

⚠️ This example has been generated automatically from the schema and it is not accurate. Refer to the schema for more information.

Schema of the response body

{
    "type": "object",
    "properties": {
        "result": {
            "description": "This will be whatever the code returned"
        }
    }
}

Refer to the common response description: Unauthorized.

Refer to the common response description: Unauthenticated.

Refer to the common response description: Error.

POST /evaluation

Create a new evaluation.

Description

Adding tasks to new evaluations

There are three ways to add tasks to evaluations:

directly during creation by providing a CSV with tasks via the csv_url and columns_mapping parameters
by sending a tasks CSV to the /evaluationbuilderhandler endpoint
by uploading tasks directly via the /task endpoint

The first option is recommended, as it will automatically call the /evaluationbuilderhandler endpoint for you, once the evaluation is created.

Request body

application/json

{
    "name": "My lovely evaluation",
    "public": true,
    "public_usable": false,
    "reports_visible": false,
    "description": "# This is an evaluation, see more at [this link](http://some.link)",
    "task_types": "MCQ",
    "modalities": "text",
    "min_questions_to_complete": 321,
    "tags": [
        "f7b6acf2-f8ea-45dc-a47f-9fcf8af5eb79"
    ],
    "csv_url": "https://example.com",
    "default_task_type": "MCQ",
    "columns_mapping": {
        "Question col": {
            "columnType": "question"
        },
        "Paraphrase of question": {
            "columnType": "paraphrase",
            "paraphraseOf": "Question col"
        }
    },
    "references": {
        "bla": {
            "schema": {
                "properties": {
                    "name": {
                        "type": "string"
                    }
                }
            },
            "name": "My wonderful schema",
            "description": "Some description here"
        },
        "other-name_with.interpunction123": {
            "schema": {
                "properties": {
                    "name": {
                        "type": "string"
                    }
                }
            }
        }
    },
    "prompt": "(str \"Please answer this question: \" task)",
    "grader": {
        "MCQ": "(= parsedResponse correct)",
        "default": "false"
    }
}

⚠️ This example has been generated automatically from the schema and it is not accurate. Refer to the schema for more information.

Schema of the request body

{
    "type": "object",
    "properties": {
        "name": {
            "type": "string",
            "example": "My lovely evaluation"
        },
        "public": {
            "type": "boolean",
            "description": "Whether this evaluation should be publicly visible. If true, anyone can view its details or evaluate models with it"
        },
        "public_usable": {
            "type": "boolean",
            "description": "Whether this evaluation can be ran by anyone. To avoid tasks being leaked, you might want to have the results shown, but have control over what it can be run on.",
            "example": false
        },
        "reports_visible": {
            "type": "boolean",
            "description": "Whether anyone can pay to see reports for this evaluation.",
            "example": false
        },
        "description": {
            "type": "string",
            "description": "The description of this evaluation, as displayed on the site. Markdown can be used for formatting",
            "nullable": true,
            "example": "# This is an evaluation, see more at [this link](http://some.link)"
        },
        "task_types": {
            "type": "array",
            "items": {
                "type": "string"
            },
            "description": "The types of tasks supported by this evaluation",
            "enum": [
                "FRQ",
                "bool",
                "json",
                "MCQ"
            ],
            "example": "MCQ"
        },
        "modalities": {
            "type": "array",
            "items": {
                "type": "string"
            },
            "description": "The available modalities of this evaluation",
            "enum": [
                "text"
            ],
            "example": "text"
        },
        "min_questions_to_complete": {
            "type": "integer",
            "format": "int64",
            "description": "The default number of tasks to run before an evaluation session is deemed finished.\nA given evaluation session may process more tasks, as starting a new evaluation session for an evaluation/model pair which is already running will just add more tasks to the current session, rather than starting a new one.",
            "nullable": true,
            "example": 321
        },
        "tags": {
            "type": "array",
            "items": {
                "type": "string",
                "format": "uuid"
            }
        },
        "csv_url": {
            "description": "The URL of a CSV file containing the tasks of the new evaluation",
            "example": "https://example.com",
            "type": "string"
        },
        "default_task_type": {
            "description": "The default type of tasks - can be overrode on a per row basis. Will use \"MCQ\" if not set",
            "example": "MCQ",
            "nullable": true,
            "type": "string",
            "enum": [
                "FRQ",
                "bool",
                "json",
                "MCQ"
            ]
        },
        "columns_mapping": {
            "description": "A mapping that specifies which CSV columns contain which types of data. See the [Evaluation Builder](#post-evaluationbuilderhandler) endpoint for details",
            "type": "object",
            "example": {
                "Question col": {
                    "columnType": "question"
                },
                "Paraphrase of question": {
                    "columnType": "paraphrase",
                    "paraphraseOf": "Question col"
                }
            },
            "additionalProperties": {
                "$ref": "#/components/schemas/ColumnMapping"
            }
        },
        "references": {
            "description": "A mapping of keys to schemas. The keys can contain ASCII alphanumeric characters, \"-\", \"_\" and \".\".",
            "type": "object",
            "additionalProperties": {
                "type": "object",
                "properties": {
                    "schema": {
                        "type": "object",
                        "description": "The JSON schema to be used"
                    },
                    "name": {
                        "type": "string",
                        "description": "An optional name for this schema - this will only be used for displaying, the actual matching is done by comparing the keys of the `references` object."
                    },
                    "description": {
                        "type": "string",
                        "description": "An optional description for this schema"
                    },
                    "type": {
                        "type": "string",
                        "enum": [
                            "json"
                        ],
                        "description": "The type of schema. If not provided, will be assumed to be JSON",
                        "example": "json"
                    }
                },
                "required": [
                    "schema"
                ]
            },
            "example": {
                "bla": {
                    "schema": {
                        "properties": {
                            "name": {
                                "type": "string"
                            }
                        }
                    },
                    "name": "My wonderful schema",
                    "description": "Some description here"
                },
                "other-name_with.interpunction123": {
                    "schema": {
                        "properties": {
                            "name": {
                                "type": "string"
                            }
                        }
                    }
                }
            }
        },
        "prompt": {
            "description": "DSL code defining how to create prompts. See the [DSL page](/docs/dsl/) for more info.",
            "example": "(str \"Please answer this question: \" task)"
        },
        "grader": {
            "description": "DSL code specifying how to grade LLM responses. This can be empty, in which case the default grader will be used. You can specify a grader that will be used for all types of tasks, or per task type graders. If you provide both a default grader and one for a specific task type, the specific one takes precedence.",
            "oneOf": [
                {
                    "type": "string",
                    "description": "DSL code that should be used for all response",
                    "example": "(= parsedResponse \"ok\")"
                },
                {
                    "type": "object",
                    "description": "Per task type DSL code. Use the \"default\" key to specify the code that should be used for tasks types that aren't specified - otherwise the system default grader will be used.",
                    "properties": {
                        "default": {
                            "type": "string",
                            "description": "The default grader to be used for task types that aren't specified.",
                            "example": "(if (= parsedResponse correct) 1 0)"
                        },
                        "FRQ": {
                            "type": "string",
                            "description": "The DSL code to be used to grade FRQ tasks. If this is empty, the default grader will be used"
                        },
                        "bool": {
                            "type": "string",
                            "description": "The DSL code to be used to grade bool tasks. If this is empty, the default grader will be used"
                        },
                        "json": {
                            "type": "string",
                            "description": "The DSL code to be used to grade json tasks. If this is empty, the default grader will be used"
                        },
                        "MCQ": {
                            "type": "string",
                            "description": "The DSL code to be used to grade MCQ tasks. If this is empty, the default grader will be used"
                        }
                    },
                    "example": {
                        "MCQ": "(= parsedResponse correct)",
                        "default": "false"
                    }
                }
            ],
            "example": {
                "MCQ": "(= parsedResponse correct)",
                "default": "false"
            }
        }
    }
}

Responses

201 Created401 Unauthorized403 Forbidden500 Internal Server Error

application/json

{
    "id": "9f65948b-0839-4704-94c1-a74682d43594",
    "name": "My lovely evaluation",
    "public": true,
    "public_usable": false,
    "reports_visible": false,
    "quality": 0.89,
    "num_tasks": 2000,
    "description": "# This is an evaluation, see more at [this link](http://some.link)",
    "last_updated": "2022-04-13T15:42:05.901Z",
    "task_types": "MCQ",
    "modalities": "text",
    "min_questions_to_complete": 321,
    "owner": {
        "id": "2120c29a-ed02-4065-bbba-c5ada79d7c47",
        "email_address": "mr.blobby@some.domain",
        "user_name": "mr_blobby",
        "full_name": "Mr Blobby, esq.",
        "user_image": "https://equistamp.com/avatars/123123123123.png",
        "bio": "Hello, my name is Inigo Montoya. You Killed my Father. Prepare to die",
        "display_options": {
            "bio": true,
            "email_address": true,
            "user_image": false
        },
        "join_date": "2022-04-13",
        "subscription_level": "pro",
        "alerts": [
            "88103840-7fe3-41a2-b492-230df4dac99d"
        ]
    },
    "tags": [
        {
            "id": "53cbd07d-fa52-4dc8-bfd1-10c3588d2174",
            "name": "string"
        }
    ]
}

⚠️ This example has been generated automatically from the schema and it is not accurate. Refer to the schema for more information.

Schema of the response body

{
    "type": "object",
    "properties": {
        "id": {
            "type": "string",
            "format": "uuid"
        },
        "name": {
            "type": "string",
            "example": "My lovely evaluation"
        },
        "public": {
            "type": "boolean",
            "description": "Whether this evaluation should be publicly visible. If true, anyone can view its details or evaluate models with it"
        },
        "public_usable": {
            "type": "boolean",
            "description": "Whether this evaluation can be ran by anyone. To avoid tasks being leaked, you might want to have the results shown, but have control over what it can be run on.",
            "example": false
        },
        "reports_visible": {
            "type": "boolean",
            "description": "Whether anyone can pay to see reports for this evaluation.",
            "example": false
        },
        "quality": {
            "type": "number",
            "format": "double",
            "description": "The quality of this evaluation, i.e. how much it can be trusted, from 0 to 1.",
            "example": 0.89
        },
        "num_tasks": {
            "type": "integer",
            "format": "int64",
            "description": "The total number of tasks defined for this evaluation. Includes redacted tasks.",
            "example": 2000
        },
        "description": {
            "type": "string",
            "description": "The description of this evaluation, as displayed on the site. Markdown can be used for formatting",
            "nullable": true,
            "example": "# This is an evaluation, see more at [this link](http://some.link)"
        },
        "last_updated": {
            "type": "string",
            "format": "date-time"
        },
        "task_types": {
            "type": "array",
            "items": {
                "type": "string"
            },
            "description": "The types of tasks supported by this evaluation",
            "enum": [
                "FRQ",
                "bool",
                "json",
                "MCQ"
            ],
            "example": "MCQ"
        },
        "modalities": {
            "type": "array",
            "items": {
                "type": "string"
            },
            "description": "The available modalities of this evaluation",
            "enum": [
                "text"
            ],
            "example": "text"
        },
        "min_questions_to_complete": {
            "type": "integer",
            "format": "int64",
            "description": "The default number of tasks to run before an evaluation session is deemed finished.\nA given evaluation session may process more tasks, as starting a new evaluation session for an evaluation/model pair which is already running will just add more tasks to the current session, rather than starting a new one.",
            "nullable": true,
            "example": 321
        },
        "owner": {
            "$ref": "#/components/schemas/ShallowUser"
        },
        "tags": {
            "type": "array",
            "items": {
                "$ref": "#/components/schemas/ShallowTag"
            }
        }
    }
}

Refer to the common response description: Unauthorized.

Refer to the common response description: Unauthenticated.

Refer to the common response description: Error.

GET /evaluation

Input parameters

Parameter	In	Type	Default	Nullable	Description
`id`	query	string		Yes	Will return the item with this id, or die trying. When this parameter is provided, then only a single item will be returned

Responses

200 OK404 Not Found500 Internal Server Error

application/json

Schema of the response body

{
    "oneOf": [
        {
            "$ref": "#/components/schemas/Evaluation"
        },
        {
            "type": "object",
            "properties": {
                "items": {
                    "description": "An array of all the items that were found, but capped at most at `per_page`",
                    "type": "array",
                    "items": {
                        "$ref": "#/components/schemas/Evaluation"
                    }
                },
                "count": {
                    "description": "The total number of items found",
                    "type": "number",
                    "format": "int32"
                },
                "per_page": {
                    "description": "The number of items returned per page",
                    "type": "number",
                    "format": "int32"
                },
                "page": {
                    "description": "The number of available pages",
                    "type": "number",
                    "format": "int32"
                }
            }
        }
    ]
}

Refer to the common response description: NotFound.

Refer to the common response description: Error.

PUT /evaluation

Request body

application/json

{
    "name": "My lovely evaluation",
    "public": true,
    "public_usable": false,
    "reports_visible": false,
    "description": "# This is an evaluation, see more at [this link](http://some.link)",
    "task_types": "MCQ",
    "modalities": "text",
    "min_questions_to_complete": 321,
    "tags": [
        "341337c9-bc8c-4d87-bbf2-7d440f7c124f"
    ]
}

⚠️ This example has been generated automatically from the schema and it is not accurate. Refer to the schema for more information.

Schema of the request body

{
    "type": "object",
    "properties": {
        "name": {
            "type": "string",
            "example": "My lovely evaluation"
        },
        "public": {
            "type": "boolean",
            "description": "Whether this evaluation should be publicly visible. If true, anyone can view its details or evaluate models with it"
        },
        "public_usable": {
            "type": "boolean",
            "description": "Whether this evaluation can be ran by anyone. To avoid tasks being leaked, you might want to have the results shown, but have control over what it can be run on.",
            "example": false
        },
        "reports_visible": {
            "type": "boolean",
            "description": "Whether anyone can pay to see reports for this evaluation.",
            "example": false
        },
        "description": {
            "type": "string",
            "description": "The description of this evaluation, as displayed on the site. Markdown can be used for formatting",
            "nullable": true,
            "example": "# This is an evaluation, see more at [this link](http://some.link)"
        },
        "task_types": {
            "type": "array",
            "items": {
                "type": "string"
            },
            "description": "The types of tasks supported by this evaluation",
            "enum": [
                "FRQ",
                "bool",
                "json",
                "MCQ"
            ],
            "example": "MCQ"
        },
        "modalities": {
            "type": "array",
            "items": {
                "type": "string"
            },
            "description": "The available modalities of this evaluation",
            "enum": [
                "text"
            ],
            "example": "text"
        },
        "min_questions_to_complete": {
            "type": "integer",
            "format": "int64",
            "description": "The default number of tasks to run before an evaluation session is deemed finished.\nA given evaluation session may process more tasks, as starting a new evaluation session for an evaluation/model pair which is already running will just add more tasks to the current session, rather than starting a new one.",
            "nullable": true,
            "example": 321
        },
        "tags": {
            "type": "array",
            "items": {
                "type": "string",
                "format": "uuid"
            }
        }
    }
}

Responses

200 OK401 Unauthorized403 Forbidden404 Not Found500 Internal Server Error

application/json

"Evaluation updated"

⚠️ This example has been generated automatically from the schema and it is not accurate. Refer to the schema for more information.

Schema of the response body

{
    "type": "string",
    "enum": [
        "Evaluation updated"
    ]
}

Refer to the common response description: Unauthorized.

Refer to the common response description: Unauthenticated.

Refer to the common response description: NotFound.

Refer to the common response description: Error.

POST /evaluationbuilderhandler

Import tasks from a CSV file.

Description

This endpoint will fetch a CSV file and create a task from each row (without the first one, which is used as a header). If dry_run is true, then this will only check for errors and not save anything to the database.

Number of questions to complete

Each evaluation run will use a subsample of all available tasks. You can set this number by providing a value for min_questions_to_complete. If you don't set this manually, it will be set on the basis of the number of tasks in your file, in such a way as to have a 95% confidence level. In practice this number tends to be larger than needed - the score of most evaluation runs don't change that much after around 200 tasks.

Task type

Unless specified otherwise, it's assumed that all tasks are Multiple Choice Questions. The can be changed by

setting default_task_type, which will change the default to whatever you provide
providing a type column, which can be used to set the task types for specific rows - any rows where the type column is not empty will that value as the type, otherwise will use the default type

Columns mapping

For the CSV import to work correctly, you must provide a way to map columns to task fields. This is done by providing a mapping of <column name> to a column definition object. The available fields in the definition object are:

columnType - this specified what this column should be used as. Must always be provided
paraphraseOf - used by paraphrase columns to point to what they're paraphrasing. All texts can have paraphrases. When a field has paraphrases defined, these will always be used when sending texts to models, or displaying them on the frontend. Only you and system administrators will have access to the non paraphrase texts.

Request body

application/json

{
    "public_usable": false,
    "reports_visible": false,
    "min_questions_to_complete": 321,
    "tags": [
        "80578ad7-0506-4c5e-a2e6-586523676152"
    ],
    "evaluation_id": "64a578cc-05b8-4749-a2eb-ff63f34d78fd",
    "dry_run": true,
    "csv_url": "https://example.com",
    "default_task_type": "MCQ",
    "columns_mapping": {
        "Question col": {
            "columnType": "question"
        },
        "Paraphrase of question": {
            "columnType": "paraphrase",
            "paraphraseOf": "Question col"
        }
    },
    "references": {
        "bla": {
            "schema": {
                "properties": {
                    "name": {
                        "type": "string"
                    }
                }
            },
            "name": "My wonderful schema",
            "description": "Some description here"
        },
        "other-name_with.interpunction123": {
            "schema": {
                "properties": {
                    "name": {
                        "type": "string"
                    }
                }
            }
        }
    },
    "prompt": "(str \"Please answer this question: \" task)",
    "grader": {
        "MCQ": "(= parsedResponse correct)",
        "default": "false"
    }
}

⚠️ This example has been generated automatically from the schema and it is not accurate. Refer to the schema for more information.

Schema of the request body

{
    "type": "object",
    "properties": {
        "public_usable": {
            "type": "boolean",
            "description": "Whether this evaluation can be ran by anyone. To avoid tasks being leaked, you might want to have the results shown, but have control over what it can be run on.",
            "example": false
        },
        "reports_visible": {
            "type": "boolean",
            "description": "Whether anyone can pay to see reports for this evaluation.",
            "example": false
        },
        "min_questions_to_complete": {
            "type": "integer",
            "format": "int64",
            "description": "The default number of tasks to run before an evaluation session is deemed finished.\nA given evaluation session may process more tasks, as starting a new evaluation session for an evaluation/model pair which is already running will just add more tasks to the current session, rather than starting a new one.",
            "nullable": true,
            "example": 321
        },
        "tags": {
            "type": "array",
            "items": {
                "type": "string",
                "format": "uuid"
            }
        },
        "evaluation_id": {
            "description": "The id of the evaluation to add tasks to",
            "type": "string",
            "format": "uuid"
        },
        "dry_run": {
            "description": "If true, this call will only check for errors and not actually import anything",
            "type": "boolean"
        },
        "csv_url": {
            "description": "The URL of a CSV file containing the tasks of the new evaluation",
            "example": "https://example.com",
            "type": "string"
        },
        "default_task_type": {
            "description": "The default type of tasks - can be overrode on a per row basis. Will use \"MCQ\" if not set",
            "example": "MCQ",
            "nullable": true,
            "type": "string",
            "enum": [
                "FRQ",
                "bool",
                "json",
                "MCQ"
            ]
        },
        "columns_mapping": {
            "description": "A mapping that specifies which CSV columns contain which types of data. See the [Evaluation Builder](#post-evaluationbuilderhandler) endpoint for details",
            "type": "object",
            "example": {
                "Question col": {
                    "columnType": "question"
                },
                "Paraphrase of question": {
                    "columnType": "paraphrase",
                    "paraphraseOf": "Question col"
                }
            },
            "additionalProperties": {
                "$ref": "#/components/schemas/ColumnMapping"
            }
        },
        "references": {
            "description": "A mapping of keys to schemas. The keys can contain ASCII alphanumeric characters, \"-\", \"_\" and \".\".",
            "type": "object",
            "additionalProperties": {
                "type": "object",
                "properties": {
                    "schema": {
                        "type": "object",
                        "description": "The JSON schema to be used"
                    },
                    "name": {
                        "type": "string",
                        "description": "An optional name for this schema - this will only be used for displaying, the actual matching is done by comparing the keys of the `references` object."
                    },
                    "description": {
                        "type": "string",
                        "description": "An optional description for this schema"
                    },
                    "type": {
                        "type": "string",
                        "enum": [
                            "json"
                        ],
                        "description": "The type of schema. If not provided, will be assumed to be JSON",
                        "example": "json"
                    }
                },
                "required": [
                    "schema"
                ]
            },
            "example": {
                "bla": {
                    "schema": {
                        "properties": {
                            "name": {
                                "type": "string"
                            }
                        }
                    },
                    "name": "My wonderful schema",
                    "description": "Some description here"
                },
                "other-name_with.interpunction123": {
                    "schema": {
                        "properties": {
                            "name": {
                                "type": "string"
                            }
                        }
                    }
                }
            }
        },
        "prompt": {
            "description": "DSL code defining how to create prompts. See the [DSL page](/docs/dsl/) for more info.",
            "example": "(str \"Please answer this question: \" task)"
        },
        "grader": {
            "description": "DSL code specifying how to grade LLM responses. This can be empty, in which case the default grader will be used. You can specify a grader that will be used for all types of tasks, or per task type graders. If you provide both a default grader and one for a specific task type, the specific one takes precedence.",
            "oneOf": [
                {
                    "type": "string",
                    "description": "DSL code that should be used for all response",
                    "example": "(= parsedResponse \"ok\")"
                },
                {
                    "type": "object",
                    "description": "Per task type DSL code. Use the \"default\" key to specify the code that should be used for tasks types that aren't specified - otherwise the system default grader will be used.",
                    "properties": {
                        "default": {
                            "type": "string",
                            "description": "The default grader to be used for task types that aren't specified.",
                            "example": "(if (= parsedResponse correct) 1 0)"
                        },
                        "FRQ": {
                            "type": "string",
                            "description": "The DSL code to be used to grade FRQ tasks. If this is empty, the default grader will be used"
                        },
                        "bool": {
                            "type": "string",
                            "description": "The DSL code to be used to grade bool tasks. If this is empty, the default grader will be used"
                        },
                        "json": {
                            "type": "string",
                            "description": "The DSL code to be used to grade json tasks. If this is empty, the default grader will be used"
                        },
                        "MCQ": {
                            "type": "string",
                            "description": "The DSL code to be used to grade MCQ tasks. If this is empty, the default grader will be used"
                        }
                    },
                    "example": {
                        "MCQ": "(= parsedResponse correct)",
                        "default": "false"
                    }
                }
            ],
            "example": {
                "MCQ": "(= parsedResponse correct)",
                "default": "false"
            }
        }
    }
}

Responses

201 Created401 Unauthorized403 Forbidden500 Internal Server Error

application/json

{
    "id": "6f7c068b-17be-42a1-913c-1e3c349af033",
    "name": "My lovely evaluation",
    "public": true,
    "public_usable": false,
    "reports_visible": false,
    "quality": 0.89,
    "num_tasks": 2000,
    "description": "# This is an evaluation, see more at [this link](http://some.link)",
    "last_updated": "2022-04-13T15:42:05.901Z",
    "task_types": "MCQ",
    "modalities": "text",
    "min_questions_to_complete": 321,
    "owner": {
        "id": "f059ec20-0e0f-4c5c-81d3-4a6e3aa64ed4",
        "email_address": "mr.blobby@some.domain",
        "user_name": "mr_blobby",
        "full_name": "Mr Blobby, esq.",
        "user_image": "https://equistamp.com/avatars/123123123123.png",
        "bio": "Hello, my name is Inigo Montoya. You Killed my Father. Prepare to die",
        "display_options": {
            "bio": true,
            "email_address": true,
            "user_image": false
        },
        "join_date": "2022-04-13",
        "subscription_level": "pro",
        "alerts": [
            "6c6028a9-85b2-4f11-b83e-53683cd48d9b"
        ]
    },
    "tags": [
        {
            "id": "d71d88f6-3afe-41a9-b263-94d1f38e81d7",
            "name": "string"
        }
    ]
}

⚠️ This example has been generated automatically from the schema and it is not accurate. Refer to the schema for more information.

Schema of the response body

{
    "type": "object",
    "properties": {
        "id": {
            "type": "string",
            "format": "uuid"
        },
        "name": {
            "type": "string",
            "example": "My lovely evaluation"
        },
        "public": {
            "type": "boolean",
            "description": "Whether this evaluation should be publicly visible. If true, anyone can view its details or evaluate models with it"
        },
        "public_usable": {
            "type": "boolean",
            "description": "Whether this evaluation can be ran by anyone. To avoid tasks being leaked, you might want to have the results shown, but have control over what it can be run on.",
            "example": false
        },
        "reports_visible": {
            "type": "boolean",
            "description": "Whether anyone can pay to see reports for this evaluation.",
            "example": false
        },
        "quality": {
            "type": "number",
            "format": "double",
            "description": "The quality of this evaluation, i.e. how much it can be trusted, from 0 to 1.",
            "example": 0.89
        },
        "num_tasks": {
            "type": "integer",
            "format": "int64",
            "description": "The total number of tasks defined for this evaluation. Includes redacted tasks.",
            "example": 2000
        },
        "description": {
            "type": "string",
            "description": "The description of this evaluation, as displayed on the site. Markdown can be used for formatting",
            "nullable": true,
            "example": "# This is an evaluation, see more at [this link](http://some.link)"
        },
        "last_updated": {
            "type": "string",
            "format": "date-time"
        },
        "task_types": {
            "type": "array",
            "items": {
                "type": "string"
            },
            "description": "The types of tasks supported by this evaluation",
            "enum": [
                "FRQ",
                "bool",
                "json",
                "MCQ"
            ],
            "example": "MCQ"
        },
        "modalities": {
            "type": "array",
            "items": {
                "type": "string"
            },
            "description": "The available modalities of this evaluation",
            "enum": [
                "text"
            ],
            "example": "text"
        },
        "min_questions_to_complete": {
            "type": "integer",
            "format": "int64",
            "description": "The default number of tasks to run before an evaluation session is deemed finished.\nA given evaluation session may process more tasks, as starting a new evaluation session for an evaluation/model pair which is already running will just add more tasks to the current session, rather than starting a new one.",
            "nullable": true,
            "example": 321
        },
        "owner": {
            "$ref": "#/components/schemas/ShallowUser"
        },
        "tags": {
            "type": "array",
            "items": {
                "$ref": "#/components/schemas/ShallowTag"
            }
        }
    }
}

Refer to the common response description: Unauthorized.

Refer to the common response description: Unauthenticated.

Refer to the common response description: Error.

GET /evaluationbuilderhandler

Check whether a CSV file contains valid tasks

Description

This endpoint will fetch a CSV file from the provided URL and validate each row to make sure that it can be processed. Rows with errors or warnings will be returned with appropriate messages, to help debug problems. When the CSV is processed (after sending an appropriate POST request to this endpoint), rows that have errors will be skipped.

Column mapping

To check whether all the rows are correct, you must provide a way to work out which columns correspond to which fields in the resulting tasks. In the case of GET requests, they should be provided as follows. Check out our sample tasks file for examples:

Basic mappings

question - this is the only required parameter. This should specify the name of the column containing the main text to be sent to models
type - this specifies where to check for per row task type overrides. By default it's assumed that tasks are multiple choice questions, unless default_task_type is set in the POST request. But if you want most tasks to be one type, but have a couple that are of a different type (e.g. true-false questions), then you can do so by using this column.
redacted - this specified where to check whether a task should be hidden by default. By default it's assumed that all tasks should be used when testing models, but sometimes a given task may be incorrect, or maybe not the best quality. One way around this would be to delete any problematical rows before uploading, but that can be a lot of work. To make things easier, tasks can be uploaded as redacted, which means that they won't be sent to models. Any rows with a redacted column defined, which have non empty values, will be saved as redacted

Paraphrases

All texts can have paraphrases. When a field has paraphrases defined, these will always be used when sending texts to models, or displaying them on the frontend. Only you and system administrators will have access to the non paraphrase texts. Paraphrases are declared as paraphrase.<paraphrase column>=<paraphrased column>. So e.g. paraphrase.question%20paraphrase=Question will declare that the "question paraphrase" column is a paraphrase of the "Question" column.

Boolean question mappings

Boolean questions have only two possible answers - True or False. You can have one column which provides this value. Any row where the answer column equals 1 or true (case insensitive) will be deemed to be a question where the correct answer is True. Any other value is False.

bool_correct - any rows which are 1 or case insensitive true or yes (so e.g. TrUe, TRue or true) will be deemed to be true statements. Anything else is false.

Free response question mappings

Free response questions are questions where the model can answer with any text. An example of this kind of question would be "fill in the blank". You can provide both correct and incorrect texts - free response questions are checked on the basis of similarity. Two identical texts should have a similarity of 1 and texts with opposite meanings will have a similarity of 0. You can specify expected answers either as things which should be similar, or texts which are opposite, in which case the similarity will be calculated as 1 - <similarity score>. Each row must have at least one correct or incorrect value provided.

frq_correct - a comma separated list of URL encoded column names, e.g. 'Correct%201,Correct%20%3D%20this'
frq_incorrect - a comma separated list of URL encoded column names, e.g. 'This%20is%20wrong,Bad%21%21'

Multiple response question mappings

In the case of multiple response questions, you must provide at least one correct answer, and at least one incorrect answers. You can add more if you want, but we will only use the first 10 correct answers, and the first 20 incorrect answers. These column definitions should be provided via:

mcq_correct - a comma separated list of URL encoded column names, e.g. 'Correct%201,Correct%20%3D%20this'
mcq_incorrect - a comma separated list of URL encoded column names, e.g. 'This%20is%20wrong,Bad%21%21'

Json question mappings

Tasks which expect valid JSON responses have the following column types, both of which are optional:

schema - a JSON schema specifying the structure of the expected JSON. If this is provided, all responses must conform to this schema. If not provided, then the schema will be assumed to be any valid JSON. The schema can be provided via a reference (see below).
expected - an expected JSON object. The JSON returned by the model must have the same values as the expected object

Example column mappings

Assuming you have a CSV file with the following columns:

Task type - contains the type of tasks
Timestamp - date of last edit - not needed here, so should be ignored
`` - an empty column
Task question to answer - the text to which models should respond
Question paraphrase - an alternative way of phrasing the question
Correct answer - the expected answer
Alternative correct answer - another answer that will also be accepted as correct
Bad response example - an incorrect answer to be provided as an option in the multiple choice question
Wrong answer - another incorrect answer to be provided as an option in the multiple choice question

Then you would have to send a GET request with type=Task%20type&question=Task%20question%20to%20answer&paraphrase.Question%20paraphrase=Task question to answer&mcq_correct=Correct%20answer,Alternative%20correct%20answer&mcq_incorrect=Bad%20response%20example,Wrong%20answer

References

In the case of schemas, it would be annoying to provide a massive JSON object in each row. To make this simpler, you can provide a set of references. Any schema column with a value that is a reference key will use the schema object that is stored as that reference. Reference names can contain English letters (upper and lowercase), digits and "-", "_", and ".". References can also have names and descriptions, for easier management. Both of these are optional, and do not in any way effect how the references are matched to rows. References should be provided as reference.<value>.<reference name> GET parameters, where <value> is one of "schema", "name" or "description". An example would be a GET request with: type=Task&question=Question&json_schema=Schema&reference.name.ref1&reference.schema.ref1=%7B%22asd%22%3A+%22asd%22%7D, which would set ref1 to be {"asd": "asd"} on all rows that have ref1 as their schema.

Input parameters

Parameter	In	Nullable	Description
`csv_url`	path	No	The URL of a CSV file containing the tasks of the new evaluation
`only_header`	path	No	When set, will just return the headers of the CSV file
`question`	path	No	The columns in the CSV file containing the questions
`redacted`	path	No	The column in the CSV file marking tasks as redacted
`type`	path	No	The column in the CSV file containing the per row task type

Responses

200 OK400 Bad Request404 Not Found500 Internal Server Error

application/json

{
    "errors": [
        {
            "task_num": 3,
            "errors": [
                {
                    "message": "This row couldn't be parsed",
                    "level": "warning",
                    "type": "validation"
                }
            ],
            "warnings": [
                "This row is suspicious"
            ]
        }
    ],
    "num_tasks": 123,
    "min_questions_to_complete": 42
}

⚠️ This example has been generated automatically from the schema and it is not accurate. Refer to the schema for more information.

Schema of the response body

{
    "type": "object",
    "properties": {
        "errors": {
            "type": "array",
            "items": {
                "type": "object",
                "properties": {
                    "task_num": {
                        "description": "The index of the row that has these errors",
                        "type": "number",
                        "format": "int64",
                        "example": 3
                    },
                    "errors": {
                        "type": "array",
                        "items": {
                            "type": "object",
                            "properties": {
                                "message": {
                                    "type": "string",
                                    "example": "This row couldn't be parsed"
                                },
                                "level": {
                                    "type": "string",
                                    "enum": [
                                        "warning",
                                        "error"
                                    ]
                                },
                                "type": {
                                    "type": "string",
                                    "example": "validation"
                                }
                            }
                        }
                    },
                    "warnings": {
                        "type": "array",
                        "items": {
                            "type": "string",
                            "example": "This row is suspicious"
                        }
                    }
                }
            }
        },
        "num_tasks": {
            "description": "The number of rows with tasks found, including rows with errors",
            "type": "number",
            "format": "int64",
            "example": 123
        },
        "min_questions_to_complete": {
            "description": "The minimum number of tasks per evaluation session. If this wasn't provided in the query parameters, it will be calculated based on the number of tasks found",
            "type": "number",
            "format": "int64",
            "example": 42
        }
    }
}

Refer to the common response description: ValidationError.

Refer to the common response description: NotFound.

Refer to the common response description: Error.

POST /evaluationmodeljobshandler

Request body

application/json

{
    "job_name": "string",
    "minutes_between_evaluations": 10.12,
    "job_description": "string",
    "start_date": "2022-04-13T15:42:05.901Z"
}

⚠️ This example has been generated automatically from the schema and it is not accurate. Refer to the schema for more information.

Schema of the request body

{
    "type": "object",
    "properties": {
        "job_name": {
            "type": "string"
        },
        "minutes_between_evaluations": {
            "type": "number",
            "format": "int64"
        },
        "job_description": {
            "type": "string"
        },
        "start_date": {
            "type": "string",
            "format": "date-time",
            "nullable": true
        }
    }
}

Responses

201 Created401 Unauthorized403 Forbidden500 Internal Server Error

application/json

{
    "job_name": "string",
    "minutes_between_evaluations": 10.12,
    "job_body": null,
    "job_description": "string",
    "job_schedule_arn": "string",
    "start_date": "2022-04-13T15:42:05.901Z",
    "owner_id": "c3821565-8ad3-48d3-be6b-6785eec6de4d",
    "model_id": "6aba704c-b89d-40af-9c68-9dde86479c65",
    "evaluation_id": "8402290a-eb86-44be-a7b7-bfa35072c30f",
    "id": "c959d296-96ea-4fc8-8b9c-7a66d53d436e",
    "creation_date": "2022-04-13T15:42:05.901Z"
}

⚠️ This example has been generated automatically from the schema and it is not accurate. Refer to the schema for more information.

Schema of the response body

{
    "type": "object",
    "properties": {
        "job_name": {
            "type": "string"
        },
        "minutes_between_evaluations": {
            "type": "number",
            "format": "int64"
        },
        "job_body": {},
        "job_description": {
            "type": "string"
        },
        "job_schedule_arn": {
            "type": "string"
        },
        "start_date": {
            "type": "string",
            "format": "date-time",
            "nullable": true
        },
        "owner_id": {
            "type": "string",
            "format": "uuid"
        },
        "model_id": {
            "type": "string",
            "format": "uuid"
        },
        "evaluation_id": {
            "type": "string",
            "format": "uuid"
        },
        "id": {
            "type": "string",
            "format": "uuid"
        },
        "creation_date": {
            "type": "string",
            "format": "date-time"
        }
    }
}

Refer to the common response description: Unauthorized.

Refer to the common response description: Unauthenticated.

Refer to the common response description: Error.

GET /evaluationmodeljobshandler

Input parameters

Parameter	In	Type	Default	Nullable	Description
`id`	query	string		Yes	Will return the item with this id, or die trying. When this parameter is provided, then only a single item will be returned

Responses

200 OK401 Unauthorized403 Forbidden404 Not Found500 Internal Server Error

application/json

Schema of the response body

{
    "oneOf": [
        {
            "$ref": "#/components/schemas/EvaluationModelJobs"
        },
        {
            "type": "object",
            "properties": {
                "items": {
                    "description": "An array of all the items that were found, but capped at most at `per_page`",
                    "type": "array",
                    "items": {
                        "$ref": "#/components/schemas/EvaluationModelJobs"
                    }
                },
                "count": {
                    "description": "The total number of items found",
                    "type": "number",
                    "format": "int32"
                },
                "per_page": {
                    "description": "The number of items returned per page",
                    "type": "number",
                    "format": "int32"
                },
                "page": {
                    "description": "The number of available pages",
                    "type": "number",
                    "format": "int32"
                }
            }
        }
    ]
}

Refer to the common response description: Unauthorized.

Refer to the common response description: Unauthenticated.

Refer to the common response description: NotFound.

Refer to the common response description: Error.

PUT /evaluationmodeljobshandler

Request body

application/json

{
    "job_name": "string",
    "minutes_between_evaluations": 10.12,
    "job_description": "string",
    "start_date": "2022-04-13T15:42:05.901Z"
}

⚠️ This example has been generated automatically from the schema and it is not accurate. Refer to the schema for more information.

Schema of the request body

{
    "type": "object",
    "properties": {
        "job_name": {
            "type": "string"
        },
        "minutes_between_evaluations": {
            "type": "number",
            "format": "int64"
        },
        "job_description": {
            "type": "string"
        },
        "start_date": {
            "type": "string",
            "format": "date-time",
            "nullable": true
        }
    }
}

Responses

200 OK401 Unauthorized403 Forbidden404 Not Found500 Internal Server Error

application/json

"EvaluationModelJobs updated"

⚠️ This example has been generated automatically from the schema and it is not accurate. Refer to the schema for more information.

Schema of the response body

{
    "type": "string",
    "enum": [
        "EvaluationModelJobs updated"
    ]
}

Refer to the common response description: Unauthorized.

Refer to the common response description: Unauthenticated.

Refer to the common response description: NotFound.

Refer to the common response description: Error.

POST /evaluationsession

Run an evaluation on a model, or take the test as a human.

Description

Human tests

Humans can test themselves on evaluations to check how hard they are. This should be done via the "Test yourself" button on evaluation pages. A random subsample of some 20 tasks will be returned, and once all of them have been completed, a summary shown of how well they did compared to other humans and AI models. Human tests can only be taken by the actual caller, as determined by Session-Token or Api-Token. Providing a different user via evaluatee_id won't do anything.

Each human test is idempotent, so until it has been completed, calling this endpoint for a given evaluation will return the same 20 tasks. This can be overriden with the restart parameter - when that is true, then a new evaluation session will be started.

Human tests are free.

AI model evaluation

Calling this endpoint with a model id in the evaluatee_id field and is_human_being_evaluated = false will start a new evaluation session for the provided evaluation_id. This requires payment, which will automatically be subtracted from your credits. If you don't have enough credits, a 402 error will be returned, with a link to your user profile, where you can purchase more credits.

By default there will be only one evaluation session per evaluation/model pair at a time. Calling this endpoint for a running evaluation session will append tasks to the current session rather than creating a new one. You can force a new evaluation session by setting restart = true.

Request body

application/json

{
    "origin": "user",
    "is_human_being_evaluated": true,
    "min_verbosity": 10.12,
    "max_verbosity": 10.12,
    "avg_verbosity": 10.12,
    "median_verbosity": 10.12,
    "evaluatee_id": "1ec67c40-fa5d-4a4d-867c-ac1cf75d4ec4",
    "evaluation_id": "9cc68041-01fe-474e-a623-59f37c7074aa",
    "notify": [
        {
            "method": "email",
            "destination": "mr.blobby@acme.com"
        }
    ],
    "restart": false,
    "system_prompt": "(str \"Please answer this: \" task)",
    "prompt": {
        "MCQ": "(str \"I have a multiple choice question for you to answer: \" task)",
        "default": "(str \"Answer this, please: \" task)"
    },
    "request": {
        "MCQ": "(bedrock-call \"your-access-key\" \"your-secret-key\" \"Jurassic\" task-text)",
        "default": "false"
    },
    "response": {
        "MCQ": "(= parsedResponse correct)",
        "default": "false"
    },
    "grader": {
        "MCQ": "(= parsedResponse correct)",
        "default": "false"
    }
}

⚠️ This example has been generated automatically from the schema and it is not accurate. Refer to the schema for more information.

Schema of the request body

{
    "type": "object",
    "properties": {
        "origin": {
            "type": "string",
            "description": "The source of this evaluation session, i.e. what triggered it",
            "example": "user",
            "enum": [
                "alert",
                "user",
                "job",
                "model"
            ]
        },
        "is_human_being_evaluated": {
            "type": "boolean",
            "description": "Whether this evaluation session is a human test. When false will start an automatic test for the provided model and evaluation.",
            "example": true
        },
        "min_verbosity": {
            "type": "number",
            "format": "double",
            "nullable": true
        },
        "max_verbosity": {
            "type": "number",
            "format": "double",
            "nullable": true
        },
        "avg_verbosity": {
            "type": "number",
            "format": "double",
            "nullable": true
        },
        "median_verbosity": {
            "type": "number",
            "format": "double",
            "nullable": true
        },
        "evaluatee_id": {
            "type": "string",
            "format": "uuid",
            "description": "In the case of human tests, the id of the user taking the test. In the case of testing models, the id of the model to be tested"
        },
        "evaluation_id": {
            "type": "string",
            "format": "uuid",
            "description": "The id of the evaluation to be run"
        },
        "notify": {
            "description": "How to notify that the evaluation session has finished. There can be up to 20 notification methods provided. If not methods provided, an email will be sent to the user that triggered it.",
            "type": "array",
            "items": {
                "type": "object",
                "properties": {
                    "method": {
                        "type": "string",
                        "enum": [
                            "email",
                            "webhook",
                            "sms",
                            "call"
                        ],
                        "description": "The notification method",
                        "example": "email"
                    },
                    "destination": {
                        "type": "string",
                        "description": "Where to send a notification",
                        "example": "mr.blobby@acme.com"
                    }
                }
            }
        },
        "restart": {
            "description": "Will force a new evaluation session if true - by default, calling this endpoint for a evaluation - model session that is already running, will add more tasks to the running session, rather than creating a new one",
            "example": false,
            "type": "boolean"
        },
        "system_prompt": {
            "description": "DSL code specifying how to construct model system prompts. This can be empty.",
            "type": "string",
            "example": "(str \"Please answer this: \" task)"
        },
        "prompt": {
            "description": "DSL code specifying how to construct model prompts. This can be empty, in which case the prompt code of the evaluation will be used. You can specify a `prompt` that will be used for all types of tasks, or per task type `prompt`s. If you provide both a default `prompt` and one for a specific task type, the specific one takes precedence. This can only be used if you're an admin of the selected evaluation - otherwise an error will be returned.",
            "oneOf": [
                {
                    "type": "string",
                    "description": "DSL code that should be used for all prompts",
                    "example": "(str \"Please answer this: \" task)"
                },
                {
                    "type": "object",
                    "description": "Per task type DSL code. Use the \"default\" key to specify the code that should be used for tasks types that aren't specified - otherwise the system default prompt will be used.",
                    "properties": {
                        "default": {
                            "type": "string",
                            "description": "The default `prompt` to be used for task types that aren't specified.",
                            "example": "(str \"Answer this, please: \" task)"
                        },
                        "FRQ": {
                            "type": "string",
                            "description": "The DSL code to be used to create prompts for FRQ tasks. If this is empty, the default `prompt` will be used"
                        },
                        "bool": {
                            "type": "string",
                            "description": "The DSL code to be used to create prompts for bool tasks. If this is empty, the default `prompt` will be used"
                        },
                        "json": {
                            "type": "string",
                            "description": "The DSL code to be used to create prompts for json tasks. If this is empty, the default `prompt` will be used"
                        },
                        "MCQ": {
                            "type": "string",
                            "description": "The DSL code to be used to create prompts for MCQ tasks. If this is empty, the default `prompt` will be used"
                        }
                    },
                    "example": {
                        "MCQ": "(str \"I have a multiple choice question for you to answer: \" task)",
                        "default": "(str \"Answer this, please: \" task)"
                    }
                }
            ],
            "example": {
                "MCQ": "(str \"I have a multiple choice question for you to answer: \" task)",
                "default": "(str \"Answer this, please: \" task)"
            }
        },
        "request": {
            "description": "DSL code specifying how to send tasks to the model. This can be empty, in which case the request code of the model will be used. You can specify a `request` that will be used for all types of tasks, or per task type `request`s. If you provide both a default `request` and one for a specific task type, the specific one takes precedence. This can only be used if you're an admin of the selected model - otherwise an error will be returned.",
            "oneOf": [
                {
                    "type": "string",
                    "description": "DSL code that should be used for all requests",
                    "example": "(POST \"http://my.model.endpoint\" {:json {\"task\" task}})"
                },
                {
                    "type": "object",
                    "description": "Per task type DSL code. Use the \"default\" key to specify the code that should be used for tasks types that aren't specified - otherwise the system default request code will be used.",
                    "properties": {
                        "default": {
                            "type": "string",
                            "description": "The default `request` to be used for task types that aren't specified.",
                            "example": "(openai-call \"your_key\" \"gtp-4\" task)"
                        },
                        "FRQ": {
                            "type": "string",
                            "description": "The DSL code to be used to send requests for FRQ tasks. If this is empty, the default `request` will be used"
                        },
                        "bool": {
                            "type": "string",
                            "description": "The DSL code to be used to send requests for bool tasks. If this is empty, the default `request` will be used"
                        },
                        "json": {
                            "type": "string",
                            "description": "The DSL code to be used to send requests for json tasks. If this is empty, the default `request` will be used"
                        },
                        "MCQ": {
                            "type": "string",
                            "description": "The DSL code to be used to send requests for MCQ tasks. If this is empty, the default `request` will be used"
                        }
                    },
                    "example": {
                        "MCQ": "(openai-call \"sk-your-secret-key\" \"gtp-4-turbo\" task-text)",
                        "default": "(anthropic-call \"sk-your-secret-key\" \"claude\" task)"
                    }
                }
            ],
            "example": {
                "MCQ": "(bedrock-call \"your-access-key\" \"your-secret-key\" \"Jurassic\" task-text)",
                "default": "false"
            }
        },
        "response": {
            "description": "DSL code specifying how to parse LLM responses. This can be empty, in which case the response code of the model will be used. You can specify a `response` parser that will be used for all types of tasks, or per task type parsers. If you provide both a default parser and one for a specific task type, the specific one takes precedence. This can only be used if you're an admin of the selected model - otherwise an error will be returned.",
            "oneOf": [
                {
                    "type": "string",
                    "description": "DSL code that should be used for all responses",
                    "example": "(get-in response [\"json\" \"resp\"])"
                },
                {
                    "type": "object",
                    "description": "Per task type DSL code. Use the \"default\" key to specify the code that should be used for tasks types that aren't specified - otherwise the model's default parser will be used.",
                    "properties": {
                        "default": {
                            "type": "string",
                            "description": "The default parser to be used for task types that aren't specified.",
                            "example": "response"
                        },
                        "FRQ": {
                            "type": "string",
                            "description": "The DSL code to be used to parse FRQ task responses. If this is empty, the default parser will be used"
                        },
                        "bool": {
                            "type": "string",
                            "description": "The DSL code to be used to parse bool task responses. If this is empty, the default parser will be used"
                        },
                        "json": {
                            "type": "string",
                            "description": "The DSL code to be used to parse json task responses. If this is empty, the default parser will be used"
                        },
                        "MCQ": {
                            "type": "string",
                            "description": "The DSL code to be used to parse MCQ task responses. If this is empty, the default parser will be used"
                        }
                    },
                    "example": {
                        "MCQ": "(= parsedResponse correct)",
                        "default": "false"
                    }
                }
            ],
            "example": {
                "MCQ": "(= parsedResponse correct)",
                "default": "false"
            }
        },
        "grader": {
            "description": "DSL code specifying how to grade LLM responses. This can be empty, in which case the grader of the evaluation will be used. You can specify a grader that will be used for all types of tasks, or per task type graders. If you provide both a default grader and one for a specific task type, the specific one takes precedence. This can only be used if you're an admin of the selected evaluation - otherwise an error will be returned.",
            "oneOf": [
                {
                    "type": "string",
                    "description": "DSL code that should be used for all response",
                    "example": "(= parsedResponse \"ok\")"
                },
                {
                    "type": "object",
                    "description": "Per task type DSL code. Use the \"default\" key to specify the code that should be used for tasks types that aren't specified - otherwise the grader of the evaluation will be used.",
                    "properties": {
                        "default": {
                            "type": "string",
                            "description": "The default grader to be used for task types that aren't specified.",
                            "example": "(if (= parsedResponse correct) 1 0)"
                        },
                        "FRQ": {
                            "type": "string",
                            "description": "The DSL code to be used to grade FRQ tasks. If this is empty, the default grader will be used"
                        },
                        "bool": {
                            "type": "string",
                            "description": "The DSL code to be used to grade bool tasks. If this is empty, the default grader will be used"
                        },
                        "json": {
                            "type": "string",
                            "description": "The DSL code to be used to grade json tasks. If this is empty, the default grader will be used"
                        },
                        "MCQ": {
                            "type": "string",
                            "description": "The DSL code to be used to grade MCQ tasks. If this is empty, the default grader will be used"
                        }
                    },
                    "example": {
                        "MCQ": "(= parsedResponse correct)",
                        "default": "false"
                    }
                }
            ],
            "example": {
                "MCQ": "(= parsedResponse correct)",
                "default": "false"
            }
        }
    }
}

Responses

201 Created401 Unauthorized402 Payment Required403 Forbidden500 Internal Server Error

application/json

{
    "id": "4b5d04c5-46c8-4361-aec2-6943db45be82",
    "datetime_started": "2022-04-13T15:42:05.901Z",
    "datetime_completed": "2022-04-13T15:42:05.901Z",
    "origin": "user",
    "completed": true,
    "failed": true,
    "is_human_being_evaluated": true,
    "num_questions_answered": 10.12,
    "num_answered_correctly": 10.12,
    "num_tasks_to_complete": 10.12,
    "num_endpoint_failures": 10.12,
    "num_endpoint_calls": 10.12,
    "num_characters_sent_to_endpoint": 10.12,
    "num_characters_received_from_endpoint": 10.12,
    "median_seconds_per_task": 10.12,
    "mean_seconds_per_task": 10.12,
    "std_seconds_per_task": 10.12,
    "distribution_of_seconds_per_task": null,
    "min_seconds_per_task": 10.12,
    "max_seconds_per_task": 10.12,
    "median_characters_per_task": 10.12,
    "mean_characters_per_task": 10.12,
    "std_characters_per_task": 10.12,
    "distribution_of_characters_per_task": null,
    "min_characters_per_task": 10.12,
    "max_characters_per_task": 10.12,
    "min_verbosity": 10.12,
    "max_verbosity": 10.12,
    "avg_verbosity": 10.12,
    "median_verbosity": 10.12,
    "evaluatee_id": "cc475c38-985c-4b3d-9e3a-766b4945166e",
    "evaluation_id": "761e4f79-385d-47a8-bd39-ab5b3ffb78ed"
}

⚠️ This example has been generated automatically from the schema and it is not accurate. Refer to the schema for more information.

Schema of the response body

{
    "type": "object",
    "properties": {
        "id": {
            "type": "string",
            "format": "uuid"
        },
        "datetime_started": {
            "type": "string",
            "format": "date-time"
        },
        "datetime_completed": {
            "type": "string",
            "format": "date-time",
            "nullable": true
        },
        "origin": {
            "type": "string",
            "description": "The source of this evaluation session, i.e. what triggered it",
            "example": "user",
            "enum": [
                "alert",
                "user",
                "job",
                "model"
            ]
        },
        "completed": {
            "type": "boolean"
        },
        "failed": {
            "type": "boolean"
        },
        "is_human_being_evaluated": {
            "type": "boolean",
            "description": "Whether this evaluation session is a human test. When false will start an automatic test for the provided model and evaluation.",
            "example": true
        },
        "num_questions_answered": {
            "type": "number",
            "format": "int64"
        },
        "num_answered_correctly": {
            "type": "number",
            "format": "int64"
        },
        "num_tasks_to_complete": {
            "type": "number",
            "format": "int64"
        },
        "num_endpoint_failures": {
            "type": "number",
            "format": "int64"
        },
        "num_endpoint_calls": {
            "type": "number",
            "format": "int64"
        },
        "num_characters_sent_to_endpoint": {
            "type": "number",
            "format": "int64"
        },
        "num_characters_received_from_endpoint": {
            "type": "number",
            "format": "int64"
        },
        "median_seconds_per_task": {
            "type": "number",
            "format": "double",
            "nullable": true
        },
        "mean_seconds_per_task": {
            "type": "number",
            "format": "double",
            "nullable": true
        },
        "std_seconds_per_task": {
            "type": "number",
            "format": "double",
            "nullable": true
        },
        "distribution_of_seconds_per_task": {
            "nullable": true
        },
        "min_seconds_per_task": {
            "type": "number",
            "format": "double",
            "nullable": true
        },
        "max_seconds_per_task": {
            "type": "number",
            "format": "double",
            "nullable": true
        },
        "median_characters_per_task": {
            "type": "number",
            "format": "double",
            "nullable": true
        },
        "mean_characters_per_task": {
            "type": "number",
            "format": "double",
            "nullable": true
        },
        "std_characters_per_task": {
            "type": "number",
            "format": "double",
            "nullable": true
        },
        "distribution_of_characters_per_task": {
            "nullable": true
        },
        "min_characters_per_task": {
            "type": "number",
            "format": "double",
            "nullable": true
        },
        "max_characters_per_task": {
            "type": "number",
            "format": "double",
            "nullable": true
        },
        "min_verbosity": {
            "type": "number",
            "format": "double",
            "nullable": true
        },
        "max_verbosity": {
            "type": "number",
            "format": "double",
            "nullable": true
        },
        "avg_verbosity": {
            "type": "number",
            "format": "double",
            "nullable": true
        },
        "median_verbosity": {
            "type": "number",
            "format": "double",
            "nullable": true
        },
        "evaluatee_id": {
            "type": "string",
            "format": "uuid",
            "description": "In the case of human tests, the id of the user taking the test. In the case of testing models, the id of the model to be tested"
        },
        "evaluation_id": {
            "type": "string",
            "format": "uuid",
            "description": "The id of the evaluation to be run"
        }
    }
}

Refer to the common response description: Unauthorized.

Refer to the common response description: PaymentRequired.

Refer to the common response description: Unauthenticated.

Refer to the common response description: Error.

GET /evaluationsession

Get evaluation sessions.

Description

If the id parameter is provided, this endpoint will return the appropriate evaluation session if possible. In the case of human tests, you can only use this endpoint to get your own results. In the case of AI model runs, you can use this endpoint to get any evaluations of models where either the model/evaluation is public, or you're an administrator of it.

Input parameters

Parameter	In	Type	Default	Nullable	Description
`id`	query	string		Yes	Will return the item with this id, or die trying. When this parameter is provided, then only a single item will be returned

Responses

200 OK404 Not Found500 Internal Server Error

application/json

Schema of the response body

{
    "oneOf": [
        {
            "$ref": "#/components/schemas/EvaluationSession"
        },
        {
            "type": "object",
            "properties": {
                "items": {
                    "description": "An array of all the items that were found, but capped at most at `per_page`",
                    "type": "array",
                    "items": {
                        "$ref": "#/components/schemas/EvaluationSession"
                    }
                },
                "count": {
                    "description": "The total number of items found",
                    "type": "number",
                    "format": "int32"
                },
                "per_page": {
                    "description": "The number of items returned per page",
                    "type": "number",
                    "format": "int32"
                },
                "page": {
                    "description": "The number of available pages",
                    "type": "number",
                    "format": "int32"
                }
            }
        }
    ]
}

Refer to the common response description: NotFound.

Refer to the common response description: Error.

POST /model

Request body

application/json

{
    "name": "my model",
    "description": "# This is a model, see more at [this link](http://some.link)",
    "publisher": "Models R Us",
    "architecture": "RNN",
    "picture": "http://some.example/pic",
    "num_parameters": 30000000,
    "modalities": "text",
    "public": true,
    "public_usable": false,
    "check_availability": true,
    "endpoint_type": "open_ai",
    "setup_code": "(POST \"http://start.my.model\")",
    "teardown_code": "(POST \"http://start.my.model\")",
    "task_holding_queue_url": "string",
    "task_execution_queue_url": "string",
    "task_execution_dlq_url": "string",
    "lambda_arn": "string",
    "cost_per_input_character_usd": 2e-05,
    "cost_per_output_character_usd": 0.0005,
    "cost_per_instance_hour_usd": 4.99,
    "max_characters_per_minute": 400,
    "max_request_per_minute": 30,
    "max_context_window_characters": 4096,
    "request_code": "(openai-call \"sk-your-secret-key\" \"gtp-4-turbo\" task-text)",
    "response_code": "(get-in response [\"json\" \"response\"])"
}

⚠️ This example has been generated automatically from the schema and it is not accurate. Refer to the schema for more information.

Schema of the request body

{
    "type": "object",
    "properties": {
        "name": {
            "type": "string",
            "example": "my model"
        },
        "description": {
            "type": "string",
            "description": "The description of this model, as displayed on the site. Markdown can be used for formatting",
            "nullable": true,
            "example": "# This is a model, see more at [this link](http://some.link)"
        },
        "publisher": {
            "type": "string",
            "description": "The entity that created this model",
            "nullable": true,
            "example": "Models R Us"
        },
        "architecture": {
            "type": "string",
            "description": "The architecture of this model",
            "nullable": true,
            "example": "RNN"
        },
        "picture": {
            "type": "string",
            "description": "An url to an image representing this model",
            "nullable": true,
            "example": "http://some.example/pic"
        },
        "num_parameters": {
            "type": "integer",
            "format": "int64",
            "description": "The number of parameters of the model",
            "nullable": true,
            "example": 30000000
        },
        "modalities": {
            "type": "array",
            "items": {
                "type": "string"
            },
            "description": "The modalities accepted by this model",
            "enum": [
                "text"
            ],
            "example": "text"
        },
        "public": {
            "type": "boolean",
            "description": "Whether this evaluation should be publicly visible. If true, anyone can view its details."
        },
        "public_usable": {
            "type": "boolean",
            "description": "Whether this model can be tested by anyone. LLMs can cost a lot to run, and these costs are on whoever added the model. This setting is here to add an extra protection against people running up large compute costs on this model. When not set, this is `false`.",
            "example": false
        },
        "check_availability": {
            "type": "boolean",
            "description": "Whether the availability of this model should be checked. When true, we will ping the endpoint every ",
            "nullable": true
        },
        "endpoint_type": {
            "type": "string",
            "description": "The type of endpoint being called. We have dedicated handlers for many of the most popular AI model providers",
            "enum": [
                "aws",
                "together.ai",
                "conversational",
                "google_cloud",
                "azure",
                "text-generation",
                "anthropic",
                "fill-mask",
                "zero-shot-classification",
                "custom",
                "open_ai",
                "text2text-generation",
                "mistral"
            ],
            "example": "open_ai"
        },
        "setup_code": {
            "type": "string",
            "description": "An optional piece of DSL code to be called if the model isn't running. This is useful when your model needs time to spin up - you can defined a call to start it here, which will be called once the model is first used.",
            "nullable": true,
            "example": "(POST \"http://start.my.model\")"
        },
        "teardown_code": {
            "type": "string",
            "description": "An optional piece of DSL code to be run after the model has finished all evaluation sessions. This is useful e.g. when your model is living on an AWS server, where you pay for uptime. You can defined a call to kill the instance, which will be called after no more evaluation sessions are running.",
            "nullable": true,
            "example": "(POST \"http://start.my.model\")"
        },
        "task_holding_queue_url": {
            "type": "string",
            "nullable": true
        },
        "task_execution_queue_url": {
            "type": "string",
            "nullable": true
        },
        "task_execution_dlq_url": {
            "type": "string",
            "nullable": true
        },
        "lambda_arn": {
            "type": "string",
            "nullable": true
        },
        "cost_per_input_character_usd": {
            "type": "number",
            "format": "double",
            "description": "The cost of a single input character in USD. We assume that a single token is 4 characters.",
            "example": 2e-05
        },
        "cost_per_output_character_usd": {
            "type": "number",
            "format": "double",
            "description": "The cost of a single output character in USD. We assume that a single token is 4 characters.",
            "example": 0.0005
        },
        "cost_per_instance_hour_usd": {
            "type": "number",
            "format": "double",
            "description": "The cost of running the model for an hour, in USD. This doesn't include input/output tokens - it's purely the server uptime. This is useful e.g. with HuggingFace inference endpoints, where they charge for server time, not for tokens throughput.",
            "example": 4.99
        },
        "max_characters_per_minute": {
            "type": "integer",
            "format": "int64",
            "description": "The maximum allowed number of characters per minute. We assume that one token is 4 characters. This must be at least 1.",
            "example": 400
        },
        "max_request_per_minute": {
            "type": "integer",
            "format": "int64",
            "description": "The maximum allowed number of requess per minute. This must be at least 1.",
            "example": 30
        },
        "max_context_window_characters": {
            "type": "integer",
            "format": "int64",
            "description": "The maximum number of characters allowed in the context window of this model. We assume that 1 token is 4 characters",
            "nullable": true,
            "example": 4096
        },
        "request_code": {
            "description": "DSL code defining how to send requests to the model. See the [DSL page](/docs/dsl/) for more info.",
            "example": "(openai-call \"sk-your-secret-key\" \"gtp-4-turbo\" task-text)"
        },
        "response_code": {
            "description": "DSL code defining how to parse responses from the model. See the [DSL page](/docs/dsl/) for more info.",
            "example": "(get-in response [\"json\" \"response\"])"
        }
    }
}

Responses

201 Created401 Unauthorized403 Forbidden500 Internal Server Error

application/json

{
    "id": "c56d2964-8364-4b8a-b6a6-517ec796fa31",
    "name": "my model",
    "description": "# This is a model, see more at [this link](http://some.link)",
    "owner_id": "4b716715-3cca-411f-b3b7-1dd767965a83",
    "publisher": "Models R Us",
    "architecture": "RNN",
    "picture": "http://some.example/pic",
    "num_parameters": 30000000,
    "modalities": "text",
    "public": true,
    "public_usable": false,
    "check_availability": true,
    "quality": 0.89,
    "endpoint_type": "open_ai",
    "cost_per_input_character_usd": 2e-05,
    "cost_per_output_character_usd": 0.0005,
    "cost_per_instance_hour_usd": 4.99,
    "max_characters_per_minute": 400,
    "max_request_per_minute": 30,
    "max_context_window_characters": 4096,
    "elo_score": 10.12,
    "score": 10.12,
    "availability": 10.12,
    "top_example_id": "f6b9676f-3954-4a35-aa54-b8695e4189ee",
    "worst_example_id": "cdb87dda-fc9b-4bf8-9419-fc5614d88356",
    "owner": {
        "id": "7fd75322-29d1-4e27-9e3b-499cee8cdadc",
        "email_address": "mr.blobby@some.domain",
        "user_name": "mr_blobby",
        "full_name": "Mr Blobby, esq.",
        "user_image": "https://equistamp.com/avatars/123123123123.png",
        "bio": "Hello, my name is Inigo Montoya. You Killed my Father. Prepare to die",
        "display_options": {
            "bio": true,
            "email_address": true,
            "user_image": false
        },
        "join_date": "2022-04-13",
        "subscription_level": "pro",
        "alerts": [
            "a685d9e4-d46b-4f61-9358-9c5c53d2efb5"
        ]
    },
    "top_example": {
        "id": "627596cc-ea70-40cd-b187-56dc733caf0a",
        "task_type": "string",
        "is_task_live": true,
        "modalities": [
            "string"
        ],
        "redacted": true,
        "num_possible_answers": 10.12,
        "evaluation_task_number": 10.12,
        "median_human_completion_seconds": 10.12,
        "median_ai_completion_seconds": 10.12,
        "num_times_human_evaluated": 10.12,
        "num_times_ai_evaluated": 10.12,
        "num_times_humans_answered_correctly": 10.12,
        "num_times_ai_answered_correctly": 10.12,
        "evaluation_id": "df4341dc-0d57-4043-8d26-6277b0fd47de",
        "owner_id": "301c3b95-f8fb-4022-86a0-6ba2273cedee",
        "tags": [
            "ccdc9aec-dfd2-4d8d-a953-fee364f64b4d"
        ]
    },
    "worst_example": null,
    "best_evaluation_session": {
        "id": "e538629a-7b80-48af-9564-c8d07477ab55",
        "datetime_started": "2022-04-13T15:42:05.901Z",
        "datetime_completed": "2022-04-13T15:42:05.901Z",
        "origin": "user",
        "completed": true,
        "failed": true,
        "is_human_being_evaluated": true,
        "num_questions_answered": 10.12,
        "num_answered_correctly": 10.12,
        "num_tasks_to_complete": 10.12,
        "num_endpoint_failures": 10.12,
        "num_endpoint_calls": 10.12,
        "num_characters_sent_to_endpoint": 10.12,
        "num_characters_received_from_endpoint": 10.12,
        "median_seconds_per_task": 10.12,
        "mean_seconds_per_task": 10.12,
        "std_seconds_per_task": 10.12,
        "distribution_of_seconds_per_task": null,
        "min_seconds_per_task": 10.12,
        "max_seconds_per_task": 10.12,
        "median_characters_per_task": 10.12,
        "mean_characters_per_task": 10.12,
        "std_characters_per_task": 10.12,
        "distribution_of_characters_per_task": null,
        "min_characters_per_task": 10.12,
        "max_characters_per_task": 10.12,
        "min_verbosity": 10.12,
        "max_verbosity": 10.12,
        "avg_verbosity": 10.12,
        "median_verbosity": 10.12,
        "evaluatee_id": "b9fc398f-5caf-4d85-bca3-9ee6b5b965d3",
        "evaluation_id": "29b9e5c3-e9e3-4c97-96f0-fd74d191a1d8"
    },
    "worst_evaluation_session": null
}

⚠️ This example has been generated automatically from the schema and it is not accurate. Refer to the schema for more information.

Schema of the response body

{
    "type": "object",
    "properties": {
        "id": {
            "type": "string",
            "format": "uuid"
        },
        "name": {
            "type": "string",
            "example": "my model"
        },
        "description": {
            "type": "string",
            "description": "The description of this model, as displayed on the site. Markdown can be used for formatting",
            "nullable": true,
            "example": "# This is a model, see more at [this link](http://some.link)"
        },
        "owner_id": {
            "type": "string",
            "format": "uuid"
        },
        "publisher": {
            "type": "string",
            "description": "The entity that created this model",
            "nullable": true,
            "example": "Models R Us"
        },
        "architecture": {
            "type": "string",
            "description": "The architecture of this model",
            "nullable": true,
            "example": "RNN"
        },
        "picture": {
            "type": "string",
            "description": "An url to an image representing this model",
            "nullable": true,
            "example": "http://some.example/pic"
        },
        "num_parameters": {
            "type": "integer",
            "format": "int64",
            "description": "The number of parameters of the model",
            "nullable": true,
            "example": 30000000
        },
        "modalities": {
            "type": "array",
            "items": {
                "type": "string"
            },
            "description": "The modalities accepted by this model",
            "enum": [
                "text"
            ],
            "example": "text"
        },
        "public": {
            "type": "boolean",
            "description": "Whether this evaluation should be publicly visible. If true, anyone can view its details."
        },
        "public_usable": {
            "type": "boolean",
            "description": "Whether this model can be tested by anyone. LLMs can cost a lot to run, and these costs are on whoever added the model. This setting is here to add an extra protection against people running up large compute costs on this model. When not set, this is `false`.",
            "example": false
        },
        "check_availability": {
            "type": "boolean",
            "description": "Whether the availability of this model should be checked. When true, we will ping the endpoint every ",
            "nullable": true
        },
        "quality": {
            "type": "number",
            "format": "double",
            "description": "The quality of this model, i.e. how much it's worth using, from 0 to 1. This is very subjective, and mainly used to decide whether it should be used by default e.g. on the frontpage.",
            "example": 0.89
        },
        "endpoint_type": {
            "type": "string",
            "description": "The type of endpoint being called. We have dedicated handlers for many of the most popular AI model providers",
            "enum": [
                "aws",
                "together.ai",
                "conversational",
                "google_cloud",
                "azure",
                "text-generation",
                "anthropic",
                "fill-mask",
                "zero-shot-classification",
                "custom",
                "open_ai",
                "text2text-generation",
                "mistral"
            ],
            "example": "open_ai"
        },
        "cost_per_input_character_usd": {
            "type": "number",
            "format": "double",
            "description": "The cost of a single input character in USD. We assume that a single token is 4 characters.",
            "example": 2e-05
        },
        "cost_per_output_character_usd": {
            "type": "number",
            "format": "double",
            "description": "The cost of a single output character in USD. We assume that a single token is 4 characters.",
            "example": 0.0005
        },
        "cost_per_instance_hour_usd": {
            "type": "number",
            "format": "double",
            "description": "The cost of running the model for an hour, in USD. This doesn't include input/output tokens - it's purely the server uptime. This is useful e.g. with HuggingFace inference endpoints, where they charge for server time, not for tokens throughput.",
            "example": 4.99
        },
        "max_characters_per_minute": {
            "type": "integer",
            "format": "int64",
            "description": "The maximum allowed number of characters per minute. We assume that one token is 4 characters. This must be at least 1.",
            "example": 400
        },
        "max_request_per_minute": {
            "type": "integer",
            "format": "int64",
            "description": "The maximum allowed number of requess per minute. This must be at least 1.",
            "example": 30
        },
        "max_context_window_characters": {
            "type": "integer",
            "format": "int64",
            "description": "The maximum number of characters allowed in the context window of this model. We assume that 1 token is 4 characters",
            "nullable": true,
            "example": 4096
        },
        "elo_score": {
            "type": "number",
            "format": "double",
            "description": "The ELO score, according to LLMSys",
            "nullable": true
        },
        "score": {
            "type": "number",
            "format": "double",
            "nullable": true
        },
        "availability": {
            "type": "number",
            "format": "double",
            "nullable": true
        },
        "top_example_id": {
            "type": "string",
            "format": "uuid",
            "nullable": true
        },
        "worst_example_id": {
            "type": "string",
            "format": "uuid",
            "nullable": true
        },
        "owner": {
            "$ref": "#/components/schemas/ShallowUser"
        },
        "top_example": {
            "$ref": "#/components/schemas/ShallowTask"
        },
        "worst_example": {
            "$ref": "#/components/schemas/ShallowTask"
        },
        "best_evaluation_session": {
            "$ref": "#/components/schemas/ShallowEvaluationSession"
        },
        "worst_evaluation_session": {
            "$ref": "#/components/schemas/ShallowEvaluationSession"
        }
    }
}

Refer to the common response description: Unauthorized.

Refer to the common response description: Unauthenticated.

Refer to the common response description: Error.

GET /model

Input parameters

Parameter	In	Type	Default	Nullable	Description
`id`	query	string		Yes	Will return the item with this id, or die trying. When this parameter is provided, then only a single item will be returned

Responses

200 OK404 Not Found500 Internal Server Error

application/json

Schema of the response body

{
    "oneOf": [
        {
            "$ref": "#/components/schemas/Model"
        },
        {
            "type": "object",
            "properties": {
                "items": {
                    "description": "An array of all the items that were found, but capped at most at `per_page`",
                    "type": "array",
                    "items": {
                        "$ref": "#/components/schemas/Model"
                    }
                },
                "count": {
                    "description": "The total number of items found",
                    "type": "number",
                    "format": "int32"
                },
                "per_page": {
                    "description": "The number of items returned per page",
                    "type": "number",
                    "format": "int32"
                },
                "page": {
                    "description": "The number of available pages",
                    "type": "number",
                    "format": "int32"
                }
            }
        }
    ]
}

Refer to the common response description: NotFound.

Refer to the common response description: Error.

PUT /model

Request body

application/json

{
    "name": "my model",
    "description": "# This is a model, see more at [this link](http://some.link)",
    "publisher": "Models R Us",
    "architecture": "RNN",
    "picture": "http://some.example/pic",
    "num_parameters": 30000000,
    "modalities": "text",
    "public": true,
    "public_usable": false,
    "check_availability": true,
    "endpoint_type": "open_ai",
    "setup_code": "(POST \"http://start.my.model\")",
    "teardown_code": "(POST \"http://start.my.model\")",
    "task_holding_queue_url": "string",
    "task_execution_queue_url": "string",
    "task_execution_dlq_url": "string",
    "lambda_arn": "string",
    "cost_per_input_character_usd": 2e-05,
    "cost_per_output_character_usd": 0.0005,
    "cost_per_instance_hour_usd": 4.99,
    "max_characters_per_minute": 400,
    "max_request_per_minute": 30,
    "max_context_window_characters": 4096
}

⚠️ This example has been generated automatically from the schema and it is not accurate. Refer to the schema for more information.

Schema of the request body

{
    "type": "object",
    "properties": {
        "name": {
            "type": "string",
            "example": "my model"
        },
        "description": {
            "type": "string",
            "description": "The description of this model, as displayed on the site. Markdown can be used for formatting",
            "nullable": true,
            "example": "# This is a model, see more at [this link](http://some.link)"
        },
        "publisher": {
            "type": "string",
            "description": "The entity that created this model",
            "nullable": true,
            "example": "Models R Us"
        },
        "architecture": {
            "type": "string",
            "description": "The architecture of this model",
            "nullable": true,
            "example": "RNN"
        },
        "picture": {
            "type": "string",
            "description": "An url to an image representing this model",
            "nullable": true,
            "example": "http://some.example/pic"
        },
        "num_parameters": {
            "type": "integer",
            "format": "int64",
            "description": "The number of parameters of the model",
            "nullable": true,
            "example": 30000000
        },
        "modalities": {
            "type": "array",
            "items": {
                "type": "string"
            },
            "description": "The modalities accepted by this model",
            "enum": [
                "text"
            ],
            "example": "text"
        },
        "public": {
            "type": "boolean",
            "description": "Whether this evaluation should be publicly visible. If true, anyone can view its details."
        },
        "public_usable": {
            "type": "boolean",
            "description": "Whether this model can be tested by anyone. LLMs can cost a lot to run, and these costs are on whoever added the model. This setting is here to add an extra protection against people running up large compute costs on this model. When not set, this is `false`.",
            "example": false
        },
        "check_availability": {
            "type": "boolean",
            "description": "Whether the availability of this model should be checked. When true, we will ping the endpoint every ",
            "nullable": true
        },
        "endpoint_type": {
            "type": "string",
            "description": "The type of endpoint being called. We have dedicated handlers for many of the most popular AI model providers",
            "enum": [
                "aws",
                "together.ai",
                "conversational",
                "google_cloud",
                "azure",
                "text-generation",
                "anthropic",
                "fill-mask",
                "zero-shot-classification",
                "custom",
                "open_ai",
                "text2text-generation",
                "mistral"
            ],
            "example": "open_ai"
        },
        "setup_code": {
            "type": "string",
            "description": "An optional piece of DSL code to be called if the model isn't running. This is useful when your model needs time to spin up - you can defined a call to start it here, which will be called once the model is first used.",
            "nullable": true,
            "example": "(POST \"http://start.my.model\")"
        },
        "teardown_code": {
            "type": "string",
            "description": "An optional piece of DSL code to be run after the model has finished all evaluation sessions. This is useful e.g. when your model is living on an AWS server, where you pay for uptime. You can defined a call to kill the instance, which will be called after no more evaluation sessions are running.",
            "nullable": true,
            "example": "(POST \"http://start.my.model\")"
        },
        "task_holding_queue_url": {
            "type": "string",
            "nullable": true
        },
        "task_execution_queue_url": {
            "type": "string",
            "nullable": true
        },
        "task_execution_dlq_url": {
            "type": "string",
            "nullable": true
        },
        "lambda_arn": {
            "type": "string",
            "nullable": true
        },
        "cost_per_input_character_usd": {
            "type": "number",
            "format": "double",
            "description": "The cost of a single input character in USD. We assume that a single token is 4 characters.",
            "example": 2e-05
        },
        "cost_per_output_character_usd": {
            "type": "number",
            "format": "double",
            "description": "The cost of a single output character in USD. We assume that a single token is 4 characters.",
            "example": 0.0005
        },
        "cost_per_instance_hour_usd": {
            "type": "number",
            "format": "double",
            "description": "The cost of running the model for an hour, in USD. This doesn't include input/output tokens - it's purely the server uptime. This is useful e.g. with HuggingFace inference endpoints, where they charge for server time, not for tokens throughput.",
            "example": 4.99
        },
        "max_characters_per_minute": {
            "type": "integer",
            "format": "int64",
            "description": "The maximum allowed number of characters per minute. We assume that one token is 4 characters. This must be at least 1.",
            "example": 400
        },
        "max_request_per_minute": {
            "type": "integer",
            "format": "int64",
            "description": "The maximum allowed number of requess per minute. This must be at least 1.",
            "example": 30
        },
        "max_context_window_characters": {
            "type": "integer",
            "format": "int64",
            "description": "The maximum number of characters allowed in the context window of this model. We assume that 1 token is 4 characters",
            "nullable": true,
            "example": 4096
        }
    }
}

Responses

200 OK401 Unauthorized403 Forbidden404 Not Found500 Internal Server Error

application/json

"Model updated"

⚠️ This example has been generated automatically from the schema and it is not accurate. Refer to the schema for more information.

Schema of the response body

{
    "type": "string",
    "enum": [
        "Model updated"
    ]
}

Refer to the common response description: Unauthorized.

Refer to the common response description: Unauthenticated.

Refer to the common response description: NotFound.

Refer to the common response description: Error.

POST /modelsconnecter

Request body

application/json

{
    "evaluation_id": "b66b4389-4843-436c-919a-cc2bbde4c8ae",
    "evaluatee_id": "ca7047ce-a47e-4784-875b-ffb281131aea",
    "cadence": "string",
    "price": 10.12,
    "connections": [
        {
            "evaluation_id": "e29c81ce-92cb-4191-a98c-51d55b0527df",
            "evaluatee_id": "ec79c154-4a57-4e5b-b56d-30f72ab01efb",
            "cadence": "once",
            "price": 123,
            "name": "my wonderful model evaluation"
        }
    ]
}

⚠️ This example has been generated automatically from the schema and it is not accurate. Refer to the schema for more information.

Schema of the request body

{
    "type": "object",
    "properties": {
        "evaluation_id": {
            "type": "string",
            "format": "uuid"
        },
        "evaluatee_id": {
            "type": "string",
            "format": "uuid"
        },
        "cadence": {
            "type": "string",
            "nullable": true
        },
        "price": {
            "type": "number",
            "format": "int64"
        },
        "connections": {
            "type": "array",
            "items": {
                "type": "object",
                "properties": {
                    "evaluation_id": {
                        "type": "string",
                        "format": "uuid",
                        "description": "The id of the evaluation to be run"
                    },
                    "evaluatee_id": {
                        "type": "string",
                        "format": "uuid",
                        "description": "The id of the model to be evaluated"
                    },
                    "cadence": {
                        "type": "string",
                        "enum": [
                            "daily",
                            "quarterly",
                            "once",
                            "every 2 weeks",
                            "weekly",
                            "monthly"
                        ],
                        "example": "once",
                        "description": "How often this evaluation should be run on this model"
                    },
                    "price": {
                        "type": "number",
                        "format": "int64",
                        "min": 100,
                        "example": 123,
                        "description": "The price to run a single evaluation on this model. This is the price you expect to pay in cents - if the actual cost will be larger - e.g. if the evaluation has more tasks added, or the model has its pricing updated - then an error will be raised, so you don't get hit with hidden costs"
                    },
                    "name": {
                        "type": "string",
                        "description": "A string identifier for this connection - used for displaying line items in Stripe",
                        "example": "my wonderful model evaluation"
                    }
                }
            }
        }
    }
}

Responses

201 Created401 Unauthorized403 Forbidden500 Internal Server Error

application/json

{
    "id": "9578841c-8225-4226-9825-191e2388178c",
    "evaluation_id": "3b299532-6d74-47c8-bb0c-f048040d364a",
    "evaluatee_id": "9ea81133-9c6f-4c65-b4bc-1ace62a3e561",
    "cadence": "string",
    "price": 10.12,
    "model": {
        "id": "276d9d54-f514-4509-8690-57ae98627c69",
        "name": "my model",
        "description": "# This is a model, see more at [this link](http://some.link)",
        "owner_id": "636ac7d7-7dd4-4ee2-9a69-b8e2bf5f3332",
        "publisher": "Models R Us",
        "architecture": "RNN",
        "picture": "http://some.example/pic",
        "num_parameters": 30000000,
        "modalities": "text",
        "public": true,
        "public_usable": false,
        "check_availability": true,
        "quality": 0.89,
        "endpoint_type": "open_ai",
        "cost_per_input_character_usd": 2e-05,
        "cost_per_output_character_usd": 0.0005,
        "cost_per_instance_hour_usd": 4.99,
        "max_characters_per_minute": 400,
        "max_request_per_minute": 30,
        "max_context_window_characters": 4096,
        "elo_score": 10.12,
        "score": 10.12,
        "availability": 10.12,
        "top_example_id": "3432730e-756a-4d5e-81dd-c9166b97330c",
        "worst_example_id": "000690ea-8a19-4627-a4c0-c1734778e452",
        "owner": "23a1154b-0f04-412f-afb0-04a1b7768f8b",
        "top_example": "c4867b79-f39b-4276-805e-1dbd4c406e82",
        "worst_example": "a3ea7580-07a4-4e66-9d08-250cb4636d14",
        "best_evaluation_session": "671cbe01-5308-4a75-8c60-c23c115551f5",
        "worst_evaluation_session": "8f6dc477-af40-465e-9e9d-6fdb620d810a"
    },
    "evaluation": {
        "id": "667911fc-7bd8-4c6d-94eb-29e8e179fd6b",
        "name": "My lovely evaluation",
        "public": true,
        "public_usable": false,
        "reports_visible": false,
        "quality": 0.89,
        "num_tasks": 2000,
        "description": "# This is an evaluation, see more at [this link](http://some.link)",
        "last_updated": "2022-04-13T15:42:05.901Z",
        "task_types": "MCQ",
        "modalities": "text",
        "min_questions_to_complete": 321,
        "owner": "ce16ae57-2a1a-488b-9b57-15ee5e68abd5",
        "tags": [
            "a893caa0-f12d-44ee-a3b9-c6bfe2bef0e5"
        ]
    }
}

⚠️ This example has been generated automatically from the schema and it is not accurate. Refer to the schema for more information.

Schema of the response body

{
    "type": "object",
    "properties": {
        "id": {
            "type": "string",
            "format": "uuid"
        },
        "evaluation_id": {
            "type": "string",
            "format": "uuid"
        },
        "evaluatee_id": {
            "type": "string",
            "format": "uuid"
        },
        "cadence": {
            "type": "string",
            "nullable": true
        },
        "price": {
            "type": "number",
            "format": "int64"
        },
        "model": {
            "$ref": "#/components/schemas/ShallowModel"
        },
        "evaluation": {
            "$ref": "#/components/schemas/ShallowEvaluation"
        }
    }
}

Refer to the common response description: Unauthorized.

Refer to the common response description: Unauthenticated.

Refer to the common response description: Error.

GET /modelsconnecter

Input parameters

Parameter	In	Type	Default	Nullable	Description
`id`	query	string		Yes	Will return the item with this id, or die trying. When this parameter is provided, then only a single item will be returned

Responses

200 OK401 Unauthorized403 Forbidden404 Not Found500 Internal Server Error

application/json

Schema of the response body

{
    "oneOf": [
        {
            "$ref": "#/components/schemas/EvaluationEvaluatee"
        },
        {
            "type": "object",
            "properties": {
                "items": {
                    "description": "An array of all the items that were found, but capped at most at `per_page`",
                    "type": "array",
                    "items": {
                        "$ref": "#/components/schemas/EvaluationEvaluatee"
                    }
                },
                "count": {
                    "description": "The total number of items found",
                    "type": "number",
                    "format": "int32"
                },
                "per_page": {
                    "description": "The number of items returned per page",
                    "type": "number",
                    "format": "int32"
                },
                "page": {
                    "description": "The number of available pages",
                    "type": "number",
                    "format": "int32"
                }
            }
        }
    ]
}

Refer to the common response description: Unauthorized.

Refer to the common response description: Unauthenticated.

Refer to the common response description: NotFound.

Refer to the common response description: Error.

POST /queryexternalmodelhandler

Run a task on a model.

Description

This endpoint can be called either as part of an evaluation session, or on its own.

If evaluation_session_id is provided, it will run the task as part of that evaluation session. Each evaluation session has a set number of tasks to evaluate, so if you call this endpoint for a finished evaluation session, you will get an error.

If no evaluation_session_id is provided, the model will be called with the provided task. This is a paid operation, and will subtract the appropriate amount of credits from your account, or raise a 402 if you don't have enough.

You can override the default request and response code for models of which you are an admin, and the prompt and grader code of evaluations of which you are an admin.

Request body

application/json

{
    "response_time_in_seconds": 10.12,
    "task_id": "5a783b42-8b3b-46dc-bc16-697df56bbe2f",
    "evaluation_session_id": "b95bdcd4-cf93-4f6f-8cdf-a1fd7db57e30",
    "model_id": "c98b0ca2-ff29-4dbe-bdeb-9fd4d0b2fdf0",
    "system_prompt": "(str \"Please answer this: \" task)",
    "prompt": {
        "MCQ": "(str \"I have a multiple choice question for you to answer: \" task)",
        "default": "(str \"Answer this, please: \" task)"
    },
    "request": {
        "MCQ": "(bedrock-call \"your-access-key\" \"your-secret-key\" \"Jurassic\" task-text)",
        "default": "false"
    },
    "response": {
        "MCQ": "(= parsedResponse correct)",
        "default": "false"
    },
    "grader": {
        "MCQ": "(= parsedResponse correct)",
        "default": "false"
    }
}

⚠️ This example has been generated automatically from the schema and it is not accurate. Refer to the schema for more information.

Schema of the request body

{
    "type": "object",
    "properties": {
        "response_time_in_seconds": {
            "type": "number",
            "format": "double",
            "nullable": true
        },
        "task_id": {
            "description": "The id of the task to be run on the model",
            "type": "string",
            "format": "uuid"
        },
        "evaluation_session_id": {
            "description": "The id of the evaluation session that is being checked",
            "type": "string",
            "format": "uuid"
        },
        "model_id": {
            "description": "The id of the model that is being evaluation",
            "type": "string",
            "format": "uuid"
        },
        "system_prompt": {
            "description": "DSL code specifying how to construct model system prompts. This can be empty.",
            "type": "string",
            "example": "(str \"Please answer this: \" task)"
        },
        "prompt": {
            "description": "DSL code specifying how to construct model prompts. This can be empty, in which case the prompt code of the evaluation will be used. You can specify a `prompt` that will be used for all types of tasks, or per task type `prompt`s. If you provide both a default `prompt` and one for a specific task type, the specific one takes precedence. This can only be used if you're an admin of the selected evaluation - otherwise an error will be returned.",
            "oneOf": [
                {
                    "type": "string",
                    "description": "DSL code that should be used for all prompts",
                    "example": "(str \"Please answer this: \" task)"
                },
                {
                    "type": "object",
                    "description": "Per task type DSL code. Use the \"default\" key to specify the code that should be used for tasks types that aren't specified - otherwise the system default prompt will be used.",
                    "properties": {
                        "default": {
                            "type": "string",
                            "description": "The default `prompt` to be used for task types that aren't specified.",
                            "example": "(str \"Answer this, please: \" task)"
                        },
                        "FRQ": {
                            "type": "string",
                            "description": "The DSL code to be used to create prompts for FRQ tasks. If this is empty, the default `prompt` will be used"
                        },
                        "bool": {
                            "type": "string",
                            "description": "The DSL code to be used to create prompts for bool tasks. If this is empty, the default `prompt` will be used"
                        },
                        "json": {
                            "type": "string",
                            "description": "The DSL code to be used to create prompts for json tasks. If this is empty, the default `prompt` will be used"
                        },
                        "MCQ": {
                            "type": "string",
                            "description": "The DSL code to be used to create prompts for MCQ tasks. If this is empty, the default `prompt` will be used"
                        }
                    },
                    "example": {
                        "MCQ": "(str \"I have a multiple choice question for you to answer: \" task)",
                        "default": "(str \"Answer this, please: \" task)"
                    }
                }
            ],
            "example": {
                "MCQ": "(str \"I have a multiple choice question for you to answer: \" task)",
                "default": "(str \"Answer this, please: \" task)"
            }
        },
        "request": {
            "description": "DSL code specifying how to send tasks to the model. This can be empty, in which case the request code of the model will be used. You can specify a `request` that will be used for all types of tasks, or per task type `request`s. If you provide both a default `request` and one for a specific task type, the specific one takes precedence. This can only be used if you're an admin of the selected model - otherwise an error will be returned.",
            "oneOf": [
                {
                    "type": "string",
                    "description": "DSL code that should be used for all requests",
                    "example": "(POST \"http://my.model.endpoint\" {:json {\"task\" task}})"
                },
                {
                    "type": "object",
                    "description": "Per task type DSL code. Use the \"default\" key to specify the code that should be used for tasks types that aren't specified - otherwise the system default request code will be used.",
                    "properties": {
                        "default": {
                            "type": "string",
                            "description": "The default `request` to be used for task types that aren't specified.",
                            "example": "(openai-call \"your_key\" \"gtp-4\" task)"
                        },
                        "FRQ": {
                            "type": "string",
                            "description": "The DSL code to be used to send requests for FRQ tasks. If this is empty, the default `request` will be used"
                        },
                        "bool": {
                            "type": "string",
                            "description": "The DSL code to be used to send requests for bool tasks. If this is empty, the default `request` will be used"
                        },
                        "json": {
                            "type": "string",
                            "description": "The DSL code to be used to send requests for json tasks. If this is empty, the default `request` will be used"
                        },
                        "MCQ": {
                            "type": "string",
                            "description": "The DSL code to be used to send requests for MCQ tasks. If this is empty, the default `request` will be used"
                        }
                    },
                    "example": {
                        "MCQ": "(openai-call \"sk-your-secret-key\" \"gtp-4-turbo\" task-text)",
                        "default": "(anthropic-call \"sk-your-secret-key\" \"claude\" task)"
                    }
                }
            ],
            "example": {
                "MCQ": "(bedrock-call \"your-access-key\" \"your-secret-key\" \"Jurassic\" task-text)",
                "default": "false"
            }
        },
        "response": {
            "description": "DSL code specifying how to parse LLM responses. This can be empty, in which case the response code of the model will be used. You can specify a `response` parser that will be used for all types of tasks, or per task type parsers. If you provide both a default parser and one for a specific task type, the specific one takes precedence. This can only be used if you're an admin of the selected model - otherwise an error will be returned.",
            "oneOf": [
                {
                    "type": "string",
                    "description": "DSL code that should be used for all responses",
                    "example": "(get-in response [\"json\" \"resp\"])"
                },
                {
                    "type": "object",
                    "description": "Per task type DSL code. Use the \"default\" key to specify the code that should be used for tasks types that aren't specified - otherwise the model's default parser will be used.",
                    "properties": {
                        "default": {
                            "type": "string",
                            "description": "The default parser to be used for task types that aren't specified.",
                            "example": "response"
                        },
                        "FRQ": {
                            "type": "string",
                            "description": "The DSL code to be used to parse FRQ task responses. If this is empty, the default parser will be used"
                        },
                        "bool": {
                            "type": "string",
                            "description": "The DSL code to be used to parse bool task responses. If this is empty, the default parser will be used"
                        },
                        "json": {
                            "type": "string",
                            "description": "The DSL code to be used to parse json task responses. If this is empty, the default parser will be used"
                        },
                        "MCQ": {
                            "type": "string",
                            "description": "The DSL code to be used to parse MCQ task responses. If this is empty, the default parser will be used"
                        }
                    },
                    "example": {
                        "MCQ": "(= parsedResponse correct)",
                        "default": "false"
                    }
                }
            ],
            "example": {
                "MCQ": "(= parsedResponse correct)",
                "default": "false"
            }
        },
        "grader": {
            "description": "DSL code specifying how to grade LLM responses. This can be empty, in which case the grader of the evaluation will be used. You can specify a grader that will be used for all types of tasks, or per task type graders. If you provide both a default grader and one for a specific task type, the specific one takes precedence. This can only be used if you're an admin of the selected evaluation - otherwise an error will be returned.",
            "oneOf": [
                {
                    "type": "string",
                    "description": "DSL code that should be used for all response",
                    "example": "(= parsedResponse \"ok\")"
                },
                {
                    "type": "object",
                    "description": "Per task type DSL code. Use the \"default\" key to specify the code that should be used for tasks types that aren't specified - otherwise the grader of the evaluation will be used.",
                    "properties": {
                        "default": {
                            "type": "string",
                            "description": "The default grader to be used for task types that aren't specified.",
                            "example": "(if (= parsedResponse correct) 1 0)"
                        },
                        "FRQ": {
                            "type": "string",
                            "description": "The DSL code to be used to grade FRQ tasks. If this is empty, the default grader will be used"
                        },
                        "bool": {
                            "type": "string",
                            "description": "The DSL code to be used to grade bool tasks. If this is empty, the default grader will be used"
                        },
                        "json": {
                            "type": "string",
                            "description": "The DSL code to be used to grade json tasks. If this is empty, the default grader will be used"
                        },
                        "MCQ": {
                            "type": "string",
                            "description": "The DSL code to be used to grade MCQ tasks. If this is empty, the default grader will be used"
                        }
                    },
                    "example": {
                        "MCQ": "(= parsedResponse correct)",
                        "default": "false"
                    }
                }
            ],
            "example": {
                "MCQ": "(= parsedResponse correct)",
                "default": "false"
            }
        }
    }
}

Responses

201 Created401 Unauthorized403 Forbidden500 Internal Server Error

application/json

{
    "id": "dac5244b-b2c4-4384-b090-4c177921c2d3",
    "raw_task_text": "string",
    "raw_response_text": "string",
    "parsed_response_text": "string",
    "response_time_in_seconds": 10.12,
    "correctness": 10.12,
    "task_id": "0cc718da-9881-4a6a-9ce0-15e71163c608",
    "evaluatee_id": "3deebcde-982e-4364-8d8d-1fec093c2b48",
    "chosen_answer_id": "8a9ef395-5ff6-469f-9202-b19040c3e63c",
    "evaluation_session_id": "6cf3e2d3-1f84-4c93-a424-a8376b6cafc9",
    "creation_date": "2022-04-13T15:42:05.901Z"
}

⚠️ This example has been generated automatically from the schema and it is not accurate. Refer to the schema for more information.

Schema of the response body

{
    "type": "object",
    "properties": {
        "id": {
            "type": "string",
            "format": "uuid"
        },
        "raw_task_text": {
            "type": "string",
            "nullable": true
        },
        "raw_response_text": {
            "type": "string",
            "nullable": true
        },
        "parsed_response_text": {
            "type": "string",
            "nullable": true
        },
        "response_time_in_seconds": {
            "type": "number",
            "format": "double",
            "nullable": true
        },
        "correctness": {
            "type": "number",
            "format": "double",
            "nullable": true
        },
        "task_id": {
            "type": "string",
            "format": "uuid"
        },
        "evaluatee_id": {
            "type": "string",
            "format": "uuid"
        },
        "chosen_answer_id": {
            "type": "string",
            "format": "uuid",
            "nullable": true
        },
        "evaluation_session_id": {
            "type": "string",
            "format": "uuid"
        },
        "creation_date": {
            "type": "string",
            "format": "date-time"
        }
    }
}

Refer to the common response description: Unauthorized.

Refer to the common response description: Unauthenticated.

Refer to the common response description: Error.

POST /response

Request body

application/json

{
    "response_time_in_seconds": 10.12,
    "task_id": "68e12e78-da68-450a-a84a-828c4455128a",
    "evaluation_session_id": "a7ba235b-1f57-4f7c-9e5e-83a684ab1904",
    "task_type": "MCQ",
    "question": "What time is it?",
    "answer_text": "Half past nine",
    "answer_id": "23aa466b-55cd-466f-b2f4-3ea1611c9a2b"
}

⚠️ This example has been generated automatically from the schema and it is not accurate. Refer to the schema for more information.

Schema of the request body

{
    "type": "object",
    "properties": {
        "response_time_in_seconds": {
            "type": "number",
            "format": "double",
            "nullable": true
        },
        "task_id": {
            "type": "string",
            "format": "uuid"
        },
        "evaluation_session_id": {
            "type": "string",
            "format": "uuid"
        },
        "task_type": {
            "description": "The type of tasks for which this is a response",
            "example": "MCQ",
            "type": "string",
            "enum": [
                "FRQ",
                "bool",
                "json",
                "MCQ"
            ]
        },
        "question": {
            "type": "string",
            "description": "The text of the question for which this is a response",
            "example": "What time is it?"
        },
        "answer_text": {
            "type": "string",
            "description": "The text returned from the model",
            "example": "Half past nine"
        },
        "answer_id": {
            "type": "string",
            "format": "uuid",
            "nullable": true,
            "description": "The id of the selected answer, in the case of multiple choice questions"
        }
    }
}

Responses

201 Created401 Unauthorized403 Forbidden500 Internal Server Error

application/json

{
    "id": "43663d9c-bf7d-4469-897d-891d415ab2b9",
    "raw_task_text": "string",
    "raw_response_text": "string",
    "parsed_response_text": "string",
    "response_time_in_seconds": 10.12,
    "correctness": 10.12,
    "task_id": "7e43ce8d-44f3-4a0c-a589-f222a6d4032b",
    "evaluatee_id": "b0e2fc7d-7312-451e-8382-b7cec800f013",
    "chosen_answer_id": "a01104af-ba21-409d-93e9-d9fcda9c6454",
    "evaluation_session_id": "b22c2e05-27e2-4df1-a80b-c385009ff118",
    "creation_date": "2022-04-13T15:42:05.901Z"
}

⚠️ This example has been generated automatically from the schema and it is not accurate. Refer to the schema for more information.

Schema of the response body

{
    "type": "object",
    "properties": {
        "id": {
            "type": "string",
            "format": "uuid"
        },
        "raw_task_text": {
            "type": "string",
            "nullable": true
        },
        "raw_response_text": {
            "type": "string",
            "nullable": true
        },
        "parsed_response_text": {
            "type": "string",
            "nullable": true
        },
        "response_time_in_seconds": {
            "type": "number",
            "format": "double",
            "nullable": true
        },
        "correctness": {
            "type": "number",
            "format": "double",
            "nullable": true
        },
        "task_id": {
            "type": "string",
            "format": "uuid"
        },
        "evaluatee_id": {
            "type": "string",
            "format": "uuid"
        },
        "chosen_answer_id": {
            "type": "string",
            "format": "uuid",
            "nullable": true
        },
        "evaluation_session_id": {
            "type": "string",
            "format": "uuid"
        },
        "creation_date": {
            "type": "string",
            "format": "date-time"
        }
    }
}

Refer to the common response description: Unauthorized.

Refer to the common response description: Unauthenticated.

Refer to the common response description: Error.

GET /response

Input parameters

Parameter	In	Type	Default	Nullable	Description
`id`	query	string		Yes	Will return the item with this id, or die trying. When this parameter is provided, then only a single item will be returned

Responses

200 OK404 Not Found500 Internal Server Error

application/json

Schema of the response body

{
    "oneOf": [
        {
            "$ref": "#/components/schemas/Response"
        },
        {
            "type": "object",
            "properties": {
                "items": {
                    "description": "An array of all the items that were found, but capped at most at `per_page`",
                    "type": "array",
                    "items": {
                        "$ref": "#/components/schemas/Response"
                    }
                },
                "count": {
                    "description": "The total number of items found",
                    "type": "number",
                    "format": "int32"
                },
                "per_page": {
                    "description": "The number of items returned per page",
                    "type": "number",
                    "format": "int32"
                },
                "page": {
                    "description": "The number of available pages",
                    "type": "number",
                    "format": "int32"
                }
            }
        }
    ]
}

Refer to the common response description: NotFound.

Refer to the common response description: Error.

PUT /response

Request body

application/json

{
    "response_time_in_seconds": 10.12,
    "task_id": "62a47814-8a55-4f97-833e-eae15d380fad",
    "evaluation_session_id": "f95abe3c-bb53-4a74-80df-7dd8d7c4215b"
}

⚠️ This example has been generated automatically from the schema and it is not accurate. Refer to the schema for more information.

Schema of the request body

{
    "type": "object",
    "properties": {
        "response_time_in_seconds": {
            "type": "number",
            "format": "double",
            "nullable": true
        },
        "task_id": {
            "type": "string",
            "format": "uuid"
        },
        "evaluation_session_id": {
            "type": "string",
            "format": "uuid"
        }
    }
}

Responses

200 OK401 Unauthorized403 Forbidden404 Not Found500 Internal Server Error

application/json

"Response updated"

⚠️ This example has been generated automatically from the schema and it is not accurate. Refer to the schema for more information.

Schema of the response body

{
    "type": "string",
    "enum": [
        "Response updated"
    ]
}

Refer to the common response description: Unauthorized.

Refer to the common response description: Unauthenticated.

Refer to the common response description: NotFound.

Refer to the common response description: Error.

GET /scores

Input parameters

Parameter	In	Type	Default	Nullable	Description
`id`	query	string		Yes	Will return the item with this id, or die trying. When this parameter is provided, then only a single item will be returned

Responses

200 OK404 Not Found500 Internal Server Error

application/json

Schema of the response body

{
    "oneOf": [
        {
            "$ref": "#/components/schemas/CurrentScores"
        },
        {
            "type": "object",
            "properties": {
                "items": {
                    "description": "An array of all the items that were found, but capped at most at `per_page`",
                    "type": "array",
                    "items": {
                        "$ref": "#/components/schemas/CurrentScores"
                    }
                },
                "count": {
                    "description": "The total number of items found",
                    "type": "number",
                    "format": "int32"
                },
                "per_page": {
                    "description": "The number of items returned per page",
                    "type": "number",
                    "format": "int32"
                },
                "page": {
                    "description": "The number of available pages",
                    "type": "number",
                    "format": "int32"
                }
            }
        }
    ]
}

Refer to the common response description: NotFound.

Refer to the common response description: Error.

POST /schema

Request body

application/json

{
    "key": "my-schema",
    "name": "My schema",
    "description": "This is a description. Nice, innit?",
    "type": "json",
    "schema": "{\"$schema\": \"http://json-schema.org/draft-07/schema#\", \"title\": \"JSON parser\", \"type\": \"object\", \"properties\": {\"name\": {\"type\": \"string\"}}}",
    "evaluation_id": "6c4eb8b6-501e-4bc4-b7a4-75451e359d55"
}

⚠️ This example has been generated automatically from the schema and it is not accurate. Refer to the schema for more information.

Schema of the request body

{
    "type": "object",
    "properties": {
        "key": {
            "description": "The key of this schema, as used in csv file upload references. Reference keys can contain English letters (upper and lowercase), digits and \"-\", \"_\", and \".\"",
            "type": "string",
            "example": "my-schema"
        },
        "name": {
            "description": "The name of this schema, used only for display purposes.",
            "type": "string",
            "example": "My schema"
        },
        "description": {
            "description": "The name of this schema, used only for display purposes.",
            "type": "string",
            "example": "This is a description. Nice, innit?"
        },
        "type": {
            "description": "The type of the new schema",
            "example": "json",
            "type": "string",
            "enum": [
                "json"
            ]
        },
        "schema": {
            "description": "A schema to validate answers against.",
            "example": "{\"$schema\": \"http://json-schema.org/draft-07/schema#\", \"title\": \"JSON parser\", \"type\": \"object\", \"properties\": {\"name\": {\"type\": \"string\"}}}",
            "type": "object"
        },
        "evaluation_id": {
            "description": "The id of the evaluation that this schema is for",
            "type": "string",
            "format": "uuid"
        }
    }
}

Responses

201 Created401 Unauthorized403 Forbidden500 Internal Server Error

application/json

{
    "key": "My-lovely-schema",
    "name": "My lovely schema",
    "description": "This will be used to check stuff",
    "evaluation_id": "6b11e6f8-5df8-4310-9417-c69740aba967",
    "id": "27d6d87e-f9e8-47bb-a14b-526b786e814b"
}

⚠️ This example has been generated automatically from the schema and it is not accurate. Refer to the schema for more information.

Schema of the response body

{
    "type": "object",
    "properties": {
        "key": {
            "type": "string",
            "description": "The identifier used in csv files for this schema",
            "nullable": true,
            "example": "My-lovely-schema"
        },
        "name": {
            "type": "string",
            "description": "An optional name describing this schema",
            "nullable": true,
            "example": "My lovely schema"
        },
        "description": {
            "type": "string",
            "description": "An optional description of this schema",
            "nullable": true,
            "example": "This will be used to check stuff"
        },
        "evaluation_id": {
            "type": "string",
            "format": "uuid"
        },
        "id": {
            "type": "string",
            "format": "uuid"
        }
    }
}

Refer to the common response description: Unauthorized.

Refer to the common response description: Unauthenticated.

Refer to the common response description: Error.

GET /schema

Input parameters

Parameter	In	Type	Default	Nullable	Description
`id`	query	string		Yes	Will return the item with this id, or die trying. When this parameter is provided, then only a single item will be returned

Responses

200 OK404 Not Found500 Internal Server Error

application/json

Schema of the response body

{
    "oneOf": [
        {
            "$ref": "#/components/schemas/SchemaHistory"
        },
        {
            "type": "object",
            "properties": {
                "items": {
                    "description": "An array of all the items that were found, but capped at most at `per_page`",
                    "type": "array",
                    "items": {
                        "$ref": "#/components/schemas/SchemaHistory"
                    }
                },
                "count": {
                    "description": "The total number of items found",
                    "type": "number",
                    "format": "int32"
                },
                "per_page": {
                    "description": "The number of items returned per page",
                    "type": "number",
                    "format": "int32"
                },
                "page": {
                    "description": "The number of available pages",
                    "type": "number",
                    "format": "int32"
                }
            }
        }
    ]
}

Refer to the common response description: NotFound.

Refer to the common response description: Error.

PUT /schema

Request body

application/json

{
    "key": "My-lovely-schema",
    "name": "My lovely schema",
    "description": "This will be used to check stuff"
}

⚠️ This example has been generated automatically from the schema and it is not accurate. Refer to the schema for more information.

Schema of the request body

{
    "type": "object",
    "properties": {
        "key": {
            "type": "string",
            "description": "The identifier used in csv files for this schema",
            "nullable": true,
            "example": "My-lovely-schema"
        },
        "name": {
            "type": "string",
            "description": "An optional name describing this schema",
            "nullable": true,
            "example": "My lovely schema"
        },
        "description": {
            "type": "string",
            "description": "An optional description of this schema",
            "nullable": true,
            "example": "This will be used to check stuff"
        }
    }
}

Responses

200 OK401 Unauthorized403 Forbidden404 Not Found500 Internal Server Error

application/json

"SchemaHistory updated"

⚠️ This example has been generated automatically from the schema and it is not accurate. Refer to the schema for more information.

Schema of the response body

{
    "type": "string",
    "enum": [
        "SchemaHistory updated"
    ]
}

Refer to the common response description: Unauthorized.

Refer to the common response description: Unauthenticated.

Refer to the common response description: NotFound.

Refer to the common response description: Error.

POST /subscription

Request body

application/json

{
    "confirmed": true,
    "type": "alert",
    "item": "bb7ca0bd-0a81-45e1-94e4-7129324bd6af",
    "method": "email",
    "destination": "(GET \"http://example.com\")"
}

⚠️ This example has been generated automatically from the schema and it is not accurate. Refer to the schema for more information.

Schema of the request body

{
    "type": "object",
    "properties": {
        "confirmed": {
            "type": "boolean",
            "nullable": true
        },
        "type": {
            "description": "The type of object to subscribe to",
            "example": "alert",
            "type": "string",
            "enum": [
                "alert",
                "evaluation_session"
            ]
        },
        "item": {
            "description": "The id of the item to subscribe to",
            "type": "string",
            "format": "uuid"
        },
        "method": {
            "description": "The method used to notify",
            "type": "string",
            "example": "email",
            "enum": [
                "email",
                "webhook",
                "sms",
                "call"
            ]
        },
        "destination": {
            "description": "The destination to which messages should be sent. In the case of email methods this must be a valid email. For text messages and calls a valid phone number. In the case of webhooks, this should be a DSL network call.",
            "type": "string",
            "example": "(GET \"http://example.com\")"
        }
    }
}

Responses

201 Created401 Unauthorized403 Forbidden500 Internal Server Error

application/json

{
    "confirmed": true,
    "method": "string",
    "destination": "string"
}

⚠️ This example has been generated automatically from the schema and it is not accurate. Refer to the schema for more information.

Schema of the response body

{
    "type": "object",
    "properties": {
        "confirmed": {
            "type": "boolean",
            "nullable": true
        },
        "method": {
            "type": "string"
        },
        "destination": {
            "type": "string"
        }
    }
}

Refer to the common response description: Unauthorized.

Refer to the common response description: Unauthenticated.

Refer to the common response description: Error.

GET /subscription

Input parameters

Parameter	In	Type	Nullable	Description
`id`	query	string	Yes	Will return the item with this id, or die trying. When this parameter is provided, then only a single item will be returned
`item`	query	string	Yes	The id of the item that was subscribed to
`item_type`	query	string	Yes	The type of subscriptions to look for.

Responses

200 OK404 Not Found500 Internal Server Error

application/json

Schema of the response body

{
    "oneOf": [
        {
            "$ref": "#/components/schemas/Subscriber"
        },
        {
            "type": "object",
            "properties": {
                "items": {
                    "description": "An array of all the items that were found, but capped at most at `per_page`",
                    "type": "array",
                    "items": {
                        "$ref": "#/components/schemas/Subscriber"
                    }
                },
                "count": {
                    "description": "The total number of items found",
                    "type": "number",
                    "format": "int32"
                },
                "per_page": {
                    "description": "The number of items returned per page",
                    "type": "number",
                    "format": "int32"
                },
                "page": {
                    "description": "The number of available pages",
                    "type": "number",
                    "format": "int32"
                }
            }
        }
    ]
}

Refer to the common response description: NotFound.

Refer to the common response description: Error.

PUT /subscription

Request body

application/json

{
    "confirmed": true
}

⚠️ This example has been generated automatically from the schema and it is not accurate. Refer to the schema for more information.

Schema of the request body

{
    "type": "object",
    "properties": {
        "confirmed": {
            "type": "boolean",
            "nullable": true
        }
    }
}

Responses

200 OK401 Unauthorized403 Forbidden404 Not Found500 Internal Server Error

application/json

"Subscriber updated"

⚠️ This example has been generated automatically from the schema and it is not accurate. Refer to the schema for more information.

Schema of the response body

{
    "type": "string",
    "enum": [
        "Subscriber updated"
    ]
}

Refer to the common response description: Unauthorized.

Refer to the common response description: Unauthenticated.

Refer to the common response description: NotFound.

Refer to the common response description: Error.

POST /tag

Request body

application/json

{
    "name": "string"
}

⚠️ This example has been generated automatically from the schema and it is not accurate. Refer to the schema for more information.

Schema of the request body

{
    "type": "object",
    "properties": {
        "name": {
            "type": "string"
        }
    }
}

Responses

201 Created401 Unauthorized403 Forbidden500 Internal Server Error

application/json

{
    "id": "108bff04-4420-43b8-b185-7c1f3d355f65",
    "name": "string"
}

⚠️ This example has been generated automatically from the schema and it is not accurate. Refer to the schema for more information.

Schema of the response body

{
    "type": "object",
    "properties": {
        "id": {
            "type": "string",
            "format": "uuid"
        },
        "name": {
            "type": "string"
        }
    }
}

Refer to the common response description: Unauthorized.

Refer to the common response description: Unauthenticated.

Refer to the common response description: Error.

GET /tag

Input parameters

Parameter	In	Type	Default	Nullable	Description
`id`	query	string		Yes	Will return the item with this id, or die trying. When this parameter is provided, then only a single item will be returned

Responses

200 OK404 Not Found500 Internal Server Error

application/json

Schema of the response body

{
    "oneOf": [
        {
            "$ref": "#/components/schemas/Tag"
        },
        {
            "type": "object",
            "properties": {
                "items": {
                    "description": "An array of all the items that were found, but capped at most at `per_page`",
                    "type": "array",
                    "items": {
                        "$ref": "#/components/schemas/Tag"
                    }
                },
                "count": {
                    "description": "The total number of items found",
                    "type": "number",
                    "format": "int32"
                },
                "per_page": {
                    "description": "The number of items returned per page",
                    "type": "number",
                    "format": "int32"
                },
                "page": {
                    "description": "The number of available pages",
                    "type": "number",
                    "format": "int32"
                }
            }
        }
    ]
}

Refer to the common response description: NotFound.

Refer to the common response description: Error.

PUT /tag

Request body

application/json

{
    "name": "string"
}

⚠️ This example has been generated automatically from the schema and it is not accurate. Refer to the schema for more information.

Schema of the request body

{
    "type": "object",
    "properties": {
        "name": {
            "type": "string"
        }
    }
}

Responses

200 OK401 Unauthorized403 Forbidden404 Not Found500 Internal Server Error

application/json

"Tag updated"

⚠️ This example has been generated automatically from the schema and it is not accurate. Refer to the schema for more information.

Schema of the response body

{
    "type": "string",
    "enum": [
        "Tag updated"
    ]
}

Refer to the common response description: Unauthorized.

Refer to the common response description: Unauthenticated.

Refer to the common response description: NotFound.

Refer to the common response description: Error.

POST /task

Request body

application/json

{
    "task_type": "string",
    "is_task_live": true,
    "modalities": [
        "string"
    ],
    "redacted": true,
    "tags": [
        "372ffb70-8cb1-4381-a390-583ef609b89d"
    ],
    "type": "MCQ",
    "questions": [
        {
            "text": "What time is it?",
            "paraphrases": []
        }
    ],
    "answers": [
        {
            "text": "half past one",
            "paraphrases": [
                "1:30 PM",
                "13:30"
            ],
            "correct": false
        },
        {
            "text": "Time is an illusion",
            "correct": false
        },
        {
            "text": "Now",
            "correct": true
        }
    ],
    "correct": true,
    "schema": "{\"$schema\": \"http://json-schema.org/draft-07/schema#\", \"title\": \"JSON parser\", \"type\": \"object\", \"properties\": {\"name\": {\"type\": \"string\"}}}",
    "evaluation_id": "8a067fa5-3527-48c2-85fa-fb27e0dc6c8b"
}

⚠️ This example has been generated automatically from the schema and it is not accurate. Refer to the schema for more information.

Schema of the request body

{
    "type": "object",
    "properties": {
        "task_type": {
            "type": "string"
        },
        "is_task_live": {
            "type": "boolean",
            "nullable": true
        },
        "modalities": {
            "type": "array",
            "items": {
                "type": "string"
            }
        },
        "redacted": {
            "type": "boolean"
        },
        "tags": {
            "type": "array",
            "items": {
                "type": "string",
                "format": "uuid"
            }
        },
        "type": {
            "description": "The type of the new task",
            "example": "MCQ",
            "type": "string",
            "enum": [
                "FRQ",
                "bool",
                "json",
                "MCQ"
            ]
        },
        "questions": {
            "description": "The task questions - i.e. what the models should answer",
            "example": [
                {
                    "text": "What time is it?",
                    "paraphrases": []
                }
            ],
            "type": "array",
            "items": {
                "type": "object",
                "properties": {
                    "text": {
                        "type": "string",
                        "example": "what time is it?"
                    },
                    "paraphrases": {
                        "type": "array",
                        "items": {
                            "type": "string",
                            "example": "can you tell me the time?"
                        }
                    }
                }
            }
        },
        "answers": {
            "description": "A list of possible answers to be sent to models with the question",
            "type": "array",
            "items": {
                "$ref": "#/components/schemas/MCQAnswer"
            },
            "example": [
                {
                    "text": "half past one",
                    "paraphrases": [
                        "1:30 PM",
                        "13:30"
                    ],
                    "correct": false
                },
                {
                    "text": "Time is an illusion",
                    "correct": false
                },
                {
                    "text": "Now",
                    "correct": true
                }
            ]
        },
        "correct": {
            "description": "Whether this task is correct. This is used in boolean tasks",
            "type": "boolean"
        },
        "schema": {
            "description": "A schema to validate answers against. This is used in JSON tasks",
            "example": "{\"$schema\": \"http://json-schema.org/draft-07/schema#\", \"title\": \"JSON parser\", \"type\": \"object\", \"properties\": {\"name\": {\"type\": \"string\"}}}",
            "type": "string"
        },
        "evaluation_id": {
            "description": "The id of the evaluation that this task is for",
            "type": "string",
            "format": "uuid"
        }
    }
}

Responses

201 Created401 Unauthorized403 Forbidden500 Internal Server Error

application/json

{
    "id": "2388e7ce-b3c1-4e7c-9243-eda914667d0d",
    "task_type": "string",
    "is_task_live": true,
    "modalities": [
        "string"
    ],
    "redacted": true,
    "num_possible_answers": 10.12,
    "evaluation_task_number": 10.12,
    "median_human_completion_seconds": 10.12,
    "median_ai_completion_seconds": 10.12,
    "num_times_human_evaluated": 10.12,
    "num_times_ai_evaluated": 10.12,
    "num_times_humans_answered_correctly": 10.12,
    "num_times_ai_answered_correctly": 10.12,
    "evaluation_id": "16e15daa-b411-4243-ac5b-b03043036c93",
    "owner_id": "ad55f409-28d0-47db-9903-5bf9c4fad6b1",
    "tags": [
        {
            "id": "c1e73682-4408-4e6c-8daa-3c1b799d3fe4",
            "name": "string"
        }
    ]
}

⚠️ This example has been generated automatically from the schema and it is not accurate. Refer to the schema for more information.

Schema of the response body

{
    "type": "object",
    "properties": {
        "id": {
            "type": "string",
            "format": "uuid"
        },
        "task_type": {
            "type": "string"
        },
        "is_task_live": {
            "type": "boolean",
            "nullable": true
        },
        "modalities": {
            "type": "array",
            "items": {
                "type": "string"
            }
        },
        "redacted": {
            "type": "boolean"
        },
        "num_possible_answers": {
            "type": "number",
            "format": "int64"
        },
        "evaluation_task_number": {
            "type": "number",
            "format": "int64"
        },
        "median_human_completion_seconds": {
            "type": "number",
            "format": "double",
            "nullable": true
        },
        "median_ai_completion_seconds": {
            "type": "number",
            "format": "double",
            "nullable": true
        },
        "num_times_human_evaluated": {
            "type": "number",
            "format": "int64"
        },
        "num_times_ai_evaluated": {
            "type": "number",
            "format": "int64"
        },
        "num_times_humans_answered_correctly": {
            "type": "number",
            "format": "int64"
        },
        "num_times_ai_answered_correctly": {
            "type": "number",
            "format": "int64"
        },
        "evaluation_id": {
            "type": "string",
            "format": "uuid"
        },
        "owner_id": {
            "type": "string",
            "format": "uuid"
        },
        "tags": {
            "type": "array",
            "items": {
                "$ref": "#/components/schemas/ShallowTag"
            }
        }
    }
}

Refer to the common response description: Unauthorized.

Refer to the common response description: Unauthenticated.

Refer to the common response description: Error.

GET /task

Input parameters

Parameter	In	Type	Default	Nullable	Description
`id`	query	string		Yes	Will return the item with this id, or die trying. When this parameter is provided, then only a single item will be returned

Responses

200 OK404 Not Found500 Internal Server Error

application/json

Schema of the response body

{
    "oneOf": [
        {
            "$ref": "#/components/schemas/Task"
        },
        {
            "type": "object",
            "properties": {
                "items": {
                    "description": "An array of all the items that were found, but capped at most at `per_page`",
                    "type": "array",
                    "items": {
                        "$ref": "#/components/schemas/Task"
                    }
                },
                "count": {
                    "description": "The total number of items found",
                    "type": "number",
                    "format": "int32"
                },
                "per_page": {
                    "description": "The number of items returned per page",
                    "type": "number",
                    "format": "int32"
                },
                "page": {
                    "description": "The number of available pages",
                    "type": "number",
                    "format": "int32"
                }
            }
        }
    ]
}

Refer to the common response description: NotFound.

Refer to the common response description: Error.

PUT /task

Request body

application/json

{
    "task_type": "string",
    "is_task_live": true,
    "modalities": [
        "string"
    ],
    "redacted": true,
    "tags": [
        "758c7803-ab3d-4ea5-8ce7-dce6a195deba"
    ]
}

⚠️ This example has been generated automatically from the schema and it is not accurate. Refer to the schema for more information.

Schema of the request body

{
    "type": "object",
    "properties": {
        "task_type": {
            "type": "string"
        },
        "is_task_live": {
            "type": "boolean",
            "nullable": true
        },
        "modalities": {
            "type": "array",
            "items": {
                "type": "string"
            }
        },
        "redacted": {
            "type": "boolean"
        },
        "tags": {
            "type": "array",
            "items": {
                "type": "string",
                "format": "uuid"
            }
        }
    }
}

Responses

200 OK401 Unauthorized403 Forbidden404 Not Found500 Internal Server Error

application/json

"Task updated"

⚠️ This example has been generated automatically from the schema and it is not accurate. Refer to the schema for more information.

Schema of the response body

{
    "type": "string",
    "enum": [
        "Task updated"
    ]
}

Refer to the common response description: Unauthorized.

Refer to the common response description: Unauthenticated.

Refer to the common response description: NotFound.

Refer to the common response description: Error.

POST /user

Request body

application/json

{
    "email_address": "mr.blobby@some.domain",
    "user_name": "mr_blobby",
    "full_name": "Mr Blobby, esq.",
    "user_image": "https://equistamp.com/avatars/123123123123.png",
    "bio": "Hello, my name is Inigo Montoya. You Killed my Father. Prepare to die",
    "display_options": {
        "bio": true,
        "email_address": true,
        "user_image": false
    }
}

⚠️ This example has been generated automatically from the schema and it is not accurate. Refer to the schema for more information.

Schema of the request body

{
    "type": "object",
    "properties": {
        "email_address": {
            "type": "string",
            "description": "The email address of this user. User for logging in, so must be unique.",
            "format": "email",
            "example": "mr.blobby@some.domain"
        },
        "user_name": {
            "type": "string",
            "description": "The user name. Used for logging in and as a unique, human readable identifier of this user",
            "example": "mr_blobby"
        },
        "full_name": {
            "type": "string",
            "description": "The presentable name of this user. This can be any string",
            "nullable": true,
            "example": "Mr Blobby, esq."
        },
        "user_image": {
            "type": "string",
            "description": "The user avatar, as bytes when uploading, and its URL when fetching",
            "nullable": true,
            "example": "https://equistamp.com/avatars/123123123123.png"
        },
        "bio": {
            "type": "string",
            "description": "A description of this user. Will be rendered as markdown on the website",
            "nullable": true,
            "example": "Hello, my name is Inigo Montoya. You Killed my Father. Prepare to die"
        },
        "display_options": {
            "description": "A mapping of <displayable field> to true/false, which controls what will be displayed to other users. No option which is not explicitly enabled will be shown to anyone else than you or system admins. To illustrate, the attached example will only allow the user's bio and email address to be returned when other users call this endpoint, and all other fields will not be returned.",
            "type": "object",
            "additonalProperties": "boolean",
            "example": {
                "bio": true,
                "email_address": true,
                "user_image": false
            }
        }
    }
}

Responses

201 Created500 Internal Server Error

application/json

{
    "id": "8947fced-c2bc-4df7-a716-e75528489c62",
    "email_address": "mr.blobby@some.domain",
    "user_name": "mr_blobby",
    "full_name": "Mr Blobby, esq.",
    "user_image": "https://equistamp.com/avatars/123123123123.png",
    "bio": "Hello, my name is Inigo Montoya. You Killed my Father. Prepare to die",
    "display_options": {
        "bio": true,
        "email_address": true,
        "user_image": false
    },
    "join_date": "2022-04-13",
    "subscription_level": "pro",
    "alerts": [
        {
            "id": "8c421c10-9d50-454c-a21c-34fcf870cdcf",
            "name": "They are coming!!",
            "description": "string",
            "public": true,
            "last_trigger_date": "2022-04-13T15:42:05.901Z",
            "trigger_cooldown": "string",
            "owner_id": "06a6d247-1966-42cd-a93c-8f9c568035e1",
            "triggers": [
                "b469c7a7-d655-4587-932d-17f04587339e"
            ],
            "subscriptions": [
                "fbc2f8a6-785f-4a6e-9bba-4f3638392094"
            ]
        }
    ]
}

⚠️ This example has been generated automatically from the schema and it is not accurate. Refer to the schema for more information.

Schema of the response body

{
    "type": "object",
    "properties": {
        "id": {
            "type": "string",
            "format": "uuid"
        },
        "email_address": {
            "type": "string",
            "description": "The email address of this user. User for logging in, so must be unique.",
            "format": "email",
            "example": "mr.blobby@some.domain"
        },
        "user_name": {
            "type": "string",
            "description": "The user name. Used for logging in and as a unique, human readable identifier of this user",
            "example": "mr_blobby"
        },
        "full_name": {
            "type": "string",
            "description": "The presentable name of this user. This can be any string",
            "nullable": true,
            "example": "Mr Blobby, esq."
        },
        "user_image": {
            "type": "string",
            "description": "The user avatar, as bytes when uploading, and its URL when fetching",
            "nullable": true,
            "example": "https://equistamp.com/avatars/123123123123.png"
        },
        "bio": {
            "type": "string",
            "description": "A description of this user. Will be rendered as markdown on the website",
            "nullable": true,
            "example": "Hello, my name is Inigo Montoya. You Killed my Father. Prepare to die"
        },
        "display_options": {
            "description": "A mapping of <displayable field> to true/false, which controls what will be displayed to other users. No option which is not explicitly enabled will be shown to anyone else than you or system admins. To illustrate, the attached example will only allow the user's bio and email address to be returned when other users call this endpoint, and all other fields will not be returned.",
            "type": "object",
            "additonalProperties": "boolean",
            "example": {
                "bio": true,
                "email_address": true,
                "user_image": false
            }
        },
        "join_date": {
            "type": "string",
            "format": "date"
        },
        "subscription_level": {
            "type": "string",
            "description": "The current subscription level of this user",
            "enum": [
                "admin",
                "free",
                "enterprise",
                "pro"
            ],
            "example": "pro"
        },
        "alerts": {
            "type": "array",
            "items": {
                "$ref": "#/components/schemas/ShallowAlert"
            }
        }
    }
}

Refer to the common response description: Error.

GET /user

Input parameters

Parameter	In	Type	Default	Nullable	Description
`id`	query	string		Yes	Will return the item with this id, or die trying. When this parameter is provided, then only a single item will be returned

Responses

200 OK404 Not Found500 Internal Server Error

application/json

Schema of the response body

{
    "oneOf": [
        {
            "$ref": "#/components/schemas/User"
        },
        {
            "type": "object",
            "properties": {
                "items": {
                    "description": "An array of all the items that were found, but capped at most at `per_page`",
                    "type": "array",
                    "items": {
                        "$ref": "#/components/schemas/User"
                    }
                },
                "count": {
                    "description": "The total number of items found",
                    "type": "number",
                    "format": "int32"
                },
                "per_page": {
                    "description": "The number of items returned per page",
                    "type": "number",
                    "format": "int32"
                },
                "page": {
                    "description": "The number of available pages",
                    "type": "number",
                    "format": "int32"
                }
            }
        }
    ]
}

Refer to the common response description: NotFound.

Refer to the common response description: Error.

PUT /user

Request body

application/json

{
    "email_address": "mr.blobby@some.domain",
    "user_name": "mr_blobby",
    "full_name": "Mr Blobby, esq.",
    "user_image": "https://equistamp.com/avatars/123123123123.png",
    "bio": "Hello, my name is Inigo Montoya. You Killed my Father. Prepare to die",
    "display_options": {
        "bio": true,
        "email_address": true,
        "user_image": false
    }
}

⚠️ This example has been generated automatically from the schema and it is not accurate. Refer to the schema for more information.

Schema of the request body

{
    "type": "object",
    "properties": {
        "email_address": {
            "type": "string",
            "description": "The email address of this user. User for logging in, so must be unique.",
            "format": "email",
            "example": "mr.blobby@some.domain"
        },
        "user_name": {
            "type": "string",
            "description": "The user name. Used for logging in and as a unique, human readable identifier of this user",
            "example": "mr_blobby"
        },
        "full_name": {
            "type": "string",
            "description": "The presentable name of this user. This can be any string",
            "nullable": true,
            "example": "Mr Blobby, esq."
        },
        "user_image": {
            "type": "string",
            "description": "The user avatar, as bytes when uploading, and its URL when fetching",
            "nullable": true,
            "example": "https://equistamp.com/avatars/123123123123.png"
        },
        "bio": {
            "type": "string",
            "description": "A description of this user. Will be rendered as markdown on the website",
            "nullable": true,
            "example": "Hello, my name is Inigo Montoya. You Killed my Father. Prepare to die"
        },
        "display_options": {
            "description": "A mapping of <displayable field> to true/false, which controls what will be displayed to other users. No option which is not explicitly enabled will be shown to anyone else than you or system admins. To illustrate, the attached example will only allow the user's bio and email address to be returned when other users call this endpoint, and all other fields will not be returned.",
            "type": "object",
            "additonalProperties": "boolean",
            "example": {
                "bio": true,
                "email_address": true,
                "user_image": false
            }
        }
    }
}

Responses

200 OK401 Unauthorized403 Forbidden404 Not Found500 Internal Server Error

application/json

"User updated"

⚠️ This example has been generated automatically from the schema and it is not accurate. Refer to the schema for more information.

Schema of the response body

{
    "type": "string",
    "enum": [
        "User updated"
    ]
}

Refer to the common response description: Unauthorized.

Refer to the common response description: Unauthenticated.

Refer to the common response description: NotFound.

Refer to the common response description: Error.

Schemas

Alert

Name	Type	Description
`description`	string \| null
`id`	string(uuid)
`last_trigger_date`	string(date-time) \| null
`name`	string	The name of the alert, displayed in the list of alerts
`owner_id`	string(uuid)
`public`	boolean
`subscriptions`	Array<ShallowSubscriberAlert>
`trigger_cooldown`	string \| null	How often the trigger can fire
`triggers`	Array<ShallowTrigger>

ColumnMapping

Name	Type	Description
`columnType`	string
`paraphraseOf`	string \| null

CurrentScores

Evaluation

Name	Type	Description
`description`	string \| null	The description of this evaluation, as displayed on the site. Markdown can be used for formatting
`id`	string(uuid)
`last_updated`	string(date-time)
`min_questions_to_complete`	integer(int64) \| null	The default number of tasks to run before an evaluation session is deemed finished. A given evaluation session may process more tasks, as starting a new evaluation session for an evaluation/model pair which is already running will just add more tasks to the current session, rather than starting a new one.
`modalities`	Array<string>	The available modalities of this evaluation
`name`	string
`num_tasks`	integer(int64)	The total number of tasks defined for this evaluation. Includes redacted tasks.
`owner`	ShallowUser
`public`	boolean	Whether this evaluation should be publicly visible. If true, anyone can view its details or evaluate models with it
`public_usable`	boolean	Whether this evaluation can be ran by anyone. To avoid tasks being leaked, you might want to have the results shown, but have control over what it can be run on.
`quality`	number(double)	The quality of this evaluation, i.e. how much it can be trusted, from 0 to 1.
`reports_visible`	boolean	Whether anyone can pay to see reports for this evaluation.
`tags`	Array<ShallowTag>
`task_types`	Array<string>	The types of tasks supported by this evaluation

EvaluationEvaluatee

Name	Type	Description
`cadence`	string \| null
`evaluatee_id`	string(uuid)
`evaluation`	ShallowEvaluation
`evaluation_id`	string(uuid)
`id`	string(uuid)
`model`	ShallowModel
`price`	number(int64)

EvaluationModelJobs

Name	Type	Description
`creation_date`	string(date-time)
`evaluation_id`	string(uuid)
`id`	string(uuid)
`job_body`
`job_description`	string
`job_name`	string
`job_schedule_arn`	string
`minutes_between_evaluations`	number(int64)
`model_id`	string(uuid)
`owner_id`	string(uuid)
`start_date`	string(date-time) \| null

EvaluationSession

Name	Type	Description
`avg_verbosity`	number(double) \| null
`completed`	boolean
`datetime_completed`	string(date-time) \| null
`datetime_started`	string(date-time)
`distribution_of_characters_per_task`
`distribution_of_seconds_per_task`
`evaluatee_id`	string(uuid)	In the case of human tests, the id of the user taking the test. In the case of testing models, the id of the model to be tested
`evaluation_id`	string(uuid)	The id of the evaluation to be run
`failed`	boolean
`id`	string(uuid)
`is_human_being_evaluated`	boolean	Whether this evaluation session is a human test. When false will start an automatic test for the provided model and evaluation.
`max_characters_per_task`	number(double) \| null
`max_seconds_per_task`	number(double) \| null
`max_verbosity`	number(double) \| null
`mean_characters_per_task`	number(double) \| null
`mean_seconds_per_task`	number(double) \| null
`median_characters_per_task`	number(double) \| null
`median_seconds_per_task`	number(double) \| null
`median_verbosity`	number(double) \| null
`min_characters_per_task`	number(double) \| null
`min_seconds_per_task`	number(double) \| null
`min_verbosity`	number(double) \| null
`num_answered_correctly`	number(int64)
`num_characters_received_from_endpoint`	number(int64)
`num_characters_sent_to_endpoint`	number(int64)
`num_endpoint_calls`	number(int64)
`num_endpoint_failures`	number(int64)
`num_questions_answered`	number(int64)
`num_tasks_to_complete`	number(int64)
`origin`	string	The source of this evaluation session, i.e. what triggered it
`std_characters_per_task`	number(double) \| null
`std_seconds_per_task`	number(double) \| null

MCQAnswer

Name	Type	Description
`correct`	boolean
`paraphrases`	Array<string>	A list of paraphrases of this answer - if provided, will always be used rather than the actual answer text
`text`	string	The text of the answer, as will be displayed to the models. If paraphrases are provided, this will never be shown to anyone other than you

Model

Name	Type	Description
`architecture`	string \| null	The architecture of this model
`availability`	number(double) \| null
`best_evaluation_session`	ShallowEvaluationSession
`check_availability`	boolean \| null	Whether the availability of this model should be checked. When true, we will ping the endpoint every
`cost_per_input_character_usd`	number(double)	The cost of a single input character in USD. We assume that a single token is 4 characters.
`cost_per_instance_hour_usd`	number(double)	The cost of running the model for an hour, in USD. This doesn't include input/output tokens - it's purely the server uptime. This is useful e.g. with HuggingFace inference endpoints, where they charge for server time, not for tokens throughput.
`cost_per_output_character_usd`	number(double)	The cost of a single output character in USD. We assume that a single token is 4 characters.
`description`	string \| null	The description of this model, as displayed on the site. Markdown can be used for formatting
`elo_score`	number(double) \| null	The ELO score, according to LLMSys
`endpoint_type`	string	The type of endpoint being called. We have dedicated handlers for many of the most popular AI model providers
`id`	string(uuid)
`max_characters_per_minute`	integer(int64)	The maximum allowed number of characters per minute. We assume that one token is 4 characters. This must be at least 1.
`max_context_window_characters`	integer(int64) \| null	The maximum number of characters allowed in the context window of this model. We assume that 1 token is 4 characters
`max_request_per_minute`	integer(int64)	The maximum allowed number of requess per minute. This must be at least 1.
`modalities`	Array<string>	The modalities accepted by this model
`name`	string
`num_parameters`	integer(int64) \| null	The number of parameters of the model
`owner`	ShallowUser
`owner_id`	string(uuid)
`picture`	string \| null	An url to an image representing this model
`public`	boolean	Whether this evaluation should be publicly visible. If true, anyone can view its details.
`public_usable`	boolean	Whether this model can be tested by anyone. LLMs can cost a lot to run, and these costs are on whoever added the model. This setting is here to add an extra protection against people running up large compute costs on this model. When not set, this is `false`.
`publisher`	string \| null	The entity that created this model
`quality`	number(double)	The quality of this model, i.e. how much it's worth using, from 0 to 1. This is very subjective, and mainly used to decide whether it should be used by default e.g. on the frontpage.
`score`	number(double) \| null
`top_example`	ShallowTask
`top_example_id`	string(uuid) \| null
`worst_evaluation_session`	ShallowEvaluationSession
`worst_example`	ShallowTask
`worst_example_id`	string(uuid) \| null

Response

Name	Type	Description
`chosen_answer_id`	string(uuid) \| null
`correctness`	number(double) \| null
`creation_date`	string(date-time)
`evaluatee_id`	string(uuid)
`evaluation_session_id`	string(uuid)
`id`	string(uuid)
`parsed_response_text`	string \| null
`raw_response_text`	string \| null
`raw_task_text`	string \| null
`response_time_in_seconds`	number(double) \| null
`task_id`	string(uuid)

SchemaHistory

Name	Type	Description
`description`	string \| null	An optional description of this schema
`evaluation_id`	string(uuid)
`id`	string(uuid)
`key`	string \| null	The identifier used in csv files for this schema
`name`	string \| null	An optional name describing this schema

ShallowAlert

Name	Type	Description
`description`	string \| null
`id`	string(uuid)
`last_trigger_date`	string(date-time) \| null
`name`	string	The name of the alert, displayed in the list of alerts
`owner_id`	string(uuid)
`public`	boolean
`subscriptions`	Array<string(uuid)>
`trigger_cooldown`	string \| null	How often the trigger can fire
`triggers`	Array<string(uuid)>

ShallowCurrentScores

ShallowEvaluation

Name	Type	Description
`description`	string \| null	The description of this evaluation, as displayed on the site. Markdown can be used for formatting
`id`	string(uuid)
`last_updated`	string(date-time)
`min_questions_to_complete`	integer(int64) \| null	The default number of tasks to run before an evaluation session is deemed finished. A given evaluation session may process more tasks, as starting a new evaluation session for an evaluation/model pair which is already running will just add more tasks to the current session, rather than starting a new one.
`modalities`	Array<string>	The available modalities of this evaluation
`name`	string
`num_tasks`	integer(int64)	The total number of tasks defined for this evaluation. Includes redacted tasks.
`owner`	string(uuid)
`public`	boolean	Whether this evaluation should be publicly visible. If true, anyone can view its details or evaluate models with it
`public_usable`	boolean	Whether this evaluation can be ran by anyone. To avoid tasks being leaked, you might want to have the results shown, but have control over what it can be run on.
`quality`	number(double)	The quality of this evaluation, i.e. how much it can be trusted, from 0 to 1.
`reports_visible`	boolean	Whether anyone can pay to see reports for this evaluation.
`tags`	Array<string(uuid)>
`task_types`	Array<string>	The types of tasks supported by this evaluation

ShallowEvaluationEvaluatee

Name	Type	Description
`cadence`	string \| null
`evaluatee_id`	string(uuid)
`evaluation`	string(uuid)
`evaluation_id`	string(uuid)
`id`	string(uuid)
`model`	string(uuid)
`price`	number(int64)

ShallowEvaluationModelJobs

Name	Type	Description
`creation_date`	string(date-time)
`evaluation_id`	string(uuid)
`id`	string(uuid)
`job_body`
`job_description`	string
`job_name`	string
`job_schedule_arn`	string
`minutes_between_evaluations`	number(int64)
`model_id`	string(uuid)
`owner_id`	string(uuid)
`start_date`	string(date-time) \| null

ShallowEvaluationSession

Name	Type	Description
`avg_verbosity`	number(double) \| null
`completed`	boolean
`datetime_completed`	string(date-time) \| null
`datetime_started`	string(date-time)
`distribution_of_characters_per_task`
`distribution_of_seconds_per_task`
`evaluatee_id`	string(uuid)	In the case of human tests, the id of the user taking the test. In the case of testing models, the id of the model to be tested
`evaluation_id`	string(uuid)	The id of the evaluation to be run
`failed`	boolean
`id`	string(uuid)
`is_human_being_evaluated`	boolean	Whether this evaluation session is a human test. When false will start an automatic test for the provided model and evaluation.
`max_characters_per_task`	number(double) \| null
`max_seconds_per_task`	number(double) \| null
`max_verbosity`	number(double) \| null
`mean_characters_per_task`	number(double) \| null
`mean_seconds_per_task`	number(double) \| null
`median_characters_per_task`	number(double) \| null
`median_seconds_per_task`	number(double) \| null
`median_verbosity`	number(double) \| null
`min_characters_per_task`	number(double) \| null
`min_seconds_per_task`	number(double) \| null
`min_verbosity`	number(double) \| null
`num_answered_correctly`	number(int64)
`num_characters_received_from_endpoint`	number(int64)
`num_characters_sent_to_endpoint`	number(int64)
`num_endpoint_calls`	number(int64)
`num_endpoint_failures`	number(int64)
`num_questions_answered`	number(int64)
`num_tasks_to_complete`	number(int64)
`origin`	string	The source of this evaluation session, i.e. what triggered it
`std_characters_per_task`	number(double) \| null
`std_seconds_per_task`	number(double) \| null

ShallowModel

Name	Type	Description
`architecture`	string \| null	The architecture of this model
`availability`	number(double) \| null
`best_evaluation_session`	string(uuid)
`check_availability`	boolean \| null	Whether the availability of this model should be checked. When true, we will ping the endpoint every
`cost_per_input_character_usd`	number(double)	The cost of a single input character in USD. We assume that a single token is 4 characters.
`cost_per_instance_hour_usd`	number(double)	The cost of running the model for an hour, in USD. This doesn't include input/output tokens - it's purely the server uptime. This is useful e.g. with HuggingFace inference endpoints, where they charge for server time, not for tokens throughput.
`cost_per_output_character_usd`	number(double)	The cost of a single output character in USD. We assume that a single token is 4 characters.
`description`	string \| null	The description of this model, as displayed on the site. Markdown can be used for formatting
`elo_score`	number(double) \| null	The ELO score, according to LLMSys
`endpoint_type`	string	The type of endpoint being called. We have dedicated handlers for many of the most popular AI model providers
`id`	string(uuid)
`max_characters_per_minute`	integer(int64)	The maximum allowed number of characters per minute. We assume that one token is 4 characters. This must be at least 1.
`max_context_window_characters`	integer(int64) \| null	The maximum number of characters allowed in the context window of this model. We assume that 1 token is 4 characters
`max_request_per_minute`	integer(int64)	The maximum allowed number of requess per minute. This must be at least 1.
`modalities`	Array<string>	The modalities accepted by this model
`name`	string
`num_parameters`	integer(int64) \| null	The number of parameters of the model
`owner`	string(uuid)
`owner_id`	string(uuid)
`picture`	string \| null	An url to an image representing this model
`public`	boolean	Whether this evaluation should be publicly visible. If true, anyone can view its details.
`public_usable`	boolean	Whether this model can be tested by anyone. LLMs can cost a lot to run, and these costs are on whoever added the model. This setting is here to add an extra protection against people running up large compute costs on this model. When not set, this is `false`.
`publisher`	string \| null	The entity that created this model
`quality`	number(double)	The quality of this model, i.e. how much it's worth using, from 0 to 1. This is very subjective, and mainly used to decide whether it should be used by default e.g. on the frontpage.
`score`	number(double) \| null
`top_example`	string(uuid)
`top_example_id`	string(uuid) \| null
`worst_evaluation_session`	string(uuid)
`worst_example`	string(uuid)
`worst_example_id`	string(uuid) \| null

ShallowResponse

Name	Type	Description
`chosen_answer_id`	string(uuid) \| null
`correctness`	number(double) \| null
`creation_date`	string(date-time)
`evaluatee_id`	string(uuid)
`evaluation_session_id`	string(uuid)
`id`	string(uuid)
`parsed_response_text`	string \| null
`raw_response_text`	string \| null
`raw_task_text`	string \| null
`response_time_in_seconds`	number(double) \| null
`task_id`	string(uuid)

ShallowSchemaHistory

Name	Type	Description
`description`	string \| null	An optional description of this schema
`evaluation_id`	string(uuid)
`id`	string(uuid)
`key`	string \| null	The identifier used in csv files for this schema
`name`	string \| null	An optional name describing this schema

ShallowSubscriber

Name	Type	Description
`confirmed`	boolean \| null
`destination`	string
`method`	string

ShallowSubscriberAlert

Name	Type	Description
`confirmed`	boolean \| null
`destination`	string
`method`	string

ShallowTag

Name	Type	Description
`id`	string(uuid)
`name`	string

ShallowTask

Name	Type	Description
`evaluation_id`	string(uuid)
`evaluation_task_number`	number(int64)
`id`	string(uuid)
`is_task_live`	boolean \| null
`median_ai_completion_seconds`	number(double) \| null
`median_human_completion_seconds`	number(double) \| null
`modalities`	Array<string>
`num_possible_answers`	number(int64)
`num_times_ai_answered_correctly`	number(int64)
`num_times_ai_evaluated`	number(int64)
`num_times_human_evaluated`	number(int64)
`num_times_humans_answered_correctly`	number(int64)
`owner_id`	string(uuid)
`redacted`	boolean
`tags`	Array<string(uuid)>
`task_type`	string

ShallowTrigger

Name	Type	Description
`alert_id`	string(uuid)
`evaluations`
`id`	string(uuid)
`invert`	boolean
`metric`	string \| null
`models`
`threshold`	number(double) \| null
`type`	string

ShallowUser

Name	Type	Description
`alerts`	Array<string(uuid)>
`bio`	string \| null	A description of this user. Will be rendered as markdown on the website
`display_options`	Example: `{'bio': True, 'email_address': True, 'user_image': False}`	A mapping of to true/false, which controls what will be displayed to other users. No option which is not explicitly enabled will be shown to anyone else than you or system admins. To illustrate, the attached example will only allow the user's bio and email address to be returned when other users call this endpoint, and all other fields will not be returned.
`email_address`	string(email)	The email address of this user. User for logging in, so must be unique.
`full_name`	string \| null	The presentable name of this user. This can be any string
`id`	string(uuid)
`join_date`	string(date)
`subscription_level`	string	The current subscription level of this user
`user_image`	string \| null	The user avatar, as bytes when uploading, and its URL when fetching
`user_name`	string	The user name. Used for logging in and as a unique, human readable identifier of this user

Subscriber

Name	Type	Description
`confirmed`	boolean \| null
`destination`	string
`method`	string

SubscriberAlert

Name	Type	Description
`confirmed`	boolean \| null
`destination`	string
`method`	string

Tag

Name	Type	Description
`id`	string(uuid)
`name`	string

Task

Name	Type	Description
`evaluation_id`	string(uuid)
`evaluation_task_number`	number(int64)
`id`	string(uuid)
`is_task_live`	boolean \| null
`median_ai_completion_seconds`	number(double) \| null
`median_human_completion_seconds`	number(double) \| null
`modalities`	Array<string>
`num_possible_answers`	number(int64)
`num_times_ai_answered_correctly`	number(int64)
`num_times_ai_evaluated`	number(int64)
`num_times_human_evaluated`	number(int64)
`num_times_humans_answered_correctly`	number(int64)
`owner_id`	string(uuid)
`redacted`	boolean
`tags`	Array<ShallowTag>
`task_type`	string

Trigger

Name	Type	Description
`alert_id`	string(uuid)
`evaluations`
`id`	string(uuid)
`invert`	boolean
`metric`	string \| null
`models`
`threshold`	number(double) \| null
`type`	string

User

Name	Type	Description
`alerts`	Array<ShallowAlert>
`bio`	string \| null	A description of this user. Will be rendered as markdown on the website
`display_options`	Example: `{'bio': True, 'email_address': True, 'user_image': False}`	A mapping of to true/false, which controls what will be displayed to other users. No option which is not explicitly enabled will be shown to anyone else than you or system admins. To illustrate, the attached example will only allow the user's bio and email address to be returned when other users call this endpoint, and all other fields will not be returned.
`email_address`	string(email)	The email address of this user. User for logging in, so must be unique.
`full_name`	string \| null	The presentable name of this user. This can be any string
`id`	string(uuid)
`join_date`	string(date)
`subscription_level`	string	The current subscription level of this user
`user_image`	string \| null	The user avatar, as bytes when uploading, and its URL when fetching
`user_name`	string	The user name. Used for logging in and as a unique, human readable identifier of this user

Common responses

This section describes common responses that are reused across operations.

Unauthenticated

A valid API token is needed to access this endpoint

application/json

"string"

⚠️ This example has been generated automatically from the schema and it is not accurate. Refer to the schema for more information.

Schema of the response body

{
    "description": "An error message describing what happened",
    "type": "string"
}

PaymentRequired

The user has insufficient credits to process this request

application/json

"string"

⚠️ This example has been generated automatically from the schema and it is not accurate. Refer to the schema for more information.

Schema of the response body

{
    "description": "An error message describing what happened",
    "type": "string"
}

Unauthorized

The provided API token does not have the appropriate permissions to fulfill this request

application/json

"string"

⚠️ This example has been generated automatically from the schema and it is not accurate. Refer to the schema for more information.

Schema of the response body

{
    "description": "An error message describing what happened",
    "type": "string"
}

NotFound

Could not find this item

application/json

"string"

⚠️ This example has been generated automatically from the schema and it is not accurate. Refer to the schema for more information.

Schema of the response body

{
    "description": "An error message describing what happened",
    "type": "string"
}

ValidationError

The request has bad data

application/json

"string"

⚠️ This example has been generated automatically from the schema and it is not accurate. Refer to the schema for more information.

Schema of the response body

{
    "description": "An error message describing what happened",
    "type": "string"
}

Error

A server error

application/json

"string"

⚠️ This example has been generated automatically from the schema and it is not accurate. Refer to the schema for more information.

Schema of the response body

{
    "description": "An error message describing what happened",
    "type": "string"
}

Common parameters

This section describes common parameters that are reused across operations.

apiToken

Name	In	Type	Default	Nullable	Description
`Api-Token`	header	string		No