Equistamp 0.0.1
3rd Party AI Evaluation Service Setting & Protecting the Global Standard of AI Safety
Endpoints
GET /auth
Description
Get the current user.
Use the fields parameter if you only want specific fields. This can also be used to get a long lived API token,
e.g.:
import requests
res = requests.put(
'https://equistamp.net/auth',
json={'email': '<your email address>', 'password': '<your password>'}
)
if res.status_code == 403:
raise ValueError(f'Invalid email or password: {res.json()}')
session_token = res.json()['session_token']
res = requests.get(
'https://equistamp.net/auth',
headers={'Session-Token': session_token},
params={'fields': 'api_token'}
)
if res.status_code != 200:
raise ValueError(res.json())
api_token = res.json()['api_token]
Input parameters
| Parameter | In | Type | Default | Nullable | Description |
|---|---|---|---|---|---|
fields |
path | No | Specific fields to be returned in the response, separated by commas - if this is used, only the specified fields will be returned |
Responses
{
"id": "f801655d-5f3c-492c-b815-86105e52d772",
"email_address": "mr.blobby@some.domain",
"user_name": "mr_blobby",
"full_name": "Mr Blobby, esq.",
"user_image": "https://equistamp.com/avatars/123123123123.png",
"bio": "Hello, my name is Inigo Montoya. You Killed my Father. Prepare to die",
"display_options": {
"bio": true,
"email_address": true,
"user_image": false
},
"join_date": "2022-04-13",
"subscription_level": "pro",
"alerts": [
{
"id": "acfd47de-772f-4fc2-bced-77e9aae9e369",
"name": "They are coming!!",
"description": "string",
"public": true,
"last_trigger_date": "2022-04-13T15:42:05.901Z",
"trigger_cooldown": "string",
"owner_id": "959b0298-bf7c-4912-9037-f86a4107448a",
"triggers": [
"8297647c-b499-4cb3-bf85-94dcbf150d12"
],
"subscriptions": [
"76f90940-48c7-4610-a900-f400cc7167eb"
]
}
]
}
Schema of the response body
{
"type": "object",
"properties": {
"id": {
"type": "string",
"format": "uuid"
},
"email_address": {
"type": "string",
"description": "The email address of this user. User for logging in, so must be unique.",
"format": "email",
"example": "mr.blobby@some.domain"
},
"user_name": {
"type": "string",
"description": "The user name. Used for logging in and as a unique, human readable identifier of this user",
"example": "mr_blobby"
},
"full_name": {
"type": "string",
"description": "The presentable name of this user. This can be any string",
"nullable": true,
"example": "Mr Blobby, esq."
},
"user_image": {
"type": "string",
"description": "The user avatar, as bytes when uploading, and its URL when fetching",
"nullable": true,
"example": "https://equistamp.com/avatars/123123123123.png"
},
"bio": {
"type": "string",
"description": "A description of this user. Will be rendered as markdown on the website",
"nullable": true,
"example": "Hello, my name is Inigo Montoya. You Killed my Father. Prepare to die"
},
"display_options": {
"description": "A mapping of <displayable field> to true/false, which controls what will be displayed to other users. No option which is not explicitly enabled will be shown to anyone else than you or system admins. To illustrate, the attached example will only allow the user's bio and email address to be returned when other users call this endpoint, and all other fields will not be returned.",
"type": "object",
"additonalProperties": "boolean",
"example": {
"bio": true,
"email_address": true,
"user_image": false
}
},
"join_date": {
"type": "string",
"format": "date"
},
"subscription_level": {
"type": "string",
"description": "The current subscription level of this user",
"enum": [
"admin",
"free",
"enterprise",
"pro"
],
"example": "pro"
},
"alerts": {
"type": "array",
"items": {
"$ref": "#/components/schemas/ShallowAlert"
}
}
}
}
Refer to the common response description: Unauthorized.
Refer to the common response description: Unauthenticated.
Refer to the common response description: NotFound.
Refer to the common response description: Error.
PUT /auth
Log in the provided user, or send an email with a login link.
Description
This endpoint handles logging in, both when valid credentials are provided, and when the user needs to reset their password. This happens depending on the provided JSON body:
- If login credentials are provided, then try to log the user in - if this fails, a 401 will be returned
- If
reset_emailis provided, assume that the user has forgotten their password. If this email can be found in the system, then send them an email with a log in link. Either way, this will always return a 200, to avoid leaking email addresses.
Log in credentials are a user identifier and a password. The following are supported:
username- this is the user name of the user (not the display name)email- the email of the userlogin- this will accept either the email or username
The result of logging in is a JSON object with a Session-Token. This should be provided as the Session-Token header on subsequent calls to the API to authenticate the user. The token will expire after a week of inactivity, but otherwise will be refreshed while using the system.
Request body
{
"username": "mr_blobby",
"email": "mr_blobby@bla.com",
"login": "mr_blobby@bla.com",
"password": "hunter2",
"reset_email": "bla@bla.com"
}
Schema of the request body
{
"type": "object",
"properties": {
"username": {
"type": "string",
"example": "mr_blobby"
},
"email": {
"type": "string",
"example": "mr_blobby@bla.com"
},
"login": {
"type": "string",
"example": "mr_blobby@bla.com"
},
"password": {
"type": "string",
"format": "password",
"example": "hunter2"
},
"reset_email": {
"type": "string",
"format": "email",
"example": "bla@bla.com",
"description": "Used when resetting a password. A login link will be sent to this email, but only if can be found in the system. When missing, this will fail silently, i.e. a 200 will be returned"
}
}
}
Responses
Schema of the response body
{
"oneOf": [
{
"type": "object",
"description": "Returned when the user successfully logs in",
"properties": {
"session_token": {
"type": "string",
"format": "uuid",
"description": "The session token of the logged in user. This should be sent as the \"Session-Token\" header on all subsequent calls. "
},
"token_expiration": {
"type": "number",
"format": "int32",
"description": "The POSIX timestamp when this token will expire. Generally in a weeks time."
}
}
},
{
"type": "string",
"description": "This is returned in the case of a password reset."
}
]
}
Refer to the common response description: Unauthorized.
Refer to the common response description: Error.
POST /alert
Create a new alert.
Description
This will create a new alert.
Request body
{
"name": "They are coming!!",
"description": "string",
"public": true,
"trigger_cooldown": "string",
"triggers": [
"eeb0632b-1935-4498-b1c1-bc3e0664e234"
],
"subscriptions": [
"efa1e33e-28ba-4ea5-a8cf-824625443d3e"
]
}
Schema of the request body
{
"type": "object",
"properties": {
"name": {
"type": "string",
"description": "The name of the alert, displayed in the list of alerts",
"example": "They are coming!!"
},
"description": {
"type": "string",
"nullable": true
},
"public": {
"type": "boolean"
},
"trigger_cooldown": {
"type": "string",
"description": "How often the trigger can fire",
"nullable": true
},
"triggers": {
"type": "array",
"items": {
"type": "string",
"format": "uuid"
}
},
"subscriptions": {
"type": "array",
"items": {
"type": "string",
"format": "uuid"
}
}
}
}
Responses
{
"id": "b6b01bfa-3e24-4610-ba4f-6b286a05d0b2",
"name": "They are coming!!",
"description": "string",
"public": true,
"last_trigger_date": "2022-04-13T15:42:05.901Z",
"trigger_cooldown": "string",
"owner_id": "922dd638-11ac-4a3d-8191-7e183aa239da",
"triggers": [
{
"id": "3cdfd9dd-8ee4-4cd7-b745-85bef97634e6",
"type": "string",
"invert": true,
"metric": "string",
"threshold": 10.12,
"models": null,
"evaluations": null,
"alert_id": "1b9aeafd-2cfc-477e-8935-7b2e379d261d"
}
],
"subscriptions": [
{
"confirmed": true,
"method": "string",
"destination": "string"
}
]
}
Schema of the response body
{
"type": "object",
"properties": {
"id": {
"type": "string",
"format": "uuid"
},
"name": {
"type": "string",
"description": "The name of the alert, displayed in the list of alerts",
"example": "They are coming!!"
},
"description": {
"type": "string",
"nullable": true
},
"public": {
"type": "boolean"
},
"last_trigger_date": {
"type": "string",
"format": "date-time",
"nullable": true
},
"trigger_cooldown": {
"type": "string",
"description": "How often the trigger can fire",
"nullable": true
},
"owner_id": {
"type": "string",
"format": "uuid"
},
"triggers": {
"type": "array",
"items": {
"$ref": "#/components/schemas/ShallowTrigger"
}
},
"subscriptions": {
"type": "array",
"items": {
"$ref": "#/components/schemas/ShallowSubscriberAlert"
}
}
}
}
Refer to the common response description: Unauthorized.
Refer to the common response description: Unauthenticated.
Refer to the common response description: Error.
GET /alert
Input parameters
| Parameter | In | Type | Default | Nullable | Description |
|---|---|---|---|---|---|
endCreationDate |
query | string | Yes | Filter out all alerts that were created after this date | |
endPredictedTriggerDate |
query | string | Yes | Filter out all alerts that are expected to trigger after this date | |
evaluations |
query | array | Yes | A list of evaluation ids. Only alerts pertaining to these evaluations will be returned | |
id |
query | string | Yes | Will return the item with this id, or die trying. When this parameter is provided, then only a single item will be returned | |
maxThreshold |
query | number | Yes | Filter out all alerts that have a higher threshold than provided | |
minThreshold |
query | number | Yes | Filter out all alerts that have a lower threshold than provided | |
models |
query | array | Yes | A list of model ids. Only alerts pertaining to these models will be returned | |
order_by |
query | string | Yes | Sort the returned results ascendingly | |
owner_id |
query | string | Yes | Return all alerts belonging to the given owner. If `me` is provided, then all alerts of the caller will be returned | |
startCreationDate |
query | string | Yes | Filter out all alerts that were created before this date | |
startPredictedTriggerDate |
query | string | Yes | Filter out all alerts that are expected to trigger before this date | |
subscriber_id |
query | string | Yes | Return all alerts subscribed to by the given owner. If `me` is provided, then subscribed alerts of the caller will be returned. This endpoint requires the caller to be allowed to filter by subscriber_id - it's not something everyone can do | |
triggerCooldown |
query | string | Yes | Filter by how often the alert can be triggered |
Responses
Schema of the response body
{
"oneOf": [
{
"$ref": "#/components/schemas/Alert"
},
{
"type": "object",
"properties": {
"items": {
"description": "An array of all the items that were found, but capped at most at `per_page`",
"type": "array",
"items": {
"$ref": "#/components/schemas/Alert"
}
},
"count": {
"description": "The total number of items found",
"type": "number",
"format": "int32"
},
"per_page": {
"description": "The number of items returned per page",
"type": "number",
"format": "int32"
},
"page": {
"description": "The number of available pages",
"type": "number",
"format": "int32"
}
}
}
]
}
Refer to the common response description: NotFound.
Refer to the common response description: Error.
PUT /alert
Request body
{
"name": "They are coming!!",
"description": "string",
"public": true,
"trigger_cooldown": "string",
"triggers": [
"022153ca-3866-4196-8e57-88bac2e73275"
],
"subscriptions": [
"04807a58-388a-43d5-af71-826e23ffee52"
]
}
Schema of the request body
{
"type": "object",
"properties": {
"name": {
"type": "string",
"description": "The name of the alert, displayed in the list of alerts",
"example": "They are coming!!"
},
"description": {
"type": "string",
"nullable": true
},
"public": {
"type": "boolean"
},
"trigger_cooldown": {
"type": "string",
"description": "How often the trigger can fire",
"nullable": true
},
"triggers": {
"type": "array",
"items": {
"type": "string",
"format": "uuid"
}
},
"subscriptions": {
"type": "array",
"items": {
"type": "string",
"format": "uuid"
}
}
}
}
Responses
"Alert updated"
Schema of the response body
{
"type": "string",
"enum": [
"Alert updated"
]
}
Refer to the common response description: Unauthorized.
Refer to the common response description: Unauthenticated.
Refer to the common response description: NotFound.
Refer to the common response description: Error.
POST /dsltest
Check whether DSL code fragments are correct.
Description
This endpoint will execute a provided DSL fragment and return the result. It will be run with test data, but you can use it to call your models or whatever. Queries that take too long will be terminated.
DSL Phases
There are four places where the DSL is used:
- Constructing prompts
- Sending requests to models
- Parsing the responses that models return
- Grading the parsed responses
These four steps happen sequentially for each task. This endpoint only checks one phase, which you must specify. That being said, there's nothing stopping you from chaining all four, e.g.:
import requests
API_KEY = "<your api key goes here>"
def run_code(code, stage, overrides):
headers = {'Api-Token': API_KEY}
res = requests.post('https://equistamp.net/dsltest', headers=headers, json={"code": code, "stage": stage, "context": overrides})
if res.status_code != 200:
raise ValueError(f'bad request: {res.text}')
return res.json()
prompt = run_code('(str "Do something with this task: " task}', 'prompt')
response = run_code('(POST "https://your.model/endpoint" {:json {"prompt" prompt}})', 'request')
parsed_response = run_code('(get-in response ["path" "to" "response"])', 'response', {"response": response})
grader_result = run_code('parsed-response', 'grader', {"response": response, "parsed-response": parsed_response})
print(grader_result)
Context
When starting a request, a context is created with useful constants:
Base constants
task- the text of the task to be completedendpoint_type- the type of endpoint - possible values are: aws, together.ai, conversational, google_cloud, azure, text-generation, anthropic, fill-mask, zero-shot-classification, custom, open_ai, text2text-generation, mistralcache- An atom containing a cache that can be used to store data between requests. Acts as a map, so items can be accessed via(get @cache <key>)and set via(swap! cache assoc <key> <val>).
Task specific context
Mulitple choice tasks
In the case of multiple choice tasks, the following are also available:
num_choices- the number of available choicesletter-choices- the letters corresponding to the available choicescorrect- the letters of all correct answers - only available to the Grader
Boolean tasks
Boolean tasks (i.e. true/false) will add the following to the grader's context:
correct- whether the current task istrueorfalse
Free response tasks
Free response tasks are tasks that expect arbitrary text. These kind of tasks don't really
have "correct" answers that can be saved, as much as phrases that are similar to what is expected,
e.g. "What is a group of whales called?" could be answered with "A pod", "Pod", "it's a pod" or
other such combinations, all of which are correct. You could also accept "a family" which is
sort of correct, in that some species are very matrilineal, but others form more casual pods.
There is also "school", which in general applies to fish, but is sometimes also used for whales.
On the other hand "a gander" or "a murder" are flat out incorrect, as those apply to birds. To help
manage this, we support positive-examples, which is a list of strings that are close to the
kind of response you're expecting, and negative-examples, which is a list of strings that are
opposite in meaning to what you expect.
The default grader uses cosine similarities to
check responses. It will check the model's response against all positive and negative examples,
normalized to <0, 1>. The complement of negative similarities is used, as in their case the
idea is to have something that is opposite in meaning (as opposed to just maximally unsimilar).
The maximum value is then returned and used as the correctness score for that given task.
The following will be added to the grader's context:
positive-examples- a list of strings that should be similar to the model's responsenegative-examples- a list of strings that should be opposite to the model's responseembedder- a one argument function that receives a string and returns an embedding vector
JSON tasks
JSON tasks expect the model to answer with correct JSON according to a schema. The schema will be added to the context.
schema- the expected schema of the resulting JSON object
Stage context
Each subsequent stage (request, response, grader) will have values added in the previous stages:
Request
prompt- the prompt to be sent to the model
Response
response- the result of the Request DSL call
Grader
parsed-response- the result of the Response call
Request body
{
"code": "(get-in response [:json \"value\"])",
"stage": "response"
}
Schema of the request body
{
"type": "object",
"properties": {
"code": {
"description": "The DSL code to be evaluated",
"type": "string",
"example": "(get-in response [:json \"value\"])"
},
"stage": {
"description": "The kind of DSL code to be tested",
"example": "response",
"type": "string",
"enum": [
"system_prompt",
"prompt",
"request",
"response",
"grader"
]
}
},
"context": {
"description": "Additional items to be added to the execution context",
"type": "object",
"properties": {
"task-type": {
"description": "The task of type to be used. Must be one of \"FRQ\", \"MCQ\", \"bool\", \"json\"",
"example": "MCQ"
},
"response": {
"description": "The response used when testing 'response' DSL code. If not provided, a dummy value will be used",
"example": {
"json": {
"value": "bla bla"
}
}
},
"parsed-response": {
"description": "The parsed_response used when testing 'grader' DSL code. If not provided, a dummy value will be used",
"example": "bla bla"
}
},
"additionalProperties": true
}
}
Responses
{
"result": null
}
Schema of the response body
{
"type": "object",
"properties": {
"result": {
"description": "This will be whatever the code returned"
}
}
}
Refer to the common response description: Unauthorized.
Refer to the common response description: Unauthenticated.
Refer to the common response description: Error.
POST /evaluation
Create a new evaluation.
Description
Adding tasks to new evaluations
There are three ways to add tasks to evaluations:
- directly during creation by providing a CSV with tasks via the
csv_urlandcolumns_mappingparameters - by sending a tasks CSV to the /evaluationbuilderhandler endpoint
- by uploading tasks directly via the /task endpoint
The first option is recommended, as it will automatically call the /evaluationbuilderhandler endpoint for you, once the evaluation is created.
Request body
{
"name": "My lovely evaluation",
"public": true,
"public_usable": false,
"reports_visible": false,
"description": "# This is an evaluation, see more at [this link](http://some.link)",
"task_types": "MCQ",
"modalities": "text",
"min_questions_to_complete": 321,
"tags": [
"f7b6acf2-f8ea-45dc-a47f-9fcf8af5eb79"
],
"csv_url": "https://example.com",
"default_task_type": "MCQ",
"columns_mapping": {
"Question col": {
"columnType": "question"
},
"Paraphrase of question": {
"columnType": "paraphrase",
"paraphraseOf": "Question col"
}
},
"references": {
"bla": {
"schema": {
"properties": {
"name": {
"type": "string"
}
}
},
"name": "My wonderful schema",
"description": "Some description here"
},
"other-name_with.interpunction123": {
"schema": {
"properties": {
"name": {
"type": "string"
}
}
}
}
},
"prompt": "(str \"Please answer this question: \" task)",
"grader": {
"MCQ": "(= parsedResponse correct)",
"default": "false"
}
}
Schema of the request body
{
"type": "object",
"properties": {
"name": {
"type": "string",
"example": "My lovely evaluation"
},
"public": {
"type": "boolean",
"description": "Whether this evaluation should be publicly visible. If true, anyone can view its details or evaluate models with it"
},
"public_usable": {
"type": "boolean",
"description": "Whether this evaluation can be ran by anyone. To avoid tasks being leaked, you might want to have the results shown, but have control over what it can be run on.",
"example": false
},
"reports_visible": {
"type": "boolean",
"description": "Whether anyone can pay to see reports for this evaluation.",
"example": false
},
"description": {
"type": "string",
"description": "The description of this evaluation, as displayed on the site. Markdown can be used for formatting",
"nullable": true,
"example": "# This is an evaluation, see more at [this link](http://some.link)"
},
"task_types": {
"type": "array",
"items": {
"type": "string"
},
"description": "The types of tasks supported by this evaluation",
"enum": [
"FRQ",
"bool",
"json",
"MCQ"
],
"example": "MCQ"
},
"modalities": {
"type": "array",
"items": {
"type": "string"
},
"description": "The available modalities of this evaluation",
"enum": [
"text"
],
"example": "text"
},
"min_questions_to_complete": {
"type": "integer",
"format": "int64",
"description": "The default number of tasks to run before an evaluation session is deemed finished.\nA given evaluation session may process more tasks, as starting a new evaluation session for an evaluation/model pair which is already running will just add more tasks to the current session, rather than starting a new one.",
"nullable": true,
"example": 321
},
"tags": {
"type": "array",
"items": {
"type": "string",
"format": "uuid"
}
},
"csv_url": {
"description": "The URL of a CSV file containing the tasks of the new evaluation",
"example": "https://example.com",
"type": "string"
},
"default_task_type": {
"description": "The default type of tasks - can be overrode on a per row basis. Will use \"MCQ\" if not set",
"example": "MCQ",
"nullable": true,
"type": "string",
"enum": [
"FRQ",
"bool",
"json",
"MCQ"
]
},
"columns_mapping": {
"description": "A mapping that specifies which CSV columns contain which types of data. See the [Evaluation Builder](#post-evaluationbuilderhandler) endpoint for details",
"type": "object",
"example": {
"Question col": {
"columnType": "question"
},
"Paraphrase of question": {
"columnType": "paraphrase",
"paraphraseOf": "Question col"
}
},
"additionalProperties": {
"$ref": "#/components/schemas/ColumnMapping"
}
},
"references": {
"description": "A mapping of keys to schemas. The keys can contain ASCII alphanumeric characters, \"-\", \"_\" and \".\".",
"type": "object",
"additionalProperties": {
"type": "object",
"properties": {
"schema": {
"type": "object",
"description": "The JSON schema to be used"
},
"name": {
"type": "string",
"description": "An optional name for this schema - this will only be used for displaying, the actual matching is done by comparing the keys of the `references` object."
},
"description": {
"type": "string",
"description": "An optional description for this schema"
},
"type": {
"type": "string",
"enum": [
"json"
],
"description": "The type of schema. If not provided, will be assumed to be JSON",
"example": "json"
}
},
"required": [
"schema"
]
},
"example": {
"bla": {
"schema": {
"properties": {
"name": {
"type": "string"
}
}
},
"name": "My wonderful schema",
"description": "Some description here"
},
"other-name_with.interpunction123": {
"schema": {
"properties": {
"name": {
"type": "string"
}
}
}
}
}
},
"prompt": {
"description": "DSL code defining how to create prompts. See the [DSL page](/docs/dsl/) for more info.",
"example": "(str \"Please answer this question: \" task)"
},
"grader": {
"description": "DSL code specifying how to grade LLM responses. This can be empty, in which case the default grader will be used. You can specify a grader that will be used for all types of tasks, or per task type graders. If you provide both a default grader and one for a specific task type, the specific one takes precedence.",
"oneOf": [
{
"type": "string",
"description": "DSL code that should be used for all response",
"example": "(= parsedResponse \"ok\")"
},
{
"type": "object",
"description": "Per task type DSL code. Use the \"default\" key to specify the code that should be used for tasks types that aren't specified - otherwise the system default grader will be used.",
"properties": {
"default": {
"type": "string",
"description": "The default grader to be used for task types that aren't specified.",
"example": "(if (= parsedResponse correct) 1 0)"
},
"FRQ": {
"type": "string",
"description": "The DSL code to be used to grade FRQ tasks. If this is empty, the default grader will be used"
},
"bool": {
"type": "string",
"description": "The DSL code to be used to grade bool tasks. If this is empty, the default grader will be used"
},
"json": {
"type": "string",
"description": "The DSL code to be used to grade json tasks. If this is empty, the default grader will be used"
},
"MCQ": {
"type": "string",
"description": "The DSL code to be used to grade MCQ tasks. If this is empty, the default grader will be used"
}
},
"example": {
"MCQ": "(= parsedResponse correct)",
"default": "false"
}
}
],
"example": {
"MCQ": "(= parsedResponse correct)",
"default": "false"
}
}
}
}
Responses
{
"id": "9f65948b-0839-4704-94c1-a74682d43594",
"name": "My lovely evaluation",
"public": true,
"public_usable": false,
"reports_visible": false,
"quality": 0.89,
"num_tasks": 2000,
"description": "# This is an evaluation, see more at [this link](http://some.link)",
"last_updated": "2022-04-13T15:42:05.901Z",
"task_types": "MCQ",
"modalities": "text",
"min_questions_to_complete": 321,
"owner": {
"id": "2120c29a-ed02-4065-bbba-c5ada79d7c47",
"email_address": "mr.blobby@some.domain",
"user_name": "mr_blobby",
"full_name": "Mr Blobby, esq.",
"user_image": "https://equistamp.com/avatars/123123123123.png",
"bio": "Hello, my name is Inigo Montoya. You Killed my Father. Prepare to die",
"display_options": {
"bio": true,
"email_address": true,
"user_image": false
},
"join_date": "2022-04-13",
"subscription_level": "pro",
"alerts": [
"88103840-7fe3-41a2-b492-230df4dac99d"
]
},
"tags": [
{
"id": "53cbd07d-fa52-4dc8-bfd1-10c3588d2174",
"name": "string"
}
]
}
Schema of the response body
{
"type": "object",
"properties": {
"id": {
"type": "string",
"format": "uuid"
},
"name": {
"type": "string",
"example": "My lovely evaluation"
},
"public": {
"type": "boolean",
"description": "Whether this evaluation should be publicly visible. If true, anyone can view its details or evaluate models with it"
},
"public_usable": {
"type": "boolean",
"description": "Whether this evaluation can be ran by anyone. To avoid tasks being leaked, you might want to have the results shown, but have control over what it can be run on.",
"example": false
},
"reports_visible": {
"type": "boolean",
"description": "Whether anyone can pay to see reports for this evaluation.",
"example": false
},
"quality": {
"type": "number",
"format": "double",
"description": "The quality of this evaluation, i.e. how much it can be trusted, from 0 to 1.",
"example": 0.89
},
"num_tasks": {
"type": "integer",
"format": "int64",
"description": "The total number of tasks defined for this evaluation. Includes redacted tasks.",
"example": 2000
},
"description": {
"type": "string",
"description": "The description of this evaluation, as displayed on the site. Markdown can be used for formatting",
"nullable": true,
"example": "# This is an evaluation, see more at [this link](http://some.link)"
},
"last_updated": {
"type": "string",
"format": "date-time"
},
"task_types": {
"type": "array",
"items": {
"type": "string"
},
"description": "The types of tasks supported by this evaluation",
"enum": [
"FRQ",
"bool",
"json",
"MCQ"
],
"example": "MCQ"
},
"modalities": {
"type": "array",
"items": {
"type": "string"
},
"description": "The available modalities of this evaluation",
"enum": [
"text"
],
"example": "text"
},
"min_questions_to_complete": {
"type": "integer",
"format": "int64",
"description": "The default number of tasks to run before an evaluation session is deemed finished.\nA given evaluation session may process more tasks, as starting a new evaluation session for an evaluation/model pair which is already running will just add more tasks to the current session, rather than starting a new one.",
"nullable": true,
"example": 321
},
"owner": {
"$ref": "#/components/schemas/ShallowUser"
},
"tags": {
"type": "array",
"items": {
"$ref": "#/components/schemas/ShallowTag"
}
}
}
}
Refer to the common response description: Unauthorized.
Refer to the common response description: Unauthenticated.
Refer to the common response description: Error.
GET /evaluation
Input parameters
| Parameter | In | Type | Default | Nullable | Description |
|---|---|---|---|---|---|
id |
query | string | Yes | Will return the item with this id, or die trying. When this parameter is provided, then only a single item will be returned |
Responses
Schema of the response body
{
"oneOf": [
{
"$ref": "#/components/schemas/Evaluation"
},
{
"type": "object",
"properties": {
"items": {
"description": "An array of all the items that were found, but capped at most at `per_page`",
"type": "array",
"items": {
"$ref": "#/components/schemas/Evaluation"
}
},
"count": {
"description": "The total number of items found",
"type": "number",
"format": "int32"
},
"per_page": {
"description": "The number of items returned per page",
"type": "number",
"format": "int32"
},
"page": {
"description": "The number of available pages",
"type": "number",
"format": "int32"
}
}
}
]
}
Refer to the common response description: NotFound.
Refer to the common response description: Error.
PUT /evaluation
Request body
{
"name": "My lovely evaluation",
"public": true,
"public_usable": false,
"reports_visible": false,
"description": "# This is an evaluation, see more at [this link](http://some.link)",
"task_types": "MCQ",
"modalities": "text",
"min_questions_to_complete": 321,
"tags": [
"341337c9-bc8c-4d87-bbf2-7d440f7c124f"
]
}
Schema of the request body
{
"type": "object",
"properties": {
"name": {
"type": "string",
"example": "My lovely evaluation"
},
"public": {
"type": "boolean",
"description": "Whether this evaluation should be publicly visible. If true, anyone can view its details or evaluate models with it"
},
"public_usable": {
"type": "boolean",
"description": "Whether this evaluation can be ran by anyone. To avoid tasks being leaked, you might want to have the results shown, but have control over what it can be run on.",
"example": false
},
"reports_visible": {
"type": "boolean",
"description": "Whether anyone can pay to see reports for this evaluation.",
"example": false
},
"description": {
"type": "string",
"description": "The description of this evaluation, as displayed on the site. Markdown can be used for formatting",
"nullable": true,
"example": "# This is an evaluation, see more at [this link](http://some.link)"
},
"task_types": {
"type": "array",
"items": {
"type": "string"
},
"description": "The types of tasks supported by this evaluation",
"enum": [
"FRQ",
"bool",
"json",
"MCQ"
],
"example": "MCQ"
},
"modalities": {
"type": "array",
"items": {
"type": "string"
},
"description": "The available modalities of this evaluation",
"enum": [
"text"
],
"example": "text"
},
"min_questions_to_complete": {
"type": "integer",
"format": "int64",
"description": "The default number of tasks to run before an evaluation session is deemed finished.\nA given evaluation session may process more tasks, as starting a new evaluation session for an evaluation/model pair which is already running will just add more tasks to the current session, rather than starting a new one.",
"nullable": true,
"example": 321
},
"tags": {
"type": "array",
"items": {
"type": "string",
"format": "uuid"
}
}
}
}
Responses
"Evaluation updated"
Schema of the response body
{
"type": "string",
"enum": [
"Evaluation updated"
]
}
Refer to the common response description: Unauthorized.
Refer to the common response description: Unauthenticated.
Refer to the common response description: NotFound.
Refer to the common response description: Error.
POST /evaluationbuilderhandler
Import tasks from a CSV file.
Description
This endpoint will fetch a CSV file and create a task from each row (without the first one, which is used as a header). If dry_run
is true, then this will only check for errors and not save anything to the database.
Number of questions to complete
Each evaluation run will use a subsample of all available tasks. You can set this number by providing a value for min_questions_to_complete.
If you don't set this manually, it will be set on the basis of the number of tasks in your file, in such a way as to
have a 95% confidence level. In practice this number tends to be larger than needed - the score of most evaluation
runs don't change that much after around 200 tasks.
Task type
Unless specified otherwise, it's assumed that all tasks are Multiple Choice Questions. The can be changed by
- setting
default_task_type, which will change the default to whatever you provide - providing a
typecolumn, which can be used to set the task types for specific rows - any rows where thetypecolumn is not empty will that value as the type, otherwise will use the default type
Columns mapping
For the CSV import to work correctly, you must provide a way to map columns to task fields. This is done by
providing a mapping of <column name> to a column definition object. The available fields in the definition
object are:
columnType- this specified what this column should be used as. Must always be providedparaphraseOf- used by paraphrase columns to point to what they're paraphrasing. All texts can have paraphrases. When a field has paraphrases defined, these will always be used when sending texts to models, or displaying them on the frontend. Only you and system administrators will have access to the non paraphrase texts.
Request body
{
"public_usable": false,
"reports_visible": false,
"min_questions_to_complete": 321,
"tags": [
"80578ad7-0506-4c5e-a2e6-586523676152"
],
"evaluation_id": "64a578cc-05b8-4749-a2eb-ff63f34d78fd",
"dry_run": true,
"csv_url": "https://example.com",
"default_task_type": "MCQ",
"columns_mapping": {
"Question col": {
"columnType": "question"
},
"Paraphrase of question": {
"columnType": "paraphrase",
"paraphraseOf": "Question col"
}
},
"references": {
"bla": {
"schema": {
"properties": {
"name": {
"type": "string"
}
}
},
"name": "My wonderful schema",
"description": "Some description here"
},
"other-name_with.interpunction123": {
"schema": {
"properties": {
"name": {
"type": "string"
}
}
}
}
},
"prompt": "(str \"Please answer this question: \" task)",
"grader": {
"MCQ": "(= parsedResponse correct)",
"default": "false"
}
}
Schema of the request body
{
"type": "object",
"properties": {
"public_usable": {
"type": "boolean",
"description": "Whether this evaluation can be ran by anyone. To avoid tasks being leaked, you might want to have the results shown, but have control over what it can be run on.",
"example": false
},
"reports_visible": {
"type": "boolean",
"description": "Whether anyone can pay to see reports for this evaluation.",
"example": false
},
"min_questions_to_complete": {
"type": "integer",
"format": "int64",
"description": "The default number of tasks to run before an evaluation session is deemed finished.\nA given evaluation session may process more tasks, as starting a new evaluation session for an evaluation/model pair which is already running will just add more tasks to the current session, rather than starting a new one.",
"nullable": true,
"example": 321
},
"tags": {
"type": "array",
"items": {
"type": "string",
"format": "uuid"
}
},
"evaluation_id": {
"description": "The id of the evaluation to add tasks to",
"type": "string",
"format": "uuid"
},
"dry_run": {
"description": "If true, this call will only check for errors and not actually import anything",
"type": "boolean"
},
"csv_url": {
"description": "The URL of a CSV file containing the tasks of the new evaluation",
"example": "https://example.com",
"type": "string"
},
"default_task_type": {
"description": "The default type of tasks - can be overrode on a per row basis. Will use \"MCQ\" if not set",
"example": "MCQ",
"nullable": true,
"type": "string",
"enum": [
"FRQ",
"bool",
"json",
"MCQ"
]
},
"columns_mapping": {
"description": "A mapping that specifies which CSV columns contain which types of data. See the [Evaluation Builder](#post-evaluationbuilderhandler) endpoint for details",
"type": "object",
"example": {
"Question col": {
"columnType": "question"
},
"Paraphrase of question": {
"columnType": "paraphrase",
"paraphraseOf": "Question col"
}
},
"additionalProperties": {
"$ref": "#/components/schemas/ColumnMapping"
}
},
"references": {
"description": "A mapping of keys to schemas. The keys can contain ASCII alphanumeric characters, \"-\", \"_\" and \".\".",
"type": "object",
"additionalProperties": {
"type": "object",
"properties": {
"schema": {
"type": "object",
"description": "The JSON schema to be used"
},
"name": {
"type": "string",
"description": "An optional name for this schema - this will only be used for displaying, the actual matching is done by comparing the keys of the `references` object."
},
"description": {
"type": "string",
"description": "An optional description for this schema"
},
"type": {
"type": "string",
"enum": [
"json"
],
"description": "The type of schema. If not provided, will be assumed to be JSON",
"example": "json"
}
},
"required": [
"schema"
]
},
"example": {
"bla": {
"schema": {
"properties": {
"name": {
"type": "string"
}
}
},
"name": "My wonderful schema",
"description": "Some description here"
},
"other-name_with.interpunction123": {
"schema": {
"properties": {
"name": {
"type": "string"
}
}
}
}
}
},
"prompt": {
"description": "DSL code defining how to create prompts. See the [DSL page](/docs/dsl/) for more info.",
"example": "(str \"Please answer this question: \" task)"
},
"grader": {
"description": "DSL code specifying how to grade LLM responses. This can be empty, in which case the default grader will be used. You can specify a grader that will be used for all types of tasks, or per task type graders. If you provide both a default grader and one for a specific task type, the specific one takes precedence.",
"oneOf": [
{
"type": "string",
"description": "DSL code that should be used for all response",
"example": "(= parsedResponse \"ok\")"
},
{
"type": "object",
"description": "Per task type DSL code. Use the \"default\" key to specify the code that should be used for tasks types that aren't specified - otherwise the system default grader will be used.",
"properties": {
"default": {
"type": "string",
"description": "The default grader to be used for task types that aren't specified.",
"example": "(if (= parsedResponse correct) 1 0)"
},
"FRQ": {
"type": "string",
"description": "The DSL code to be used to grade FRQ tasks. If this is empty, the default grader will be used"
},
"bool": {
"type": "string",
"description": "The DSL code to be used to grade bool tasks. If this is empty, the default grader will be used"
},
"json": {
"type": "string",
"description": "The DSL code to be used to grade json tasks. If this is empty, the default grader will be used"
},
"MCQ": {
"type": "string",
"description": "The DSL code to be used to grade MCQ tasks. If this is empty, the default grader will be used"
}
},
"example": {
"MCQ": "(= parsedResponse correct)",
"default": "false"
}
}
],
"example": {
"MCQ": "(= parsedResponse correct)",
"default": "false"
}
}
}
}
Responses
{
"id": "6f7c068b-17be-42a1-913c-1e3c349af033",
"name": "My lovely evaluation",
"public": true,
"public_usable": false,
"reports_visible": false,
"quality": 0.89,
"num_tasks": 2000,
"description": "# This is an evaluation, see more at [this link](http://some.link)",
"last_updated": "2022-04-13T15:42:05.901Z",
"task_types": "MCQ",
"modalities": "text",
"min_questions_to_complete": 321,
"owner": {
"id": "f059ec20-0e0f-4c5c-81d3-4a6e3aa64ed4",
"email_address": "mr.blobby@some.domain",
"user_name": "mr_blobby",
"full_name": "Mr Blobby, esq.",
"user_image": "https://equistamp.com/avatars/123123123123.png",
"bio": "Hello, my name is Inigo Montoya. You Killed my Father. Prepare to die",
"display_options": {
"bio": true,
"email_address": true,
"user_image": false
},
"join_date": "2022-04-13",
"subscription_level": "pro",
"alerts": [
"6c6028a9-85b2-4f11-b83e-53683cd48d9b"
]
},
"tags": [
{
"id": "d71d88f6-3afe-41a9-b263-94d1f38e81d7",
"name": "string"
}
]
}
Schema of the response body
{
"type": "object",
"properties": {
"id": {
"type": "string",
"format": "uuid"
},
"name": {
"type": "string",
"example": "My lovely evaluation"
},
"public": {
"type": "boolean",
"description": "Whether this evaluation should be publicly visible. If true, anyone can view its details or evaluate models with it"
},
"public_usable": {
"type": "boolean",
"description": "Whether this evaluation can be ran by anyone. To avoid tasks being leaked, you might want to have the results shown, but have control over what it can be run on.",
"example": false
},
"reports_visible": {
"type": "boolean",
"description": "Whether anyone can pay to see reports for this evaluation.",
"example": false
},
"quality": {
"type": "number",
"format": "double",
"description": "The quality of this evaluation, i.e. how much it can be trusted, from 0 to 1.",
"example": 0.89
},
"num_tasks": {
"type": "integer",
"format": "int64",
"description": "The total number of tasks defined for this evaluation. Includes redacted tasks.",
"example": 2000
},
"description": {
"type": "string",
"description": "The description of this evaluation, as displayed on the site. Markdown can be used for formatting",
"nullable": true,
"example": "# This is an evaluation, see more at [this link](http://some.link)"
},
"last_updated": {
"type": "string",
"format": "date-time"
},
"task_types": {
"type": "array",
"items": {
"type": "string"
},
"description": "The types of tasks supported by this evaluation",
"enum": [
"FRQ",
"bool",
"json",
"MCQ"
],
"example": "MCQ"
},
"modalities": {
"type": "array",
"items": {
"type": "string"
},
"description": "The available modalities of this evaluation",
"enum": [
"text"
],
"example": "text"
},
"min_questions_to_complete": {
"type": "integer",
"format": "int64",
"description": "The default number of tasks to run before an evaluation session is deemed finished.\nA given evaluation session may process more tasks, as starting a new evaluation session for an evaluation/model pair which is already running will just add more tasks to the current session, rather than starting a new one.",
"nullable": true,
"example": 321
},
"owner": {
"$ref": "#/components/schemas/ShallowUser"
},
"tags": {
"type": "array",
"items": {
"$ref": "#/components/schemas/ShallowTag"
}
}
}
}
Refer to the common response description: Unauthorized.
Refer to the common response description: Unauthenticated.
Refer to the common response description: Error.
GET /evaluationbuilderhandler
Check whether a CSV file contains valid tasks
Description
This endpoint will fetch a CSV file from the provided URL and validate each row to make sure that it can be processed. Rows with errors or warnings will be returned with appropriate messages, to help debug problems. When the CSV is processed (after sending an appropriate POST request to this endpoint), rows that have errors will be skipped.
Column mapping
To check whether all the rows are correct, you must provide a way to work out which columns correspond to which fields in the resulting tasks. In the case of GET requests, they should be provided as follows. Check out our sample tasks file for examples:
Basic mappings
question- this is the only required parameter. This should specify the name of the column containing the main text to be sent to modelstype- this specifies where to check for per row task type overrides. By default it's assumed that tasks are multiple choice questions, unlessdefault_task_typeis set in the POST request. But if you want most tasks to be one type, but have a couple that are of a different type (e.g. true-false questions), then you can do so by using this column.redacted- this specified where to check whether a task should be hidden by default. By default it's assumed that all tasks should be used when testing models, but sometimes a given task may be incorrect, or maybe not the best quality. One way around this would be to delete any problematical rows before uploading, but that can be a lot of work. To make things easier, tasks can be uploaded asredacted, which means that they won't be sent to models. Any rows with a redacted column defined, which have non empty values, will be saved as redacted
Paraphrases
All texts can have paraphrases. When a field has paraphrases defined, these will always be used when sending
texts to models, or displaying them on the frontend. Only you and system administrators will have access to the
non paraphrase texts. Paraphrases are declared as paraphrase.<paraphrase column>=<paraphrased column>. So e.g.
paraphrase.question%20paraphrase=Question will declare that the "question paraphrase" column is a paraphrase
of the "Question" column.
Boolean question mappings
Boolean questions have only two possible answers - True or False. You can have one column which provides this
value. Any row where the answer column equals 1 or true (case insensitive) will be deemed to be a question
where the correct answer is True. Any other value is False.
bool_correct- any rows which are1or case insensitivetrueoryes(so e.g.TrUe,TRueortrue) will be deemed to be true statements. Anything else is false.
Free response question mappings
Free response questions are questions where the model can answer with any text. An example of this kind of question
would be "fill in the blank". You can provide both correct and incorrect texts - free response questions are checked
on the basis of similarity. Two identical texts should have a similarity of 1 and texts with opposite meanings will
have a similarity of 0. You can specify expected answers either as things which should be similar, or texts which are
opposite, in which case the similarity will be calculated as 1 - <similarity score>. Each
row must have at least one correct or incorrect value provided.
frq_correct- a comma separated list of URL encoded column names, e.g.'Correct%201,Correct%20%3D%20this'frq_incorrect- a comma separated list of URL encoded column names, e.g.'This%20is%20wrong,Bad%21%21'
Multiple response question mappings
In the case of multiple response questions, you must provide at least one correct answer, and at least one incorrect answers. You can add more if you want, but we will only use the first 10 correct answers, and the first 20 incorrect answers. These column definitions should be provided via:
mcq_correct- a comma separated list of URL encoded column names, e.g.'Correct%201,Correct%20%3D%20this'mcq_incorrect- a comma separated list of URL encoded column names, e.g.'This%20is%20wrong,Bad%21%21'
Json question mappings
Tasks which expect valid JSON responses have the following column types, both of which are optional:
schema- a JSON schema specifying the structure of the expected JSON. If this is provided, all responses must conform to this schema. If not provided, then the schema will be assumed to be any valid JSON. The schema can be provided via a reference (see below).expected- an expected JSON object. The JSON returned by the model must have the same values as theexpectedobject
Example column mappings
Assuming you have a CSV file with the following columns:
Task type- contains the type of tasksTimestamp- date of last edit - not needed here, so should be ignored- `` - an empty column
Task question to answer- the text to which models should respondQuestion paraphrase- an alternative way of phrasing the questionCorrect answer- the expected answerAlternative correct answer- another answer that will also be accepted as correctBad response example- an incorrect answer to be provided as an option in the multiple choice questionWrong answer- another incorrect answer to be provided as an option in the multiple choice question
Then you would have to send a GET request with type=Task%20type&question=Task%20question%20to%20answer¶phrase.Question%20paraphrase=Task question to answer&mcq_correct=Correct%20answer,Alternative%20correct%20answer&mcq_incorrect=Bad%20response%20example,Wrong%20answer
References
In the case of schemas, it would be annoying to provide a massive JSON object in each row. To make this simpler, you can
provide a set of references. Any schema column with a value that is a reference key will use the schema object that is
stored as that reference. Reference names can contain English letters (upper and lowercase), digits and "-", "_", and ".".
References can also have names and descriptions, for easier management. Both of these are optional, and do not in any
way effect how the references are matched to rows.
References should be provided as reference.<value>.<reference name> GET parameters, where <value> is one of "schema",
"name" or "description". An example would be a GET request with:
type=Task&question=Question&json_schema=Schema&reference.name.ref1&reference.schema.ref1=%7B%22asd%22%3A+%22asd%22%7D,
which would set ref1 to be {"asd": "asd"} on all rows that have ref1 as their schema.
Input parameters
| Parameter | In | Type | Default | Nullable | Description |
|---|---|---|---|---|---|
csv_url |
path | No | The URL of a CSV file containing the tasks of the new evaluation | ||
only_header |
path | No | When set, will just return the headers of the CSV file | ||
question |
path | No | The columns in the CSV file containing the questions | ||
redacted |
path | No | The column in the CSV file marking tasks as redacted | ||
type |
path | No | The column in the CSV file containing the per row task type |
Responses
{
"errors": [
{
"task_num": 3,
"errors": [
{
"message": "This row couldn't be parsed",
"level": "warning",
"type": "validation"
}
],
"warnings": [
"This row is suspicious"
]
}
],
"num_tasks": 123,
"min_questions_to_complete": 42
}
Schema of the response body
{
"type": "object",
"properties": {
"errors": {
"type": "array",
"items": {
"type": "object",
"properties": {
"task_num": {
"description": "The index of the row that has these errors",
"type": "number",
"format": "int64",
"example": 3
},
"errors": {
"type": "array",
"items": {
"type": "object",
"properties": {
"message": {
"type": "string",
"example": "This row couldn't be parsed"
},
"level": {
"type": "string",
"enum": [
"warning",
"error"
]
},
"type": {
"type": "string",
"example": "validation"
}
}
}
},
"warnings": {
"type": "array",
"items": {
"type": "string",
"example": "This row is suspicious"
}
}
}
}
},
"num_tasks": {
"description": "The number of rows with tasks found, including rows with errors",
"type": "number",
"format": "int64",
"example": 123
},
"min_questions_to_complete": {
"description": "The minimum number of tasks per evaluation session. If this wasn't provided in the query parameters, it will be calculated based on the number of tasks found",
"type": "number",
"format": "int64",
"example": 42
}
}
}
Refer to the common response description: ValidationError.
Refer to the common response description: NotFound.
Refer to the common response description: Error.
POST /evaluationmodeljobshandler
Request body
{
"job_name": "string",
"minutes_between_evaluations": 10.12,
"job_description": "string",
"start_date": "2022-04-13T15:42:05.901Z"
}
Schema of the request body
{
"type": "object",
"properties": {
"job_name": {
"type": "string"
},
"minutes_between_evaluations": {
"type": "number",
"format": "int64"
},
"job_description": {
"type": "string"
},
"start_date": {
"type": "string",
"format": "date-time",
"nullable": true
}
}
}
Responses
{
"job_name": "string",
"minutes_between_evaluations": 10.12,
"job_body": null,
"job_description": "string",
"job_schedule_arn": "string",
"start_date": "2022-04-13T15:42:05.901Z",
"owner_id": "c3821565-8ad3-48d3-be6b-6785eec6de4d",
"model_id": "6aba704c-b89d-40af-9c68-9dde86479c65",
"evaluation_id": "8402290a-eb86-44be-a7b7-bfa35072c30f",
"id": "c959d296-96ea-4fc8-8b9c-7a66d53d436e",
"creation_date": "2022-04-13T15:42:05.901Z"
}
Schema of the response body
{
"type": "object",
"properties": {
"job_name": {
"type": "string"
},
"minutes_between_evaluations": {
"type": "number",
"format": "int64"
},
"job_body": {},
"job_description": {
"type": "string"
},
"job_schedule_arn": {
"type": "string"
},
"start_date": {
"type": "string",
"format": "date-time",
"nullable": true
},
"owner_id": {
"type": "string",
"format": "uuid"
},
"model_id": {
"type": "string",
"format": "uuid"
},
"evaluation_id": {
"type": "string",
"format": "uuid"
},
"id": {
"type": "string",
"format": "uuid"
},
"creation_date": {
"type": "string",
"format": "date-time"
}
}
}
Refer to the common response description: Unauthorized.
Refer to the common response description: Unauthenticated.
Refer to the common response description: Error.
GET /evaluationmodeljobshandler
Input parameters
| Parameter | In | Type | Default | Nullable | Description |
|---|---|---|---|---|---|
id |
query | string | Yes | Will return the item with this id, or die trying. When this parameter is provided, then only a single item will be returned |
Responses
Schema of the response body
{
"oneOf": [
{
"$ref": "#/components/schemas/EvaluationModelJobs"
},
{
"type": "object",
"properties": {
"items": {
"description": "An array of all the items that were found, but capped at most at `per_page`",
"type": "array",
"items": {
"$ref": "#/components/schemas/EvaluationModelJobs"
}
},
"count": {
"description": "The total number of items found",
"type": "number",
"format": "int32"
},
"per_page": {
"description": "The number of items returned per page",
"type": "number",
"format": "int32"
},
"page": {
"description": "The number of available pages",
"type": "number",
"format": "int32"
}
}
}
]
}
Refer to the common response description: Unauthorized.
Refer to the common response description: Unauthenticated.
Refer to the common response description: NotFound.
Refer to the common response description: Error.
PUT /evaluationmodeljobshandler
Request body
{
"job_name": "string",
"minutes_between_evaluations": 10.12,
"job_description": "string",
"start_date": "2022-04-13T15:42:05.901Z"
}
Schema of the request body
{
"type": "object",
"properties": {
"job_name": {
"type": "string"
},
"minutes_between_evaluations": {
"type": "number",
"format": "int64"
},
"job_description": {
"type": "string"
},
"start_date": {
"type": "string",
"format": "date-time",
"nullable": true
}
}
}
Responses
"EvaluationModelJobs updated"
Schema of the response body
{
"type": "string",
"enum": [
"EvaluationModelJobs updated"
]
}
Refer to the common response description: Unauthorized.
Refer to the common response description: Unauthenticated.
Refer to the common response description: NotFound.
Refer to the common response description: Error.
POST /evaluationsession
Run an evaluation on a model, or take the test as a human.
Description
Human tests
Humans can test themselves on evaluations to check how hard they are. This should be done
via the "Test yourself" button on evaluation pages. A random subsample of some 20 tasks
will be returned, and once all of them have been completed, a summary shown of how well they
did compared to other humans and AI models. Human tests can only be taken by the actual caller,
as determined by Session-Token or Api-Token. Providing a different user via evaluatee_id won't
do anything.
Each human test is idempotent, so until it has been completed, calling this endpoint for
a given evaluation will return the same 20 tasks. This can be overriden with the restart
parameter - when that is true, then a new evaluation session will be started.
Human tests are free.
AI model evaluation
Calling this endpoint with a model id in the evaluatee_id field and is_human_being_evaluated = false
will start a new evaluation session for the provided evaluation_id. This requires payment, which
will automatically be subtracted from your credits. If you don't have enough credits, a
402 error will be returned, with a link to your user profile, where you can purchase more credits.
By default there will be only one evaluation session per evaluation/model pair at a time.
Calling this endpoint for a running evaluation session will append tasks to the current session
rather than creating a new one. You can force a new evaluation session by setting restart = true.
Request body
{
"origin": "user",
"is_human_being_evaluated": true,
"min_verbosity": 10.12,
"max_verbosity": 10.12,
"avg_verbosity": 10.12,
"median_verbosity": 10.12,
"evaluatee_id": "1ec67c40-fa5d-4a4d-867c-ac1cf75d4ec4",
"evaluation_id": "9cc68041-01fe-474e-a623-59f37c7074aa",
"notify": [
{
"method": "email",
"destination": "mr.blobby@acme.com"
}
],
"restart": false,
"system_prompt": "(str \"Please answer this: \" task)",
"prompt": {
"MCQ": "(str \"I have a multiple choice question for you to answer: \" task)",
"default": "(str \"Answer this, please: \" task)"
},
"request": {
"MCQ": "(bedrock-call \"your-access-key\" \"your-secret-key\" \"Jurassic\" task-text)",
"default": "false"
},
"response": {
"MCQ": "(= parsedResponse correct)",
"default": "false"
},
"grader": {
"MCQ": "(= parsedResponse correct)",
"default": "false"
}
}
Schema of the request body
{
"type": "object",
"properties": {
"origin": {
"type": "string",
"description": "The source of this evaluation session, i.e. what triggered it",
"example": "user",
"enum": [
"alert",
"user",
"job",
"model"
]
},
"is_human_being_evaluated": {
"type": "boolean",
"description": "Whether this evaluation session is a human test. When false will start an automatic test for the provided model and evaluation.",
"example": true
},
"min_verbosity": {
"type": "number",
"format": "double",
"nullable": true
},
"max_verbosity": {
"type": "number",
"format": "double",
"nullable": true
},
"avg_verbosity": {
"type": "number",
"format": "double",
"nullable": true
},
"median_verbosity": {
"type": "number",
"format": "double",
"nullable": true
},
"evaluatee_id": {
"type": "string",
"format": "uuid",
"description": "In the case of human tests, the id of the user taking the test. In the case of testing models, the id of the model to be tested"
},
"evaluation_id": {
"type": "string",
"format": "uuid",
"description": "The id of the evaluation to be run"
},
"notify": {
"description": "How to notify that the evaluation session has finished. There can be up to 20 notification methods provided. If not methods provided, an email will be sent to the user that triggered it.",
"type": "array",
"items": {
"type": "object",
"properties": {
"method": {
"type": "string",
"enum": [
"email",
"webhook",
"sms",
"call"
],
"description": "The notification method",
"example": "email"
},
"destination": {
"type": "string",
"description": "Where to send a notification",
"example": "mr.blobby@acme.com"
}
}
}
},
"restart": {
"description": "Will force a new evaluation session if true - by default, calling this endpoint for a evaluation - model session that is already running, will add more tasks to the running session, rather than creating a new one",
"example": false,
"type": "boolean"
},
"system_prompt": {
"description": "DSL code specifying how to construct model system prompts. This can be empty.",
"type": "string",
"example": "(str \"Please answer this: \" task)"
},
"prompt": {
"description": "DSL code specifying how to construct model prompts. This can be empty, in which case the prompt code of the evaluation will be used. You can specify a `prompt` that will be used for all types of tasks, or per task type `prompt`s. If you provide both a default `prompt` and one for a specific task type, the specific one takes precedence. This can only be used if you're an admin of the selected evaluation - otherwise an error will be returned.",
"oneOf": [
{
"type": "string",
"description": "DSL code that should be used for all prompts",
"example": "(str \"Please answer this: \" task)"
},
{
"type": "object",
"description": "Per task type DSL code. Use the \"default\" key to specify the code that should be used for tasks types that aren't specified - otherwise the system default prompt will be used.",
"properties": {
"default": {
"type": "string",
"description": "The default `prompt` to be used for task types that aren't specified.",
"example": "(str \"Answer this, please: \" task)"
},
"FRQ": {
"type": "string",
"description": "The DSL code to be used to create prompts for FRQ tasks. If this is empty, the default `prompt` will be used"
},
"bool": {
"type": "string",
"description": "The DSL code to be used to create prompts for bool tasks. If this is empty, the default `prompt` will be used"
},
"json": {
"type": "string",
"description": "The DSL code to be used to create prompts for json tasks. If this is empty, the default `prompt` will be used"
},
"MCQ": {
"type": "string",
"description": "The DSL code to be used to create prompts for MCQ tasks. If this is empty, the default `prompt` will be used"
}
},
"example": {
"MCQ": "(str \"I have a multiple choice question for you to answer: \" task)",
"default": "(str \"Answer this, please: \" task)"
}
}
],
"example": {
"MCQ": "(str \"I have a multiple choice question for you to answer: \" task)",
"default": "(str \"Answer this, please: \" task)"
}
},
"request": {
"description": "DSL code specifying how to send tasks to the model. This can be empty, in which case the request code of the model will be used. You can specify a `request` that will be used for all types of tasks, or per task type `request`s. If you provide both a default `request` and one for a specific task type, the specific one takes precedence. This can only be used if you're an admin of the selected model - otherwise an error will be returned.",
"oneOf": [
{
"type": "string",
"description": "DSL code that should be used for all requests",
"example": "(POST \"http://my.model.endpoint\" {:json {\"task\" task}})"
},
{
"type": "object",
"description": "Per task type DSL code. Use the \"default\" key to specify the code that should be used for tasks types that aren't specified - otherwise the system default request code will be used.",
"properties": {
"default": {
"type": "string",
"description": "The default `request` to be used for task types that aren't specified.",
"example": "(openai-call \"your_key\" \"gtp-4\" task)"
},
"FRQ": {
"type": "string",
"description": "The DSL code to be used to send requests for FRQ tasks. If this is empty, the default `request` will be used"
},
"bool": {
"type": "string",
"description": "The DSL code to be used to send requests for bool tasks. If this is empty, the default `request` will be used"
},
"json": {
"type": "string",
"description": "The DSL code to be used to send requests for json tasks. If this is empty, the default `request` will be used"
},
"MCQ": {
"type": "string",
"description": "The DSL code to be used to send requests for MCQ tasks. If this is empty, the default `request` will be used"
}
},
"example": {
"MCQ": "(openai-call \"sk-your-secret-key\" \"gtp-4-turbo\" task-text)",
"default": "(anthropic-call \"sk-your-secret-key\" \"claude\" task)"
}
}
],
"example": {
"MCQ": "(bedrock-call \"your-access-key\" \"your-secret-key\" \"Jurassic\" task-text)",
"default": "false"
}
},
"response": {
"description": "DSL code specifying how to parse LLM responses. This can be empty, in which case the response code of the model will be used. You can specify a `response` parser that will be used for all types of tasks, or per task type parsers. If you provide both a default parser and one for a specific task type, the specific one takes precedence. This can only be used if you're an admin of the selected model - otherwise an error will be returned.",
"oneOf": [
{
"type": "string",
"description": "DSL code that should be used for all responses",
"example": "(get-in response [\"json\" \"resp\"])"
},
{
"type": "object",
"description": "Per task type DSL code. Use the \"default\" key to specify the code that should be used for tasks types that aren't specified - otherwise the model's default parser will be used.",
"properties": {
"default": {
"type": "string",
"description": "The default parser to be used for task types that aren't specified.",
"example": "response"
},
"FRQ": {
"type": "string",
"description": "The DSL code to be used to parse FRQ task responses. If this is empty, the default parser will be used"
},
"bool": {
"type": "string",
"description": "The DSL code to be used to parse bool task responses. If this is empty, the default parser will be used"
},
"json": {
"type": "string",
"description": "The DSL code to be used to parse json task responses. If this is empty, the default parser will be used"
},
"MCQ": {
"type": "string",
"description": "The DSL code to be used to parse MCQ task responses. If this is empty, the default parser will be used"
}
},
"example": {
"MCQ": "(= parsedResponse correct)",
"default": "false"
}
}
],
"example": {
"MCQ": "(= parsedResponse correct)",
"default": "false"
}
},
"grader": {
"description": "DSL code specifying how to grade LLM responses. This can be empty, in which case the grader of the evaluation will be used. You can specify a grader that will be used for all types of tasks, or per task type graders. If you provide both a default grader and one for a specific task type, the specific one takes precedence. This can only be used if you're an admin of the selected evaluation - otherwise an error will be returned.",
"oneOf": [
{
"type": "string",
"description": "DSL code that should be used for all response",
"example": "(= parsedResponse \"ok\")"
},
{
"type": "object",
"description": "Per task type DSL code. Use the \"default\" key to specify the code that should be used for tasks types that aren't specified - otherwise the grader of the evaluation will be used.",
"properties": {
"default": {
"type": "string",
"description": "The default grader to be used for task types that aren't specified.",
"example": "(if (= parsedResponse correct) 1 0)"
},
"FRQ": {
"type": "string",
"description": "The DSL code to be used to grade FRQ tasks. If this is empty, the default grader will be used"
},
"bool": {
"type": "string",
"description": "The DSL code to be used to grade bool tasks. If this is empty, the default grader will be used"
},
"json": {
"type": "string",
"description": "The DSL code to be used to grade json tasks. If this is empty, the default grader will be used"
},
"MCQ": {
"type": "string",
"description": "The DSL code to be used to grade MCQ tasks. If this is empty, the default grader will be used"
}
},
"example": {
"MCQ": "(= parsedResponse correct)",
"default": "false"
}
}
],
"example": {
"MCQ": "(= parsedResponse correct)",
"default": "false"
}
}
}
}
Responses
{
"id": "4b5d04c5-46c8-4361-aec2-6943db45be82",
"datetime_started": "2022-04-13T15:42:05.901Z",
"datetime_completed": "2022-04-13T15:42:05.901Z",
"origin": "user",
"completed": true,
"failed": true,
"is_human_being_evaluated": true,
"num_questions_answered": 10.12,
"num_answered_correctly": 10.12,
"num_tasks_to_complete": 10.12,
"num_endpoint_failures": 10.12,
"num_endpoint_calls": 10.12,
"num_characters_sent_to_endpoint": 10.12,
"num_characters_received_from_endpoint": 10.12,
"median_seconds_per_task": 10.12,
"mean_seconds_per_task": 10.12,
"std_seconds_per_task": 10.12,
"distribution_of_seconds_per_task": null,
"min_seconds_per_task": 10.12,
"max_seconds_per_task": 10.12,
"median_characters_per_task": 10.12,
"mean_characters_per_task": 10.12,
"std_characters_per_task": 10.12,
"distribution_of_characters_per_task": null,
"min_characters_per_task": 10.12,
"max_characters_per_task": 10.12,
"min_verbosity": 10.12,
"max_verbosity": 10.12,
"avg_verbosity": 10.12,
"median_verbosity": 10.12,
"evaluatee_id": "cc475c38-985c-4b3d-9e3a-766b4945166e",
"evaluation_id": "761e4f79-385d-47a8-bd39-ab5b3ffb78ed"
}
Schema of the response body
{
"type": "object",
"properties": {
"id": {
"type": "string",
"format": "uuid"
},
"datetime_started": {
"type": "string",
"format": "date-time"
},
"datetime_completed": {
"type": "string",
"format": "date-time",
"nullable": true
},
"origin": {
"type": "string",
"description": "The source of this evaluation session, i.e. what triggered it",
"example": "user",
"enum": [
"alert",
"user",
"job",
"model"
]
},
"completed": {
"type": "boolean"
},
"failed": {
"type": "boolean"
},
"is_human_being_evaluated": {
"type": "boolean",
"description": "Whether this evaluation session is a human test. When false will start an automatic test for the provided model and evaluation.",
"example": true
},
"num_questions_answered": {
"type": "number",
"format": "int64"
},
"num_answered_correctly": {
"type": "number",
"format": "int64"
},
"num_tasks_to_complete": {
"type": "number",
"format": "int64"
},
"num_endpoint_failures": {
"type": "number",
"format": "int64"
},
"num_endpoint_calls": {
"type": "number",
"format": "int64"
},
"num_characters_sent_to_endpoint": {
"type": "number",
"format": "int64"
},
"num_characters_received_from_endpoint": {
"type": "number",
"format": "int64"
},
"median_seconds_per_task": {
"type": "number",
"format": "double",
"nullable": true
},
"mean_seconds_per_task": {
"type": "number",
"format": "double",
"nullable": true
},
"std_seconds_per_task": {
"type": "number",
"format": "double",
"nullable": true
},
"distribution_of_seconds_per_task": {
"nullable": true
},
"min_seconds_per_task": {
"type": "number",
"format": "double",
"nullable": true
},
"max_seconds_per_task": {
"type": "number",
"format": "double",
"nullable": true
},
"median_characters_per_task": {
"type": "number",
"format": "double",
"nullable": true
},
"mean_characters_per_task": {
"type": "number",
"format": "double",
"nullable": true
},
"std_characters_per_task": {
"type": "number",
"format": "double",
"nullable": true
},
"distribution_of_characters_per_task": {
"nullable": true
},
"min_characters_per_task": {
"type": "number",
"format": "double",
"nullable": true
},
"max_characters_per_task": {
"type": "number",
"format": "double",
"nullable": true
},
"min_verbosity": {
"type": "number",
"format": "double",
"nullable": true
},
"max_verbosity": {
"type": "number",
"format": "double",
"nullable": true
},
"avg_verbosity": {
"type": "number",
"format": "double",
"nullable": true
},
"median_verbosity": {
"type": "number",
"format": "double",
"nullable": true
},
"evaluatee_id": {
"type": "string",
"format": "uuid",
"description": "In the case of human tests, the id of the user taking the test. In the case of testing models, the id of the model to be tested"
},
"evaluation_id": {
"type": "string",
"format": "uuid",
"description": "The id of the evaluation to be run"
}
}
}
Refer to the common response description: Unauthorized.
Refer to the common response description: PaymentRequired.
Refer to the common response description: Unauthenticated.
Refer to the common response description: Error.
GET /evaluationsession
Get evaluation sessions.
Description
If the id parameter is provided, this endpoint will return the appropriate evaluation session if possible.
In the case of human tests, you can only use this endpoint to get your own results.
In the case of AI model runs, you can use this endpoint to get any evaluations of models where
either the model/evaluation is public, or you're an administrator of it.
Input parameters
| Parameter | In | Type | Default | Nullable | Description |
|---|---|---|---|---|---|
id |
query | string | Yes | Will return the item with this id, or die trying. When this parameter is provided, then only a single item will be returned |
Responses
Schema of the response body
{
"oneOf": [
{
"$ref": "#/components/schemas/EvaluationSession"
},
{
"type": "object",
"properties": {
"items": {
"description": "An array of all the items that were found, but capped at most at `per_page`",
"type": "array",
"items": {
"$ref": "#/components/schemas/EvaluationSession"
}
},
"count": {
"description": "The total number of items found",
"type": "number",
"format": "int32"
},
"per_page": {
"description": "The number of items returned per page",
"type": "number",
"format": "int32"
},
"page": {
"description": "The number of available pages",
"type": "number",
"format": "int32"
}
}
}
]
}
Refer to the common response description: NotFound.
Refer to the common response description: Error.
POST /model
Request body
{
"name": "my model",
"description": "# This is a model, see more at [this link](http://some.link)",
"publisher": "Models R Us",
"architecture": "RNN",
"picture": "http://some.example/pic",
"num_parameters": 30000000,
"modalities": "text",
"public": true,
"public_usable": false,
"check_availability": true,
"endpoint_type": "open_ai",
"setup_code": "(POST \"http://start.my.model\")",
"teardown_code": "(POST \"http://start.my.model\")",
"task_holding_queue_url": "string",
"task_execution_queue_url": "string",
"task_execution_dlq_url": "string",
"lambda_arn": "string",
"cost_per_input_character_usd": 2e-05,
"cost_per_output_character_usd": 0.0005,
"cost_per_instance_hour_usd": 4.99,
"max_characters_per_minute": 400,
"max_request_per_minute": 30,
"max_context_window_characters": 4096,
"request_code": "(openai-call \"sk-your-secret-key\" \"gtp-4-turbo\" task-text)",
"response_code": "(get-in response [\"json\" \"response\"])"
}
Schema of the request body
{
"type": "object",
"properties": {
"name": {
"type": "string",
"example": "my model"
},
"description": {
"type": "string",
"description": "The description of this model, as displayed on the site. Markdown can be used for formatting",
"nullable": true,
"example": "# This is a model, see more at [this link](http://some.link)"
},
"publisher": {
"type": "string",
"description": "The entity that created this model",
"nullable": true,
"example": "Models R Us"
},
"architecture": {
"type": "string",
"description": "The architecture of this model",
"nullable": true,
"example": "RNN"
},
"picture": {
"type": "string",
"description": "An url to an image representing this model",
"nullable": true,
"example": "http://some.example/pic"
},
"num_parameters": {
"type": "integer",
"format": "int64",
"description": "The number of parameters of the model",
"nullable": true,
"example": 30000000
},
"modalities": {
"type": "array",
"items": {
"type": "string"
},
"description": "The modalities accepted by this model",
"enum": [
"text"
],
"example": "text"
},
"public": {
"type": "boolean",
"description": "Whether this evaluation should be publicly visible. If true, anyone can view its details."
},
"public_usable": {
"type": "boolean",
"description": "Whether this model can be tested by anyone. LLMs can cost a lot to run, and these costs are on whoever added the model. This setting is here to add an extra protection against people running up large compute costs on this model. When not set, this is `false`.",
"example": false
},
"check_availability": {
"type": "boolean",
"description": "Whether the availability of this model should be checked. When true, we will ping the endpoint every ",
"nullable": true
},
"endpoint_type": {
"type": "string",
"description": "The type of endpoint being called. We have dedicated handlers for many of the most popular AI model providers",
"enum": [
"aws",
"together.ai",
"conversational",
"google_cloud",
"azure",
"text-generation",
"anthropic",
"fill-mask",
"zero-shot-classification",
"custom",
"open_ai",
"text2text-generation",
"mistral"
],
"example": "open_ai"
},
"setup_code": {
"type": "string",
"description": "An optional piece of DSL code to be called if the model isn't running. This is useful when your model needs time to spin up - you can defined a call to start it here, which will be called once the model is first used.",
"nullable": true,
"example": "(POST \"http://start.my.model\")"
},
"teardown_code": {
"type": "string",
"description": "An optional piece of DSL code to be run after the model has finished all evaluation sessions. This is useful e.g. when your model is living on an AWS server, where you pay for uptime. You can defined a call to kill the instance, which will be called after no more evaluation sessions are running.",
"nullable": true,
"example": "(POST \"http://start.my.model\")"
},
"task_holding_queue_url": {
"type": "string",
"nullable": true
},
"task_execution_queue_url": {
"type": "string",
"nullable": true
},
"task_execution_dlq_url": {
"type": "string",
"nullable": true
},
"lambda_arn": {
"type": "string",
"nullable": true
},
"cost_per_input_character_usd": {
"type": "number",
"format": "double",
"description": "The cost of a single input character in USD. We assume that a single token is 4 characters.",
"example": 2e-05
},
"cost_per_output_character_usd": {
"type": "number",
"format": "double",
"description": "The cost of a single output character in USD. We assume that a single token is 4 characters.",
"example": 0.0005
},
"cost_per_instance_hour_usd": {
"type": "number",
"format": "double",
"description": "The cost of running the model for an hour, in USD. This doesn't include input/output tokens - it's purely the server uptime. This is useful e.g. with HuggingFace inference endpoints, where they charge for server time, not for tokens throughput.",
"example": 4.99
},
"max_characters_per_minute": {
"type": "integer",
"format": "int64",
"description": "The maximum allowed number of characters per minute. We assume that one token is 4 characters. This must be at least 1.",
"example": 400
},
"max_request_per_minute": {
"type": "integer",
"format": "int64",
"description": "The maximum allowed number of requess per minute. This must be at least 1.",
"example": 30
},
"max_context_window_characters": {
"type": "integer",
"format": "int64",
"description": "The maximum number of characters allowed in the context window of this model. We assume that 1 token is 4 characters",
"nullable": true,
"example": 4096
},
"request_code": {
"description": "DSL code defining how to send requests to the model. See the [DSL page](/docs/dsl/) for more info.",
"example": "(openai-call \"sk-your-secret-key\" \"gtp-4-turbo\" task-text)"
},
"response_code": {
"description": "DSL code defining how to parse responses from the model. See the [DSL page](/docs/dsl/) for more info.",
"example": "(get-in response [\"json\" \"response\"])"
}
}
}
Responses
{
"id": "c56d2964-8364-4b8a-b6a6-517ec796fa31",
"name": "my model",
"description": "# This is a model, see more at [this link](http://some.link)",
"owner_id": "4b716715-3cca-411f-b3b7-1dd767965a83",
"publisher": "Models R Us",
"architecture": "RNN",
"picture": "http://some.example/pic",
"num_parameters": 30000000,
"modalities": "text",
"public": true,
"public_usable": false,
"check_availability": true,
"quality": 0.89,
"endpoint_type": "open_ai",
"cost_per_input_character_usd": 2e-05,
"cost_per_output_character_usd": 0.0005,
"cost_per_instance_hour_usd": 4.99,
"max_characters_per_minute": 400,
"max_request_per_minute": 30,
"max_context_window_characters": 4096,
"elo_score": 10.12,
"score": 10.12,
"availability": 10.12,
"top_example_id": "f6b9676f-3954-4a35-aa54-b8695e4189ee",
"worst_example_id": "cdb87dda-fc9b-4bf8-9419-fc5614d88356",
"owner": {
"id": "7fd75322-29d1-4e27-9e3b-499cee8cdadc",
"email_address": "mr.blobby@some.domain",
"user_name": "mr_blobby",
"full_name": "Mr Blobby, esq.",
"user_image": "https://equistamp.com/avatars/123123123123.png",
"bio": "Hello, my name is Inigo Montoya. You Killed my Father. Prepare to die",
"display_options": {
"bio": true,
"email_address": true,
"user_image": false
},
"join_date": "2022-04-13",
"subscription_level": "pro",
"alerts": [
"a685d9e4-d46b-4f61-9358-9c5c53d2efb5"
]
},
"top_example": {
"id": "627596cc-ea70-40cd-b187-56dc733caf0a",
"task_type": "string",
"is_task_live": true,
"modalities": [
"string"
],
"redacted": true,
"num_possible_answers": 10.12,
"evaluation_task_number": 10.12,
"median_human_completion_seconds": 10.12,
"median_ai_completion_seconds": 10.12,
"num_times_human_evaluated": 10.12,
"num_times_ai_evaluated": 10.12,
"num_times_humans_answered_correctly": 10.12,
"num_times_ai_answered_correctly": 10.12,
"evaluation_id": "df4341dc-0d57-4043-8d26-6277b0fd47de",
"owner_id": "301c3b95-f8fb-4022-86a0-6ba2273cedee",
"tags": [
"ccdc9aec-dfd2-4d8d-a953-fee364f64b4d"
]
},
"worst_example": null,
"best_evaluation_session": {
"id": "e538629a-7b80-48af-9564-c8d07477ab55",
"datetime_started": "2022-04-13T15:42:05.901Z",
"datetime_completed": "2022-04-13T15:42:05.901Z",
"origin": "user",
"completed": true,
"failed": true,
"is_human_being_evaluated": true,
"num_questions_answered": 10.12,
"num_answered_correctly": 10.12,
"num_tasks_to_complete": 10.12,
"num_endpoint_failures": 10.12,
"num_endpoint_calls": 10.12,
"num_characters_sent_to_endpoint": 10.12,
"num_characters_received_from_endpoint": 10.12,
"median_seconds_per_task": 10.12,
"mean_seconds_per_task": 10.12,
"std_seconds_per_task": 10.12,
"distribution_of_seconds_per_task": null,
"min_seconds_per_task": 10.12,
"max_seconds_per_task": 10.12,
"median_characters_per_task": 10.12,
"mean_characters_per_task": 10.12,
"std_characters_per_task": 10.12,
"distribution_of_characters_per_task": null,
"min_characters_per_task": 10.12,
"max_characters_per_task": 10.12,
"min_verbosity": 10.12,
"max_verbosity": 10.12,
"avg_verbosity": 10.12,
"median_verbosity": 10.12,
"evaluatee_id": "b9fc398f-5caf-4d85-bca3-9ee6b5b965d3",
"evaluation_id": "29b9e5c3-e9e3-4c97-96f0-fd74d191a1d8"
},
"worst_evaluation_session": null
}
Schema of the response body
{
"type": "object",
"properties": {
"id": {
"type": "string",
"format": "uuid"
},
"name": {
"type": "string",
"example": "my model"
},
"description": {
"type": "string",
"description": "The description of this model, as displayed on the site. Markdown can be used for formatting",
"nullable": true,
"example": "# This is a model, see more at [this link](http://some.link)"
},
"owner_id": {
"type": "string",
"format": "uuid"
},
"publisher": {
"type": "string",
"description": "The entity that created this model",
"nullable": true,
"example": "Models R Us"
},
"architecture": {
"type": "string",
"description": "The architecture of this model",
"nullable": true,
"example": "RNN"
},
"picture": {
"type": "string",
"description": "An url to an image representing this model",
"nullable": true,
"example": "http://some.example/pic"
},
"num_parameters": {
"type": "integer",
"format": "int64",
"description": "The number of parameters of the model",
"nullable": true,
"example": 30000000
},
"modalities": {
"type": "array",
"items": {
"type": "string"
},
"description": "The modalities accepted by this model",
"enum": [
"text"
],
"example": "text"
},
"public": {
"type": "boolean",
"description": "Whether this evaluation should be publicly visible. If true, anyone can view its details."
},
"public_usable": {
"type": "boolean",
"description": "Whether this model can be tested by anyone. LLMs can cost a lot to run, and these costs are on whoever added the model. This setting is here to add an extra protection against people running up large compute costs on this model. When not set, this is `false`.",
"example": false
},
"check_availability": {
"type": "boolean",
"description": "Whether the availability of this model should be checked. When true, we will ping the endpoint every ",
"nullable": true
},
"quality": {
"type": "number",
"format": "double",
"description": "The quality of this model, i.e. how much it's worth using, from 0 to 1. This is very subjective, and mainly used to decide whether it should be used by default e.g. on the frontpage.",
"example": 0.89
},
"endpoint_type": {
"type": "string",
"description": "The type of endpoint being called. We have dedicated handlers for many of the most popular AI model providers",
"enum": [
"aws",
"together.ai",
"conversational",
"google_cloud",
"azure",
"text-generation",
"anthropic",
"fill-mask",
"zero-shot-classification",
"custom",
"open_ai",
"text2text-generation",
"mistral"
],
"example": "open_ai"
},
"cost_per_input_character_usd": {
"type": "number",
"format": "double",
"description": "The cost of a single input character in USD. We assume that a single token is 4 characters.",
"example": 2e-05
},
"cost_per_output_character_usd": {
"type": "number",
"format": "double",
"description": "The cost of a single output character in USD. We assume that a single token is 4 characters.",
"example": 0.0005
},
"cost_per_instance_hour_usd": {
"type": "number",
"format": "double",
"description": "The cost of running the model for an hour, in USD. This doesn't include input/output tokens - it's purely the server uptime. This is useful e.g. with HuggingFace inference endpoints, where they charge for server time, not for tokens throughput.",
"example": 4.99
},
"max_characters_per_minute": {
"type": "integer",
"format": "int64",
"description": "The maximum allowed number of characters per minute. We assume that one token is 4 characters. This must be at least 1.",
"example": 400
},
"max_request_per_minute": {
"type": "integer",
"format": "int64",
"description": "The maximum allowed number of requess per minute. This must be at least 1.",
"example": 30
},
"max_context_window_characters": {
"type": "integer",
"format": "int64",
"description": "The maximum number of characters allowed in the context window of this model. We assume that 1 token is 4 characters",
"nullable": true,
"example": 4096
},
"elo_score": {
"type": "number",
"format": "double",
"description": "The ELO score, according to LLMSys",
"nullable": true
},
"score": {
"type": "number",
"format": "double",
"nullable": true
},
"availability": {
"type": "number",
"format": "double",
"nullable": true
},
"top_example_id": {
"type": "string",
"format": "uuid",
"nullable": true
},
"worst_example_id": {
"type": "string",
"format": "uuid",
"nullable": true
},
"owner": {
"$ref": "#/components/schemas/ShallowUser"
},
"top_example": {
"$ref": "#/components/schemas/ShallowTask"
},
"worst_example": {
"$ref": "#/components/schemas/ShallowTask"
},
"best_evaluation_session": {
"$ref": "#/components/schemas/ShallowEvaluationSession"
},
"worst_evaluation_session": {
"$ref": "#/components/schemas/ShallowEvaluationSession"
}
}
}
Refer to the common response description: Unauthorized.
Refer to the common response description: Unauthenticated.
Refer to the common response description: Error.
GET /model
Input parameters
| Parameter | In | Type | Default | Nullable | Description |
|---|---|---|---|---|---|
id |
query | string | Yes | Will return the item with this id, or die trying. When this parameter is provided, then only a single item will be returned |
Responses
Schema of the response body
{
"oneOf": [
{
"$ref": "#/components/schemas/Model"
},
{
"type": "object",
"properties": {
"items": {
"description": "An array of all the items that were found, but capped at most at `per_page`",
"type": "array",
"items": {
"$ref": "#/components/schemas/Model"
}
},
"count": {
"description": "The total number of items found",
"type": "number",
"format": "int32"
},
"per_page": {
"description": "The number of items returned per page",
"type": "number",
"format": "int32"
},
"page": {
"description": "The number of available pages",
"type": "number",
"format": "int32"
}
}
}
]
}
Refer to the common response description: NotFound.
Refer to the common response description: Error.
PUT /model
Request body
{
"name": "my model",
"description": "# This is a model, see more at [this link](http://some.link)",
"publisher": "Models R Us",
"architecture": "RNN",
"picture": "http://some.example/pic",
"num_parameters": 30000000,
"modalities": "text",
"public": true,
"public_usable": false,
"check_availability": true,
"endpoint_type": "open_ai",
"setup_code": "(POST \"http://start.my.model\")",
"teardown_code": "(POST \"http://start.my.model\")",
"task_holding_queue_url": "string",
"task_execution_queue_url": "string",
"task_execution_dlq_url": "string",
"lambda_arn": "string",
"cost_per_input_character_usd": 2e-05,
"cost_per_output_character_usd": 0.0005,
"cost_per_instance_hour_usd": 4.99,
"max_characters_per_minute": 400,
"max_request_per_minute": 30,
"max_context_window_characters": 4096
}
Schema of the request body
{
"type": "object",
"properties": {
"name": {
"type": "string",
"example": "my model"
},
"description": {
"type": "string",
"description": "The description of this model, as displayed on the site. Markdown can be used for formatting",
"nullable": true,
"example": "# This is a model, see more at [this link](http://some.link)"
},
"publisher": {
"type": "string",
"description": "The entity that created this model",
"nullable": true,
"example": "Models R Us"
},
"architecture": {
"type": "string",
"description": "The architecture of this model",
"nullable": true,
"example": "RNN"
},
"picture": {
"type": "string",
"description": "An url to an image representing this model",
"nullable": true,
"example": "http://some.example/pic"
},
"num_parameters": {
"type": "integer",
"format": "int64",
"description": "The number of parameters of the model",
"nullable": true,
"example": 30000000
},
"modalities": {
"type": "array",
"items": {
"type": "string"
},
"description": "The modalities accepted by this model",
"enum": [
"text"
],
"example": "text"
},
"public": {
"type": "boolean",
"description": "Whether this evaluation should be publicly visible. If true, anyone can view its details."
},
"public_usable": {
"type": "boolean",
"description": "Whether this model can be tested by anyone. LLMs can cost a lot to run, and these costs are on whoever added the model. This setting is here to add an extra protection against people running up large compute costs on this model. When not set, this is `false`.",
"example": false
},
"check_availability": {
"type": "boolean",
"description": "Whether the availability of this model should be checked. When true, we will ping the endpoint every ",
"nullable": true
},
"endpoint_type": {
"type": "string",
"description": "The type of endpoint being called. We have dedicated handlers for many of the most popular AI model providers",
"enum": [
"aws",
"together.ai",
"conversational",
"google_cloud",
"azure",
"text-generation",
"anthropic",
"fill-mask",
"zero-shot-classification",
"custom",
"open_ai",
"text2text-generation",
"mistral"
],
"example": "open_ai"
},
"setup_code": {
"type": "string",
"description": "An optional piece of DSL code to be called if the model isn't running. This is useful when your model needs time to spin up - you can defined a call to start it here, which will be called once the model is first used.",
"nullable": true,
"example": "(POST \"http://start.my.model\")"
},
"teardown_code": {
"type": "string",
"description": "An optional piece of DSL code to be run after the model has finished all evaluation sessions. This is useful e.g. when your model is living on an AWS server, where you pay for uptime. You can defined a call to kill the instance, which will be called after no more evaluation sessions are running.",
"nullable": true,
"example": "(POST \"http://start.my.model\")"
},
"task_holding_queue_url": {
"type": "string",
"nullable": true
},
"task_execution_queue_url": {
"type": "string",
"nullable": true
},
"task_execution_dlq_url": {
"type": "string",
"nullable": true
},
"lambda_arn": {
"type": "string",
"nullable": true
},
"cost_per_input_character_usd": {
"type": "number",
"format": "double",
"description": "The cost of a single input character in USD. We assume that a single token is 4 characters.",
"example": 2e-05
},
"cost_per_output_character_usd": {
"type": "number",
"format": "double",
"description": "The cost of a single output character in USD. We assume that a single token is 4 characters.",
"example": 0.0005
},
"cost_per_instance_hour_usd": {
"type": "number",
"format": "double",
"description": "The cost of running the model for an hour, in USD. This doesn't include input/output tokens - it's purely the server uptime. This is useful e.g. with HuggingFace inference endpoints, where they charge for server time, not for tokens throughput.",
"example": 4.99
},
"max_characters_per_minute": {
"type": "integer",
"format": "int64",
"description": "The maximum allowed number of characters per minute. We assume that one token is 4 characters. This must be at least 1.",
"example": 400
},
"max_request_per_minute": {
"type": "integer",
"format": "int64",
"description": "The maximum allowed number of requess per minute. This must be at least 1.",
"example": 30
},
"max_context_window_characters": {
"type": "integer",
"format": "int64",
"description": "The maximum number of characters allowed in the context window of this model. We assume that 1 token is 4 characters",
"nullable": true,
"example": 4096
}
}
}
Responses
"Model updated"
Schema of the response body
{
"type": "string",
"enum": [
"Model updated"
]
}
Refer to the common response description: Unauthorized.
Refer to the common response description: Unauthenticated.
Refer to the common response description: NotFound.
Refer to the common response description: Error.
POST /modelsconnecter
Request body
{
"evaluation_id": "b66b4389-4843-436c-919a-cc2bbde4c8ae",
"evaluatee_id": "ca7047ce-a47e-4784-875b-ffb281131aea",
"cadence": "string",
"price": 10.12,
"connections": [
{
"evaluation_id": "e29c81ce-92cb-4191-a98c-51d55b0527df",
"evaluatee_id": "ec79c154-4a57-4e5b-b56d-30f72ab01efb",
"cadence": "once",
"price": 123,
"name": "my wonderful model evaluation"
}
]
}
Schema of the request body
{
"type": "object",
"properties": {
"evaluation_id": {
"type": "string",
"format": "uuid"
},
"evaluatee_id": {
"type": "string",
"format": "uuid"
},
"cadence": {
"type": "string",
"nullable": true
},
"price": {
"type": "number",
"format": "int64"
},
"connections": {
"type": "array",
"items": {
"type": "object",
"properties": {
"evaluation_id": {
"type": "string",
"format": "uuid",
"description": "The id of the evaluation to be run"
},
"evaluatee_id": {
"type": "string",
"format": "uuid",
"description": "The id of the model to be evaluated"
},
"cadence": {
"type": "string",
"enum": [
"daily",
"quarterly",
"once",
"every 2 weeks",
"weekly",
"monthly"
],
"example": "once",
"description": "How often this evaluation should be run on this model"
},
"price": {
"type": "number",
"format": "int64",
"min": 100,
"example": 123,
"description": "The price to run a single evaluation on this model. This is the price you expect to pay in cents - if the actual cost will be larger - e.g. if the evaluation has more tasks added, or the model has its pricing updated - then an error will be raised, so you don't get hit with hidden costs"
},
"name": {
"type": "string",
"description": "A string identifier for this connection - used for displaying line items in Stripe",
"example": "my wonderful model evaluation"
}
}
}
}
}
}
Responses
{
"id": "9578841c-8225-4226-9825-191e2388178c",
"evaluation_id": "3b299532-6d74-47c8-bb0c-f048040d364a",
"evaluatee_id": "9ea81133-9c6f-4c65-b4bc-1ace62a3e561",
"cadence": "string",
"price": 10.12,
"model": {
"id": "276d9d54-f514-4509-8690-57ae98627c69",
"name": "my model",
"description": "# This is a model, see more at [this link](http://some.link)",
"owner_id": "636ac7d7-7dd4-4ee2-9a69-b8e2bf5f3332",
"publisher": "Models R Us",
"architecture": "RNN",
"picture": "http://some.example/pic",
"num_parameters": 30000000,
"modalities": "text",
"public": true,
"public_usable": false,
"check_availability": true,
"quality": 0.89,
"endpoint_type": "open_ai",
"cost_per_input_character_usd": 2e-05,
"cost_per_output_character_usd": 0.0005,
"cost_per_instance_hour_usd": 4.99,
"max_characters_per_minute": 400,
"max_request_per_minute": 30,
"max_context_window_characters": 4096,
"elo_score": 10.12,
"score": 10.12,
"availability": 10.12,
"top_example_id": "3432730e-756a-4d5e-81dd-c9166b97330c",
"worst_example_id": "000690ea-8a19-4627-a4c0-c1734778e452",
"owner": "23a1154b-0f04-412f-afb0-04a1b7768f8b",
"top_example": "c4867b79-f39b-4276-805e-1dbd4c406e82",
"worst_example": "a3ea7580-07a4-4e66-9d08-250cb4636d14",
"best_evaluation_session": "671cbe01-5308-4a75-8c60-c23c115551f5",
"worst_evaluation_session": "8f6dc477-af40-465e-9e9d-6fdb620d810a"
},
"evaluation": {
"id": "667911fc-7bd8-4c6d-94eb-29e8e179fd6b",
"name": "My lovely evaluation",
"public": true,
"public_usable": false,
"reports_visible": false,
"quality": 0.89,
"num_tasks": 2000,
"description": "# This is an evaluation, see more at [this link](http://some.link)",
"last_updated": "2022-04-13T15:42:05.901Z",
"task_types": "MCQ",
"modalities": "text",
"min_questions_to_complete": 321,
"owner": "ce16ae57-2a1a-488b-9b57-15ee5e68abd5",
"tags": [
"a893caa0-f12d-44ee-a3b9-c6bfe2bef0e5"
]
}
}
Schema of the response body
{
"type": "object",
"properties": {
"id": {
"type": "string",
"format": "uuid"
},
"evaluation_id": {
"type": "string",
"format": "uuid"
},
"evaluatee_id": {
"type": "string",
"format": "uuid"
},
"cadence": {
"type": "string",
"nullable": true
},
"price": {
"type": "number",
"format": "int64"
},
"model": {
"$ref": "#/components/schemas/ShallowModel"
},
"evaluation": {
"$ref": "#/components/schemas/ShallowEvaluation"
}
}
}
Refer to the common response description: Unauthorized.
Refer to the common response description: Unauthenticated.
Refer to the common response description: Error.
GET /modelsconnecter
Input parameters
| Parameter | In | Type | Default | Nullable | Description |
|---|---|---|---|---|---|
id |
query | string | Yes | Will return the item with this id, or die trying. When this parameter is provided, then only a single item will be returned |
Responses
Schema of the response body
{
"oneOf": [
{
"$ref": "#/components/schemas/EvaluationEvaluatee"
},
{
"type": "object",
"properties": {
"items": {
"description": "An array of all the items that were found, but capped at most at `per_page`",
"type": "array",
"items": {
"$ref": "#/components/schemas/EvaluationEvaluatee"
}
},
"count": {
"description": "The total number of items found",
"type": "number",
"format": "int32"
},
"per_page": {
"description": "The number of items returned per page",
"type": "number",
"format": "int32"
},
"page": {
"description": "The number of available pages",
"type": "number",
"format": "int32"
}
}
}
]
}
Refer to the common response description: Unauthorized.
Refer to the common response description: Unauthenticated.
Refer to the common response description: NotFound.
Refer to the common response description: Error.
POST /queryexternalmodelhandler
Run a task on a model.
Description
This endpoint can be called either as part of an evaluation session, or on its own.
If evaluation_session_id is provided, it will run the task as part of that evaluation session. Each
evaluation session has a set number of tasks to evaluate, so if you call this endpoint for a finished
evaluation session, you will get an error.
If no evaluation_session_id is provided, the model will be called with the provided task. This is a
paid operation, and will subtract the appropriate amount of credits from your account, or raise a 402
if you don't have enough.
You can override the default request and response code for models of which you are an admin, and the prompt and grader code of evaluations of which you are an admin.
Request body
{
"response_time_in_seconds": 10.12,
"task_id": "5a783b42-8b3b-46dc-bc16-697df56bbe2f",
"evaluation_session_id": "b95bdcd4-cf93-4f6f-8cdf-a1fd7db57e30",
"model_id": "c98b0ca2-ff29-4dbe-bdeb-9fd4d0b2fdf0",
"system_prompt": "(str \"Please answer this: \" task)",
"prompt": {
"MCQ": "(str \"I have a multiple choice question for you to answer: \" task)",
"default": "(str \"Answer this, please: \" task)"
},
"request": {
"MCQ": "(bedrock-call \"your-access-key\" \"your-secret-key\" \"Jurassic\" task-text)",
"default": "false"
},
"response": {
"MCQ": "(= parsedResponse correct)",
"default": "false"
},
"grader": {
"MCQ": "(= parsedResponse correct)",
"default": "false"
}
}
Schema of the request body
{
"type": "object",
"properties": {
"response_time_in_seconds": {
"type": "number",
"format": "double",
"nullable": true
},
"task_id": {
"description": "The id of the task to be run on the model",
"type": "string",
"format": "uuid"
},
"evaluation_session_id": {
"description": "The id of the evaluation session that is being checked",
"type": "string",
"format": "uuid"
},
"model_id": {
"description": "The id of the model that is being evaluation",
"type": "string",
"format": "uuid"
},
"system_prompt": {
"description": "DSL code specifying how to construct model system prompts. This can be empty.",
"type": "string",
"example": "(str \"Please answer this: \" task)"
},
"prompt": {
"description": "DSL code specifying how to construct model prompts. This can be empty, in which case the prompt code of the evaluation will be used. You can specify a `prompt` that will be used for all types of tasks, or per task type `prompt`s. If you provide both a default `prompt` and one for a specific task type, the specific one takes precedence. This can only be used if you're an admin of the selected evaluation - otherwise an error will be returned.",
"oneOf": [
{
"type": "string",
"description": "DSL code that should be used for all prompts",
"example": "(str \"Please answer this: \" task)"
},
{
"type": "object",
"description": "Per task type DSL code. Use the \"default\" key to specify the code that should be used for tasks types that aren't specified - otherwise the system default prompt will be used.",
"properties": {
"default": {
"type": "string",
"description": "The default `prompt` to be used for task types that aren't specified.",
"example": "(str \"Answer this, please: \" task)"
},
"FRQ": {
"type": "string",
"description": "The DSL code to be used to create prompts for FRQ tasks. If this is empty, the default `prompt` will be used"
},
"bool": {
"type": "string",
"description": "The DSL code to be used to create prompts for bool tasks. If this is empty, the default `prompt` will be used"
},
"json": {
"type": "string",
"description": "The DSL code to be used to create prompts for json tasks. If this is empty, the default `prompt` will be used"
},
"MCQ": {
"type": "string",
"description": "The DSL code to be used to create prompts for MCQ tasks. If this is empty, the default `prompt` will be used"
}
},
"example": {
"MCQ": "(str \"I have a multiple choice question for you to answer: \" task)",
"default": "(str \"Answer this, please: \" task)"
}
}
],
"example": {
"MCQ": "(str \"I have a multiple choice question for you to answer: \" task)",
"default": "(str \"Answer this, please: \" task)"
}
},
"request": {
"description": "DSL code specifying how to send tasks to the model. This can be empty, in which case the request code of the model will be used. You can specify a `request` that will be used for all types of tasks, or per task type `request`s. If you provide both a default `request` and one for a specific task type, the specific one takes precedence. This can only be used if you're an admin of the selected model - otherwise an error will be returned.",
"oneOf": [
{
"type": "string",
"description": "DSL code that should be used for all requests",
"example": "(POST \"http://my.model.endpoint\" {:json {\"task\" task}})"
},
{
"type": "object",
"description": "Per task type DSL code. Use the \"default\" key to specify the code that should be used for tasks types that aren't specified - otherwise the system default request code will be used.",
"properties": {
"default": {
"type": "string",
"description": "The default `request` to be used for task types that aren't specified.",
"example": "(openai-call \"your_key\" \"gtp-4\" task)"
},
"FRQ": {
"type": "string",
"description": "The DSL code to be used to send requests for FRQ tasks. If this is empty, the default `request` will be used"
},
"bool": {
"type": "string",
"description": "The DSL code to be used to send requests for bool tasks. If this is empty, the default `request` will be used"
},
"json": {
"type": "string",
"description": "The DSL code to be used to send requests for json tasks. If this is empty, the default `request` will be used"
},
"MCQ": {
"type": "string",
"description": "The DSL code to be used to send requests for MCQ tasks. If this is empty, the default `request` will be used"
}
},
"example": {
"MCQ": "(openai-call \"sk-your-secret-key\" \"gtp-4-turbo\" task-text)",
"default": "(anthropic-call \"sk-your-secret-key\" \"claude\" task)"
}
}
],
"example": {
"MCQ": "(bedrock-call \"your-access-key\" \"your-secret-key\" \"Jurassic\" task-text)",
"default": "false"
}
},
"response": {
"description": "DSL code specifying how to parse LLM responses. This can be empty, in which case the response code of the model will be used. You can specify a `response` parser that will be used for all types of tasks, or per task type parsers. If you provide both a default parser and one for a specific task type, the specific one takes precedence. This can only be used if you're an admin of the selected model - otherwise an error will be returned.",
"oneOf": [
{
"type": "string",
"description": "DSL code that should be used for all responses",
"example": "(get-in response [\"json\" \"resp\"])"
},
{
"type": "object",
"description": "Per task type DSL code. Use the \"default\" key to specify the code that should be used for tasks types that aren't specified - otherwise the model's default parser will be used.",
"properties": {
"default": {
"type": "string",
"description": "The default parser to be used for task types that aren't specified.",
"example": "response"
},
"FRQ": {
"type": "string",
"description": "The DSL code to be used to parse FRQ task responses. If this is empty, the default parser will be used"
},
"bool": {
"type": "string",
"description": "The DSL code to be used to parse bool task responses. If this is empty, the default parser will be used"
},
"json": {
"type": "string",
"description": "The DSL code to be used to parse json task responses. If this is empty, the default parser will be used"
},
"MCQ": {
"type": "string",
"description": "The DSL code to be used to parse MCQ task responses. If this is empty, the default parser will be used"
}
},
"example": {
"MCQ": "(= parsedResponse correct)",
"default": "false"
}
}
],
"example": {
"MCQ": "(= parsedResponse correct)",
"default": "false"
}
},
"grader": {
"description": "DSL code specifying how to grade LLM responses. This can be empty, in which case the grader of the evaluation will be used. You can specify a grader that will be used for all types of tasks, or per task type graders. If you provide both a default grader and one for a specific task type, the specific one takes precedence. This can only be used if you're an admin of the selected evaluation - otherwise an error will be returned.",
"oneOf": [
{
"type": "string",
"description": "DSL code that should be used for all response",
"example": "(= parsedResponse \"ok\")"
},
{
"type": "object",
"description": "Per task type DSL code. Use the \"default\" key to specify the code that should be used for tasks types that aren't specified - otherwise the grader of the evaluation will be used.",
"properties": {
"default": {
"type": "string",
"description": "The default grader to be used for task types that aren't specified.",
"example": "(if (= parsedResponse correct) 1 0)"
},
"FRQ": {
"type": "string",
"description": "The DSL code to be used to grade FRQ tasks. If this is empty, the default grader will be used"
},
"bool": {
"type": "string",
"description": "The DSL code to be used to grade bool tasks. If this is empty, the default grader will be used"
},
"json": {
"type": "string",
"description": "The DSL code to be used to grade json tasks. If this is empty, the default grader will be used"
},
"MCQ": {
"type": "string",
"description": "The DSL code to be used to grade MCQ tasks. If this is empty, the default grader will be used"
}
},
"example": {
"MCQ": "(= parsedResponse correct)",
"default": "false"
}
}
],
"example": {
"MCQ": "(= parsedResponse correct)",
"default": "false"
}
}
}
}
Responses
{
"id": "dac5244b-b2c4-4384-b090-4c177921c2d3",
"raw_task_text": "string",
"raw_response_text": "string",
"parsed_response_text": "string",
"response_time_in_seconds": 10.12,
"correctness": 10.12,
"task_id": "0cc718da-9881-4a6a-9ce0-15e71163c608",
"evaluatee_id": "3deebcde-982e-4364-8d8d-1fec093c2b48",
"chosen_answer_id": "8a9ef395-5ff6-469f-9202-b19040c3e63c",
"evaluation_session_id": "6cf3e2d3-1f84-4c93-a424-a8376b6cafc9",
"creation_date": "2022-04-13T15:42:05.901Z"
}
Schema of the response body
{
"type": "object",
"properties": {
"id": {
"type": "string",
"format": "uuid"
},
"raw_task_text": {
"type": "string",
"nullable": true
},
"raw_response_text": {
"type": "string",
"nullable": true
},
"parsed_response_text": {
"type": "string",
"nullable": true
},
"response_time_in_seconds": {
"type": "number",
"format": "double",
"nullable": true
},
"correctness": {
"type": "number",
"format": "double",
"nullable": true
},
"task_id": {
"type": "string",
"format": "uuid"
},
"evaluatee_id": {
"type": "string",
"format": "uuid"
},
"chosen_answer_id": {
"type": "string",
"format": "uuid",
"nullable": true
},
"evaluation_session_id": {
"type": "string",
"format": "uuid"
},
"creation_date": {
"type": "string",
"format": "date-time"
}
}
}
Refer to the common response description: Unauthorized.
Refer to the common response description: Unauthenticated.
Refer to the common response description: Error.
POST /response
Request body
{
"response_time_in_seconds": 10.12,
"task_id": "68e12e78-da68-450a-a84a-828c4455128a",
"evaluation_session_id": "a7ba235b-1f57-4f7c-9e5e-83a684ab1904",
"task_type": "MCQ",
"question": "What time is it?",
"answer_text": "Half past nine",
"answer_id": "23aa466b-55cd-466f-b2f4-3ea1611c9a2b"
}
Schema of the request body
{
"type": "object",
"properties": {
"response_time_in_seconds": {
"type": "number",
"format": "double",
"nullable": true
},
"task_id": {
"type": "string",
"format": "uuid"
},
"evaluation_session_id": {
"type": "string",
"format": "uuid"
},
"task_type": {
"description": "The type of tasks for which this is a response",
"example": "MCQ",
"type": "string",
"enum": [
"FRQ",
"bool",
"json",
"MCQ"
]
},
"question": {
"type": "string",
"description": "The text of the question for which this is a response",
"example": "What time is it?"
},
"answer_text": {
"type": "string",
"description": "The text returned from the model",
"example": "Half past nine"
},
"answer_id": {
"type": "string",
"format": "uuid",
"nullable": true,
"description": "The id of the selected answer, in the case of multiple choice questions"
}
}
}
Responses
{
"id": "43663d9c-bf7d-4469-897d-891d415ab2b9",
"raw_task_text": "string",
"raw_response_text": "string",
"parsed_response_text": "string",
"response_time_in_seconds": 10.12,
"correctness": 10.12,
"task_id": "7e43ce8d-44f3-4a0c-a589-f222a6d4032b",
"evaluatee_id": "b0e2fc7d-7312-451e-8382-b7cec800f013",
"chosen_answer_id": "a01104af-ba21-409d-93e9-d9fcda9c6454",
"evaluation_session_id": "b22c2e05-27e2-4df1-a80b-c385009ff118",
"creation_date": "2022-04-13T15:42:05.901Z"
}
Schema of the response body
{
"type": "object",
"properties": {
"id": {
"type": "string",
"format": "uuid"
},
"raw_task_text": {
"type": "string",
"nullable": true
},
"raw_response_text": {
"type": "string",
"nullable": true
},
"parsed_response_text": {
"type": "string",
"nullable": true
},
"response_time_in_seconds": {
"type": "number",
"format": "double",
"nullable": true
},
"correctness": {
"type": "number",
"format": "double",
"nullable": true
},
"task_id": {
"type": "string",
"format": "uuid"
},
"evaluatee_id": {
"type": "string",
"format": "uuid"
},
"chosen_answer_id": {
"type": "string",
"format": "uuid",
"nullable": true
},
"evaluation_session_id": {
"type": "string",
"format": "uuid"
},
"creation_date": {
"type": "string",
"format": "date-time"
}
}
}
Refer to the common response description: Unauthorized.
Refer to the common response description: Unauthenticated.
Refer to the common response description: Error.
GET /response
Input parameters
| Parameter | In | Type | Default | Nullable | Description |
|---|---|---|---|---|---|
id |
query | string | Yes | Will return the item with this id, or die trying. When this parameter is provided, then only a single item will be returned |
Responses
Schema of the response body
{
"oneOf": [
{
"$ref": "#/components/schemas/Response"
},
{
"type": "object",
"properties": {
"items": {
"description": "An array of all the items that were found, but capped at most at `per_page`",
"type": "array",
"items": {
"$ref": "#/components/schemas/Response"
}
},
"count": {
"description": "The total number of items found",
"type": "number",
"format": "int32"
},
"per_page": {
"description": "The number of items returned per page",
"type": "number",
"format": "int32"
},
"page": {
"description": "The number of available pages",
"type": "number",
"format": "int32"
}
}
}
]
}
Refer to the common response description: NotFound.
Refer to the common response description: Error.
PUT /response
Request body
{
"response_time_in_seconds": 10.12,
"task_id": "62a47814-8a55-4f97-833e-eae15d380fad",
"evaluation_session_id": "f95abe3c-bb53-4a74-80df-7dd8d7c4215b"
}
Schema of the request body
{
"type": "object",
"properties": {
"response_time_in_seconds": {
"type": "number",
"format": "double",
"nullable": true
},
"task_id": {
"type": "string",
"format": "uuid"
},
"evaluation_session_id": {
"type": "string",
"format": "uuid"
}
}
}
Responses
"Response updated"
Schema of the response body
{
"type": "string",
"enum": [
"Response updated"
]
}
Refer to the common response description: Unauthorized.
Refer to the common response description: Unauthenticated.
Refer to the common response description: NotFound.
Refer to the common response description: Error.
GET /scores
Input parameters
| Parameter | In | Type | Default | Nullable | Description |
|---|---|---|---|---|---|
id |
query | string | Yes | Will return the item with this id, or die trying. When this parameter is provided, then only a single item will be returned |
Responses
Schema of the response body
{
"oneOf": [
{
"$ref": "#/components/schemas/CurrentScores"
},
{
"type": "object",
"properties": {
"items": {
"description": "An array of all the items that were found, but capped at most at `per_page`",
"type": "array",
"items": {
"$ref": "#/components/schemas/CurrentScores"
}
},
"count": {
"description": "The total number of items found",
"type": "number",
"format": "int32"
},
"per_page": {
"description": "The number of items returned per page",
"type": "number",
"format": "int32"
},
"page": {
"description": "The number of available pages",
"type": "number",
"format": "int32"
}
}
}
]
}
Refer to the common response description: NotFound.
Refer to the common response description: Error.
POST /schema
Request body
{
"key": "my-schema",
"name": "My schema",
"description": "This is a description. Nice, innit?",
"type": "json",
"schema": "{\"$schema\": \"http://json-schema.org/draft-07/schema#\", \"title\": \"JSON parser\", \"type\": \"object\", \"properties\": {\"name\": {\"type\": \"string\"}}}",
"evaluation_id": "6c4eb8b6-501e-4bc4-b7a4-75451e359d55"
}
Schema of the request body
{
"type": "object",
"properties": {
"key": {
"description": "The key of this schema, as used in csv file upload references. Reference keys can contain English letters (upper and lowercase), digits and \"-\", \"_\", and \".\"",
"type": "string",
"example": "my-schema"
},
"name": {
"description": "The name of this schema, used only for display purposes.",
"type": "string",
"example": "My schema"
},
"description": {
"description": "The name of this schema, used only for display purposes.",
"type": "string",
"example": "This is a description. Nice, innit?"
},
"type": {
"description": "The type of the new schema",
"example": "json",
"type": "string",
"enum": [
"json"
]
},
"schema": {
"description": "A schema to validate answers against.",
"example": "{\"$schema\": \"http://json-schema.org/draft-07/schema#\", \"title\": \"JSON parser\", \"type\": \"object\", \"properties\": {\"name\": {\"type\": \"string\"}}}",
"type": "object"
},
"evaluation_id": {
"description": "The id of the evaluation that this schema is for",
"type": "string",
"format": "uuid"
}
}
}
Responses
{
"key": "My-lovely-schema",
"name": "My lovely schema",
"description": "This will be used to check stuff",
"evaluation_id": "6b11e6f8-5df8-4310-9417-c69740aba967",
"id": "27d6d87e-f9e8-47bb-a14b-526b786e814b"
}
Schema of the response body
{
"type": "object",
"properties": {
"key": {
"type": "string",
"description": "The identifier used in csv files for this schema",
"nullable": true,
"example": "My-lovely-schema"
},
"name": {
"type": "string",
"description": "An optional name describing this schema",
"nullable": true,
"example": "My lovely schema"
},
"description": {
"type": "string",
"description": "An optional description of this schema",
"nullable": true,
"example": "This will be used to check stuff"
},
"evaluation_id": {
"type": "string",
"format": "uuid"
},
"id": {
"type": "string",
"format": "uuid"
}
}
}
Refer to the common response description: Unauthorized.
Refer to the common response description: Unauthenticated.
Refer to the common response description: Error.
GET /schema
Input parameters
| Parameter | In | Type | Default | Nullable | Description |
|---|---|---|---|---|---|
id |
query | string | Yes | Will return the item with this id, or die trying. When this parameter is provided, then only a single item will be returned |
Responses
Schema of the response body
{
"oneOf": [
{
"$ref": "#/components/schemas/SchemaHistory"
},
{
"type": "object",
"properties": {
"items": {
"description": "An array of all the items that were found, but capped at most at `per_page`",
"type": "array",
"items": {
"$ref": "#/components/schemas/SchemaHistory"
}
},
"count": {
"description": "The total number of items found",
"type": "number",
"format": "int32"
},
"per_page": {
"description": "The number of items returned per page",
"type": "number",
"format": "int32"
},
"page": {
"description": "The number of available pages",
"type": "number",
"format": "int32"
}
}
}
]
}
Refer to the common response description: NotFound.
Refer to the common response description: Error.
PUT /schema
Request body
{
"key": "My-lovely-schema",
"name": "My lovely schema",
"description": "This will be used to check stuff"
}
Schema of the request body
{
"type": "object",
"properties": {
"key": {
"type": "string",
"description": "The identifier used in csv files for this schema",
"nullable": true,
"example": "My-lovely-schema"
},
"name": {
"type": "string",
"description": "An optional name describing this schema",
"nullable": true,
"example": "My lovely schema"
},
"description": {
"type": "string",
"description": "An optional description of this schema",
"nullable": true,
"example": "This will be used to check stuff"
}
}
}
Responses
"SchemaHistory updated"
Schema of the response body
{
"type": "string",
"enum": [
"SchemaHistory updated"
]
}
Refer to the common response description: Unauthorized.
Refer to the common response description: Unauthenticated.
Refer to the common response description: NotFound.
Refer to the common response description: Error.
POST /subscription
Request body
{
"confirmed": true,
"type": "alert",
"item": "bb7ca0bd-0a81-45e1-94e4-7129324bd6af",
"method": "email",
"destination": "(GET \"http://example.com\")"
}
Schema of the request body
{
"type": "object",
"properties": {
"confirmed": {
"type": "boolean",
"nullable": true
},
"type": {
"description": "The type of object to subscribe to",
"example": "alert",
"type": "string",
"enum": [
"alert",
"evaluation_session"
]
},
"item": {
"description": "The id of the item to subscribe to",
"type": "string",
"format": "uuid"
},
"method": {
"description": "The method used to notify",
"type": "string",
"example": "email",
"enum": [
"email",
"webhook",
"sms",
"call"
]
},
"destination": {
"description": "The destination to which messages should be sent. In the case of email methods this must be a valid email. For text messages and calls a valid phone number. In the case of webhooks, this should be a DSL network call.",
"type": "string",
"example": "(GET \"http://example.com\")"
}
}
}
Responses
{
"confirmed": true,
"method": "string",
"destination": "string"
}
Schema of the response body
{
"type": "object",
"properties": {
"confirmed": {
"type": "boolean",
"nullable": true
},
"method": {
"type": "string"
},
"destination": {
"type": "string"
}
}
}
Refer to the common response description: Unauthorized.
Refer to the common response description: Unauthenticated.
Refer to the common response description: Error.
GET /subscription
Input parameters
| Parameter | In | Type | Default | Nullable | Description |
|---|---|---|---|---|---|
id |
query | string | Yes | Will return the item with this id, or die trying. When this parameter is provided, then only a single item will be returned | |
item |
query | string | Yes | The id of the item that was subscribed to | |
item_type |
query | string | Yes | The type of subscriptions to look for. |
Responses
Schema of the response body
{
"oneOf": [
{
"$ref": "#/components/schemas/Subscriber"
},
{
"type": "object",
"properties": {
"items": {
"description": "An array of all the items that were found, but capped at most at `per_page`",
"type": "array",
"items": {
"$ref": "#/components/schemas/Subscriber"
}
},
"count": {
"description": "The total number of items found",
"type": "number",
"format": "int32"
},
"per_page": {
"description": "The number of items returned per page",
"type": "number",
"format": "int32"
},
"page": {
"description": "The number of available pages",
"type": "number",
"format": "int32"
}
}
}
]
}
Refer to the common response description: NotFound.
Refer to the common response description: Error.
PUT /subscription
Request body
{
"confirmed": true
}
Schema of the request body
{
"type": "object",
"properties": {
"confirmed": {
"type": "boolean",
"nullable": true
}
}
}
Responses
"Subscriber updated"
Schema of the response body
{
"type": "string",
"enum": [
"Subscriber updated"
]
}
Refer to the common response description: Unauthorized.
Refer to the common response description: Unauthenticated.
Refer to the common response description: NotFound.
Refer to the common response description: Error.
POST /tag
Request body
{
"name": "string"
}
Schema of the request body
{
"type": "object",
"properties": {
"name": {
"type": "string"
}
}
}
Responses
{
"id": "108bff04-4420-43b8-b185-7c1f3d355f65",
"name": "string"
}
Schema of the response body
{
"type": "object",
"properties": {
"id": {
"type": "string",
"format": "uuid"
},
"name": {
"type": "string"
}
}
}
Refer to the common response description: Unauthorized.
Refer to the common response description: Unauthenticated.
Refer to the common response description: Error.
GET /tag
Input parameters
| Parameter | In | Type | Default | Nullable | Description |
|---|---|---|---|---|---|
id |
query | string | Yes | Will return the item with this id, or die trying. When this parameter is provided, then only a single item will be returned |
Responses
Schema of the response body
{
"oneOf": [
{
"$ref": "#/components/schemas/Tag"
},
{
"type": "object",
"properties": {
"items": {
"description": "An array of all the items that were found, but capped at most at `per_page`",
"type": "array",
"items": {
"$ref": "#/components/schemas/Tag"
}
},
"count": {
"description": "The total number of items found",
"type": "number",
"format": "int32"
},
"per_page": {
"description": "The number of items returned per page",
"type": "number",
"format": "int32"
},
"page": {
"description": "The number of available pages",
"type": "number",
"format": "int32"
}
}
}
]
}
Refer to the common response description: NotFound.
Refer to the common response description: Error.
PUT /tag
Request body
{
"name": "string"
}
Schema of the request body
{
"type": "object",
"properties": {
"name": {
"type": "string"
}
}
}
Responses
"Tag updated"
Schema of the response body
{
"type": "string",
"enum": [
"Tag updated"
]
}
Refer to the common response description: Unauthorized.
Refer to the common response description: Unauthenticated.
Refer to the common response description: NotFound.
Refer to the common response description: Error.
POST /task
Request body
{
"task_type": "string",
"is_task_live": true,
"modalities": [
"string"
],
"redacted": true,
"tags": [
"372ffb70-8cb1-4381-a390-583ef609b89d"
],
"type": "MCQ",
"questions": [
{
"text": "What time is it?",
"paraphrases": []
}
],
"answers": [
{
"text": "half past one",
"paraphrases": [
"1:30 PM",
"13:30"
],
"correct": false
},
{
"text": "Time is an illusion",
"correct": false
},
{
"text": "Now",
"correct": true
}
],
"correct": true,
"schema": "{\"$schema\": \"http://json-schema.org/draft-07/schema#\", \"title\": \"JSON parser\", \"type\": \"object\", \"properties\": {\"name\": {\"type\": \"string\"}}}",
"evaluation_id": "8a067fa5-3527-48c2-85fa-fb27e0dc6c8b"
}
Schema of the request body
{
"type": "object",
"properties": {
"task_type": {
"type": "string"
},
"is_task_live": {
"type": "boolean",
"nullable": true
},
"modalities": {
"type": "array",
"items": {
"type": "string"
}
},
"redacted": {
"type": "boolean"
},
"tags": {
"type": "array",
"items": {
"type": "string",
"format": "uuid"
}
},
"type": {
"description": "The type of the new task",
"example": "MCQ",
"type": "string",
"enum": [
"FRQ",
"bool",
"json",
"MCQ"
]
},
"questions": {
"description": "The task questions - i.e. what the models should answer",
"example": [
{
"text": "What time is it?",
"paraphrases": []
}
],
"type": "array",
"items": {
"type": "object",
"properties": {
"text": {
"type": "string",
"example": "what time is it?"
},
"paraphrases": {
"type": "array",
"items": {
"type": "string",
"example": "can you tell me the time?"
}
}
}
}
},
"answers": {
"description": "A list of possible answers to be sent to models with the question",
"type": "array",
"items": {
"$ref": "#/components/schemas/MCQAnswer"
},
"example": [
{
"text": "half past one",
"paraphrases": [
"1:30 PM",
"13:30"
],
"correct": false
},
{
"text": "Time is an illusion",
"correct": false
},
{
"text": "Now",
"correct": true
}
]
},
"correct": {
"description": "Whether this task is correct. This is used in boolean tasks",
"type": "boolean"
},
"schema": {
"description": "A schema to validate answers against. This is used in JSON tasks",
"example": "{\"$schema\": \"http://json-schema.org/draft-07/schema#\", \"title\": \"JSON parser\", \"type\": \"object\", \"properties\": {\"name\": {\"type\": \"string\"}}}",
"type": "string"
},
"evaluation_id": {
"description": "The id of the evaluation that this task is for",
"type": "string",
"format": "uuid"
}
}
}
Responses
{
"id": "2388e7ce-b3c1-4e7c-9243-eda914667d0d",
"task_type": "string",
"is_task_live": true,
"modalities": [
"string"
],
"redacted": true,
"num_possible_answers": 10.12,
"evaluation_task_number": 10.12,
"median_human_completion_seconds": 10.12,
"median_ai_completion_seconds": 10.12,
"num_times_human_evaluated": 10.12,
"num_times_ai_evaluated": 10.12,
"num_times_humans_answered_correctly": 10.12,
"num_times_ai_answered_correctly": 10.12,
"evaluation_id": "16e15daa-b411-4243-ac5b-b03043036c93",
"owner_id": "ad55f409-28d0-47db-9903-5bf9c4fad6b1",
"tags": [
{
"id": "c1e73682-4408-4e6c-8daa-3c1b799d3fe4",
"name": "string"
}
]
}
Schema of the response body
{
"type": "object",
"properties": {
"id": {
"type": "string",
"format": "uuid"
},
"task_type": {
"type": "string"
},
"is_task_live": {
"type": "boolean",
"nullable": true
},
"modalities": {
"type": "array",
"items": {
"type": "string"
}
},
"redacted": {
"type": "boolean"
},
"num_possible_answers": {
"type": "number",
"format": "int64"
},
"evaluation_task_number": {
"type": "number",
"format": "int64"
},
"median_human_completion_seconds": {
"type": "number",
"format": "double",
"nullable": true
},
"median_ai_completion_seconds": {
"type": "number",
"format": "double",
"nullable": true
},
"num_times_human_evaluated": {
"type": "number",
"format": "int64"
},
"num_times_ai_evaluated": {
"type": "number",
"format": "int64"
},
"num_times_humans_answered_correctly": {
"type": "number",
"format": "int64"
},
"num_times_ai_answered_correctly": {
"type": "number",
"format": "int64"
},
"evaluation_id": {
"type": "string",
"format": "uuid"
},
"owner_id": {
"type": "string",
"format": "uuid"
},
"tags": {
"type": "array",
"items": {
"$ref": "#/components/schemas/ShallowTag"
}
}
}
}
Refer to the common response description: Unauthorized.
Refer to the common response description: Unauthenticated.
Refer to the common response description: Error.
GET /task
Input parameters
| Parameter | In | Type | Default | Nullable | Description |
|---|---|---|---|---|---|
id |
query | string | Yes | Will return the item with this id, or die trying. When this parameter is provided, then only a single item will be returned |
Responses
Schema of the response body
{
"oneOf": [
{
"$ref": "#/components/schemas/Task"
},
{
"type": "object",
"properties": {
"items": {
"description": "An array of all the items that were found, but capped at most at `per_page`",
"type": "array",
"items": {
"$ref": "#/components/schemas/Task"
}
},
"count": {
"description": "The total number of items found",
"type": "number",
"format": "int32"
},
"per_page": {
"description": "The number of items returned per page",
"type": "number",
"format": "int32"
},
"page": {
"description": "The number of available pages",
"type": "number",
"format": "int32"
}
}
}
]
}
Refer to the common response description: NotFound.
Refer to the common response description: Error.
PUT /task
Request body
{
"task_type": "string",
"is_task_live": true,
"modalities": [
"string"
],
"redacted": true,
"tags": [
"758c7803-ab3d-4ea5-8ce7-dce6a195deba"
]
}
Schema of the request body
{
"type": "object",
"properties": {
"task_type": {
"type": "string"
},
"is_task_live": {
"type": "boolean",
"nullable": true
},
"modalities": {
"type": "array",
"items": {
"type": "string"
}
},
"redacted": {
"type": "boolean"
},
"tags": {
"type": "array",
"items": {
"type": "string",
"format": "uuid"
}
}
}
}
Responses
"Task updated"
Schema of the response body
{
"type": "string",
"enum": [
"Task updated"
]
}
Refer to the common response description: Unauthorized.
Refer to the common response description: Unauthenticated.
Refer to the common response description: NotFound.
Refer to the common response description: Error.
POST /user
Request body
{
"email_address": "mr.blobby@some.domain",
"user_name": "mr_blobby",
"full_name": "Mr Blobby, esq.",
"user_image": "https://equistamp.com/avatars/123123123123.png",
"bio": "Hello, my name is Inigo Montoya. You Killed my Father. Prepare to die",
"display_options": {
"bio": true,
"email_address": true,
"user_image": false
}
}
Schema of the request body
{
"type": "object",
"properties": {
"email_address": {
"type": "string",
"description": "The email address of this user. User for logging in, so must be unique.",
"format": "email",
"example": "mr.blobby@some.domain"
},
"user_name": {
"type": "string",
"description": "The user name. Used for logging in and as a unique, human readable identifier of this user",
"example": "mr_blobby"
},
"full_name": {
"type": "string",
"description": "The presentable name of this user. This can be any string",
"nullable": true,
"example": "Mr Blobby, esq."
},
"user_image": {
"type": "string",
"description": "The user avatar, as bytes when uploading, and its URL when fetching",
"nullable": true,
"example": "https://equistamp.com/avatars/123123123123.png"
},
"bio": {
"type": "string",
"description": "A description of this user. Will be rendered as markdown on the website",
"nullable": true,
"example": "Hello, my name is Inigo Montoya. You Killed my Father. Prepare to die"
},
"display_options": {
"description": "A mapping of <displayable field> to true/false, which controls what will be displayed to other users. No option which is not explicitly enabled will be shown to anyone else than you or system admins. To illustrate, the attached example will only allow the user's bio and email address to be returned when other users call this endpoint, and all other fields will not be returned.",
"type": "object",
"additonalProperties": "boolean",
"example": {
"bio": true,
"email_address": true,
"user_image": false
}
}
}
}
Responses
{
"id": "8947fced-c2bc-4df7-a716-e75528489c62",
"email_address": "mr.blobby@some.domain",
"user_name": "mr_blobby",
"full_name": "Mr Blobby, esq.",
"user_image": "https://equistamp.com/avatars/123123123123.png",
"bio": "Hello, my name is Inigo Montoya. You Killed my Father. Prepare to die",
"display_options": {
"bio": true,
"email_address": true,
"user_image": false
},
"join_date": "2022-04-13",
"subscription_level": "pro",
"alerts": [
{
"id": "8c421c10-9d50-454c-a21c-34fcf870cdcf",
"name": "They are coming!!",
"description": "string",
"public": true,
"last_trigger_date": "2022-04-13T15:42:05.901Z",
"trigger_cooldown": "string",
"owner_id": "06a6d247-1966-42cd-a93c-8f9c568035e1",
"triggers": [
"b469c7a7-d655-4587-932d-17f04587339e"
],
"subscriptions": [
"fbc2f8a6-785f-4a6e-9bba-4f3638392094"
]
}
]
}
Schema of the response body
{
"type": "object",
"properties": {
"id": {
"type": "string",
"format": "uuid"
},
"email_address": {
"type": "string",
"description": "The email address of this user. User for logging in, so must be unique.",
"format": "email",
"example": "mr.blobby@some.domain"
},
"user_name": {
"type": "string",
"description": "The user name. Used for logging in and as a unique, human readable identifier of this user",
"example": "mr_blobby"
},
"full_name": {
"type": "string",
"description": "The presentable name of this user. This can be any string",
"nullable": true,
"example": "Mr Blobby, esq."
},
"user_image": {
"type": "string",
"description": "The user avatar, as bytes when uploading, and its URL when fetching",
"nullable": true,
"example": "https://equistamp.com/avatars/123123123123.png"
},
"bio": {
"type": "string",
"description": "A description of this user. Will be rendered as markdown on the website",
"nullable": true,
"example": "Hello, my name is Inigo Montoya. You Killed my Father. Prepare to die"
},
"display_options": {
"description": "A mapping of <displayable field> to true/false, which controls what will be displayed to other users. No option which is not explicitly enabled will be shown to anyone else than you or system admins. To illustrate, the attached example will only allow the user's bio and email address to be returned when other users call this endpoint, and all other fields will not be returned.",
"type": "object",
"additonalProperties": "boolean",
"example": {
"bio": true,
"email_address": true,
"user_image": false
}
},
"join_date": {
"type": "string",
"format": "date"
},
"subscription_level": {
"type": "string",
"description": "The current subscription level of this user",
"enum": [
"admin",
"free",
"enterprise",
"pro"
],
"example": "pro"
},
"alerts": {
"type": "array",
"items": {
"$ref": "#/components/schemas/ShallowAlert"
}
}
}
}
Refer to the common response description: Error.
GET /user
Input parameters
| Parameter | In | Type | Default | Nullable | Description |
|---|---|---|---|---|---|
id |
query | string | Yes | Will return the item with this id, or die trying. When this parameter is provided, then only a single item will be returned |
Responses
Schema of the response body
{
"oneOf": [
{
"$ref": "#/components/schemas/User"
},
{
"type": "object",
"properties": {
"items": {
"description": "An array of all the items that were found, but capped at most at `per_page`",
"type": "array",
"items": {
"$ref": "#/components/schemas/User"
}
},
"count": {
"description": "The total number of items found",
"type": "number",
"format": "int32"
},
"per_page": {
"description": "The number of items returned per page",
"type": "number",
"format": "int32"
},
"page": {
"description": "The number of available pages",
"type": "number",
"format": "int32"
}
}
}
]
}
Refer to the common response description: NotFound.
Refer to the common response description: Error.
PUT /user
Request body
{
"email_address": "mr.blobby@some.domain",
"user_name": "mr_blobby",
"full_name": "Mr Blobby, esq.",
"user_image": "https://equistamp.com/avatars/123123123123.png",
"bio": "Hello, my name is Inigo Montoya. You Killed my Father. Prepare to die",
"display_options": {
"bio": true,
"email_address": true,
"user_image": false
}
}
Schema of the request body
{
"type": "object",
"properties": {
"email_address": {
"type": "string",
"description": "The email address of this user. User for logging in, so must be unique.",
"format": "email",
"example": "mr.blobby@some.domain"
},
"user_name": {
"type": "string",
"description": "The user name. Used for logging in and as a unique, human readable identifier of this user",
"example": "mr_blobby"
},
"full_name": {
"type": "string",
"description": "The presentable name of this user. This can be any string",
"nullable": true,
"example": "Mr Blobby, esq."
},
"user_image": {
"type": "string",
"description": "The user avatar, as bytes when uploading, and its URL when fetching",
"nullable": true,
"example": "https://equistamp.com/avatars/123123123123.png"
},
"bio": {
"type": "string",
"description": "A description of this user. Will be rendered as markdown on the website",
"nullable": true,
"example": "Hello, my name is Inigo Montoya. You Killed my Father. Prepare to die"
},
"display_options": {
"description": "A mapping of <displayable field> to true/false, which controls what will be displayed to other users. No option which is not explicitly enabled will be shown to anyone else than you or system admins. To illustrate, the attached example will only allow the user's bio and email address to be returned when other users call this endpoint, and all other fields will not be returned.",
"type": "object",
"additonalProperties": "boolean",
"example": {
"bio": true,
"email_address": true,
"user_image": false
}
}
}
}
Responses
"User updated"
Schema of the response body
{
"type": "string",
"enum": [
"User updated"
]
}
Refer to the common response description: Unauthorized.
Refer to the common response description: Unauthenticated.
Refer to the common response description: NotFound.
Refer to the common response description: Error.
Schemas
Alert
| Name | Type | Description |
|---|---|---|
description |
string | null | |
id |
string(uuid) | |
last_trigger_date |
string(date-time) | null | |
name |
string | The name of the alert, displayed in the list of alerts |
owner_id |
string(uuid) | |
public |
boolean | |
subscriptions |
Array<ShallowSubscriberAlert> | |
trigger_cooldown |
string | null | How often the trigger can fire |
triggers |
Array<ShallowTrigger> |
ColumnMapping
| Name | Type | Description |
|---|---|---|
columnType |
string | |
paraphraseOf |
string | null |
CurrentScores
Evaluation
| Name | Type | Description |
|---|---|---|
description |
string | null | The description of this evaluation, as displayed on the site. Markdown can be used for formatting |
id |
string(uuid) | |
last_updated |
string(date-time) | |
min_questions_to_complete |
integer(int64) | null | The default number of tasks to run before an evaluation session is deemed finished. A given evaluation session may process more tasks, as starting a new evaluation session for an evaluation/model pair which is already running will just add more tasks to the current session, rather than starting a new one. |
modalities |
Array<string> | The available modalities of this evaluation |
name |
string | |
num_tasks |
integer(int64) | The total number of tasks defined for this evaluation. Includes redacted tasks. |
owner |
ShallowUser | |
public |
boolean | Whether this evaluation should be publicly visible. If true, anyone can view its details or evaluate models with it |
public_usable |
boolean | Whether this evaluation can be ran by anyone. To avoid tasks being leaked, you might want to have the results shown, but have control over what it can be run on. |
quality |
number(double) | The quality of this evaluation, i.e. how much it can be trusted, from 0 to 1. |
reports_visible |
boolean | Whether anyone can pay to see reports for this evaluation. |
tags |
Array<ShallowTag> | |
task_types |
Array<string> | The types of tasks supported by this evaluation |
EvaluationEvaluatee
| Name | Type | Description |
|---|---|---|
cadence |
string | null | |
evaluatee_id |
string(uuid) | |
evaluation |
ShallowEvaluation | |
evaluation_id |
string(uuid) | |
id |
string(uuid) | |
model |
ShallowModel | |
price |
number(int64) |
EvaluationModelJobs
| Name | Type | Description |
|---|---|---|
creation_date |
string(date-time) | |
evaluation_id |
string(uuid) | |
id |
string(uuid) | |
job_body |
||
job_description |
string | |
job_name |
string | |
job_schedule_arn |
string | |
minutes_between_evaluations |
number(int64) | |
model_id |
string(uuid) | |
owner_id |
string(uuid) | |
start_date |
string(date-time) | null |
EvaluationSession
| Name | Type | Description |
|---|---|---|
avg_verbosity |
number(double) | null | |
completed |
boolean | |
datetime_completed |
string(date-time) | null | |
datetime_started |
string(date-time) | |
distribution_of_characters_per_task |
||
distribution_of_seconds_per_task |
||
evaluatee_id |
string(uuid) | In the case of human tests, the id of the user taking the test. In the case of testing models, the id of the model to be tested |
evaluation_id |
string(uuid) | The id of the evaluation to be run |
failed |
boolean | |
id |
string(uuid) | |
is_human_being_evaluated |
boolean | Whether this evaluation session is a human test. When false will start an automatic test for the provided model and evaluation. |
max_characters_per_task |
number(double) | null | |
max_seconds_per_task |
number(double) | null | |
max_verbosity |
number(double) | null | |
mean_characters_per_task |
number(double) | null | |
mean_seconds_per_task |
number(double) | null | |
median_characters_per_task |
number(double) | null | |
median_seconds_per_task |
number(double) | null | |
median_verbosity |
number(double) | null | |
min_characters_per_task |
number(double) | null | |
min_seconds_per_task |
number(double) | null | |
min_verbosity |
number(double) | null | |
num_answered_correctly |
number(int64) | |
num_characters_received_from_endpoint |
number(int64) | |
num_characters_sent_to_endpoint |
number(int64) | |
num_endpoint_calls |
number(int64) | |
num_endpoint_failures |
number(int64) | |
num_questions_answered |
number(int64) | |
num_tasks_to_complete |
number(int64) | |
origin |
string | The source of this evaluation session, i.e. what triggered it |
std_characters_per_task |
number(double) | null | |
std_seconds_per_task |
number(double) | null |
MCQAnswer
| Name | Type | Description |
|---|---|---|
correct |
boolean | |
paraphrases |
Array<string> | A list of paraphrases of this answer - if provided, will always be used rather than the actual answer text |
text |
string | The text of the answer, as will be displayed to the models. If paraphrases are provided, this will never be shown to anyone other than you |
Model
| Name | Type | Description |
|---|---|---|
architecture |
string | null | The architecture of this model |
availability |
number(double) | null | |
best_evaluation_session |
ShallowEvaluationSession | |
check_availability |
boolean | null | Whether the availability of this model should be checked. When true, we will ping the endpoint every |
cost_per_input_character_usd |
number(double) | The cost of a single input character in USD. We assume that a single token is 4 characters. |
cost_per_instance_hour_usd |
number(double) | The cost of running the model for an hour, in USD. This doesn't include input/output tokens - it's purely the server uptime. This is useful e.g. with HuggingFace inference endpoints, where they charge for server time, not for tokens throughput. |
cost_per_output_character_usd |
number(double) | The cost of a single output character in USD. We assume that a single token is 4 characters. |
description |
string | null | The description of this model, as displayed on the site. Markdown can be used for formatting |
elo_score |
number(double) | null | The ELO score, according to LLMSys |
endpoint_type |
string | The type of endpoint being called. We have dedicated handlers for many of the most popular AI model providers |
id |
string(uuid) | |
max_characters_per_minute |
integer(int64) | The maximum allowed number of characters per minute. We assume that one token is 4 characters. This must be at least 1. |
max_context_window_characters |
integer(int64) | null | The maximum number of characters allowed in the context window of this model. We assume that 1 token is 4 characters |
max_request_per_minute |
integer(int64) | The maximum allowed number of requess per minute. This must be at least 1. |
modalities |
Array<string> | The modalities accepted by this model |
name |
string | |
num_parameters |
integer(int64) | null | The number of parameters of the model |
owner |
ShallowUser | |
owner_id |
string(uuid) | |
picture |
string | null | An url to an image representing this model |
public |
boolean | Whether this evaluation should be publicly visible. If true, anyone can view its details. |
public_usable |
boolean | Whether this model can be tested by anyone. LLMs can cost a lot to run, and these costs are on whoever added the model. This setting is here to add an extra protection against people running up large compute costs on this model. When not set, this is `false`. |
publisher |
string | null | The entity that created this model |
quality |
number(double) | The quality of this model, i.e. how much it's worth using, from 0 to 1. This is very subjective, and mainly used to decide whether it should be used by default e.g. on the frontpage. |
score |
number(double) | null | |
top_example |
ShallowTask | |
top_example_id |
string(uuid) | null | |
worst_evaluation_session |
ShallowEvaluationSession | |
worst_example |
ShallowTask | |
worst_example_id |
string(uuid) | null |
Response
| Name | Type | Description |
|---|---|---|
chosen_answer_id |
string(uuid) | null | |
correctness |
number(double) | null | |
creation_date |
string(date-time) | |
evaluatee_id |
string(uuid) | |
evaluation_session_id |
string(uuid) | |
id |
string(uuid) | |
parsed_response_text |
string | null | |
raw_response_text |
string | null | |
raw_task_text |
string | null | |
response_time_in_seconds |
number(double) | null | |
task_id |
string(uuid) |
SchemaHistory
| Name | Type | Description |
|---|---|---|
description |
string | null | An optional description of this schema |
evaluation_id |
string(uuid) | |
id |
string(uuid) | |
key |
string | null | The identifier used in csv files for this schema |
name |
string | null | An optional name describing this schema |
ShallowAlert
| Name | Type | Description |
|---|---|---|
description |
string | null | |
id |
string(uuid) | |
last_trigger_date |
string(date-time) | null | |
name |
string | The name of the alert, displayed in the list of alerts |
owner_id |
string(uuid) | |
public |
boolean | |
subscriptions |
Array<string(uuid)> | |
trigger_cooldown |
string | null | How often the trigger can fire |
triggers |
Array<string(uuid)> |
ShallowCurrentScores
ShallowEvaluation
| Name | Type | Description |
|---|---|---|
description |
string | null | The description of this evaluation, as displayed on the site. Markdown can be used for formatting |
id |
string(uuid) | |
last_updated |
string(date-time) | |
min_questions_to_complete |
integer(int64) | null | The default number of tasks to run before an evaluation session is deemed finished. A given evaluation session may process more tasks, as starting a new evaluation session for an evaluation/model pair which is already running will just add more tasks to the current session, rather than starting a new one. |
modalities |
Array<string> | The available modalities of this evaluation |
name |
string | |
num_tasks |
integer(int64) | The total number of tasks defined for this evaluation. Includes redacted tasks. |
owner |
string(uuid) | |
public |
boolean | Whether this evaluation should be publicly visible. If true, anyone can view its details or evaluate models with it |
public_usable |
boolean | Whether this evaluation can be ran by anyone. To avoid tasks being leaked, you might want to have the results shown, but have control over what it can be run on. |
quality |
number(double) | The quality of this evaluation, i.e. how much it can be trusted, from 0 to 1. |
reports_visible |
boolean | Whether anyone can pay to see reports for this evaluation. |
tags |
Array<string(uuid)> | |
task_types |
Array<string> | The types of tasks supported by this evaluation |
ShallowEvaluationEvaluatee
| Name | Type | Description |
|---|---|---|
cadence |
string | null | |
evaluatee_id |
string(uuid) | |
evaluation |
string(uuid) | |
evaluation_id |
string(uuid) | |
id |
string(uuid) | |
model |
string(uuid) | |
price |
number(int64) |
ShallowEvaluationModelJobs
| Name | Type | Description |
|---|---|---|
creation_date |
string(date-time) | |
evaluation_id |
string(uuid) | |
id |
string(uuid) | |
job_body |
||
job_description |
string | |
job_name |
string | |
job_schedule_arn |
string | |
minutes_between_evaluations |
number(int64) | |
model_id |
string(uuid) | |
owner_id |
string(uuid) | |
start_date |
string(date-time) | null |
ShallowEvaluationSession
| Name | Type | Description |
|---|---|---|
avg_verbosity |
number(double) | null | |
completed |
boolean | |
datetime_completed |
string(date-time) | null | |
datetime_started |
string(date-time) | |
distribution_of_characters_per_task |
||
distribution_of_seconds_per_task |
||
evaluatee_id |
string(uuid) | In the case of human tests, the id of the user taking the test. In the case of testing models, the id of the model to be tested |
evaluation_id |
string(uuid) | The id of the evaluation to be run |
failed |
boolean | |
id |
string(uuid) | |
is_human_being_evaluated |
boolean | Whether this evaluation session is a human test. When false will start an automatic test for the provided model and evaluation. |
max_characters_per_task |
number(double) | null | |
max_seconds_per_task |
number(double) | null | |
max_verbosity |
number(double) | null | |
mean_characters_per_task |
number(double) | null | |
mean_seconds_per_task |
number(double) | null | |
median_characters_per_task |
number(double) | null | |
median_seconds_per_task |
number(double) | null | |
median_verbosity |
number(double) | null | |
min_characters_per_task |
number(double) | null | |
min_seconds_per_task |
number(double) | null | |
min_verbosity |
number(double) | null | |
num_answered_correctly |
number(int64) | |
num_characters_received_from_endpoint |
number(int64) | |
num_characters_sent_to_endpoint |
number(int64) | |
num_endpoint_calls |
number(int64) | |
num_endpoint_failures |
number(int64) | |
num_questions_answered |
number(int64) | |
num_tasks_to_complete |
number(int64) | |
origin |
string | The source of this evaluation session, i.e. what triggered it |
std_characters_per_task |
number(double) | null | |
std_seconds_per_task |
number(double) | null |
ShallowModel
| Name | Type | Description |
|---|---|---|
architecture |
string | null | The architecture of this model |
availability |
number(double) | null | |
best_evaluation_session |
string(uuid) | |
check_availability |
boolean | null | Whether the availability of this model should be checked. When true, we will ping the endpoint every |
cost_per_input_character_usd |
number(double) | The cost of a single input character in USD. We assume that a single token is 4 characters. |
cost_per_instance_hour_usd |
number(double) | The cost of running the model for an hour, in USD. This doesn't include input/output tokens - it's purely the server uptime. This is useful e.g. with HuggingFace inference endpoints, where they charge for server time, not for tokens throughput. |
cost_per_output_character_usd |
number(double) | The cost of a single output character in USD. We assume that a single token is 4 characters. |
description |
string | null | The description of this model, as displayed on the site. Markdown can be used for formatting |
elo_score |
number(double) | null | The ELO score, according to LLMSys |
endpoint_type |
string | The type of endpoint being called. We have dedicated handlers for many of the most popular AI model providers |
id |
string(uuid) | |
max_characters_per_minute |
integer(int64) | The maximum allowed number of characters per minute. We assume that one token is 4 characters. This must be at least 1. |
max_context_window_characters |
integer(int64) | null | The maximum number of characters allowed in the context window of this model. We assume that 1 token is 4 characters |
max_request_per_minute |
integer(int64) | The maximum allowed number of requess per minute. This must be at least 1. |
modalities |
Array<string> | The modalities accepted by this model |
name |
string | |
num_parameters |
integer(int64) | null | The number of parameters of the model |
owner |
string(uuid) | |
owner_id |
string(uuid) | |
picture |
string | null | An url to an image representing this model |
public |
boolean | Whether this evaluation should be publicly visible. If true, anyone can view its details. |
public_usable |
boolean | Whether this model can be tested by anyone. LLMs can cost a lot to run, and these costs are on whoever added the model. This setting is here to add an extra protection against people running up large compute costs on this model. When not set, this is `false`. |
publisher |
string | null | The entity that created this model |
quality |
number(double) | The quality of this model, i.e. how much it's worth using, from 0 to 1. This is very subjective, and mainly used to decide whether it should be used by default e.g. on the frontpage. |
score |
number(double) | null | |
top_example |
string(uuid) | |
top_example_id |
string(uuid) | null | |
worst_evaluation_session |
string(uuid) | |
worst_example |
string(uuid) | |
worst_example_id |
string(uuid) | null |
ShallowResponse
| Name | Type | Description |
|---|---|---|
chosen_answer_id |
string(uuid) | null | |
correctness |
number(double) | null | |
creation_date |
string(date-time) | |
evaluatee_id |
string(uuid) | |
evaluation_session_id |
string(uuid) | |
id |
string(uuid) | |
parsed_response_text |
string | null | |
raw_response_text |
string | null | |
raw_task_text |
string | null | |
response_time_in_seconds |
number(double) | null | |
task_id |
string(uuid) |
ShallowSchemaHistory
| Name | Type | Description |
|---|---|---|
description |
string | null | An optional description of this schema |
evaluation_id |
string(uuid) | |
id |
string(uuid) | |
key |
string | null | The identifier used in csv files for this schema |
name |
string | null | An optional name describing this schema |
ShallowSubscriber
| Name | Type | Description |
|---|---|---|
confirmed |
boolean | null | |
destination |
string | |
method |
string |
ShallowSubscriberAlert
| Name | Type | Description |
|---|---|---|
confirmed |
boolean | null | |
destination |
string | |
method |
string |
ShallowTag
| Name | Type | Description |
|---|---|---|
id |
string(uuid) | |
name |
string |
ShallowTask
| Name | Type | Description |
|---|---|---|
evaluation_id |
string(uuid) | |
evaluation_task_number |
number(int64) | |
id |
string(uuid) | |
is_task_live |
boolean | null | |
median_ai_completion_seconds |
number(double) | null | |
median_human_completion_seconds |
number(double) | null | |
modalities |
Array<string> | |
num_possible_answers |
number(int64) | |
num_times_ai_answered_correctly |
number(int64) | |
num_times_ai_evaluated |
number(int64) | |
num_times_human_evaluated |
number(int64) | |
num_times_humans_answered_correctly |
number(int64) | |
owner_id |
string(uuid) | |
redacted |
boolean | |
tags |
Array<string(uuid)> | |
task_type |
string |
ShallowTrigger
| Name | Type | Description |
|---|---|---|
alert_id |
string(uuid) | |
evaluations |
||
id |
string(uuid) | |
invert |
boolean | |
metric |
string | null | |
models |
||
threshold |
number(double) | null | |
type |
string |
ShallowUser
| Name | Type | Description |
|---|---|---|
alerts |
Array<string(uuid)> | |
bio |
string | null | A description of this user. Will be rendered as markdown on the website |
display_options |
Example: {'bio': True, 'email_address': True, 'user_image': False} |
A mapping of |
email_address |
string(email) | The email address of this user. User for logging in, so must be unique. |
full_name |
string | null | The presentable name of this user. This can be any string |
id |
string(uuid) | |
join_date |
string(date) | |
subscription_level |
string | The current subscription level of this user |
user_image |
string | null | The user avatar, as bytes when uploading, and its URL when fetching |
user_name |
string | The user name. Used for logging in and as a unique, human readable identifier of this user |
Subscriber
| Name | Type | Description |
|---|---|---|
confirmed |
boolean | null | |
destination |
string | |
method |
string |
SubscriberAlert
| Name | Type | Description |
|---|---|---|
confirmed |
boolean | null | |
destination |
string | |
method |
string |
Tag
| Name | Type | Description |
|---|---|---|
id |
string(uuid) | |
name |
string |
Task
| Name | Type | Description |
|---|---|---|
evaluation_id |
string(uuid) | |
evaluation_task_number |
number(int64) | |
id |
string(uuid) | |
is_task_live |
boolean | null | |
median_ai_completion_seconds |
number(double) | null | |
median_human_completion_seconds |
number(double) | null | |
modalities |
Array<string> | |
num_possible_answers |
number(int64) | |
num_times_ai_answered_correctly |
number(int64) | |
num_times_ai_evaluated |
number(int64) | |
num_times_human_evaluated |
number(int64) | |
num_times_humans_answered_correctly |
number(int64) | |
owner_id |
string(uuid) | |
redacted |
boolean | |
tags |
Array<ShallowTag> | |
task_type |
string |
Trigger
| Name | Type | Description |
|---|---|---|
alert_id |
string(uuid) | |
evaluations |
||
id |
string(uuid) | |
invert |
boolean | |
metric |
string | null | |
models |
||
threshold |
number(double) | null | |
type |
string |
User
| Name | Type | Description |
|---|---|---|
alerts |
Array<ShallowAlert> | |
bio |
string | null | A description of this user. Will be rendered as markdown on the website |
display_options |
Example: {'bio': True, 'email_address': True, 'user_image': False} |
A mapping of |
email_address |
string(email) | The email address of this user. User for logging in, so must be unique. |
full_name |
string | null | The presentable name of this user. This can be any string |
id |
string(uuid) | |
join_date |
string(date) | |
subscription_level |
string | The current subscription level of this user |
user_image |
string | null | The user avatar, as bytes when uploading, and its URL when fetching |
user_name |
string | The user name. Used for logging in and as a unique, human readable identifier of this user |
Common responses
This section describes common responses that are reused across operations.
Unauthenticated
A valid API token is needed to access this endpoint
"string"
Schema of the response body
{
"description": "An error message describing what happened",
"type": "string"
}
PaymentRequired
The user has insufficient credits to process this request
"string"
Schema of the response body
{
"description": "An error message describing what happened",
"type": "string"
}
Unauthorized
The provided API token does not have the appropriate permissions to fulfill this request
"string"
Schema of the response body
{
"description": "An error message describing what happened",
"type": "string"
}
NotFound
Could not find this item
"string"
Schema of the response body
{
"description": "An error message describing what happened",
"type": "string"
}
ValidationError
The request has bad data
"string"
Schema of the response body
{
"description": "An error message describing what happened",
"type": "string"
}
Error
A server error
"string"
Schema of the response body
{
"description": "An error message describing what happened",
"type": "string"
}
Common parameters
This section describes common parameters that are reused across operations.
apiToken
| Name | In | Type | Default | Nullable | Description |
|---|---|---|---|---|---|
Api-Token |
header | string | No |