Evaluation Sessions

An evaluation sessions is a run of a batch of tests from an evaluation against a model. This will result in a score for that specific evaluation session. You can run an evaluation session for any evaluation you can see, and any model that is publicly runnable or of which you are an admin.

If you start an evaluation session for an evaluation - model pair for which an evaluation session is already running, this will add a bunch of extra tasks to the running evaluation session, rather than creating a new one. If you really want a new evaluation session, this can be forced with the restart API parameter.

Evaluations sessions are a paid service. The price depends on the evaluation and model - it's basically the cost of running the tasks on the model with a markup to cover our costs. Expensive models and evaluations with lots or large tasks will result in more expensive evaluation sessions.

There are two ways to trigger evaluation session runs - through the website, or via the API.

Purchase Evaluation Runs via the website

Find an evaluation you're interesting in from the list of available evaluations
Click "Learn more"
Click "Connect Models"
Select which models you wish to evaluate
Choose how often you want this evaluation to run
Click "Connect models"
Enter your preferred payment option to pay for the runs

The payment step might not happen - if you have enough credits to cover the costs of your selected evaluation sessions, we will simply subtract them. If you have enough credits to cover only some of your evaluation sessions, then we will subtract credits where possible, and ask you to pay for the rest. Recurring evaluation sessions will always have to be paid for.

Once the payment goes through, your evaluation runs will start. Depending on the model, the evaluation session can take quite a while to finish - slower models, or models with stringent rate limits will take longer. The connections page (where you chose the models) will display which models currently have running evaluation session, and will also display any scheduled runs, if you want to cancel them. You can also cancel scheduled evaluation sessions directly via Stripe from your user profile page.

Start evaluation sessions via the API

You can trigger evaluation sessions by sending a POST request to the /evaluationsession endpoint. When you call this endpoint, we will check whether you have enough credits to run your chosen evaluation session and if so, will deduct the cost from your available credits. If not, an error with the 402 HTTP status code will be returned.

Assuming that you wanted to run an evaluation session for an evaluation with the deadbeef-0000-0000-0000-000000000000 id and a model with an id of 12345678-0000-0000-0000-000000000000, you'd start the evaluation session by calling the following using Python (see the API docs on how to get the api_token):

import requests
api_token = 'your token here'

res = requests.post(
    'https://equistamp.net/evaluationsession',
    headers={'Api-Token': api_token},
    json={
        'evaluation_id': 'deadbeef-0000-0000-0000-000000000000',
        'evaluatee_id': '12345678-0000-0000-0000-000000000000',
        'is_human_being_evaluated': False,
        'origin': 'user',
    }
)

if res.status_code == 400:
    # General errors
    raise ValueError(res.json())
if res.status_code == 401:
    # No Api-Token provided
    raise ValueError(res.json())
if res.status_code == 402:
    # Not enough credits
    raise ValueError(res.json())
if res.status_code == 403:
    # You're not allowed to run evaluation sessions on this model / evaluation
    raise ValueError(res.json())
if res.status_code == 404:
    # No such eval session / model found
    raise ValueError(res.json())

print(f'Started evaluation session. Check https://equistamp.com/evaluation-sessions/{res.json()["id"]} for details')