4.3 Configure Evaluation Model¶
Recall that in the last section, the evaluation script identified an evaluator_model that will serve as the judge AI for this assessment.
1. Specifying Evaluator Model¶
In this workshop, we are reusing the same model for both chat_completion and evaluation roles, but you can choose to separate the two by:
- Deploying a new model to the same Azure AI Project
- Updating the
EVALUATION_MODELenvironment variable to this one - Restarting the evaluation script
HOMEWORK: Try deploying a gpt-4 model for evaluations. How do results differ?
| src/api/evaluate.py | |
|---|---|
1 2 3 4 5 6 7 8 9 10 11 | |
2. Configuring Evaluator Model¶
Let's take a look at the core evaluate function that executes the workflow. This function will run an assessment once for each record (in data), for each evaluator (in evaluators). This requires a lot of calls to the identified evaluation model - which will require a higher token capacity for efficient completion.
Note: The current script uses a single evaluator (for Groundedness). Adding additional evaluators will increase the number of calls made to the default model, so make sure you configure quota to adjust for that accordingly.=
Update the model quota in Azure AI Foundry if execution has rate limit issues
Take these steps to view and update your model quota.
- Visit your Azure AI project page in Azure AI Foundry
- Click "Models + Endpoints" and select the evaluation model
- Click
Editand increase the Tokens per minute rate limit* (e.g., to 30) - Click
Save and close
Click to expand and see a screenshot of the update dialog
