Skip to content

4.4 Run Evaluation ScriptΒΆ

To run the evaluation script from the Visual Studio Code terminal: - You need to be authenticated on Azure - You need to have the azure-ai-evaluation package installed

You should have done the first step at the start of this workshop, and the dev container already has the package installed for the second.

1. Run the ScriptΒΆ

Run the evaluation script using the following command

1
python evaluate.py

2. Observe the OutputΒΆ

  • Note that there are 13 response records corresponding to the 13 test inputs in our evaluation dataset. These reflect the first stage of evaluation flow where the target AI (chat app) is creating a responses file based on the test inputs given.
  • Note that you can view the traces in local device at the URL provided. This will launch a local version of the trace viewer.
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
Starting prompt flow service...
Start prompt flow service on 127.0.0.1:23333, version: 1.16.2.
You can stop the prompt flow service with the following command:'pf service stop'.

You can view the traces in local from http://127.0.0.1:23333/v1.0/ui/traces/?#run=main_evaluate_chat_with_products_rxna_3r9_20241216_163719_733780
[2024-12-16 16:37:42 +0000][promptflow._sdk._orchestrator.run_submitter][INFO] - Submitting run main_evaluate_chat_with_products_rxna_3r9_20241216_163719_733780, log path: /home/vscode/.promptflow/.runs/main_evaluate_chat_with_products_rxna_3r9_20241216_163719_733780/logs.txt
πŸ’¬ Response: {'content': 'Could you please specify which camping table you are referring to? There are multiple options available, and I can provide information on them.', 'role': 'assistant'}
πŸ’¬ Response: {'content': "Could you specify what aspect of care you're asking about? Are you looking for cleaning instructions, storage tips, or something else for the TrailWalker Hiking Shoes?", 'role': 'assistant'}
πŸ’¬ Response: {'content': 'Sorry, I only can answer queries related to outdoor/camping gear and clothing. So, how can I help?', 'role': 'assistant'}
πŸ’¬ Response: {'content': "Could you please specify which tents you are comparing, or do you want information about a specific tent's waterproof features?", 'role': 'assistant'}
πŸ’¬ Response: {'content': 'Sorry, I only can answer queries related to outdoor/camping gear and clothing. So, how can I help?', 'role': 'assistant'}
πŸ’¬ Response: {'content': 'Sorry, I only can answer queries related to outdoor/camping gear and clothing. So, how can I help?', 'role': 'assistant'}
πŸ’¬ Response: {'content': 'The TrailBlaze Hiking Pants are crafted from high-quality nylon fabric.', 'role': 'assistant'}
πŸ’¬ Response: {'content': 'The TrailMaster X4 Tent comes with an included carry bag, which makes transporting the tent easy and convenient. You can simply pack the tent into the carry bag and carry it as needed for your camping adventure. If you have any more specific questions about the tent or its features, feel free to ask!', 'role': 'assistant'}
πŸ’¬ Response: {'content': 'Sorry, I only can answer queries related to outdoor/camping gear and clothing. So, how can I help?', 'role': 'assistant'}
πŸ’¬ Response: {'content': 'Sorry, I only can answer queries related to outdoor/camping gear and clothing. So, how can I help?', 'role': 'assistant'}
πŸ’¬ Response: {'content': 'Sorry, I only can answer queries related to outdoor/camping gear and clothing. So, how can I help?', 'role': 'assistant'}
πŸ’¬ Response: {'content': 'Sorry, I only can answer queries related to outdoor/camping gear and clothing. So, how can I help?', 'role': 'assistant'}
πŸ’¬ Response: {'content': 'Sorry, I only can answer queries related to outdoor/camping gear and clothing. So, how can I help?', 'role': 'assistant'}
...
...
...

3. Rate Limit ErrorsΒΆ

Occasionally you might see an error message that looks like this. This relates to the tokens per minute rate limits on your evaluation model deployment. The script is designed to handle them and continue running - reconfiguring the model can help.

1
2
[2024-12-16 16:44:07 +0000][promptflow.core._prompty_utils][ERROR] - Exception occurs: RateLimitError: Error code: 429 - {'error': {'code': '429', 'message': 'Requests to the ChatCompletions_Create Operation under Azure OpenAI API version 2024-06-01 have exceeded token rate limit of your current AIServices S0 pricing tier. Please retry after 60 seconds. Please contact Azure support service if you would like to further increase the default rate limit.'}}
[2024-12-16 16:44:07 +0000][promptflow.core._prompty_utils][WARNING] - RateLimitError #2, Retry-After=60, Back off 60.0 seconds for retry.

4. Final ResultΒΆ

Once the evaluation workflow completes, you should see a run summary that looks like this. Note that the evaluation results are saved locally, and also published to the Azure AI Foundry portal.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
======= Run Summary =======

Run name: "azure_ai_evaluation_evaluators_common_base_eval_asyncevaluatorbase_rvrjml8t_20241216_164205_696721"
Run status: "Completed"
Start time: "2024-12-16 16:42:05.695977+00:00"
Duration: "0:03:06.490785"
Output path: "/home/vscode/.promptflow/.runs/azure_ai_evaluation_evaluators_common_base_eval_asyncevaluatorbase_rvrjml8t_20241216_164205_696721"

======= Combined Run Summary (Per Evaluator) =======

{
    "groundedness": {
        "status": "Completed",
        "duration": "0:03:06.490785",
        "completed_lines": 13,
        "failed_lines": 0,
        "log_path": "/home/vscode/.promptflow/.runs/azure_ai_evaluation_evaluators_common_base_eval_asyncevaluatorbase_rvrjml8t_20241216_164205_696721"
    }
}

====================================================

Evaluation results saved to "/workspaces/learns-azure-ai-foundry/src/api/myevalresults.json".

'-----Summarized Metrics-----'
{'groundedness.gpt_groundedness': 1.4615384615384615,
'groundedness.groundedness': 1.4615384615384615}
'-----Tabular Result-----'
                                    outputs.response  ... line_number
0   Could you please specify which tents you are c...  ...           0
1   Could you please specify which camping table y...  ...           1
2   Sorry, I only can answer queries related to ou...  ...           2
3   Could you specify what aspect of care you're a...  ...           3
4   Sorry, I only can answer queries related to ou...  ...           4
5   The TrailMaster X4 Tent comes with an included...  ...           5
6   Sorry, I only can answer queries related to ou...  ...           6
7   The TrailBlaze Hiking Pants are crafted from h...  ...           7
8   Sorry, I only can answer queries related to ou...  ...           8
9   Sorry, I only can answer queries related to ou...  ...           9
10  Sorry, I only can answer queries related to ou...  ...          10
11  Sorry, I only can answer queries related to ou...  ...          11
12  Sorry, I only can answer queries related to ou...  ...          12

[13 rows x 8 columns]
('View evaluation results in AI Studio: '
'https://ai.azure.com/build/evaluation/XXXXXXX?wsid=/subscriptions/XXXXXXXX/resourceGroups/ninarasi-ragchat-rg/providers/Microsoft.MachineLearningServices/workspaces/ninarasi-ragchat-v1')
'''

# ----------------------------------------------