4.4 Understand Eval Workflow¶
The evaluation flow takes 7-9 minutes to complete. Let's use the time to explore the code and understand the underlying workflow in more detail
ACTIVATE WORD WRAP: Many of these .jsonl
files will contain large text strings per line. Press Alt-Z (or Cmd-Z on Mac) to toggle word wrap. This will make the data in these .jsonl
files easier to read within the limited screen view.
1. Explore: Create Response¶
-
Open the file
src/api/evaluators/data.jsonl
- This file contains the suite of test questions, each associated with a specific customer.
- Sample question: "what is the waterproof rating of the tent I bought?"
-
Take another look at
src/api/evaluate-chat-flow.ipynb
- Look at Cell 3, beginning
def create_response_data(df):
- For each question in the file, the
get_response
function (from our chat application) is invoked to generate the response and associated context - The {question, context, response} triples are then written to the
results.jsonl
file.
- Look at Cell 3, beginning
2. Explore: Evaluate Response¶
- Take another look at
src/api/evaluate-chat-flow.ipynb
- Look a cell 4, beginning
def evaluate():
- Observe: It loads the results file from the previous step
- Observe: For each result in file, it extracts the "triple"
- Observe: For each triple, it executes the 4 evaluator Promptys
- Observe: It writes the scores to an
result_evaluated.jsonl
file
- Look a cell 4, beginning
You can ignore the eval_results.json
file that is also created here. That file concatenates all the line results into a single JSON file for use in other tasks.
3. Explore: Create Summary¶
-
When notebook execution completes, look in the
src/api/evaluators
folder:- You see: Chat Responses in
result.jsonl
- You see: Evaluated Results in
result_evaluated.jsonl
(scores at end of each line) - You see: Evaluation Summary computed from
eval_results.jsonl
(complete data.)
- You see: Chat Responses in
-
Scroll to the bottom of the notebook to view the results cell:
- Click the
View as scrollable element
link to redisplay output - Scroll to the bottom of redisplayed cell to view scores table
- You should see something like the table below - we reformatted it manually for clarity.
- Click the
4. Understand: Eval Results¶
The figure shows you what that tabulated data looks like in the notebook results. Ignore the formatting for now, and let's look at what this tells us:
- You see 12 rows of data - corresponding to 12 test inputs (in
data.jsonl
) - You see 4 metrics from custom evaluators -
groundedness
,fluency
,coherence
,relevance
- Each metric records a score - between
1
and5
Let's try to put the scores in context of the responses we see. Try these exercises:
-
Pick a row above that has a
groundedness
of 5.- View the related row in the
result_evaluated.jsonl
file - Observe related answer and context in file
-
Ask: was the answer grounded in the context?
Want to see
groundedness=5
example from a previous run? Click to expand this.{"question": "tell me about your hiking jackets", "context": [{"id": "17", "title": "RainGuard Hiking Jacket", "content": "Introducing the MountainStyle RainGuard Hiking Jacket - the ultimate solution for weatherproof comfort during your outdoor undertakings! Designed with waterproof, breathable fabric, this jacket promises an outdoor experience that's as dry as it is comfortable. The rugged construction assures durability, while the adjustable hood provides a customizable fit against wind and rain. Featuring multiple pockets for safe, convenient storage and adjustable cuffs and hem, you can tailor the jacket to suit your needs on-the-go. And, don't worry about overheating during intense activities - it's equipped with ventilation zippers for increased airflow. Reflective details ensure visibility even during low-light conditions, making it perfect for evening treks. With its lightweight, packable design, carrying it inside your backpack requires minimal effort. With options for men and women, the RainGuard Hiking Jacket is perfect for hiking, camping, trekking and countless other outdoor adventures. Don't let the weather stand in your way - embrace the outdoors with MountainStyle RainGuard Hiking Jacket!", "url": "/products/rainguard-hiking-jacket"}, {"id": "3", "title": "Summit Breeze Jacket", "content": "Discover the joy of hiking with MountainStyle's Summit Breeze Jacket. This lightweight jacket is your perfect companion for outdoor adventures. Sporting a trail-ready, windproof design and a water-resistant fabric, it's ready to withstand any weather. The breathable polyester material and adjustable cuffs keep you comfortable, whether you're ascending a mountain or strolling through a park. And its sleek black color adds style to function. The jacket features a full-zip front closure, adjustable hood, and secure zippered pockets. Experience the comfort of its inner lining and the convenience of its packable design. Crafted for night trekkers too, the jacket has reflective accents for enhanced visibility. Rugged yet chic, the Summit Breeze Jacket is more than a hiking essential, it's the gear that inspires you to reach new heights. Choose adventure, choose the Summit Breeze Jacket.", "url": "/products/summit-breeze-jacket"}], "answer": "Sure, Sarah Lee! \ud83d\ude0a We have two hiking jackets that you might be interested in:\n\n1. RainGuard Hiking Jacket: This jacket is designed with waterproof, breathable fabric to keep you dry and comfortable during your outdoor adventures. It has adjustable cuffs and hem, multiple pockets for storage, and ventilation zippers for increased airflow. The reflective details ensure visibility even in low-light conditions. It's lightweight and packable, making it easy to carry in your backpack. Perfect for hiking, camping, trekking, and more! \ud83c\udf27\ufe0f\ud83c\udfd5\ufe0f\n\n2. Summit Breeze Jacket: This lightweight jacket", "groundedness": "5", "fluency": "4", "coherence": "4", "relevance": "5"}
- View the related row in the
-
Pick a row that has a
groundedness
of 1.- View the related row in the
result_evaluated.jsonl
file - Observe related answer and context in file
-
Ask: was THIS answer grounded in the context?
Want to see
groundedness=1
example from a previous run? Click to expand this.{"question": "I would like to return the tent I bought. It is used but I still want to return it since the roof leaks.", "context": [{"id": "1", "title": "TrailMaster X4 Tent", "content": "Unveiling the TrailMaster X4 Tent from OutdoorLiving, your home away from home for your next camping adventure. Crafted from durable polyester, this tent boasts a spacious interior perfect for four occupants. It ensures your dryness under drizzly skies thanks to its water-resistant construction, and the accompanying rainfly adds an extra layer of weather protection. It offers refreshing airflow and bug defence, courtesy of its mesh panels. Accessibility is not an issue with its multiple doors and interior pockets that keep small items tidy. Reflective guy lines grant better visibility at night, and the freestanding design simplifies setup and relocation. With the included carry bag, transporting this convenient abode becomes a breeze. Be it an overnight getaway or a week-long nature escapade, the TrailMaster X4 Tent provides comfort, convenience, and concord with the great outdoors. Comes with a two-year limited warranty to ensure customer satisfaction.", "url": "/products/trailmaster-x4-tent"}, {"id": "15", "title": "SkyView 2-Person Tent", "content": "Introducing the OutdoorLiving SkyView 2-Person Tent, a perfect companion for your camping and hiking adventures. This tent offers a spacious interior that houses two people comfortably, with room to spare. Crafted from durable waterproof materials to shield you from the elements, it is the fortress you need in the wild. Setup is a breeze thanks to its intuitive design and color-coded poles, while two large doors allow for easy access. Stay organized with interior pockets, and store additional gear in its two vestibules. The tent also features mesh panels for effective ventilation, and it comes with a rainfly for extra weather protection. Light enough for on-the-go adventurers, it packs compactly into a carrying bag for seamless transportation. Reflective guy lines ensure visibility at night for added safety, and the tent stands freely for versatile placement. Experience the reliability of double-stitched seams that guarantee increased durability, and rest easy under the stars with OutdoorLiving's SkyView 2-Person Tent. It's not just a tent; it's your home away from home.", "url": "/products/skyview-2-person-tent"}], "answer": "Hi Melissa! \ud83d\ude0a I'm sorry to hear that the TrailMaster X4 Tent you bought has a leaking roof. As a Gold member, you are eligible for returns. To initiate the return process, please contact our customer service team at [customer service number]. They will assist you further and provide you with the necessary instructions. \ud83d\uded2\ud83c\udf27\ufe0f", "groundedness": "1", "fluency": "4", "coherence": "4", "relevance": "3"}
- View the related row in the
In the provided examples, we can see that the first response in the visualized results (row 0
) had a groundedness of 5, while the third row from the bottom (row 9
) had a groundedness of 1.
- In the first case, the answers provided matched the data context (tent names).
- In the second case, the answers mention real tents from the context but the question did not actually specify the name of the tent - so response was not grounded in truth.
Explore the data in more detail on your own. Try to build your intuition for how scores are computed, and how that assessment reflects in the quality of your application.
CONGRATULATIONS. You just looked under the hood of an AI-Assisted evaluation workflow.