Bulk Evaluation

Before you head into production, we have tools to automate AI Copilot tests at scale!

Why do you need bulk tests

When building your AI Copilot, you will have to tweak the settings often to ensure the responses show parity and are grounded and verifiable. How can you do this at scale

There are several components to test:

  • testing prompts,

  • ensuring the synthetic data retrieval works

  • checking the suitability of the language model and its advanced settings

  • evaluation of the final AI Copilot to produce the Golden Answers

Features of Bulk Runner

  1. Run several iterations of your workflows at scale

  2. Choose any of the API Response Outputs to populate your test

  3. Get output in CSV for further data analysis

Add your workflow links. In this example, we have run the same AI Copilot with 3 different Dense Embedding Weightages (From Very Dense to Very Sparse).

Step 2 - Add your CSV/google sheets

Add your csv with relevant columns that will be the inputs for the run. In this example, we have made a csv of 5 questions that the 3 workflow runs will use.

Step 3 - Select the input column

Select the column which you need as an input from the dropdown box. The outputs will appear as various columns.

CHECK OUT THE EXAMPLE MENTIONED HERE: https://gooey.ai/bulk/?example_id=r0ugpna8

Last updated