Bulk Evaluation
Before you head into production, we have tools to automate AI Copilot tests at scale!
Why do you need bulk tests
When building your AI Copilot, you will have to tweak the settings often to ensure the responses show parity and are grounded and verifiable. How can you do this at scale
There are several components to test:
testing prompts,
ensuring the synthetic data retrieval works
checking the suitability of the language model and its advanced settings
evaluation of the final AI Copilot to produce the Golden Answers
Features of Bulk Runner
Run several iterations of your workflows at scale
Choose any of the API Response Outputs to populate your test
Get output in CSV for further data analysis
LINK TO WORKFLOW: https://gooey.ai/bulk/
Step 1 - Add your workflow links
Add your workflow links. In this example, we have run the same AI Copilot with 3 different Dense Embedding Weightages (From Very Dense to Very Sparse).
Step 2 - Add your CSV/google sheets
Add your csv with relevant columns that will be the inputs for the run. In this example, we have made a csv of 5 questions that the 3 workflow runs will use.
Step 3 - Select the input column
Select the column which you need as an input from the dropdown box. The outputs will appear as various columns.
CHECK OUT THE EXAMPLE MENTIONED HERE: https://gooey.ai/bulk/?example_id=r0ugpna8
Last updated