Evaluation | Snorkel AI

Evaluation workflow overview

Evaluation for GenAI output is inherently challenging because GenAI model responses are varied and context-dependent.

Onboard evaluation artifacts

Evaluation for GenAI output begins with onboarding, where you define the key elements of the benchmark.

Run an evaluation benchmark

Once you've completed artifact onboarding, it's time to set the benchmark for this GenAI model.

Refine evaluation benchmark

After running the initial evaluation, you may need to refine it. This step is iterative, with the end goal of having a benchmark that fully aligns with business...

Refine GenAI system

The end goal for GenAI output evaluation is to use the insights to refine your LLM system until it is production-worthy and meets your criteria.