From Black-Box to White-Box: Generative AI Evaluation on Google's Vertex AI
My session titled “From Black-Box to White-Box: Generative AI Evaluation on Google’s Vertex AI”, will provide attendees with a strategic framework to transition from subjective model testing to rigorous, data-driven evaluation using Vertex AI.
Key Topics will be covered:
Deconstructing LLM Hallucinations: An analysis of the root causes of model errors, specifically statistical auto-completion (“guessing”), fuzzy memory, and the bias explaining why models often prioritize helpfulness over truth.
Vertex AI Evaluation Service: An overview of the fully-managed platform for evaluating Generative AI models and agents, enabling developers to assess criteria like safety, groundedness, and instruction following.
Adaptive Rubrics & Metrics: A deep dive into using adaptive rubrics to generate unique, prompt-specific pass/fail tests that act like unit tests for AI, alongside standard computation-based metrics.
The 3-Step Evaluation Framework: A practical, code-first guide on preparing evaluation datasets, defining and running an EvalTask using the Vertex AI SDK, and visualizing actionable insights via radar charts and Vertex AI Experiments.