Pydantic AI Output Validators: Runtime Validation with Dependencies

One of the best things about Pydantic AI is how seamlessly it integrates with Pydantic's validation system. You get automated output validation with the ability to retry model calls when validation fails - no manual prompt engineering to get the format right, just define your models and let the framework handle the back-and-forth with the LLM.

I've been playing around with Pydantic AI's output validation features and wanted to jot down some notes about the @output_validator decorator, particularly when you need access to runtime dependencies for validation.

The Basic Setup

Here's a simple example where we're generating SQL queries and want to validate them based on some runtime policy:

import os

import logfire
from dotenv import load_dotenv
from pydantic import BaseModel
from pydantic_ai import Agent, ModelRetry, RunContext

load_dotenv()

logfire.configure(token=os.getenv("LOGFIRE_TOKEN"))
logfire.instrument_pydantic_ai()


class SQLOutput(BaseModel):
    sql_query: str


class SQLValidationDeps(BaseModel):
    allow_select_star: bool


agent = Agent[SQLValidationDeps, SQLOutput](
    "anthropic:claude-3-5-haiku-latest",
    output_type=SQLOutput,
    deps_type=SQLValidationDeps,
    system_prompt="Generate PostgreSQL flavored SQL queries based on user input.",
    retries=3,
)


@agent.output_validator
async def validate_sql(
    ctx: RunContext[SQLValidationDeps], output: SQLOutput
) -> SQLOutput:
    """Validate that SQL queries follow the SELECT * policy."""

    if not ctx.deps.allow_select_star and "SELECT *" in output.sql_query.upper():
        raise ModelRetry(
            "SELECT * is not allowed. Please specify explicit column names."
        )

    return output


# Example usage
if __name__ == "__main__":
    # Demo with SELECT * disabled
    result = agent.run_sync(
        "Get all users who were active yesterday",
        deps=SQLValidationDeps(allow_select_star=False),
    )
    print("Result with SELECT * disabled:")
    print(result.output)
    print()

    # Demo with SELECT * enabled
    result = agent.run_sync(
        "Show me all user data", deps=SQLValidationDeps(allow_select_star=True)
    )
    print("Result with SELECT * enabled:")
    print(result.output)

Why Output Validators Instead of Field Validators?

I initially thought I could just use Pydantic's regular @field_validator on the SQLOutput model. The key issue is that field validators run during model instantiation and don't have access to runtime dependencies.

In my actual work use case, I needed to run EXPLAIN {query} on the generated SQL to ensure it used the correct functions that were available in the database. But to do that, I needed to make a call to the database using the user's credentials that were passed in the request. A @field_validator would have no way to access this information since it's not part of the model being validated.

Output validators solve this by running after the model is created but before it's returned to the user, with full access to the RunContext and dependencies.

The ModelRetry Exception

When validation fails, you raise ModelRetry - this is a specific exception type that tells Pydantic AI to retry the generation process. It's not just any exception; it's a signal to the framework that the model should try again with the feedback you provide.

This has implications for error handling. If you're wrapping the validation logic in try/except blocks, you need to be careful:

try:
    # some validation logic
    if validation_fails:
        raise ModelRetry("Try again with better parameters")
except ModelRetry:
    # Re-raise ModelRetry exceptions - don't catch them!
    raise
except Exception as e:
    # Handle other exceptions
    logger.error(f"Unexpected error: {e}")
    raise ModelRetry("An unexpected error occurred, please try again")

Retry Configuration

The retries=3 parameter on the agent is crucial. This controls how many times the model will attempt to generate a valid response before giving up. You need to think about this number carefully:

Too low and you might miss valid responses that would work on retry
Too high and you'll waste tokens and time on responses that are fundamentally flawed
The default is 1, which might be too conservative for complex validation scenarios

In the SQL example, 3 retries seems reasonable - enough to handle simple mistakes like using SELECT * when it's not allowed, but not so many that we're burning through tokens on fundamentally broken queries.

The Retry Loop

When validation fails and you raise ModelRetry, the agent will keep trying until either:

The model generates a response that passes validation, or
The maximum number of retries is exceeded

If the retry limit is hit, you'll get an UnexpectedModelBehavior: Exceeded maximum retries (1) for result validation exception. This is why setting an appropriate retry count is important - too low and you might get this error when a valid response was achievable, too high and you'll waste tokens on fundamentally flawed attempts.