In this article, you will learn how to use Pydantic to validate, parse, and serialize structured data in Python using type hints.
Topics we will cover include:
- Defining core models with type coercion and clear validation errors
- Using optional fields, defaults, and
Fieldconstraints effectively - Writing custom validators, handling nested structures, and exporting JSON
Let’s not waste any more time.
The Complete Guide to Pydantic for Python Developers
Image by Editor
Introduction
Python’s flexibility with data types is convenient when coding, but it can lead to runtime errors when your code receives unexpected data formats. Such errors are especially common when you’re working with APIs, processing configuration files, or handling user input. Data validation, therefore, becomes necessary for building reliable applications.
Pydantic addresses this challenge by providing automatic data validation and serialization using Python’s type hint system, allowing you to define exactly what your data should look like and automatically enforcing those rules.
This article covers the basics of using Pydantic for data validation using type hints. Here’s what you’ll learn:
- Creating and validating data structures with type hints
- Handling optional fields and default values
- Building custom validation logic for specific requirements
- Working with nested models and complex data structures
Let’s begin with the basics. Before you proceed,
and follow along with the examples.
Basic Pydantic Models
Unlike manual data validation approaches that require writing extensive if-statements and type checks, Pydantic integrates well with your existing Python code. It uses Python’s type hints (which you might already be using) and transforms them into powerful validation logic.
When data doesn’t match your specifications, you get clear, actionable error messages instead of cryptic runtime exceptions. This reduces debugging time and makes your code more maintainable and self-documenting.
Pydantic models inherit from BaseModel and use Python type hints to define the expected data structure:
|
from pydantic import BaseModel
class User(BaseModel): name: str age: int email: str
# Create a user user = User(name=“Alice”, age=“25”, email=“alice@example.com”) print(user.age) print(type(user.age)) |
Output:
This code defines a User model with three required fields. When creating a user instance, Pydantic automatically converts the string “25” to the integer 25. If conversion isn’t possible (like passing “abc” for age), it raises a validation error with a clear message about what went wrong. This automatic type coercion is particularly useful when working with JSON data or form inputs where everything arrives as strings.
Optional Fields and Defaults
Real-world data often has missing or optional fields. Pydantic handles this with Optional types and default values:
|
from pydantic import BaseModel, Field from typing import Optional
class Product(BaseModel): name: str price: float description: Optional[str] = None in_stock: bool = True category: str = Field(default=“general”, min_length=1)
# All these work product1 = Product(name=“Widget”, price=9.99) product2 = Product(name=“Gadget”, price=15.50, description=“Useful tool”) |
The Optional[str] type means description can be a string or None. Fields with default values don’t need to be provided when creating instances. The Field() function adds validation constraints.
Here it ensures category has at least one character. This flexibility allows your models to handle incomplete data gracefully while still enforcing important business rules.
Custom Validators in Pydantic
Sometimes you need validation logic beyond basic type checking. Validators let you implement custom rules:
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 |
from pydantic import BaseModel, field_validator import re
class Account(BaseModel): username: str email: str password: str
@field_validator(‘username’) def validate_username(cls, v): if len(v) < 3: raise ValueError(‘Username must be at least 3 characters’) if not v.isalnum(): raise ValueError(‘Username must be alphanumeric’) return v.lower() # Normalize to lowercase
@field_validator(’email’) def validate_email(cls, v): pattern = r‘^[\w\.-]+@[\w\.-]+\.\w+$’ if not re.match(pattern, v): raise ValueError(‘Invalid email format’) return v
@field_validator(‘password’) def validate_password(cls, v): if len(v) < 8: raise ValueError(‘Password must be at least 8 characters’) return v
account = Account( username=“JohnDoe123”, email=“john@example.com”, password=“secretpass123” ) |
Validators run automatically during model creation. They can transform data (like converting usernames to lowercase) or reject invalid values with descriptive error messages.
The cls parameter gives access to the class, and v is the value being validated. Validators run in the order they’re defined and can access values from previously validated fields.
Nested Models and Complex Structures
Real applications deal with hierarchical data. Pydantic makes nested validation straightforward:
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 |
from pydantic import BaseModel, field_validator from typing import List, Optional from datetime import datetime
class Address(BaseModel): street: str city: str state: str zip_code: str
@field_validator(‘zip_code’) def validate_zip(cls, v): if not v.isdigit() or len(v) != 5: raise ValueError(‘ZIP code must be 5 digits’) return v
class Contact(BaseModel): name: str phone: str email: Optional[str] = None
class Company(BaseModel): name: str founded: datetime address: Address contacts: List[Contact] employee_count: int is_public: bool = False
# Complex nested data gets fully validated company_data = { “name”: “Tech Corp”, “founded”: “2020-01-15T10:00:00”, “address”: { “street”: “123 Main St”, “city”: “San Francisco”, “state”: “CA”, “zip_code”: “94105” }, “contacts”: [ {“name”: “John Smith”, “phone”: “555-0123”}, {“name”: “Jane Doe”, “phone”: “555-0456”, “email”: “jane@techcorp.com”} ], “employee_count”: 150 }
company = Company(**company_data) |
Pydantic validates the entire structure recursively. The address gets validated according to the Address model rules, each contact in the contacts list is validated as a Contact model, and the datetime string is automatically parsed. If any part of the nested structure is invalid, you get a detailed error showing exactly where the problem occurs.
If all goes well, the company object will look like:
|
Company(name=‘Tech Corp’, founded=datetime.datetime(2020, 1, 15, 10, 0), address=Address(street=‘123 Main St’, city=‘San Francisco’, state=‘CA’, zip_code=‘94105’), contacts=[Contact(name=‘John Smith’, phone=‘555-0123’, email=None), Contact(name=‘Jane Doe’, phone=‘555-0456’, email=‘jane@techcorp.com’)], employee_count=150, is_public=False) |
Working with APIs and JSON
Pydantic works well in handling API responses and JSON data, which often comes in unpredictable formats.
This example shows handling typical API challenges: mixed data types (age as string), various datetime formats, and optional fields:
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 |
from pydantic import BaseModel, Field, field_validator from typing import Union, Optional from datetime import datetime import json
class APIResponse(BaseModel): status: str message: Optional[str] = None data: Optional[dict] = None timestamp: datetime = Field(default_factory=datetime.now)
class UserProfile(BaseModel): id: int username: str full_name: Optional[str] = None age: Optional[int] = Field(None, ge=0, le=150) # Age constraints created_at: Union[datetime, str] # Handle multiple formats is_verified: bool = False
@field_validator(‘created_at’, mode=‘before’) def parse_created_at(cls, v): if isinstance(v, str): try: return datetime.fromisoformat(v.replace(‘Z’, ‘+00:00’)) except ValueError: raise ValueError(‘Invalid datetime format’) return v
# Simulate API response api_json = ”‘ { “status”: “success”, “data”: { “id”: 123, “username”: “alice_dev”, “full_name”: “Alice Johnson”, “age”: “28”, “created_at”: “2023-01-15T10:30:00Z”, “is_verified”: true } } ‘”
response_data = json.loads(api_json) api_response = APIResponse(**response_data)
if api_response.data: user = UserProfile(**api_response.data) print(f“User {user.username} created at {user.created_at}”) |
When you load the JSON response and create the user object, you’ll get the following output:
|
User alice_dev created at 2023–01–15 10:30:00+00:00 |
The mode="before" parameter on validators means they run before type conversion, allowing you to handle string inputs before they’re converted to the target type. Field constraints like ge=0, le=150 ensure age values are reasonable.
Error Handling and Validation
When validation fails, Pydantic provides structured error information:
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 |
from pydantic import BaseModel, ValidationError, field_validator from typing import List
class Order(BaseModel): order_id: int customer_email: str items: List[str] total: float
@field_validator(‘total’) def positive_total(cls, v): if v <= 0: raise ValueError(‘Total must be positive’) return v
# Invalid data bad_data = { “order_id”: “not_a_number”, “customer_email”: “invalid_email”, “items”: “should_be_list”, “total”: –10.50 }
try: order = Order(**bad_data) except ValidationError as e: print(“Validation errors:”) for error in e.errors(): field = error[‘loc’][0] message = error[‘msg’] print(f” {field}: {message}”)
# Get JSON representation of errors print(“\nJSON errors:”) print(e.json(indent=2)) |
Output:
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 |
Validation errors: order_id: Input should be a valid integer, unable to parse string as an integer items: Input should be a valid list total: Value error, Total must be positive
JSON errors: [ { “type”: “int_parsing”, “loc”: [ “order_id” ], “msg”: “Input should be a valid integer, unable to parse string as an integer”, “input”: “not_a_number”, “url”: “https://errors.pydantic.dev/2.11/v/int_parsing” }, { “type”: “list_type”, “loc”: [ “items” ], “msg”: “Input should be a valid list”, “input”: “should_be_list”, “url”: “https://errors.pydantic.dev/2.11/v/list_type” }, { “type”: “value_error”, “loc”: [ “total” ], “msg”: “Value error, Total must be positive”, “input”: –10.5, “ctx”: { “error”: “Total must be positive” }, “url”: “https://errors.pydantic.dev/2.11/v/value_error” } ] |
Pydantic’s error objects contain detailed information about what went wrong and where. Each error includes the field location, error type, and a human-readable message. This makes it easy to provide meaningful feedback to users or log detailed error information for debugging.
Serialization and Export
Converting models back to dictionaries or JSON is straightforward:
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 |
from pydantic import BaseModel from datetime import datetime
class Event(BaseModel): name: str date: datetime attendees: int is_public: bool = True
event = Event( name=“Python Meetup”, date=datetime(2024, 3, 15, 18, 30), attendees=45 )
# Export to dictionary event_dict = event.model_dump() print(event_dict)
# Export to JSON string event_json = event.model_dump_json() print(event_json)
# Export with exclusions public_data = event.model_dump(exclude={‘attendees’}) print(public_data)
# Export with custom serialization formatted_json = event.model_dump_json(indent=2) print(formatted_json) |
Output:
|
{‘name’: ‘Python Meetup’, ‘date’: datetime.datetime(2024, 3, 15, 18, 30), ‘attendees’: 45, ‘is_public’: True} {“name”:“Python Meetup”,“date”:“2024-03-15T18:30:00”,“attendees”:45,“is_public”:true} {‘name’: ‘Python Meetup’, ‘date’: datetime.datetime(2024, 3, 15, 18, 30), ‘is_public’: True} { “name”: “Python Meetup”, “date”: “2024-03-15T18:30:00”, “attendees”: 45, “is_public”: true } |
The model_dump() and model_dump_json() methods provide flexible export options. You can exclude sensitive fields, include only specific fields, or customize how values are serialized. This is particularly useful when creating API responses where you need different representations of the same data for different contexts.
Conclusion
Pydantic transforms data validation from a tedious, error-prone task into an automatic, declarative process. Using Python’s type system, it provides runtime guarantees about your data structure while maintaining clean, readable code. Pydantic helps you catch errors early and build more reliable applications with less boilerplate code.
This article should give you a good foundation in Pydantic, from basic models to custom validators and nested structures. We’ve covered how to define data models with type hints, handle optional fields and defaults, create custom validation logic, and work with complex nested structures.
As you apply these concepts in your projects, you’ll learn additional features like serialization options, configuration settings, and advanced validation patterns. The patterns you’ve learned here will scale from simple scripts to complex applications. Keep experimenting with Pydantic’s features, and you’ll find it becomes an essential tool in your Python development workflow.
