Pydantic - data validation library in Python

Understanding Pydantic in Python: Revolutionizing Data Validation

FastAPI | Python | Spatial Data Science | January 11, 2024

Pydantic is a data validation and settings management library for Python 3.6 and above.


Data validation and settings management play a pivotal role in developing robust applications in Python programming. Pydantic, a Python library, has emerged as a game-changer in this domain. By leveraging Python type annotations, Pydantic simplifies data validation and parsing, bringing efficiency and clarity to code.


Why Type Hints in Python


Type hints in Python serve a different purpose compared to statically typed languages like C or Java. In statically typed languages, the type of every variable, function parameter, and return type must be explicitly declared and is strictly enforced at compile time. This means that type-related errors are caught before the program even runs, contributing to the overall robustness and efficiency of the code.


Python, however, is a dynamically typed language where variable types are determined at runtime, and the same variable can hold data of different types at different times during execution. While this provides flexibility and ease of coding, it can lead to more complex bugs and type-related errors that only emerge during runtime.


Code Runtimes


Type hints in Python address this by allowing developers to specify types for variables and function parameters and return values optionally. These hints make the code more readable and understandable, helping developers catch potential issues earlier in the development process, often with IDEs and linters. They also facilitate better code documentation and improved maintainability, especially in larger codebases where understanding the type of data being passed around can be challenging.


What is Pydantic


Pydantic is a data validation and settings management library for Python 3.6 and above. It is built on Python type annotations, which allows for user-friendly and concise data parsing. The library ensures that the data you work with adheres to the formats and types you expect, raising informative errors when data is invalid. This functionality is not only powerful for debugging but also for ensuring data integrity throughout your application.


Key Features of Pydantic


  1. Easy Data Parsing: Pydantic allows for quick parsing of complex data types into Python classes, such as JSON models. This is particularly useful in web frameworks and APIs.
  2. Speed: Pydantic core validation logic is written in Rust, which makes it very fast.
  3. Automatic Data Validation: The library validates data as it's parsed. You can catch errors early, ensuring only valid data flows through your systems.
  4. Data Validation Modes: Pydantic can run in either strict=True mode (where data is not converted) or strict=False mode, where Pydantic tries to coerce data to the correct type where appropriate.
  5. Editor Support: Pydantic's reliance on standard Python types means it works well with code editors, offering autocomplete and type-checking features.
  6. Custom Validation: While it offers extensive built-in validators, Pydantic also allows you to define custom validation functions, offering flexibility for complex scenarios.

Pydantic in Action: Exploring Popular Use Cases


Pydantic's flexibility and ease of use have made it popular in various Python applications and is used by all FAANG companies and 20 of the 25 largest companies on NASDAQ.

Pydantic is downloaded over 70M times/month!

Let us explore some of its most common use cases, showcasing its versatility and effectiveness.


API Development with FastAPI:

  1. FastAPI, one of the fastest-growing Python web frameworks, integrates Pydantic for request and response handling.
  2. Developers define data models using Pydantic, which FastAPI uses to handle data serialization and validation automatically.
  3. This integration speeds up API development and ensures the APIs are type-safe, reducing runtime errors.
  4. Pydantic models in FastAPI come with automatic documentation generation, making API endpoints self-documenting and consuming easier.

        
            #
"""
This code snippet defines an API endpoint where the data structure
for the request body is a Pydantic model.
"""

from fastapi import FastAPI
from pydantic import BaseModel

class Item(BaseModel):
    name: str
    description: str = None
    price: float
    tax: float = None

app = FastAPI()

@app.post("/items/")
async def create_item(item: Item):
    return item
        
    

Robust Settings Management:

    • Pydantic's BaseSettings class is a boon for application configuration. It supports reading from environment variables, files, and complex hierarchical settings.
    • This feature is particularly valuable in cloud-native and containerized applications, where configuration often comes from environment variables.
    • Using Pydantic, developers ensure that their application settings are loaded, validated, and documented.

        
            #
"""
This example demonstrates loading configuration from an environment file.
"""
from pydantic import BaseSettings

class Settings(BaseSettings):
    app_name: str
    admin_email: str
    items_per_page: int = 10

settings = Settings(_env_file='.env')
        
    

Data Science and Machine Learning:


Neural Network

In the realm of data science and machine learning, data validation is critical.

Pydantic can validate data schemas for machine learning models, ensuring that the input data matches the expected format.

It also assists in preprocessing steps, where data from various sources can be normalized and validated seamlessly.


        
            #
"""
Pydantic ensures that the data matches the expected format before it's fed into a machine learning model.
"""
from pydantic import BaseModel, ValidationError
import pandas as pd

class MLModelData(BaseModel):
    age: int
    salary: float
    department: str

try:
    # Simulating data row from a dataset
    data = MLModelData.model_validate({'age': 30, 'salary': 70000, 'department': 'HR'})
except ValidationError as e:
    print(e.json())
        
    

Integrating with ORM Tools like SQLAlchemy:

    • When used with Object-Relational Mapping (ORM) tools like SQLAlchemy, Pydantic helps in validating and serializing database records.
    • This integration simplifies converting complex query results into Python models that are easier to work with and understand.

        
            #
"""
In this example, Pydantic validates and serializes a database record
into a Python object for SqlAlchemy. 
"""
from pydantic import BaseModel
from sqlalchemy.orm import Session
from my_app.models import User

class UserSchema(BaseModel):
    id: int
    name: str
    email: str

# Assuming db_session is a SQLAlchemy session
db_user = db_session.query(User).first()
user_data = UserSchema.from_orm(db_user)
        
    

Enhancing Testing with Pytest:

  • In testing frameworks like Pytest, Pydantic ensures that test data adheres to the same schemas as production data.
  • This consistency reduces the likelihood of tests passing with data that would be invalid in a live environment.

        
            #
"""
This code shows using Pydantic in Pytest fixtures to ensure consistency in test data.
"""
import pytest
from my_app.models import UserSchema

@pytest.fixture
def user_data():
    return UserSchema(name="John Doe", email="john@example.com")

def test_user_creation(user_data):
    assert user_data.name == "John Doe"
        
    

Compatibility with Django and Flask:

    • While Django and Flask do not natively integrate with Pydantic, community-driven plugins and extensions have bridged this gap.
    • These integrations bring Pydantic’s powerful validation to these widely-used frameworks, enhancing their capability in handling data validation and serialization.

        
            #
"""
Flask example with Pydantic for request validation.
This Flask route uses Pydantic to validate incoming JSON data.
"""
from flask import Flask, request
from pydantic import BaseModel, ValidationError

app = Flask(__name__)

class UserRequest(BaseModel):
    username: str
    password: str

@app.route('/user', methods=['POST'])
def create_user():
    try:
        user = UserRequest.parse_raw(request.data)
    except ValidationError as e:
        return str(e),  400
    return "User Created",  201
        
    

Data Parsing in ETL Processes:

    • Pydantic is also an asset in Extract, Transform, Load (ETL) processes, where data consistency and format are crucial.
    • It can be used to validate data at each stage of the ETL pipeline, ensuring data integrity and reducing errors in data processing.

        
            #
"""
Pydantic is used here to validate each data item in an ETL process
"""
from pydantic import BaseModel, ValidationError

class ProductData(BaseModel):
    product_id: int
    name: str
    price: float

raw_data = {'product_id': 1, 'name': 'Widget', 'price': 19.99}

try:
    product = ProductData(**raw_data)
except ValidationError as e:
    print("Data validation error:", e)
        
    

Conclusion


Pydantic's broad applicability across different domains of Python programming underscores its value. From API development to data science and configuration management to enhancing traditional web frameworks, Pydantic stands out as a versatile, powerful tool for modern Python development. Its role in ensuring data integrity and streamlining development workflows cannot be overstated, making it a must-learn library for Python developers.