Background
This blog is inspired by the article "Improve Serialization Performance in Django Rest Framework". The author compared various serializer solutions in the Django Rest Framework, such as simple functions and regular serializers. Unfortunately, after reading this article, I noted that the development packages used are somewhat outdated (versions: they use Python 3.7, Django 2.1.1, and Django Rest Framework 3.9.4.). Consequently, I have decided to recreate these environments using the latest versions of the packages to enhance their credibility.
Introduction
Before delving into the detailed experiments, I will list the methods I plan to compare, along with the versions of packages in my development environment:
Serializer solutions
- Data Class
- Regular Serializer
- Model Serializer
- Simple function
- Pydantic
In recent years, Pydantic has emerged as the most widely used data validation library for Python, which is why I've included it in my comparison list.
Versions in My Local Environment
- Python 3.10
- Django 5.0.6
- Django Rest Framework 3.15.1
- Pydantic 2.7.3
Environment Setup
I created the two models called Product
and Order
within a new Django project for these experiments, and there is a relation between the Product
and Order
which can make the experiments more relevant to daily operations.
models.py
from django.db import models
class Product(models.Model):
id = models.AutoField(primary_key=True)
name = models.CharField(max_length=100)
price = models.DecimalField(max_digits=10, decimal_places=2)
stock = models.IntegerField()
remark = models.TextField()
created_at = models.DateTimeField(auto_now_add=True)
def __str__(self):
return self.name
class Order(models.Model):
id = models.AutoField(primary_key=True)
product = models.ForeignKey(Product, on_delete=models.CASCADE)
quantity = models.IntegerField()
order_date = models.DateTimeField(auto_now_add=True)
def __str__(self):
return f"Order {self.id} for {self.product.name}"
And I also create a Django command for populating data.
populate_data.py
import random
from django.utils import timezone
from django.core.management.base import BaseCommand
from myapp.models import Product, Order
class Command(BaseCommand):
help = 'Populate the database with test data'
def handle(self, *args, **kwargs):
Product.objects.all().delete()
Order.objects.all().delete()
# Create products
products = []
for i in range(1000):
product = Product(
name=f'Product {i}',
price=random.uniform(10.0, 100.0),
stock=random.randint(1, 100),
remark='Remark for product',
created_at=timezone.now()
)
products.append(product)
# Bulk create products with a batch size of 500
Product.objects.bulk_create(products, batch_size=500)
# Fetch all products to get their IDs
all_products = list(Product.objects.all())
# Create orders
orders = []
for i in range(10000):
order = Order(
product=random.choice(all_products),
quantity=random.randint(1, 10),
order_date=timezone.now()
)
orders.append(order)
# Bulk create orders with a batch size of 500
Order.objects.bulk_create(orders, batch_size=500)
self.stdout.write(self.style.SUCCESS('Successfully populated the database with test data'))
Next, we need to run the following commands to complete data preparation.
$ python manage.py makemigrations
$ python manage.py migrate
$ python manage.py populate_data
Experiment
As previously mentioned, we have various methods for comparison. In this section, we will implement the necessary serializer solutions.
Data Class
from dataclasses import dataclass
from datetime import datetime
from typing import Optional
@dataclass
class ProductData:
id: int
name: str
price: float
stock: int
remark: str
created_at: datetime
@dataclass
class OrderData:
id: int
product: ProductData
quantity: int
order_date: datetime
Regular Serializer
from rest_framework import serializers
class ProductSerializer(serializers.Serializer):
id = serializers.IntegerField()
name = serializers.CharField(max_length=100)
price = serializers.DecimalField(max_digits=10, decimal_places=2)
stock = serializers.IntegerField()
remark = serializers.CharField()
created_at = serializers.DateTimeField()
class OrderSerializer(serializers.Serializer):
id = serializers.IntegerField()
product = ProductSerializer()
quantity = serializers.IntegerField()
order_date = serializers.DateTimeField()
Model Serializer
from rest_framework import serializers
from .models import Product, Order
class ProductModelSerializer(serializers.ModelSerializer):
class Meta:
model = Product
fields = ['id', 'name', 'price', 'stock', 'remark', 'created_at']
class OrderModelSerializer(serializers.ModelSerializer):
product = ProductModelSerializer()
class Meta:
model = Order
fields = ['id', 'product', 'quantity', 'order_date']
Simple Function
from typing import Dict, Any
def serialize_product(product: Product) -> Dict[str, Any]:
return {
'id': product.id,
'name': product.name,
'price': float(product.price),
'stock': product.stock,
'remark': product.remark,
'created_at': product.created_at.isoformat() if product.created_at else None,
}
def serialize_order(order: Order) -> Dict[str, Any]:
return {
'id': order.id,
'product': serialize_product(order.product),
'quantity': order.quantity,
'order_date': order.order_date.isoformat() if order.order_date else None,
}
Pydantic
from pydantic import BaseModel
from typing import List
from datetime import datetime
class ProductDataPydantic(BaseModel):
id: int
name: str
price: float
stock: int
remark: str
created_at: datetime
class OrderDataPydantic(BaseModel):
id: int
product: ProductDataPydantic
quantity: int
order_date: datetime
Experiment Results
For this experiment I prepare a Django command and use the line_profiler to help me measure the performance for different methods. As you can see below I have the five functions corresponding to different methods for performance evaluation.
profile_serialization.py
from django.core.management.base import BaseCommand
from django.db import connection
from myapp.models import Product, Order
from myapp.dataclasses import ProductData, OrderData
from myapp.serializers import ProductSerializer, ProductModelSerializer, OrderSerializer, OrderModelSerializer, serialize_product, serialize_order
from myapp.pydantic_models import ProductDataPydantic, OrderDataPydantic
class Command(BaseCommand):
help = 'Profile serialization performance'
def handle(self, *args, **kwargs):
# Read data from the database with select_related
orders = list(Order.objects.select_related('product').all())
print("length of orders:", len(orders))
# Dataclass serialization
self.profile_dataclass_serialization(orders)
# Django Serializer
self.profile_django_serializer(orders)
# Django ModelSerializer
self.profile_django_model_serializer(orders)
# Simple Function-based Serialization
self.profile_simple_function(orders)
# Pydantic Serialization
self.profile_pydantic_serialization(orders)
@profile
def profile_dataclass_serialization(self, orders):
order_data = [
OrderData(
id=order.id,
product=ProductData(
id=order.product.id,
name=order.product.name,
price=float(order.product.price),
stock=order.product.stock,
remark=order.product.remark,
created_at=order.product.created_at
),
quantity=order.quantity,
order_date=order.order_date
) for order in orders
]
@profile
def profile_django_serializer(self, orders):
order_serializer = OrderSerializer(orders, many=True)
order_data = [order for order in order_serializer.data]
@profile
def profile_django_model_serializer(self, orders):
order_model_serializer = OrderModelSerializer(orders, many=True)
order_data = [order for order in order_model_serializer.data]
@profile
def profile_simple_function(self, orders):
simple_serialized_orders = [serialize_order(order) for order in orders]
@profile
def profile_pydantic_serialization(self, orders):
order_data = [
OrderDataPydantic(
id=order.id,
product=ProductDataPydantic(
id=order.product.id,
name=order.product.name,
price=float(order.product.price),
stock=order.product.stock,
remark=order.product.remark,
created_at=order.product.created_at
),
quantity=order.quantity,
order_date=order.order_date
) for order in orders
]
You can run the following command to get the final results, and the table shows the results of different methods.
$ kernprof -l -v manage.py profile_serialization
Method | Result (seconds) | Data Validation |
---|---|---|
Dataclass serialization | 0.05273 s | N |
Django Serializer | 0.443061 s | Y |
Django ModelSerializer | 0.423694 s | Y |
Simple Function-based Serialization | 0.032002 s | N |
Pydantic Serialization | 0.086857 s | Y |
It's not surprising that simple function-based serialization performed the best in this comparison due to its simplicity and lack of data validation capabilities. However, I was somewhat surprised to find that the performance of Django's impression that ModelSerializer would perform worse, but the test showed nearly the same performance for both (Serializer and ModelSerializer). This is intriguing, and we will explore this scenario in the next section.
Upon reviewing the results table again, the most appealing solution to me is Pydantic Serialization; it performs the best among all serialization methods with data validation functionality. Remarkably, there is not much difference in performance between Pydantic and serializations without data validation. This is why Pydantic has recently become the most popular serialization/data validation tool in Python.
Further Discussion
In this section, we will discuss the performance results of the Serializer and ModelSerializer. Upon investigation, both have similar implementation approaches. There are two parts to the implementation:
- Field Handling: Both
Serializer
andModelSerializer
handle fields in a similar manner once they are defined. For read operations, both serializers iterate over the fields and generate the output dictionary. - Field Definitions: While
Serializer
requires explicit field definitions,ModelSerializer
introspects the model and automatically creates the fields
I believe Django and DRF have addressed performance issues with ModelSerializer
, which is reflected in today's experiment results. ModelSerializer
has a slight overhead during the initialization phase due to model introspection. However, this overhead is negligible during read operations because it happens only once, resulting in performance that is almost identical to that of Serializer
.
Conclusion
I took some time to complete this experiment, but I acknowledge there may be some omissions or deficiencies. I welcome your corrections. At the end of this post, I believe there is a key takeaway for you.