Best Practices: Normalize while designing, denormalize while optimizing
Let's say we have the first data structure like this.
Name | Origin | Power | Latitude | Longitude | Country | Time |
---|---|---|---|---|---|---|
Blitz | Alien | Freeze | 10 | -15 | USA | 2023/04/29 |
Blitz | Alien | Flight | 10 | -15 | USA | 2021/01/02 |
Hexa | Scientist | Telekinesis | 35 | 139 | Japan | 2010/09/01 |
Hexa | Scientist | Telekinesis | 35 | 139 | Japan | 2012/01/08 |
Traveller | Billionaire | Time Travel | 43 | 1 | France | 2010/11/09 |
Since Name
and Origin
are distinct, they can be taken out as a seperate table.
Origin
Table
Name | Origin |
---|---|
Blitz | Alien |
Hexa | Scientist |
Traveller | Billionaire |
Powers
Table
Name | Power | Latitude | Longitude | Country | Time |
---|---|---|---|---|---|
Blitz | Freeze | 10 | -15 | USA | 2023/04/29 |
Blitz | Flight | 10 | -15 | USA | 2021/01/02 |
Hexa | Telekinesis | 35 | 139 | Japan | 2010/09/01 |
Hexa | Telekinesis | 35 | 139 | Japan | 2012/01/08 |
Traveller | Time Travel | 43 | 1 | France | 2010/11/09 |
Also, notice that Latitude
and Longitude
are dependant on the country.
So we can create a
Locations
table
Location_id | Latitude | Longitude | Country |
---|---|---|---|
1 | 10 | -15 | USA |
2 | 35 | 139 | Japan |
3 | 43 | 1 | France |
Powers
table can be simplified as
user_id | power | location_id | time |
---|---|---|---|
2 | Freeze | 1 | 2023/04/29 |
2 | Flight | 1 | 2021/01/02 |
4 | Telekinesis | 2 | 2010/09/01 |
4 | Telekinesis | 2 | 2012/01/08 |
7 | Time Travel | 3 | 2010/11/09 |
Noramalized tables are easier to see as a user of the database and it becomes easier to cherry-pick the desired data.
However, if I need the first nomal form, it requires a lot of joins, which slows the performance.
In Django, we can define it was unique_together
in the Meta class of the model.
class Locations(models.Model):
...
class Meta:
unique_together = ("latitude", "longitude")
When designing a data schema, we often face duplicate fields, violating the DRY principle.
Django provides an Abstract Base Classes.
All we need is an abstract = True
statement in class Meta
.
A visual exmaple will be easy to understand.
class ContactInfo(models.Model):
name = models.CharField(max_length=30)
email = models.EmailField(max_length=30)
address = models.CharField(max_length=30)
class Meta:
abstract = True
class Customer(ContactInfo):
phone = models.CharField(max_length=30)
class Staff(ContactInfo):
position = models.CharField(max_length=30)
Staff
and Customer
both inherit from ContactInfo
.
ContactInfo
is the abstract class, which is not created as a new table in the database, but Customer
and Staff
tables will have fields such as, name, email and address.
Django models can become very chuncky. It is ideal to refactor methods that do not interact directly with the database.
Let's say we have a model that checks if the user is part of another service.
models.py
class Profile(models.Model):
...
def is_registered(self):
url = "http://api.regcheck.com/?q={0}".format(
self.user.username)
return webclient.get(url)
This can be refactored to
models.py
from .services import RegisterAPI
def is_registered(self):
return RegisterAPI.is_registered(self.user.username)
services.py
API_URL = "http://api.regcheck.com/?q={0}"
class RegisterAPI:
...
def is_registered(self, username):
url =API_URL.format(username)
return webclient.get(url)
Don't store unnecessary data.
This is an obvious advice but during developement, we often find ourselves, making columns for every field we need.
To avoid this, use @property
In the polls tutorial, we've used
class Question(models.Model):
...
def was_published_recently(self):
return self.pub_date >= timezone.now() - datetime.timedelta(days=1)
If this attribute does not have to be stored in db, we can simply add @property
class Question(models.Model):
...
@property
def was_published_recently(self):
return self.pub_date >= timezone.now() - datetime.timedelta(days=1)
In Question
table, was_published_recently
will not be stored, but accessing Question.was_published_recently()
will return the same expected value.
If the property itself is computationally expensive, use @cached_property
.
This value will be cached in Python as an instance memory.
The implementation method is identical with @property
.
Access the data using only one dot.
Rather than calling Question.pub_date.year
, define a property of Question.pub_year
.
This adds security benefits of hiding the datastructure of your models.
The post was written based on
Django Design Patterns and Best Practices: Easily Build Maintainable Websites with Powerful and Relevant Django Design Patterns - Auran Ravidran
Recommend checking this book for more detailed explanation.