Background
Recently, I encountered an unusual bug, and upon investigation, we discovered that the root cause was related to the singleton object we had implemented earlier. This bug was deeply concealed, making it quite challenging to uncover. In this blog post, I will narrate the entire journey. Let's dive in.
Singleton
First, we need to know what the singleton is and how to implement it in Python. The Singleton pattern restricts the instantiation of a class to a single instance and provides a global point of access to that instance, and the following shows a simple example of creating a singleton object in Python:
class ConfigGetter:
_instance = None
_config_cache = {}
def __init__(self) -> None:
print("__init__")
def __new__(cls, *args, **kwargs):
print("cls._instance", cls._instance)
if cls._instance is None:
cls._instance = super().__new__(cls)
return cls._instance
We use the magic method __new__
to complete the singleton in Python. __new__
is a class method (i.e., it's defined on the class itself) and is responsible for creating a new instance of the class. It's called before __init__
, and its primary purpose is to return a new instance of the class.
class TestView(APIView):
def get(self, request):
config_getter = ConfigGetter()
return Response(status=status.HTTP_200_OK)
Next, we use a simple API to test whether the singleton object works fine or not. We start a web service (Django) and call the API twice. You can see the output like the following:
Django version 3.2.15, using settings 'config.settings.dev'
Starting development server at http://127.0.0.1:8000/
Quit the server with CONTROL-C.
cls._instance None
init
cls._instance <src.ConfigGetter object at 0x10aee0e50>
init
the first time, the cls._instance was None, but the second time, it showed it was a ConfigGetter object. The singleton object appears to work correctly, but please be cautious of the __init__
will still execute twice, which means the attributes of the singleton object may vary depending on your __init__
.
Case Study
Alright, after demonstrating how to implement the singleton in Python, let's return to the story we initially intended to share. We have a configuration table that controls which crawlers are enabled. When we execute the code, it checks this configuration table using the ConfigGetter
object.
Here's an example of the code:
class ConfigGetter:
_instance = None
_config_cache = {}
def __new__(cls, *args, **kwargs):
print("cls._instance", cls._instance)
if cls._instance is None:
cls._instance = super().__new__(cls)
return cls._instance
def get_config(self, scac_code: str) -> CarrierConfig:
if scac_code not in self._config_cache:
config = self._prepare_config(scac_code=scac_code)
self._config_cache[scac_code] = config
return self._config_cache[scac_code]
The _prepare_config
the function will load the data from the database, and as you can see, the smart us at that time was to utilize the process cache and singleton object to reduce the query times for better performance. So, what problems does this approach introduce?"
The answer is that we can't immediately apply database changes to the code. To elaborate, if I update the config table and want the program to apply these changes immediately, can we achieve this with the code we designed above? Absolutely not. So, can we implement it correctly while also reducing database queries?
Problem definition
Okay, let's recap our problem. We have a web service running on Django, and we want to use a config table to control which crawler is currently enabled. We have two key criteria to fulfill:
- After updating the config table, the program should immediately apply these changes.
- We need to ensure good performance. We don't want the program to query the config table every time because these changes are infrequent.
In the beginning, we proposed a simple solution for handling this case; we reset the _config_cache
after we updated the config table. It will look like:
class ConfigGetter:
_instance = None
_config_cache = {}
def __new__(cls, *args, **kwargs):
print("cls._instance", cls._instance)
if cls._instance is None:
cls._instance = super().__new__(cls)
return cls._instance
def get_config(self, scac_code: str) -> CarrierConfig:
if scac_code not in self._config_cache:
config = self._prepare_config(scac_code=scac_code)
self._config_cache[scac_code] = config
return self._config_cache[scac_code]
def reset(self):
self._config_cache = {}
But after giving it some thought, this solution may not work as expected. In a Django application that runs with multiple processes, each process will indeed have its own Singleton object. This behavior is because each process operates independently and maintains its own separate memory space.
What does that mean? If we have 5 processes in a Django application, it's hard to reset _config_cache
for all processes, so here's the next solution:
Could we create a singleton object across all processes?
import multiprocessing
class Singleton:
_instance = None
_config_cache = {}
_lock = multiprocessing.Lock()
def __new__(cls):
with cls._lock:
if cls._instance is None:
cls._instance = super().__new__(cls)
return cls._instance
def get_config(self, scac_code: str) -> CarrierConfig:
if scac_code not in self._config_cache:
config = self._prepare_config(scac_code=scac_code)
self._config_cache[scac_code] = config
return self._config_cache[scac_code]
def reset(self):
self._config_cache = {}
Alright, this seems to be shaping up nicely, doesn't it? We proceeded with testing and reflecting on whether there were any potential issues with this solution. Another concern surfaced: if we implement this Singleton pattern and run both the Django and Celery processes within the same Python interpreter instance, they will indeed share the same Singleton object, all thanks to multiprocessing.Lock()
.
However, if you are running the Django and Celery processes in separate interpreter instances (for example, running them on separate servers), they will not share the same Singleton object. In this case, you would need to use a different method to share the Singleton instance across processes, such as using a shared memory object or a separate server process to manage the Singleton instance.
Summary
In the end, we opted for the Redis cache solution to resolve this issue. After updating the config table, we clear the Redis cache, forcing the program to query the table and rebuild the Redis cache. I hope you found this journey insightful, and if you have any great ideas or better solutions, please leave a comment; I would greatly appreciate it. Thanks for reading!