Memory leak in a Python project
I have recently analyzed and fixed an interesting issue. One of the Metal3 components, Ironic Inspector, started showing a linear growth of consumed memory. Other containers in the Metal3 pod do not exhibit similar issues. The memory usage of the service that normally requires 100-200 MiB exceeds 1 GiB within a month of being mostly idle.
This blog post describes what I did to debug the problem, and how an unorthodox approach to resource clean-up can cause unexpected troubles down the road.
How much memory does a container consume?
Should be trivial to figure out, right? Right?..
OpenShift has metrics API (which the reporter used), but it's not available on dev-scripts (the testing environment we commonly use) for whatever reason :( But that's just Linux, so...
... no, the "ps" command is not installed either ...
... okay, let's resort to the basics:
$ oc exec -c metal3-ironic-inspector metal3-69f467dd6-b9wtl -- grep Rss /proc/1/status RssAnon: 941804 kB RssFile: 11248 kB RssShmem: 0 kB
Here, the problem is already noticeable: this service has no reasons to consume more than 900 MiB of RAM.
How does it develop?
To confirm the bug, I need to see that the memory usage actually grows (as opposed to allocating too much memory once). Let's commit a Small Bash Crime, knowing that the process of interest will have PID 1 in the container.
while true; do date -u; date -u >> ~/report; oc exec -c metal3-ironic-inspector metal3-8f9b5d6c8-mpp9v -- \ grep Rss /proc/1/status >> ~/report; sleep 600; done
Now we have a report that looks like this (although much longer):
Thu May 23 11:56:35 UTC 2024 RssAnon: 941804 kB RssFile: 11248 kB RssShmem: 0 kB Thu May 23 12:06:35 UTC 2024 RssAnon: 942056 kB RssFile: 11248 kB RssShmem: 0 kB Thu May 23 12:16:36 UTC 2024 RssAnon: 942336 kB RssFile: 11248 kB RssShmem: 0 kB Thu May 23 12:26:36 UTC 2024 RssAnon: 942704 kB RssFile: 11248 kB RssShmem: 0 kB Thu May 23 12:36:36 UTC 2024 RssAnon: 942912 kB RssFile: 11248 kB RssShmem: 0 kB
The growth is steady, just as reported. Bug confirmed.
Now, what is leaking?
To leak memory in Python, you need to allocate an object and then keep a reference to it somewhere forever. Usually, a global variable or some sort of a cache (spoiler: this case is more interesting).
Checking the Inspector logs, I quickly discovered what exactly it was doing all this time: largely nothing. Just running a periodic task to sync its database with Ironic. This task is therefore the only obvious culprit.
Time to tracemalloc!
The standard tracemalloc module allows taking snapshots of memory, comparing these snapshots, and reporting the difference including the exact line where the memory was allocated.
Game plan: take snapshots periodically, compare each new one with the previous one. At which point? Well, I have a periodic task AND this periodic task is our suspect, so why not put this logic in the end of it?
Starting tracemalloc
... is easy:
diff --git a/ironic_inspector/cmd/all.py b/ironic_inspector/cmd/all.py index 5dcf2a9..942df91 100644 --- a/ironic_inspector/cmd/all.py +++ b/ironic_inspector/cmd/all.py @@ -13,6 +13,7 @@ """The Ironic Inspector service.""" import sys +import tracemalloc from oslo_config import cfg from oslo_service import service @@ -26,6 +27,8 @@ CONF = cfg.CONF def main(args=sys.argv[1:]): + tracemalloc.start() + # Parse config file and command line options, then start logging service_utils.prepare_service(args)
Collecting samples
Proudly stol^Wadapted from the Python documentation:
diff --git a/ironic_inspector/conductor/manager.py b/ironic_inspector/conductor/manager.py index 44e2c43..e32511d 100644 --- a/ironic_inspector/conductor/manager.py +++ b/ironic_inspector/conductor/manager.py @@ -13,6 +13,7 @@ import sys import traceback as traceback_mod +import tracemalloc from eventlet import semaphore from futurist import periodics @@ -219,7 +220,10 @@ def periodic_clean_up(): # pragma: no cover pxe_filter.driver().sync(ir_utils.get_client()) -def sync_with_ironic(conductor): +_SNAPSHOT = None + + +def _do_sync(conductor): if (conductor.coordinator is not None and not conductor.coordinator.is_leader): LOG.debug('The conductor is not a leader, skipping syncing ' @@ -234,6 +238,22 @@ def sync_with_ironic(conductor): node_cache.delete_nodes_not_in_list(ironic_node_uuids) +def sync_with_ironic(conductor): + global _SNAPSHOT + + _do_sync(conductor) + + new_snapshot = tracemalloc.take_snapshot() + if _SNAPSHOT is not None: + diff = new_snapshot.compare_to(_SNAPSHOT, 'lineno') + if diff: + for stat in diff[:10]: + LOG.info("MEMORY %s", stat) + else: + LOG.debug("No memory usage difference") + _SNAPSHOT = new_snapshot
Looking at the report
Each time the peridic task is run, something like this is logged:
/usr/lib64/python3.9/tracemalloc.py:193: size=0 B (-92.5 KiB), count=0 (-1974) /usr/lib/python3.9/site-packages/openstack/config/cloud_region.py:193: size=30.6 MiB (+5868 B), count=229614 (+42), average=140 B /usr/lib/python3.9/site-packages/openstack/config/cloud_region.py:74: size=26.0 MiB (+4987 B), count=355355 (+65), average=77 B /usr/lib/python3.9/site-packages/openstack/config/_util.py:32: size=24.2 MiB (+4632 B), count=5467 (+1), average=4632 B /usr/lib/python3.9/site-packages/openstack/resource.py:278: size=30.7 KiB (-2856 B), count=187 (-17), average=168 B /usr/lib/python3.9/site-packages/openstack/config/cloud_region.py:79: size=14.5 MiB (+2777 B), count=229614 (+42), average=66 B /usr/lib/python3.9/site-packages/eventlet/semaphore.py:43: size=8832 KiB (+1872 B), count=30717 (+6), average=294 B /usr/lib/python3.9/site-packages/openstack/utils.py:84: size=9884 KiB (+1752 B), count=54428 (+9), average=186 B /usr/lib/python3.9/site-packages/openstack/resource.py:574: size=558 KiB (-1008 B), count=3401 (-6), average=168 B /usr/lib/python3.9/site-packages/openstack/connection.py:395: size=3971 KiB (+744 B), count=10933 (+2), average=372 B
Importantly, with each iteration, certain sizes always increate and never decrease. As you can guess, these are the ones in the MiB range at the time of taking this sample.
I will ignore the eventlet allocation and hope that it goes away, because the last thing I want to do is to dive into the eventlet's code. Everything else clearly points at the openstacksdk - the library we use to talk to Ironic.
Looking at the source
First let's take a look at utils.py
line 84. This is not literally the
first thing that caught my attention, but it exposes a completely separate
smaller issue. Here is the code:
def get_string_format_keys(fmt_string, old_style=True): if old_style: class AccessSaver: def __init__(self): self.keys = [] def __getitem__(self, key): self.keys.append(key) a = AccessSaver() fmt_string % a return a.keys # else ...
Ouch. A lesser known fact: creating new types in Python is not exactly cheap. Many years ago, I ended up creating namedtuples on each API request which made this API unbelievably slow. Now it seems like they are not necessarily deallocated?
A naive attempt to verify this idea with a simple Python script has not yielded a conclusive result, and it's possible that this leak is simply a byproduct of the larger issue. I have proposed a trivial fix to create the class only once and left it there. If you understand how classes are deallocated, definitely let me know!
What else do we have?
The other lines seem pretty boring at first sight:
if project_name not in conf: _disable_service( config_dict, st, reason="No section for project '{project}' (service type " "'{service_type}') was present in the config.".format( project=project_name, service_type=st ), ) continue
def _make_key(key, service_type): if not service_type: return key else: service_type = service_type.lower().replace('-', '_') return "_".join([service_type, key])
def normalize_keys(config): new_config = {} for key, value in config.items(): key = key.replace('-', '_') if isinstance(value, dict): new_config[key] = normalize_keys(value) elif isinstance(value, bool): new_config[key] = value elif isinstance(value, int) and key not in ( 'verbose_level', 'api_timeout', ): new_config[key] = str(value) elif isinstance(value, float): new_config[key] = str(value) else: new_config[key] = value return new_config
At first, nothing particularly interesting: allocating some strings in the
first two examples, presumably reallocating the dictionary in the last one. The
location of the first extract is important though: it's a part of the
from_conf
function that creates a new CloudRegion
object, which, in
turn, is a part of a Connection
object. By re-tracing the code paths
leading to the seemingly boring allocations, I'm able to conclude that
connections are somehow not deallocated, which causes their configuration
dictionaries to accumulate in the memory.
Do we actually close connections?
Well, we do not:
def get_client(token=None): session = _get_ironic_session() try: return openstack.connection.Connection( session=session, oslo_conf=CONF).baremetal except Exception as exc: LOG.error('Failed to establish a connection with ironic, ' 'reason: %s', exc) raise
Not closing connections is a potential problem, but it's usually not a huge problem in programming languages with garbage collection. Once the baremetal proxy is no longer used by the caller, it and thus the connection will be orphaned, causing Python to delete them. We won't have a chance to handle potential exceptions (which is why it's a bad practice), but memory should not be leaked.
At this point, I could add a missing close
call or better keep the
connection for the lifetime of the service (which is what I'm inclined to do in
reality), but I wanted to understand the issue.
Diving into close
Let's check out the Connection
's close
call. How many seconds will it
take you to go "wait, what"?
def close(self): self.config.set_auth_cache() if self.__pool_executor: self.__pool_executor.shutdown() atexit.unregister(self.close)
Okay, okay, I was in a bit different position: I was going through the git
history and stumbled upon the patch that makes close
look like this. I had
some context, and here is the commit message:
finalizers execution order is not guaranteed in python3. This leads to absence of builtins like "open" in the finalizers. Depending on the keyring backend used inside of the set_auth_cache access to the file system may be required. Replace use of destructor to another builtin library "atexit" to implement a more reliable exit handler.
Wait, what?
The first paragraph is correct: you should not rely on __del__
method to be
called in a normal Python environment. If your process is already exiting, many
things may have been deleted. And yes, even the open
built-in function may
no longer be available. What made me blink in a disbelief is the second part.
Here is the change:
diff --git a/openstack/connection.py b/openstack/connection.py index 819e16bd3..5722a01e2 100644 --- a/openstack/connection.py +++ b/openstack/connection.py @@ -176,6 +176,7 @@ try to find it and if that fails, you would create it:: Additional information about the services can be found in the :ref:`service-proxies` documentation. """ +import atexit import concurrent.futures import warnings import weakref @@ -472,9 +473,8 @@ class Connection( 'additional_metric_tags' ] = self.config.config['additional_metric_tags'] - def __del__(self): - # try to force release of resources and save authorization - self.close() + # Register cleanup steps + atexit.register(self.close)
Since the connection's close
method is doing something quite non-trivial
that may easily fail during the process shutdown, the author moved this logic
from the unreliable __del__
to the atexit machinery, so that it runs
before anything is deallocated. Problem solved?
But what happens if close
is never called? The answer is: exactly the thing
described in this blog post.
Grand finale
Now we have all components of the puzzle. Inspector was written long ago under
an assumption that connections are pretty lightweight and closing them is not
necessary. In fact, I'm not even sure the close
call existed back then!
A new connection was opened on every run of the periodic task but was never
closed. Thanks to the Python's garbage collector, everything worked smoothly.
Some time passed, and certain non-trivial logic was added to close
. It was
still not a problem because Inspector has never used any features that required
proper clean-up. But then the patch above landed.
You see, things like self.close
are bound methods, they contain a
reference to the self
object (otherwise, the method body would not be able
to use it). Passing self.close
to atexit.register
makes the latter
store the former somewhere in the Python's guts. Which, in turn, makes the
corresponding object, i.e. a Connection
live for entire duration of the
service.
I'm not sure which lessons we need to learn out of this story. Possibly:
Always close things that are called Connection, File, Stream, etc.
Even better, cache connections if you use them constantly.
Don't get too creative with rarely used features, especially if they have a global effect on the process.
Really, avoid global state, it will bite you when you expect it the least.
TraceMalloc rocks (and I somehow did not know about it before)!