Wednesday, May 23, 2018

tracking memory leaks in python

Tracking leaks in python is difficult for a multitude of reasons:

  1. It's GC'd language, which means things don't get freed immediately
  2. It uses pool allocators
  3. the re module has a cache of compiled expressions
  4. tracemalloc may not give you good call stacks: https://bugs.python.org/issue33565
  5. the ThreadPoolExecutor creates a thread per submit until you hit max_workers, default max workers is os.cpu_count() * 5
  6. When using tracemalloc it will consume memory for using the traces
  7. When using modules like request/aiohttp/aiobotocore/etc which use sockets they typically have a pool of connections whose size may fluctuate over time
  8. memory fragmentation

Here are a set of work-arounds around these issues
  1. gc.collect() from a place that isn't holding onto object references when you want a stable point)
  2. from 3.6 forwards use PYTHONMALLOC=malloc
  3. call re._cache.clear() from a similar place to #1
  4. no known work-around (I'm trying tohelp ensure it does something better in the future)
  5. when you start tracemalloc ensure you start after all the threads have been created, this means you've submitted at least max_worker jobs to the pools.  Another hack is temporarily changing the ThreadPoolExecutor to create all threads on first submit
  6. Don't rely on RSS when using tracemalloc
  7. Try to make the pool sizes 1
  8. Run your leak tests for longer periods, or if using large chunks of memory try to reduce the chunk sizes

The way I approach it is two-fold:
  1. Try to use tracemalloc to figure out specifically where leaks are coming from, I use a helper like: https://gist.github.com/thehesiod/2f56f98370bea45f021d3704b21707a9
  2. using memory_profiler module to binary search through the codebase to figure out what is causing a leak from a high-level.  This basically means disabling parts of your application until you find the trigger.


No comments: