Edit 2019-01-12: fixed here: numpy/pull/11977
Here is a minimal example. Running this code results in unbounded memory use, looking like a memory leak:
It turns out that this isn’t really a memory leak but rather a problem with NumPy’s vectorize method which creates a circular reference in some situations. Here’s the GitHub issue that I opened: numpy/issues/11867.
In the mean time, a workaround is to manually delete the
_ufunc attribute after using
Alternatively, avoid the frozen distribution and call
It’s worth mentioning that memory_profiler is a great tool for finding memory leaks:
$ python3 geomprofile.py Filename: geomprofile.py Line # Mem usage Increment Line Contents ================================================ 4 69.2 MiB 69.2 MiB @profile 5 def main1(): 6 79.2 MiB 0.0 MiB for _ in range(1000): 7 79.2 MiB 10.0 MiB x = hypergeom(100, 30, 40).cdf(3)
We see that the
hypergeom line contributed to an increase in memory use of 10Mb.
Drilling down into NumPy’s
vectorize took a bit of manual debugging; I didn’t have as much luck with memory_profiler there.
In a production situation one might not have the luxury of finding the real cause of the memory leak immediately. In that case it might be enough to wrap the offending code in a call to multiprocessing so that the leaked memory is reclaimed frequently. A lightweight option is to use processify. See Liau Yung Siang’s blog post for more details.