odoo/upgrade-util#94

Created by Upgrade, Christophe Simonis (chs)
Merged at 5f83f3aa5b9a0734915429c9ccee83fea46035f2

Statuses:

label
odoo-dev:master-processpool-chunksize-chs
head
077e33d554351245fd170aed9432d354018eadb4
merged
1 year ago by Platform, Nicolas Seinlet (nse)
odoo/upgrade-util
master #94

[IMP] snippets.convert_html_columns: a batch processing story

TLDR: RTFM

Once upon a time, in a countryside farm in Belgium...

At first, the upgrade of databases was straightforward. But, as time passed, the size of the databases grew, and some CPU-intensive computations took so much time that a solution needed to be found. Hopefully, the Python standard library has the perfect module for this task: concurrent.futures.
Then, Python 3.10 appeared, and the usage of ProcessPoolExecutor started to sometimes hang for no apparent reasons. Soon, our hero finds out he wasn't the only one to suffer from this issue1. Unfortunately, the proposed solution looked overkill. Still, it revealed that the issue had already been known2 for a few years. Despite the fact that an official patch wasn't ready to be committed, discussion about its legitimacy3 leads our hero to a nicer solution.

By default, ProcessPoolExecutor.map submits elements one by one to the pool. This is pretty inefficient when there are a lot of elements to process. This can be changed by using a large value for the chunksize argument.

Who would have thought that a bigger chunk size would solve a performance issue?
As always, the response was in the documentation4.


  1. https://stackoverflow.com/questions/74633896/processpoolexecutor-using-map-hang-on-large-load ↩

  2. python/cpython#74028 ↩

  3. python/cpython#114975 ↩

  4. https://docs.python.org/3/library/concurrent.futures.html#concurrent.futures.Executor.map ↩