BUG: np.dot is not thread-safe with OpenBLAS #11046

artemru · 2018-05-04T12:22:26Z

I'm using numpy (1.14.1) linked against OpenBLAS 0.2.18 and it looks like np.dot
(that uses dgemm routine from openblas) is not thread-safe :

import numpy as np
from multiprocessing.pool import ThreadPool

dim = 4   # for larger value of dim, there's no issue
a = np.arange(10**5 / dim) / 10.**5
b = np.arange(10**5).reshape(-1, dim) / 10.**5

pp = ThreadPool(4)
threaded_result = pp.map(a.dot, [b] * 4) 
pp.close()
pp.terminate()

result = a.dot(b)
print [np.max(np.abs(x - result)) for x in threaded_result]

# print
# [1822.7068840452998, 1540.2636287421, 96.10628199050007, 0.0]
# or other rather random results whereas it should return zeros

I don't know if this kind of behavior is expected, is it numpy or rather openblas bug ?

Note :

numpy with MKL blas does not have this issue at all
everything runs fine if openblas threading is turned off (export OPENBLAS_NUM_THREADS=1)
I don't know how to test openblas==0.2.20 version that maybe solves this

Some extra info if needed :

Architecture:          x86_64
CPU op-mode(s):        32-bit, 64-bit
Byte Order:            Little Endian
CPU(s):                32
On-line CPU(s) list:   0-31
Thread(s) per core:    2
Core(s) per socket:    8
Socket(s):             2
NUMA node(s):          2
Vendor ID:             GenuineIntel
CPU family:            6
Model:                 62
Model name:            Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
Stepping:              4
CPU MHz:               2500.060
BogoMIPS:              5000.12
Hypervisor vendor:     Xen
Virtualization type:   full
L1d cache:             32K
L1i cache:             32K
L2 cache:              256K
L3 cache:              25600K
NUMA node0 CPU(s):     0-7,16-23
NUMA node1 CPU(s):     8-15,24-31
Flags:                 fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx rdtscp lm constant_tsc rep_good nopl xtopology eagerfpu pni pclmulqdq ssse3 cx16 pcid sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm retpoline kaiser fsgsbase smep erms xsaveopt

np.show_config()
lapack_opt_info:
    libraries = ['openblas', 'openblas']
    library_dirs = ['/usr/local/lib']
    define_macros = [('HAVE_CBLAS', None)]
    language = c
blas_opt_info:
    libraries = ['openblas', 'openblas']
    library_dirs = ['/usr/local/lib']
    define_macros = [('HAVE_CBLAS', None)]
    language = c
openblas_info:
    libraries = ['openblas', 'openblas']
    library_dirs = ['/usr/local/lib']
    define_macros = [('HAVE_CBLAS', None)]
    language = c
blis_info:
  NOT AVAILABLE
openblas_lapack_info:
    libraries = ['openblas', 'openblas']
    library_dirs = ['/usr/local/lib']
    define_macros = [('HAVE_CBLAS', None)]
    language = c
lapack_mkl_info:
  NOT AVAILABLE
blas_mkl_info:
  NOT AVAILABLE

The text was updated successfully, but these errors were encountered:

mattip · 2018-05-04T12:31:29Z

re-occurance of #4813, there the issue was solved by upgrading to OpenBLAS 0.2.9

artemru · 2018-05-04T12:32:48Z

with multiprocessing everything is fine, it's about multi-threading. I think it's a different issue.

artemru · 2018-05-16T20:45:03Z

could anyone reproduce this issue ?

pv · 2018-05-16T22:00:13Z

Yes, reproducible in fedora 28 + openblas 0.2.20, didn't seem to occur with ATLAS.
Numpy iirc just assumes the BLAS/LAPACK libs are threadsafe, there are no extra locks.
I'm not sure if there's something that needs to be done on the Numpy side, looks like openblas issue.

charris · 2018-05-16T22:35:48Z

There is some experimenting going on with the OpenBLAS versions. It would be good to have a test for this, probably in numpy/linalg/tests/. Maybe in test_regressions.py or its own test_threading.py module.

ogrisel · 2018-05-22T09:28:56Z

@artemru did you report this bug to the OpenBLAS developers? If so what is the URL of the report?

artemru · 2018-05-23T09:06:27Z

@ogrisel, indeed it looks like a pure openblas issue. I did not report it to openblas developers (lack of time and I'm not fluent in c++). Yet, it's not clear for me whether openblas guaranties the thread-safety (I've just looked at https://github.com/xianyi/OpenBLAS/wiki/faq
If your application is already multi-threaded, it will conflict with OpenBLAS multi-threading. Thus, you must set OpenBLAS to use single thread as following.).

seberg · 2018-10-30T22:49:56Z

This issue is troubling me, but I am not quite sure how it can be solved. Possibly we can push openblas to solve the big bugs? Even than it would be annoying if other BLAS implementations are also not thread safe (by default).

I don't like hacks, but for this thing, I don't mind if the solution is seriously ugly or not, I would just prefer if there is one at all...

ogrisel · 2018-10-31T08:16:21Z

I think we should work with upstream OpenBLAS to make it thread-safe.

ogrisel · 2018-10-31T10:31:01Z

It would also be interesting to try to build OpenBLAS with OpenMP instead of its internal libpthread backend and check it the race condition reported by @artemru still happens in this case. OpenMP runtimes are thread-safe by design (I believe), so it's likely that it would fix the issue reported by @artemru.

In the past the @matthew-brett decided to build the OpenBLAS included in numpy & scipy wheels with the libpthread backend instead of OpenMP so as to avoid the fork-safety issues of the GCC implementation of the OpenMP runtime named GOMP. @njsmith submitted a patch to the GOMP developers to make it fork-safe but the review stalled: https://gcc.gnu.org/ml/gcc-patches/2014-02/msg00813.html and C-libraries that use OpenMP are still subject to deadlock or crash Python programs that use multiprocessing with the fork startmethod.

Nowadays I suspect that OpenBLAS could be built with OpenMP using clang so as to avoid running into the GOMP fork-safety limitations. clang / llvm use the implementation of the openmp runtime opensourced by Intel and as far as I know it is fork-safe.

Edit: the thread-safety issue in OpenBLAS is apparently unrelated to its threading backend (pthread vs openmp) as it also occurs when OpenBLAS is compiled with the single thread-mode flag (OpenMathLib/OpenBLAS#1844).

charris · 2018-10-31T14:20:48Z

Note that current NumPy wheels are linked with OpenBLAS 0.3.0

artemru · 2018-10-31T15:19:54Z

it's also reproducible with OpenBLAS 0.3.0.

seberg · 2018-10-31T20:51:09Z

Just to note, I have opened an issue at OpenBLAS (OpenMathLib/OpenBLAS#1844), to hopefully discuss things there. Since I do not know the technical details here, any continuation of discussion there would be very welcome. For all I know right now, this seems like a high priority issue to me (also happens as default on Linux Systems when OpenBLAS is used), and if we can provide some help to OpenBLAS it might be good.

For all I see, downstream users have no reason to suspect such issues and it seems like it could randomly, once in a while create incorrect results (frankly, I mean I might suspect such issues, but half the people who work in a similar environment as I do are probably not even aware that OpenBLAS is threading).

mattip · 2018-11-01T08:14:42Z

There might be a need to hold the GIL for some lapack/blas implementations if they cannot promise thread safety. Unfortunately we do not have a way to query, at runtime, which implementation we are using, see issue #11826

ogrisel · 2018-11-01T08:21:56Z

I think it's better to work with upstream to ensure that they are all thread-safe. MKL is thread safe, OpenBLAS can probably be fixed. I don't know for Blis but I would believe so.

seberg · 2018-11-01T08:30:01Z

Well, maybe we can add code such as the one you linked to to change the number of threads. If numpy knows the BLAS implementation it should release the GIL. But it could refuse to release the GIL if it sees one it does not recognize. For OpenBLAS and the typical ones it should definitely be rather fixed of course.

bbbbbbbbba · 2018-11-02T02:46:38Z

My impression is that, for reasonable behavior with multithreading, the thread server (blas_server.c) in OpenBLAS may need to be rewritten completely. Currently, if the calling program spawns multiple threads, then each of those threads becomes a main thread, and they share the same n-1 worker threads, which is not that bad (since the amount of parallelism is upper bounded by n anyway). However, blas_server.c doesn't expect there to be more than one main thread, so it makes a lot of questionable design choices, e.g:

When dispatching tasks, busy-waiting on the worker threads until one become idle;
When waiting for results, waiting for each worker thread as long as it is busy, even if it is busy from a task some other main thread gave to it.

Despite not affecting correctness, those problems lead to worse performance than one can reasonably expect --- than, say, if each main thread spawned its own n-1 worker threads, or if they shared the same n-1 worker threads in a reasonable way.

And there there are some one-off things that becomes outright bugs in a multithread setting, like this global buffer, and some other bug I don't yet understand that happens with this code snippet. This last one has been frustrating me for quite a while.

matthew-brett · 2018-11-02T07:22:57Z

Would you consider opening an OpenBLAS issue on Github, to give a home for discussion?

bbbbbbbbba · 2018-11-02T07:28:10Z

There is already a OpenBLAS issue (OpenMathLib/OpenBLAS#1844), and I have been trying to discuss there for a while. I decided to escape here for my mental health.

mattip · 2018-11-12T17:11:08Z

OpenBLAS fixed OpenMathLib/OpenBLAS#1844.

seberg · 2019-01-05T22:42:51Z

I guess we can close this, since OpenBLAS is fixed, and we are making sure to link a newer version (even point it out in the release notes).

charris mentioned this issue May 16, 2018

np.linalg.svd hangs #10742

Closed

artemru closed this as completed May 22, 2018

artemru reopened this May 22, 2018

This was referenced Oct 30, 2018

Set number of threads after numpy import #11826

Closed

DOC: default value of optimize in numpy.einsum #12294

Closed

seberg mentioned this issue Oct 31, 2018

OpenBLAS threadsafety issues for downstream libraries (NumPy) OpenMathLib/OpenBLAS#1844

Closed

mattip changed the title ~~np.dot is not thread-safe with OpenBLAS==0.2.18~~ np.dot is not thread-safe with OpenBLAS Nov 1, 2018

mattip added 00 - Bug component: numpy.linalg labels Nov 1, 2018

mattip changed the title ~~np.dot is not thread-safe with OpenBLAS~~ BUG: np.dot is not thread-safe with OpenBLAS Nov 1, 2018

seberg mentioned this issue Nov 5, 2018

Revert change from #532 due to unsafe use of static buffer OpenMathLib/OpenBLAS#1852

Closed

seberg closed this as completed Jan 5, 2019

Uh oh!

BUG: np.dot is not thread-safe with OpenBLAS #11046

BUG: np.dot is not thread-safe with OpenBLAS #11046

Comments

artemru commented May 4, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

mattip commented May 4, 2018

Uh oh!

artemru commented May 4, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

artemru commented May 16, 2018

Uh oh!

pv commented May 16, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

charris commented May 16, 2018

Uh oh!

ogrisel commented May 22, 2018

Uh oh!

artemru commented May 23, 2018

Uh oh!

seberg commented Oct 30, 2018

Uh oh!

ogrisel commented Oct 31, 2018

Uh oh!

ogrisel commented Oct 31, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

charris commented Oct 31, 2018

Uh oh!

artemru commented Oct 31, 2018

Uh oh!

seberg commented Oct 31, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mattip commented Nov 1, 2018

Uh oh!

ogrisel commented Nov 1, 2018

Uh oh!

seberg commented Nov 1, 2018

Uh oh!

bbbbbbbbba commented Nov 2, 2018

Uh oh!

matthew-brett commented Nov 2, 2018

Uh oh!

bbbbbbbbba commented Nov 2, 2018

Uh oh!

mattip commented Nov 12, 2018

Uh oh!

seberg commented Jan 5, 2019

Uh oh!

artemru commented May 4, 2018 •

edited

Loading

artemru commented May 4, 2018 •

edited

Loading

pv commented May 16, 2018 •

edited

Loading

ogrisel commented Oct 31, 2018 •

edited

Loading

seberg commented Oct 31, 2018 •

edited

Loading