-
Notifications
You must be signed in to change notification settings - Fork 1.6k
pthread locking oddities as exemplified by test/sblat2 #995
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Just as a quick headsup, I succeeded in making the helgrind warnings in memory.c go away by adding locks around the release_info[] changes in alloc_mmap() and the accesses to memory[position] in blas_memory_alloc()/blas_memory_free(). (Some of the spots in blas_memory_alloc were already guarded by locks conditional on the value of memory[position].used, which seems to make little sense as that itself is subject to races.) |
Updated helgrind log (now using helgrind from current valgrind 3.12.0 on Skylake): |
Changed files |
In the discussion of #990, @juliantaylor mentioned seeing infrequent lockups with pthreads that were exacerbated by attempts to speed up or remove YIELDING. One simple test case he mentioned was the sblat2 test, so I tried to run the "helgrind" thread debugger from the valgrind tool suite (valgrind.org, pertinent documentation at http://valgrind.org/docs/manual/hg-manual.html ) on a DEBUG=1 build of the current develop branch with test/sblat2 <./sblat2.dat . (Hardware involved is a lowly Atom but this should not matter at least for the init phase, I can rerun with better hardware if necessary.)
helgrind log is here: sblat2.txt
The initial lock acquisition order error it records looks to be a gfortran (4.8.5) glitch, things get interesting after the Thread-Announcement heading.
At first glance, the subsequent messages suggest that in driver/others/memory.c the "LOCK_COMMAND" in line 969 (and UNLOCK in 1003) should be moved outside the scope of the "if..." that immediately precedes it (in order to guard reads as well as writes to memory_initialized).
Also in that file, the alloc_mmap() function for the NO_WARMUP case (line 395 onwards) seems to need locks around the assignments to release_info and in particular the incrementing of release_pos.
Comments from people more thread-aware than myself greatly appreciated...
The text was updated successfully, but these errors were encountered: