Skip to content

Openblas 2.19 and above is not working on Ubuntu 16.04 for Power 8 #1037

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
rakshithprakash opened this issue Dec 27, 2016 · 42 comments
Closed

Comments

@rakshithprakash
Copy link

Hi,
I was running HPL-2.2 with openblas 2.19,2.20 and could see that the benchmark never exits even after running it overnight for a very small problem size such as 200. I cross verified it once on 2.18 and could see that it completes in less than a second. Please find the command that I'm using :

mpirun -np 2 -bind-to-core --allow-run-as-root --mca btl sm,self,tcp xhpl

My guest configuration is as below :

Number of cores : 2 cores
SMT mode : 8
Memory : 16GB
OS : Ubuntu 16.04 LTS
Kernel version : 4.4.0-21-generic

I verified the same thing on x86 and could see that it is working fine.

After looking at the perf data on 2.19 I could observe that 90% of the time is spent in inner_thread

Attaching the perf data of 2.19 :

2 19_profile

and whereas the annotations of inner_thread looks like this :

annotations.txt

I could the observe the same behavior on another Ubuntu machine as well but it works fine in RHEL.

@martin-frbg
Copy link
Collaborator

Most of the POWER8-specific changes since 0.2.18 appear to have happened in a narrow timespan between April 19 and May 22 - while I cannot possibly comment on the power8 assembly, I think I understand that wernsaar also adjusted some thresholds for thread creation in the GEMM functions that may simply have made thread contention more likely, and 8310d4d also dropped the ALLOC_SHM define from Makefile.power without explanation (though it may have been spurious all along).

Unless somebody else comes up with a better idea, I wonder if you could try a snapshot from somewhere in the middle of what appears to have been the crucial period ( say 0551e57 from April 26 ), and/or try and see if limiting the number of threads created on each node via OMP_NUM_THREADS has any influence ?

Also do the Ubuntu and RHEL machines you mention use the exact same binary, or was OpenBLAS
built separately on each (implying different compiler versions and/or options in use) ?

@brada4
Copy link
Contributor

brada4 commented Dec 28, 2016

Did you use any flags when compiling OpenBLAS?
Also full gcc and gfortran versions are more essential than kernel version (normally one assumes kernel shipped with distribution, or patched fully or anything in between)

Attach to frozen process with debugger and dump all backtraces:
"$ script
"$ gdb
"> atta pid
"> thread apply all backtrace
"..... here is the interesting output
">deta
">quit
"$ quit

And attach typescript file here.

@rakshithprakash
Copy link
Author

I have compiled openblas using just the make command to use default options on both Ubuntu and RHEL and used export LD_LIBRARY_PATH= and ran HPL using the following command : mpirun -np 2 -bind-to-core --allow-run-as-root --mca btl sm,self,tcp xhpl

I have used GCC and Gfortran version 5.3.1 on both Ubuntu and RHEL.

Also tried to collect the traces but I see the following error - [ No Source Available ]

Am I missing out something? Does it require to connect any debuggers? since it is a guest machine on KVM. Any other ways of collecting the traces?

@martin-frbg
Copy link
Collaborator

You may need to build a debuggable version of OpenBLAS and HPL first to get any meaningful backtraces with gdb.

@brada4
Copy link
Contributor

brada4 commented Dec 29, 2016

Just a quick peek into the problem - can you compare CPUID on compile machine and run machine?

@brada4
Copy link
Contributor

brada4 commented Dec 29, 2016

Does it work out single threaded and/or without MPI binding options?
OpenBLAS by default spins up pthreads for all available CPUs, or compile-time detected CPUs, whichever smallest (set OPENBLAS_NUM_THREADS to less if needed) , maybe MPI binding/affinity somehow hurts default build that does not try to bind processors.
You can start gdb in build root directory where all source files are in place (sort of)

@grisuthedragon
Copy link
Contributor

@rakshithprakash Can you provide me your make.inc from the HPL benchmark such that I can recompile it easily ?

@rakshithprakash
Copy link
Author

rakshithprakash commented Jan 3, 2017

@brada4 Attaching the traces :

backtraces.txt

@rakshithprakash
Copy link
Author

@brada4 Both the compiler and the run machine are the same in my case.
And I removed the binding in the mpi command and gave a run but I'm seeing the same issue again.
Please find the command that I used : mpirun --allow-run-as-root --mca btl sm,self,tcp xhpl

@rakshithprakash
Copy link
Author

@grisuthedragon Please find the attached Makefile

Makefile_ppc.txt

@grisuthedragon
Copy link
Contributor

I tested it with the current development version of OpenBLAS on an IBM Power8+ with CentOS 7.3 running on and everything works fine with the HPL benchmark.

I compiled OpenBLAS using

 make NUM_THREADS=1 USE_OPENMP=0 

because for the HPL benchmark it is quite common to use only the parallelization coming from the MPI processes.

@martin-frbg
Copy link
Collaborator

Probably related to #660, running with only one thread is bound to avoid any deadlocks from multithreading.

@grisuthedragon
Copy link
Contributor

Even with enabled multithreading in OpenBLAS the hpl code works fine without running into a deadlock on my machine.

@brada4
Copy link
Contributor

brada4 commented Jan 3, 2017

I was looking for
t a a bt (i.e backtrace from all threads)

Current backtrace looks like OpenMP-enabled system OpenBLAS (symlinked to /usr/lib/libblas.so.3 via update-alternatives)- are you sure HPL is linked against freshly built OpenBLAS? (check with ldd)

@rakshithprakash
Copy link
Author

@brada4 Please find the attached backtrace for all threads :

backtraces_allthreads.txt

I also looked at the ldd and could see that libopenblas is linked to 2.19 version :

libopenblas.so.0 => /home/hpl-2.2/openblas-2.19/OpenBLAS-0.2.19/libopenblas.so.0 (0x00003fffb4790000)

@brada4
Copy link
Contributor

brada4 commented Jan 3, 2017

I have doubts about linked imports of xhpl (program that backtrace comes from)
You can try LD_PRELOAD=...../openblas.so.0 xhpl (with full path in hope to override system library)
Cleaner way would be to plant alternative for libblas.so.3 to work around HPL build system mistakes.

#2  exec_blas._omp_fn.0 () at blas_server_omp.c:312
#3  0x00003fff867be8a4 in GOMP_parallel () from /usr/lib/powerpc64le-linux-gnu/libgomp.so.1
#4  0x00003fff86ecc7e4 in exec_blas (num=<optimized out>, queue=<optimized out>) at blas_server_omp.c:305
---Type <return> to continue, or q <return> to quit---
#5  0x00003fff86dfbee0 in gemm_driver (args=<optimized out>, range_m=<optimized out>, range_n=<optimized out>, sa=<optimized out>, sb=<optimized out>, mypos=0) at level3_thread.c:672
#6  0x00003fff86dfc1f4 in dgemm_thread_nt (args=<optimized out>, range_m=<optimized out>, range_n=<optimized out>, sa=<optimized out>, sb=<optimized out>, mypos=<optimized out>)
    at level3_thread.c:733
#7  0x00003fff87ba7cd0 in dgemm_ () from /usr/lib/libblas.so.3
#8  0x00000000100121f0 in HPL_dgemm ()

@rakshithprakash
Copy link
Author

I did a export LD_PRELOAD=/home/hpl-2.2/openblas-2.19/OpenBLAS-0.2.19 and i'm seeing the following error :

ERROR: ld.so: object '/home/hpl-2.2/openblas-2.19/OpenBLAS-0.2.19' from LD_PRELOAD cannot be preloaded (cannot read file data): ignored.

And this is the ldd from export LD_LIBRARY_PATH=/home/hpl-2.2/openblas-2.19/OpenBLAS-0.2.19 :

ldd ./xhpl
linux-vdso64.so.1 => (0x00003fffa0ef0000)
libblas.so.3 => /usr/lib/libblas.so.3 (0x00003fffa0e50000)
libmpi.so.12 => /usr/lib/libmpi.so.12 (0x00003fffa0d20000)
libc.so.6 => /lib/powerpc64le-linux-gnu/libc.so.6 (0x00003fffa0b40000)
libopenblas.so.0 => /home/hpl-2.2/openblas-2.19/OpenBLAS-0.2.19/libopenblas.so.0 (0x00003fff9ff00000)
libm.so.6 => /lib/powerpc64le-linux-gnu/libm.so.6 (0x00003fff9fe10000)
libibverbs.so.1 => /usr/lib/libibverbs.so.1 (0x00003fff9fde0000)
libopen-rte.so.12 => /usr/lib/libopen-rte.so.12 (0x00003fff9fd30000)
libopen-pal.so.13 => /usr/lib/libopen-pal.so.13 (0x00003fff9fc60000)
libpthread.so.0 => /lib/powerpc64le-linux-gnu/libpthread.so.0 (0x00003fff9fc20000)
/lib64/ld64.so.2 (0x0000000058cf0000)
libgfortran.so.3 => /usr/lib/powerpc64le-linux-gnu/libgfortran.so.3 (0x00003fff9fae0000)
libgomp.so.1 => /usr/lib/powerpc64le-linux-gnu/libgomp.so.1 (0x00003fff9fa90000)
libdl.so.2 => /lib/powerpc64le-linux-gnu/libdl.so.2 (0x00003fff9fa60000)
libhwloc.so.5 => /usr/lib/powerpc64le-linux-gnu/libhwloc.so.5 (0x00003fff9f9f0000)
libutil.so.1 => /lib/powerpc64le-linux-gnu/libutil.so.1 (0x00003fff9f9c0000)
libgcc_s.so.1 => /lib/powerpc64le-linux-gnu/libgcc_s.so.1 (0x00003fff9f990000)
libnuma.so.1 => /usr/lib/powerpc64le-linux-gnu/libnuma.so.1 (0x00003fff9f960000)
libltdl.so.7 => /usr/lib/powerpc64le-linux-gnu/libltdl.so.7 (0x00003fff9f930000)

@martin-frbg
Copy link
Collaborator

Unlike LD_LIBRARY_PATH you need to include the name of the library in the LD_PRELOAD, so:
export LD_PRELOAD=/home/hpl-2.2/openblas-2.19/OpenBLAS-0.2.19/libopenblas.so.0
(and it does look a bit strange that ldd shows separate entries for the libopenblas.so.0 and a libblas.so.3 although you mentioned that both link to the same file)

@brada4
Copy link
Contributor

brada4 commented Jan 4, 2017

If I look at Makefile_ppc.txt attached earlier, it uses both -lblas and -lopenblas
Taking out -lblas will fix the issue of undefined result.

@martin-frbg
Copy link
Collaborator

Users of Ubuntu 16.04 on POWER8 may also want to take note of Ubuntu Bug #1641241 here:
https://bugs.launchpad.net/ubuntu/+source/glibc/+bug/1641241
describing a misbehaviour of the hardware lock elision code included in recent versions of glibc that is apparently specific to the POWER platform (and worked around by the update linked at the end of the page)
(Found via a bug report by bhart in tensorflow/tensorflow#5482)

@rakshithprakash
Copy link
Author

I did an export LD_PRELOAD for both versions of openblas - 2.18 and 2.19. Below is the ldd for 2.19 :

ldd ./xhpl
linux-vdso64.so.1 => (0x00003fff87de0000)
/home/hpl-2.2/openblas-2.19/OpenBLAS-0.2.19/libopenblas.so.0 (0x00003fff871a0000)
libblas.so.3 => /usr/lib/libblas.so.3 (0x00003fff87100000)
libmpi.so.12 => /usr/lib/libmpi.so.12 (0x00003fff86fd0000)
libc.so.6 => /lib/powerpc64le-linux-gnu/libc.so.6 (0x00003fff86df0000)
libm.so.6 => /lib/powerpc64le-linux-gnu/libm.so.6 (0x00003fff86d00000)
libpthread.so.0 => /lib/powerpc64le-linux-gnu/libpthread.so.0 (0x00003fff86cc0000)
libgfortran.so.3 => /usr/lib/powerpc64le-linux-gnu/libgfortran.so.3 (0x00003fff86b80000)
libgomp.so.1 => /usr/lib/powerpc64le-linux-gnu/libgomp.so.1 (0x00003fff86b30000)
libibverbs.so.1 => /usr/lib/libibverbs.so.1 (0x00003fff86b00000)
libopen-rte.so.12 => /usr/lib/libopen-rte.so.12 (0x00003fff86a50000)
libopen-pal.so.13 => /usr/lib/libopen-pal.so.13 (0x00003fff86980000)
/lib64/ld64.so.2 (0x0000000044b20000)
libgcc_s.so.1 => /lib/powerpc64le-linux-gnu/libgcc_s.so.1 (0x00003fff86950000)
libdl.so.2 => /lib/powerpc64le-linux-gnu/libdl.so.2 (0x00003fff86920000)
libhwloc.so.5 => /usr/lib/powerpc64le-linux-gnu/libhwloc.so.5 (0x00003fff868b0000)
libutil.so.1 => /lib/powerpc64le-linux-gnu/libutil.so.1 (0x00003fff86880000)
libnuma.so.1 => /usr/lib/powerpc64le-linux-gnu/libnuma.so.1 (0x00003fff86850000)
libltdl.so.7 => /usr/lib/powerpc64le-linux-gnu/libltdl.so.7 (0x00003fff86820000)

Both the versions 2.18 & 2.19 seem to work now after using LD_PRELOAD but I see that 2.19 is taking approximately 2.4x the time to complete when compared to 2.18. Please find the results below :

2.18 :

WR11R2C4 2000 140 1 2 2.44 2.190e+00

2.19 :

WR11R2C4 2000 140 1 2 5.98 8.922e-01

I collected the perf data and annotations for them, please find them below :

Samples: 293K of event 'cycles:ppp', Event count (approx.): 293830000000
Overhead Command Shared Object Symbol

  • 28.97% xhpl libopenblas_power8p-r0.2.19.so [.] inner_thread

  • 24.71% xhpl libopenblas_power8p-r0.2.19.so [.] inner_thread

  • 6.20% xhpl libgomp.so.1.0.0 [.] 0x0000000000016350

  • 5.89% xhpl libgomp.so.1.0.0 [.] 0x0000000000016790

  • 5.27% xhpl libopenblas_power8p-r0.2.19.so [.] LDGEMM_L4x16_LOOP

  • 4.83% xhpl libgomp.so.1.0.0 [.] 0x000000000001635c

  • 4.07% xhpl libgomp.so.1.0.0 [.] 0x000000000001679c

  • 2.23% xhpl mca_btl_sm.so [.] mca_btl_sm_component_progress

  • 1.73% xhpl mca_pml_ob1.so [.] mca_pml_ob1_progress

  • 1.33% xhpl libopen-pal.so.13.0.2 [.] opal_progress

  • 1.01% xhpl mca_pml_ob1.so [.] mca_pml_ob1_iprobe

    │ START_RPCC();

    │ /* thread has to wait */
    │ while(job[current].working[mypos][CACHE_LINE_SIZE * bufferside] == 0) {YIELDING;};
    │1dad54: add r10,r26,r20
    │1dad58: rldicr r10,r10,3,60
    │1dad5c: ldx r9,r30,r10
    0.00 │1dad60: cmpdi cr7,r9,0
    │1dad64: bne cr7,1dad9c <inner_thread+0x65c>
    │1dad68: nop
    │1dad6c: ori r2,r2,0
    7.92 │1dad70: nop
    │1dad74: nop
    │1dad78: nop
    10.82 │1dad7c: nop
    6.36 │1dad80: nop
    │1dad84: nop
    17.26 │1dad88: nop
    23.68 │1dad8c: nop
    │1dad90: ldx r9,r30,r10
    3.85 │1dad94: cmpdi cr7,r9,0
    │1dad98: beq cr7,1dad70 <inner_thread+0x630>

    │ STOP_RPCC(waiting2);

@martin-frbg
Copy link
Collaborator

Not sure how to read the perf data (do you have the 2.18 values for comparison ?), are these results reproducible (and same workload etc on the machine during both runs) ? If the values are stable it could be that the changed thresholds mentioned above are not favorable for the matrix sizes in this particular benchmark. Perhaps @grisuthedragon has benchmark results from his machine easily available ?

@rakshithprakash
Copy link
Author

@martin-frbg Here are the perf data for 2.18 for comparison :

Samples: 301K of event 'cycles:ppp', Event count (approx.): 301774000000
Overhead Command Shared Object Symbol

  • 14.60% xhpl libopenblas_power8p-r0.2.18.so [.] dgemm_kernel
  • 5.74% xhpl [kernel.kallsyms] [k] update_curr
  • 5.54% xhpl [kernel.kallsyms] [k] __schedule
  • 4.34% xhpl [kernel.kallsyms] [k] pick_next_task_fair
  • 4.11% xhpl [kernel.kallsyms] [k] __calc_delta
  • 3.93% xhpl [kernel.kallsyms] [k] _switch
  • 3.85% xhpl [kernel.kallsyms] [k] pick_next_entity
  • 3.35% xhpl [kernel.kallsyms] [k] clear_buddies
  • 3.04% xhpl [kernel.kallsyms] [k] _raw_spin_lock
  • 2.55% xhpl [kernel.kallsyms] [k] __perf_event_task_sched_out
  • 2.36% xhpl [kernel.kallsyms] [k] __switch_to
  • 2.18% xhpl [kernel.kallsyms] [k] update_min_vruntime
  • 2.12% xhpl [kernel.kallsyms] [k] put_prev_entity

Yes it's reproducible.

@brada4
Copy link
Contributor

brada4 commented Jan 4, 2017

can you fix conflicting libblas and libopenblas dependencies?
So far what I see is mistake building HPL, nothing more.

@martin-frbg
Copy link
Collaborator

Is it conceivable that you built 0.2.18 with different options, something more like the NUM_THREADS=1 USE_OPENMP=0 that grisuthedragon recommended above for hpl ? (No libgomp and no reference to threading in its perf results would explain less overhead...)

@brada4
Copy link
Contributor

brada4 commented Jan 4, 2017

Most likely /usr/lib/libblas.so.3 is ubuntu-supplied openblas 0.2.18 built with OPENMP, and without CBLAS or LAPACK...
# update-alternatives --list
Should confirm it

@grisuthedragon
Copy link
Contributor

@martin-frbg Here is the result of my benchmark( gcc 4.8.5, glibc 2,17, CentOS 7.3, Kernel 4.8[from Fedora 25], current OpenBLAS 0.2.20dev , OpenMPI 1.8.)

I optimized the HPL.dat for my machine now having the following:

HPLinpack benchmark input file
Innovative Computing Laboratory, University of Tennessee
HPL.out      output file name (if any)
6            device out (6=stdout,7=stderr,file)
4            # of problems sizes (N)
10240 20480 30720 40960 30 34 35  Ns
1            # of NBs
96 32 64 96 128 160 192 224 256      NBs
0            PMAP process mapping (0=Row-,1=Column-major)
1          # of process grids (P x Q)
4       Ps
5       Qs
16.0         threshold
1            # of panel fact
2 1 2        PFACTs (0=left, 1=Crout, 2=Right)
1            # of recursive stopping criterium
2 4          NBMINs (>= 1)
1            # of panels in recursion
2            NDIVs
1            # of recursive panel fact.
2 1 2        RFACTs (0=left, 1=Crout, 2=Right)
1            # of broadcast
0            BCASTs (0=1rg,1=1rM,2=2rg,3=2rM,4=Lng,5=LnM)
1            # of lookahead depth
0            DEPTHs (>=0)
2            SWAP (0=bin-exch,1=long,2=mix)
64           swapping threshold
0            L1 in (0=transposed,1=no-transposed) form
0            U  in (0=transposed,1=no-transposed) form
1            Equilibration (0=no,1=yes)
8            memory alignment in double (> 0)

and for the largest experiment (N = 40960) I get:

OMP_NUM_THREADS=1 mpirun --allow-run-as-root  -np 20 ./xhpl 
...
 ================================================================================
T/V                N    NB     P     Q               Time                 Gflops
--------------------------------------------------------------------------------
WR00R2R2       40960    96     4     5             107.79              4.251e+02
HPL_pdgesv() start time Wed Jan  4 21:05:28 2017

HPL_pdgesv() end time   Wed Jan  4 21:07:16 2017

--------------------------------------------------------------------------------
||Ax-b||_oo/(eps*(||A||_oo*||x||_oo+||b||_oo)*N)=        0.0028599 ...... PASSED
================================================================================
...

which is quite good comparing it to the pure DGEMM performance (490 GFlop/s) obtained by the DGEMM benchmark of OpenBLAS.

@brada4
Copy link
Contributor

brada4 commented Jan 4, 2017

again check with LDD if you use fedora-supplied /usr/lib64/libopenblas(p/o).so or one you built or both.
it does not go anywhere if you mix in random combinations of libraries in the picture.

@rakshithprakash
Copy link
Author

@brada4 Please find the results below for update-alternatives --list command for both LD_PRELOAD & LD_LIBRARY_PATH

update-alternatives --list libblas.so.3

/usr/lib/libblas/libblas.so.3
/usr/lib/openblas-base/libblas.so.3

update-alternatives --list libopenblas.so.0

update-alternatives: error: no alternatives for libopenblas.so.0

@rakshithprakash
Copy link
Author

@martin-frbg I have used just the make command to use default options for both 2.18 & 2.19.

@rakshithprakash
Copy link
Author

@grisuthedragon Hi, can you please give it a try on Ubuntu 16.04 once?

@brada4
Copy link
Contributor

brada4 commented Jan 19, 2017

Main idea is to avoid linking to system blas (remove -lblas option) , and use -L/where/openblas/is/built -lopenblas, and check with ldd that you actually test openblas build that you intended to test.

@grisuthedragon
Copy link
Contributor

@rakshithprakash I do not have an Ubuntu 16.04 running on this machine. Furthermore, IBM suggest RHEL/CentOS on this type of machines.

@brada4
Copy link
Contributor

brada4 commented Jan 19, 2017

@grisuthedragon On IBM site I find contrary statement.... This problem will not be fixed by Red Hat switch, since system default BLAS will be linked in addition to openblas.

@rakshithprakash
Copy link
Author

@brada4 Removing the -lblas option doesn't seem to work for me. Please find the error below :

mpicc -L/home/hpl-2.2/openblas-2.19/OpenBLAS-0.2.19/ -lopenblas -L/opt/ibm/lib/ -lm -R/opt/ibm/lib -o /home/hpl-2.2/hpl-2.2/bin/ppc64/xhpl HPL_pddriver.o HPL_pdinfo.o HPL_pdtest.o /home/hpl-2.2/hpl-2.2/lib/ppc64/libhpl.a
/home/hpl-2.2/hpl-2.2/lib/ppc64/libhpl.a(HPL_idamax.o): In function HPL_idamax': HPL_idamax.c:(.text+0x38): undefined reference to idamax_'
/home/hpl-2.2/hpl-2.2/lib/ppc64/libhpl.a(HPL_dgemv.o): In function HPL_dgemv': HPL_dgemv.c:(.text+0xa8): undefined reference to dgemv_'
HPL_dgemv.c:(.text+0x12c): undefined reference to dgemv_' /home/hpl-2.2/hpl-2.2/lib/ppc64/libhpl.a(HPL_dcopy.o): In function HPL_dcopy':
HPL_dcopy.c:(.text+0x3c): undefined reference to dcopy_' /home/hpl-2.2/hpl-2.2/lib/ppc64/libhpl.a(HPL_daxpy.o): In function HPL_daxpy':
HPL_daxpy.c:(.text+0x44): undefined reference to daxpy_' /home/hpl-2.2/hpl-2.2/lib/ppc64/libhpl.a(HPL_dscal.o): In function HPL_dscal':
HPL_dscal.c:(.text+0x3c): undefined reference to dscal_' /home/hpl-2.2/hpl-2.2/lib/ppc64/libhpl.a(HPL_dtrsv.o): In function HPL_dtrsv':
HPL_dtrsv.c:(.text+0xc0): undefined reference to dtrsv_' /home/hpl-2.2/hpl-2.2/lib/ppc64/libhpl.a(HPL_dger.o): In function HPL_dger':
HPL_dger.c:(.text+0x74): undefined reference to dger_' HPL_dger.c:(.text+0xbc): undefined reference to dger_'
/home/hpl-2.2/hpl-2.2/lib/ppc64/libhpl.a(HPL_dgemm.o): In function HPL_dgemm': HPL_dgemm.c:(.text+0xd8): undefined reference to dgemm_'
HPL_dgemm.c:(.text+0x17c): undefined reference to dgemm_' /home/hpl-2.2/hpl-2.2/lib/ppc64/libhpl.a(HPL_dtrsm.o): In function HPL_dtrsm':
HPL_dtrsm.c:(.text+0xf4): undefined reference to dtrsm_' HPL_dtrsm.c:(.text+0x1c0): undefined reference to dtrsm_'
collect2: error: ld returned 1 exit status
Makefile:76: recipe for target 'dexe.grd' failed
make[2]: *** [dexe.grd] Error 1
make[2]: Leaving directory '/home/hpl-2.2/hpl-2.2/testing/ptest/ppc64'
Make.top:64: recipe for target 'build_tst' failed
make[1]: *** [build_tst] Error 2
make[1]: Leaving directory '/home/hpl-2.2/hpl-2.2'
Makefile:72: recipe for target 'build' failed
make: *** [build] Error 2

And my make file is :

----------------------------------------------------------------------

- shell --------------------------------------------------------------

----------------------------------------------------------------------

SHELL = /bin/sh

CD = cd
CP = cp
LN_S = ln -s
MKDIR = mkdir
RM = /bin/rm -f
TOUCH = touch

----------------------------------------------------------------------

- Platform identifier ------------------------------------------------

----------------------------------------------------------------------

ARCH = ppc64

----------------------------------------------------------------------

- HPL Directory Structure / HPL library ------------------------------

----------------------------------------------------------------------

TOPdir = /home/hpl-2.2/hpl-2.2
INCdir = $(TOPdir)/include
BINdir = $(TOPdir)/bin/$(ARCH)
LIBdir = $(TOPdir)/lib/$(ARCH)

HPLlib = $(LIBdir)/libhpl.a

----------------------------------------------------------------------

- Message Passing library (MPI) --------------------------------------

----------------------------------------------------------------------

MPinc tells the C compiler where to find the Message Passing library

header files, MPlib is defined to be the name of the library to be

used. The variable MPdir is only used for defining MPinc and MPlib.

MPdir =
MPinc =
MPlib =

----------------------------------------------------------------------

- Linear Algebra library (BLAS or VSIPL) -----------------------------

----------------------------------------------------------------------

LAinc tells the C compiler where to find the Linear Algebra library

header files, LAlib is defined to be the name of the library to be

used. The variable LAdir is only used for defining LAinc and LAlib.

LAdir =
LAinc =
LAlib =

----------------------------------------------------------------------

- F77 / C interface --------------------------------------------------

----------------------------------------------------------------------

You can skip this section if and only if you are not planning to use

a BLAS library featuring a Fortran 77 interface. Otherwise, it is

necessary to fill out the F2CDEFS variable with the appropriate

options. One and only one option should be chosen in each of

the 3 following categories:

1) name space (How C calls a Fortran 77 routine)

-DAdd_ : all lower case and a suffixed underscore (Suns,

Intel, ...), [default]

-DNoChange : all lower case (IBM RS6000),

-DUpCase : all upper case (Cray),

-DAdd__ : the FORTRAN compiler in use is f2c.

2) C and Fortran 77 integer mapping

-DF77_INTEGER=int : Fortran 77 INTEGER is a C int, [default]

-DF77_INTEGER=long : Fortran 77 INTEGER is a C long,

-DF77_INTEGER=short : Fortran 77 INTEGER is a C short.

3) Fortran 77 string handling

-DStringSunStyle : The string address is passed at the string loca-

tion on the stack, and the string length is then

passed as an F77_INTEGER after all explicit

stack arguments, [default]

-DStringStructPtr : The address of a structure is passed by a

Fortran 77 string, and the structure is of the

form: struct {char *cp; F77_INTEGER len;},

-DStringStructVal : A structure is passed by value for each Fortran

77 string, and the structure is of the form:

struct {char *cp; F77_INTEGER len;},

-DStringCrayStyle : Special option for Cray machines, which uses

Cray fcd (fortran character descriptor) for

interoperation.

F2CDEFS = -DAdd_ -DF77_INTEGER=int -DStringSunStyle

----------------------------------------------------------------------

- HPL includes / libraries / specifics -------------------------------

----------------------------------------------------------------------

HPL_INCLUDES = -I$(INCdir) -I$(INCdir)/$(ARCH) $(LAinc) $(MPinc)
HPL_LIBS = $(HPLlib) $(LAlib) $(MPlib)

- Compile time options -----------------------------------------------

-DHPL_COPY_L force the copy of the panel L before bcast;

-DHPL_CALL_CBLAS call the cblas interface;

-DHPL_CALL_VSIPL call the vsip library;

-DHPL_DETAILED_TIMING enable detailed timers;

By default HPL will:

*) not copy L before broadcast,

*) call the BLAS Fortran 77 interface,

*) not display detailed timing information.

HPL_OPTS =

----------------------------------------------------------------------

HPL_DEFS = $(F2CDEFS) $(HPL_OPTS) $(HPL_INCLUDES)

----------------------------------------------------------------------

- Compilers / linkers - Optimization flags ---------------------------

----------------------------------------------------------------------

export OMPI_CFLAGS:=
CC = mpicc

CCNOOPT = $(HPL_DEFS) -m64

CCFLAGS = $(HPL_DEFS) -m64 -O3 -mcpu=power8 -mtune=power8

LINKER = mpicc

LINKFLAGS = -L/home/hpl-2.2/openblas-2.19/OpenBLAS-0.2.19/ -lopenblas -L/opt/ibm/lib/ -lm -R/opt/ibm/lib

ARCHIVER = ar
ARFLAGS = r
RANLIB = echo

I tried adding another L/home/hpl-2.2/openblas-2.19/OpenBLAS-0.2.19/ -lblas and it didn't work. I can get the same make file compiled by using the -lblas option in LAlib

@martin-frbg
Copy link
Collaborator

Please put the "-lopenblas" in the LAlib list where the -lblas was - the libhpl.a depends on it and the sequence within the library list matters.

@grisuthedragon
Copy link
Contributor

@rakshithprakash
Or if you do not have OpenBLAS in the default search path of your compiler put
-L/home/hpl-2.2/openblas-2.19/OpenBLAS-0.2.19/ -lopenblas to the LAlib variable. If /home/hpl-2.2/openblas-2.19/OpenBLAS-0.2.19/ is the place where you have an compiled version of OpenBLAS. If the linker uses the shared library in this case, you may have to add /home/hpl-2.2/openblas-2.19/OpenBLAS-0.2.19/ to the LD_LIBRARY_PATH environment variable.

@brada4 If you're using the IBM XL compilers + ESSL + CUDA than the IBM support told me the other around. But no more here. ;-)

@rakshithprakash
Copy link
Author

It got compiled now after adding the entire path in LAlib variable. But I do not see the path in the ldd.

ldd ./xhpl
linux-vdso64.so.1 => (0x00003fff83f10000)
libopenblas.so.0 => /usr/lib/libopenblas.so.0 (0x00003fff83500000)
libmpi.so.12 => /usr/lib/libmpi.so.12 (0x00003fff833d0000)
libc.so.6 => /lib/powerpc64le-linux-gnu/libc.so.6 (0x00003fff831f0000)
libm.so.6 => /lib/powerpc64le-linux-gnu/libm.so.6 (0x00003fff83100000)
libpthread.so.0 => /lib/powerpc64le-linux-gnu/libpthread.so.0 (0x00003fff830c0000)
libgfortran.so.3 => /usr/lib/powerpc64le-linux-gnu/libgfortran.so.3 (0x00003fff82f80000)
libibverbs.so.1 => /usr/lib/libibverbs.so.1 (0x00003fff82f50000)
libopen-rte.so.12 => /usr/lib/libopen-rte.so.12 (0x00003fff82ea0000)
libopen-pal.so.13 => /usr/lib/libopen-pal.so.13 (0x00003fff82dd0000)
/lib64/ld64.so.2 (0x00000000446b0000)
libgcc_s.so.1 => /lib/powerpc64le-linux-gnu/libgcc_s.so.1 (0x00003fff82da0000)
libdl.so.2 => /lib/powerpc64le-linux-gnu/libdl.so.2 (0x00003fff82d70000)
libhwloc.so.5 => /usr/lib/powerpc64le-linux-gnu/libhwloc.so.5 (0x00003fff82d00000)
libutil.so.1 => /lib/powerpc64le-linux-gnu/libutil.so.1 (0x00003fff82cd0000)
libnuma.so.1 => /usr/lib/powerpc64le-linux-gnu/libnuma.so.1 (0x00003fff82ca0000)
libltdl.so.7 => /usr/lib/powerpc64le-linux-gnu/libltdl.so.7 (0x00003fff82c70000)

But using export LD_LIBRARY_PATH I can see the output for both 2.18 & 2.19.

2.18:

================================================================================
T/V N NB P Q Time Gflops

WR11R2C4 2000 140 1 2 1.81 2.948e+00

2.19 :

================================================================================
T/V N NB P Q Time Gflops

WR11R2C4 2000 140 1 2 0.35 1.512e+01

@brada4
Copy link
Contributor

brada4 commented Jan 20, 2017

Probably they add ESSL as alternative to libblas.so.3 and all works well by default. You could try that way with OpenBLAS too - 'make install' will install /opt/OpenBLAS/lib/libopenblas.so
then run
update-alternatives --install /usr/lib/libblas.so.3 libblas.so.3 /opt/OpenBLAS/lib/libopenblas.so 1
then update-alternatives --config libblas.so.3
then you can easily switch between BLAS implementations as you go forward without hard-coding any implementation.

@martin-frbg
Copy link
Collaborator

So if I read your most recent results correctly 2.19 is now performing better than 2.18 (Gflops went from 2.948 to 15.12 for that test) ?

@grisuthedragon
Copy link
Contributor

@brada4 I do not think that they use the ESSL as alternative for libblas.so.3 because the ESSL is designed to work with the XL compiler and therefore the Fortran symbols does not have the underscore at the end. So installing ESSL as alternative will break all applications.

@brada4
Copy link
Contributor

brada4 commented Jan 20, 2017

Just build HPL against -lblas and update alternatives. It is the easiest way

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants