You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Most of these numbers are incorrect. It's unclear where things went wrong. @wilfonba and I already confirmed that the A100 numbers are incorrect.
One comment is that this page should have an example of how exactly to run the performance test locally. For example, the command ./mfc.sh run -n 8 -j 8 ./examples/3D_performance_test/case.py --case-optimization -t pre_process simulation or some such for CPU and the addition of --gpu for GPU cases.
I ran the 3D_performance_test example with 4M and 8M grid points on my M1 Max on 8 Cores, gfortran 14.1.0 and got:
which is a factor of 5x faster than what's on the website for the M2 chip. I know the M1 Max is probably faster than the M2 for this workload, but not 5x faster. Again, @wilfonba replicated this problem on NV A100s as well. These results should all be updated.
We can remove Summit performance results instead of generic V100 test results. We also don't to have 1, 4, and 8M grid point cases. The numbers are so similar regardless. I think we should just converge on 8M grid points (200^3 simulation) for all performance tests, which is big enough to be meaningful but not too big to overwhelm the memory of any real device.
Open to other suggestions!
The text was updated successfully, but these errors were encountered:
I'm gathering some more info, all using 8M grid points. This is everything I have. I didn't run a test on Frontier, but we should also update that number.
Intel Xeon Gold 6226 CPU (Cascade Lake) @ 2.70GHz (on Phoenix), 12 core CPU, best performance using 12 cores, Intel oneAPI 2022.1.0
Performance: 151.599077472947 ns/gp/eq/rhs
AMD EPYC 7713 (Milan) 64-Core CPU, best performance using 32 cores. gcc12.1.0
Performance: 137.48353539352445 ns/gp/eq/rhs
M1 Max, 8 Cores. gcc14.1
Performance: 71.969625308176333 ns/gp/eq/rhs
RTX6000 (single-precision GPU upconverting to DP in software) @ Phoenix, NVHPC 22.11
Performance: 3.851041689413657 ns/gp/eq/rhs
A40 (single-precision GPU upconverting to DP in software) @ NCSA Delta, NVHPC 22.11
Performance: 3.316569112456631 ns/gp/eq/rhs
MI250X 1 GCD, CCE16.0.1
Performance: 1.0871197509246793 ns/gp/eq/rhs
A30 @ RG, NVHPC 24.1
Performance: 1.055906093866407 ns/gp/eq/rhs
V100-32GB @ Phoenix, NVHPC 24.5
Performance: 0.9892712201437496 ns/gp/eq/rhs
A100-80GB @ Phoenix, NVHPC 22.11
Performance: 0.6163026871295073 ns/gp/eq/rhs
H100 80GB PCIe @ Rogues Gallery, NVHPC 24.5
Performance: 0.4362547841810634 ns/gp/eq/rhs
GH200 @ Rogues Gallery, NVHPC 24.1, (only the GPU is used)
Uh oh!
There was an error while loading. Please reload this page.
https://mflowcode.github.io/documentation/md_expectedPerformance.html
Most of these numbers are incorrect. It's unclear where things went wrong. @wilfonba and I already confirmed that the A100 numbers are incorrect.
One comment is that this page should have an example of how exactly to run the performance test locally. For example, the command
./mfc.sh run -n 8 -j 8 ./examples/3D_performance_test/case.py --case-optimization -t pre_process simulation
or some such for CPU and the addition of--gpu
for GPU cases.I ran the
3D_performance_test
example with 4M and 8M grid points on my M1 Max on 8 Cores, gfortran 14.1.0 and got:Performance: 74.107741811522786 ns/gp/eq/rhs
Performance: 70.347097355807136 ns/gp/eq/rhs
Performance: 71.969625308176333 ns/gp/eq/rhs
which is a factor of 5x faster than what's on the website for the M2 chip. I know the M1 Max is probably faster than the M2 for this workload, but not 5x faster. Again, @wilfonba replicated this problem on NV A100s as well. These results should all be updated.
We can remove Summit performance results instead of generic V100 test results. We also don't to have 1, 4, and 8M grid point cases. The numbers are so similar regardless. I think we should just converge on 8M grid points (200^3 simulation) for all performance tests, which is big enough to be meaningful but not too big to overwhelm the memory of any real device.
Open to other suggestions!
The text was updated successfully, but these errors were encountered: