Compilers and CPU benchmarks
Data
First number is timing in seconds (lower number is better)Second number is factor relative to the best number (in red) for each of A, B, C, D, JAC regardless of the platform
number in brackets <> is speedup over single CPU timing
OS is GNU/Linux-2.4.2X, various distributions
CHARMM is c31a2, includes 12 DEC 2003 (R2) version of GAMESS for QM calculations
pref.dat was used
Altix (ia64): 16 CPUs
Pentium4 (ia32): P4 3.2GHz, 8 boxes (CPUs), GigE
AMD Opteron (x86_64): 2 X Opteron 244
MDGRAPE-2S: GRAPE (ia32 after MDGRAPE line is the time for no cutoff on the host only)
NOTE: None of the relative performance factors are set yet
1 CPU machine compiler A B C D JAC SHC5 x86_64-2.2GHz gcc-3.4 37.1,1.00 65.1,1.00 ia32-3.2GHz gcc-3.4 45.8,1.00 89.5,1.00 515.3,1.00 2592.4,1.00 707.6,1.00 804.6,1.00 ia32-3.2GHz ifort-8.0 40.9,1.00 83.1,1.00 399.7,1.00 2208.0,1.00 672.1,1.00 768.5,1.00 ia64-1.4GHz gcc-3.4 99.7,1.00 146.3,1.00 1061.0,1.00 7832.5.7,1.00 1406.4,1.00 1372.9,1.00 ia64-1.4GHz ifort-8.0 78.7,1.00 107.2,1.00 698.3,1.00 2769.3,1.00 619.0,1.00 1120.7,1.00 x86_64-1.8GHz gcc-3.4 48.0,1.00 81.9,1.00 452.3,1.00 2725.2,1.00 772.3,1.00 779.5,1.00 x86_64-1.8GHz pathf90-1.3 51.7,1.00 76.4,1.00 702.7,1.00 x86_64-1.8GHz pgf77-5.1 48.7 85.2 RT/E RT/E 712.9,1.00 786.9,1.00 x86_64-1.8GHz ifort-8.0 54.5 97.5 465.6 897.3,1.00 950.5,1.00 Mac-G5-2.0GHz xlf-8.1 99.9 Mac-G5-2.0GHz gcc-3.4 114.3 IBM-Pwr4-1GHz xlf-8.1 100.5,1.00 159.9,1.00 1267.9,1.00 1282.5,1.00 IBM-Pwr4-1GHz gcc-3.2(64) 150.9,1.00 248.1,1.00 IBM-Pwr4-1GHz gcc-3.2(32) 164.7,1.00 251.7,1.00 MDGRAPE-2S ifort-8.0 33.8,1.00 N/A N/A 2294.9,1.00 ia32-3.2GHz ifort-8.0 712.1,21.07 N/A N/A 60148.4,26.21 2 CPUs x86_64-2.2GHz gcc-3.4 18.7,1.00<1.98> 33.0,1.00<1.97> ia32-3.2GHz gcc-3.4 24.3,1.00<1.88> 52.1,1.00<1.72> 270.8,1.00<1.90> 1286.1,1.00<2.01> 429.5,1.00<1.65> 410.0,1.00<1.96> ia32-3.2GHz ifort-8.0 21.8,1.00<1.88> 48.3,1.00<1.72> 207.0,1.00<1.93> 1141.5,1.00<1.93> 407.9,1.00<1.65> 385.0,1.00<2.00> ia64-1.4GHz gcc-3.4 50.5,1.00<1.97> 74.6,1.00<1.96> 537.7,1.00<1.97> 4050.1,1.00<1.93> 728.6,1.00<1.93> 699.5,1.00<1.96> ia64-1.4GHz ifort-8.0 39.8,1.00<1.98> 53.6,1.00<2.00> 354.4,1.00<1.97> 1458.6,1.00<1.90> 331.9,1.0<1.87> 580.0,1.00<1.93> x86_64-1.8GHz gcc-3.4 24.6,1.00<1.95> 44.1,1.00<1.86> 244.0,1.00<1.85> 1376.7,1.00<1.97> Mac-G5-2.0GHz gcc-3.4 63.5,1.00<1.80> IBM-Pwr4-1GHz xlf-8.1 51.8,1.00<1.94> 83.3,1.00<1.92> 658.0,1.00<1.93> 657.8,1.0<1.95> MDGRAPE-2S ifort-8.0 18.5,1.00<1.83> N/A N/A 1161.1,1.0<1.98> ia32-3.2GHz ifort-8.0 360.3,19.5<2.0> N/A N/A 30348.0,26.09<2.0> 4 CPUs x86_64-2.2GHz gcc-3.4 9.6,1.00<3.86> 17.4,1.00<3.74> ia32-3.2GHz gcc-3.4 14.0,1.00<3.27> 32.1,1.00<2.79> 133.7,1.00<3.85> 656.2,1.00<3.95> 274.7,1.00<2.58> 219.0,1.00<3.67> ia32-3.2GHz ifort-8.0 12.6,1.00<3.25> 30.4,1.00<2.73> 106.4,1.00<3.76> 578.0,1.00<3.82> 264.8,1.00<2.54> 200.9,1.00<3.83> ia64-1.4GHz gcc-3.4 26.0,1.00<3.83> 38.2,1.00<3.83> 275.4,1.00<3.85> 1997.6,1.00<3.92> 379.4,1.00<3.71> 362.1,1.00<3.79> ia64-1.4GHz ifort-8.0 20.3,1.00<3.88> 28.2,1.00<3.80> 182.0,1.00<3.84> 719.2,1.00<3.85> 176.1,1.00<3.52> 295.5,1.00<3.79> x86_64-2.2GHz gcc-3.4 18.7,1.00<1.98> 33.0,1.00<1.97> IBM-Pwr4-1GHz xlf-8.1 27.3,1.00<3.68> 44.1,1.00<3.63> 362.6,1.00<3.50> 408.9,1.0<3.14> MDGRAPE-2S ifort-8.0 11.1,1.00<3.05> N/A N/A 593.2,1.00<3.87> ia32-3.2GHz ifort-8.0 184.8,16.7<3.9> N/A N/A 15077.0,25.18<4.0> 8 CPUs x86_64-2.2GHz gcc-3.4 5.6,1.00<6.63> 11.3,1.00<5.76> ia32-3.2GHz gcc-3.4 9.0,1.00<5.09> 23.2,1.00<3.86> 71.2,1.00<7.24> 350.4,1.00<7.40> 204.5,1.00<3.46> 125.3,1.00<6.42> ia32-3.2GHz ifort-8.0 8.3,1.00<4.92> 22.2,1.00<3.74> 58.5,1.00<6.83> 301.5,1.00<7.32> 198.1,1.00<3.39> 119.8,1.00<6.41> ia64-1.4GHz gcc-3.4 13.5,1.00<7.39> 20.8,1.00<7.03> 143.7,1.00<7.38> 1031.9,1.00<7.59> 211.6,1.00<6.65> 188.0,1.00<7.30> ia64-1.4GHz ifort-8.0 10.8,1.00<7.29> 16.0,1.00<6.70> 94.9,1.00,<7.36> 369.3,1.00<7.50> 107.7,1.00<5.75> 154.9,1.00<7.23> MDGRAPE-2S ifort-8.0 7.7,1.00<4.40> N/A N/A 315.3,1.00<7.29> ia32-3.2GHz ifort-8.0 93.7,12.2<7.60> N/A N/A 7533.0,23.29<7.98> 16 CPUs ia64-1.4GHz gcc-3.4 7.7,1.00<12.95> 13.1,1.0<11.17> 78.1,1.00<13.59> 519.8,1.00<15.07> 135.4,1.0<10.39> 105.3,1.00<13.04> ia64-1.4GHz ifort-8.0 6.4,1.00<12.30> 10.8,1.00<9.93> 50.8,1.00<13.75> 191.3,1.00<14.46> 85.0,1.00<7.28> 88.4,1.00<12.68>
Notes:
Compile options:
- gcc-ia32: g77 -malign-double -O3 -march=pentium4 -mmmx -msse2 -mfpmath=sse -fomit-frame-pointer -fschedule-insns2 -fno-backslash -fugly-complex -fno-globals -Wno-globals
- gcc-ia64: g77 -fno-backslash -fugly-complex -fno-globals -Wno-globals -O3 -minline-float-divide-max-throughput
- gcc-x86_64: g77 -O3 -msse2 -mmmx -mfpmath=sse -fno-backslash -fugly-complex -fno-globals -Wno-globals
- ifort-ia32: ifort -O3 -tpp7 -132 -axW -w95 -cm
- ifort-ia64: ifort -O2 -tpp2 -132 -ftz -WB -w95 -cm -i8
- pgf77-x86_64: pgf77 -fastsse -tp k8-64
- pathf90-x86_64: pathf90 -O2
- xlf-Mac-G5: xlf -O5 -qarch=g5
- xlf-IBM-pwr4: xlf90_r -O3 -qfixed -qalign=4k -qarch=auto -qtune=auto -qmaxmem=-1 -q64 -qintsize=8 -qposition=appendold
- gcc-IBM-pwr4: g77 -O3 -fno-globals -Wno-globals
- N/A Either the method or instruction set not available
- RT/E Runtime Error
- A Spherical cutoff method molecular dynamics, MbCO+3830 waters(14026 atoms), 100 steps
- B Periodic boundary method (PMEwald) molecular dynamics, MbCO+4985 waters(17491 atoms), 100 steps
- C HF/6-31G quantum mechanical calculation of 36 atom system (nanotube model), RUNTYP=GRADIENT; files can be found here
- D The same as C only B3LYP/6-31G DFT method is used; files can be found here
- JAC The Joint Amber Charmm benchmark JAC1000(23558 atoms)
- SHC5 + water (130711 atoms)
Milan Hodoscek Last modified: Sun Feb 20 09:47:27 CEST 2006