Benchmarking GCC and ifort compilers on new hardware

Compilers and CPU benchmarks

Data

First number is timing in seconds (lower number is better)
Second number is factor relative to the best number (in red) for each of A, B, C, D, JAC regardless of the platform
number in brackets <> is speedup over single CPU timing
OS is GNU/Linux-2.4.2X, various distributions
CHARMM is c31a2, includes 12 DEC 2003 (R2) version of GAMESS for QM calculations
pref.dat was used
Altix (ia64): 16 CPUs
Pentium4 (ia32): P4 3.2GHz, 8 boxes (CPUs), GigE
AMD Opteron (x86_64): 2 X Opteron 244
MDGRAPE-2S: GRAPE (ia32 after MDGRAPE line is the time for no cutoff on the host only)

NOTE: None of the relative performance factors are set yet

1 CPU
machinecompilerABCDJACSHC5
x86_64-2.2GHzgcc-3.437.1,1.0065.1,1.00
ia32-3.2GHzgcc-3.445.8,1.0089.5,1.00515.3,1.002592.4,1.00707.6,1.00804.6,1.00
ia32-3.2GHzifort-8.040.9,1.0083.1,1.00399.7,1.002208.0,1.00672.1,1.00768.5,1.00
ia64-1.4GHzgcc-3.499.7,1.00146.3,1.001061.0,1.007832.5.7,1.001406.4,1.001372.9,1.00
ia64-1.4GHzifort-8.078.7,1.00107.2,1.00698.3,1.002769.3,1.00619.0,1.001120.7,1.00
x86_64-1.8GHzgcc-3.448.0,1.0081.9,1.00452.3,1.002725.2,1.00772.3,1.00779.5,1.00
x86_64-1.8GHzpathf90-1.351.7,1.0076.4,1.00702.7,1.00
x86_64-1.8GHzpgf77-5.148.785.2RT/ERT/E712.9,1.00786.9,1.00
x86_64-1.8GHzifort-8.054.597.5465.6 897.3,1.00950.5,1.00
Mac-G5-2.0GHzxlf-8.199.9 
Mac-G5-2.0GHzgcc-3.4114.3 
IBM-Pwr4-1GHzxlf-8.1100.5,1.00159.9,1.00 1267.9,1.001282.5,1.00
IBM-Pwr4-1GHzgcc-3.2(64)150.9,1.00248.1,1.00 
IBM-Pwr4-1GHzgcc-3.2(32)164.7,1.00251.7,1.00 
MDGRAPE-2Sifort-8.033.8,1.00 N/AN/A 2294.9,1.00
ia32-3.2GHzifort-8.0712.1,21.07 N/AN/A 60148.4,26.21
2 CPUs
x86_64-2.2GHzgcc-3.418.7,1.00<1.98>33.0,1.00<1.97>  
ia32-3.2GHzgcc-3.424.3,1.00<1.88>52.1,1.00<1.72>270.8,1.00<1.90>1286.1,1.00<2.01>429.5,1.00<1.65>410.0,1.00<1.96>
ia32-3.2GHzifort-8.021.8,1.00<1.88>48.3,1.00<1.72>207.0,1.00<1.93>1141.5,1.00<1.93>407.9,1.00<1.65>385.0,1.00<2.00>
ia64-1.4GHzgcc-3.450.5,1.00<1.97>74.6,1.00<1.96>537.7,1.00<1.97>4050.1,1.00<1.93>728.6,1.00<1.93>699.5,1.00<1.96>
ia64-1.4GHzifort-8.039.8,1.00<1.98>53.6,1.00<2.00>354.4,1.00<1.97>1458.6,1.00<1.90>331.9,1.0<1.87>580.0,1.00<1.93>
x86_64-1.8GHzgcc-3.424.6,1.00<1.95>44.1,1.00<1.86>244.0,1.00<1.85>1376.7,1.00<1.97>  
Mac-G5-2.0GHzgcc-3.463.5,1.00<1.80> 
IBM-Pwr4-1GHzxlf-8.151.8,1.00<1.94>83.3,1.00<1.92> 658.0,1.00<1.93>657.8,1.0<1.95>
MDGRAPE-2Sifort-8.018.5,1.00<1.83> N/AN/A 1161.1,1.0<1.98>
ia32-3.2GHzifort-8.0360.3,19.5<2.0> N/AN/A 30348.0,26.09<2.0>
4 CPUs
x86_64-2.2GHzgcc-3.49.6,1.00<3.86>17.4,1.00<3.74>  
ia32-3.2GHzgcc-3.414.0,1.00<3.27>32.1,1.00<2.79>133.7,1.00<3.85>656.2,1.00<3.95>274.7,1.00<2.58>219.0,1.00<3.67>
ia32-3.2GHzifort-8.012.6,1.00<3.25>30.4,1.00<2.73>106.4,1.00<3.76>578.0,1.00<3.82>264.8,1.00<2.54>200.9,1.00<3.83>
ia64-1.4GHzgcc-3.426.0,1.00<3.83>38.2,1.00<3.83>275.4,1.00<3.85>1997.6,1.00<3.92>379.4,1.00<3.71>362.1,1.00<3.79>
ia64-1.4GHzifort-8.020.3,1.00<3.88>28.2,1.00<3.80>182.0,1.00<3.84>719.2,1.00<3.85>176.1,1.00<3.52>295.5,1.00<3.79>
x86_64-2.2GHzgcc-3.418.7,1.00<1.98>33.0,1.00<1.97>  
IBM-Pwr4-1GHzxlf-8.127.3,1.00<3.68>44.1,1.00<3.63> 362.6,1.00<3.50>408.9,1.0<3.14>
MDGRAPE-2Sifort-8.011.1,1.00<3.05> N/AN/A 593.2,1.00<3.87>
ia32-3.2GHzifort-8.0184.8,16.7<3.9> N/AN/A 15077.0,25.18<4.0>
8 CPUs
x86_64-2.2GHzgcc-3.45.6,1.00<6.63>11.3,1.00<5.76>  
ia32-3.2GHzgcc-3.49.0,1.00<5.09>23.2,1.00<3.86>71.2,1.00<7.24>350.4,1.00<7.40>204.5,1.00<3.46>125.3,1.00<6.42>
ia32-3.2GHzifort-8.08.3,1.00<4.92>22.2,1.00<3.74>58.5,1.00<6.83>301.5,1.00<7.32>198.1,1.00<3.39>119.8,1.00<6.41>
ia64-1.4GHzgcc-3.413.5,1.00<7.39>20.8,1.00<7.03>143.7,1.00<7.38>1031.9,1.00<7.59>211.6,1.00<6.65>188.0,1.00<7.30>
ia64-1.4GHzifort-8.010.8,1.00<7.29>16.0,1.00<6.70>94.9,1.00,<7.36>369.3,1.00<7.50>107.7,1.00<5.75>154.9,1.00<7.23>
MDGRAPE-2Sifort-8.07.7,1.00<4.40> N/AN/A 315.3,1.00<7.29>
ia32-3.2GHzifort-8.093.7,12.2<7.60> N/AN/A 7533.0,23.29<7.98>
16 CPUs
ia64-1.4GHzgcc-3.47.7,1.00<12.95>13.1,1.0<11.17>78.1,1.00<13.59>519.8,1.00<15.07>135.4,1.0<10.39>105.3,1.00<13.04>
ia64-1.4GHzifort-8.06.4,1.00<12.30>10.8,1.00<9.93>50.8,1.00<13.75>191.3,1.00<14.46>85.0,1.00<7.28>88.4,1.00<12.68>

Notes:


Compile options:
  • gcc-ia32: g77 -malign-double -O3 -march=pentium4 -mmmx -msse2 -mfpmath=sse -fomit-frame-pointer -fschedule-insns2 -fno-backslash -fugly-complex -fno-globals -Wno-globals
  • gcc-ia64: g77 -fno-backslash -fugly-complex -fno-globals -Wno-globals -O3 -minline-float-divide-max-throughput
  • gcc-x86_64: g77 -O3 -msse2 -mmmx -mfpmath=sse -fno-backslash -fugly-complex -fno-globals -Wno-globals
  • ifort-ia32: ifort -O3 -tpp7 -132 -axW -w95 -cm
  • ifort-ia64: ifort -O2 -tpp2 -132 -ftz -WB -w95 -cm -i8
  • pgf77-x86_64: pgf77 -fastsse -tp k8-64
  • pathf90-x86_64: pathf90 -O2
  • xlf-Mac-G5: xlf -O5 -qarch=g5
  • xlf-IBM-pwr4: xlf90_r -O3 -qfixed -qalign=4k -qarch=auto -qtune=auto -qmaxmem=-1 -q64 -qintsize=8 -qposition=appendold
  • gcc-IBM-pwr4: g77 -O3 -fno-globals -Wno-globals
  • N/A Either the method or instruction set not available
  • RT/E Runtime Error
  • A Spherical cutoff method molecular dynamics, MbCO+3830 waters(14026 atoms), 100 steps
  • B Periodic boundary method (PMEwald) molecular dynamics, MbCO+4985 waters(17491 atoms), 100 steps
  • C HF/6-31G quantum mechanical calculation of 36 atom system (nanotube model), RUNTYP=GRADIENT; files can be found here
  • D The same as C only B3LYP/6-31G DFT method is used; files can be found here
  • JAC The Joint Amber Charmm benchmark JAC1000(23558 atoms)
  • SHC5 + water (130711 atoms)
See also the older page and Spatial decomposition benchmarks
Milan Hodoscek
Last modified: Sun Feb 20 09:47:27 CEST 2006