Previous Thread
Next Thread
Print Thread
Results depend on number of cores
#19367 11/19/08 12:23 PM
Joined: Oct 2007
Posts: 24
M
Forum Member
OP Offline
Forum Member
M
Joined: Oct 2007
Posts: 24
Hello,

I'm currently in negotiations with some companies to buy a HPC cluster for CHARMM simulations. One of them is making a few benchmarks for me and they got a very strange behavior: The results of the simulation depend on the number of cores used. For instance the energy changes if different numbers of cores are used...but it remains the same if the simulation is rerun with the same number of cores.

Does anybody have any idea how that can happen? I kind of suspect a problem with the compilation. The person that is doing the benchmarks for me uses AMD Opterons and Intel Xeons connected via different Infiniband versions (DDR, ConnectX). Currently he is using mvapich...

Any idea is highly appreciated,

Alexander

Re: Results depend on number of cores
Melbourne #19368 11/19/08 05:35 PM
Joined: Sep 2003
Posts: 8,470
rmv Online Content
Forum Member
Online Content
Forum Member
Joined: Sep 2003
Posts: 8,470
The CHARMM release version is relevant information for this sort of problem.

It may be important to make sure that code run on AMD is compiled on AMD, and likewise for Intel.

Re: Results depend on number of cores
Melbourne #19369 11/20/08 06:30 AM
Joined: Nov 2003
Posts: 200
Forum Member
Offline
Forum Member
Joined: Nov 2003
Posts: 200
This may be a compiler problem. Is the test case using pme, and was charmm compiled with intel compiler? If so, please check to see if the Ewksum term is what is changing between different processor counts. If that is where the trouble lies, then please specify which version of charmm you are using. If you are not using c35b1 and the rest is true, you should either get c35b1 or put in a fix for the loop that the intel compiler incorrectly calculates. If you want the workaround/fix, let us know.
all the best
mike


Physical mail: Dr. Michael F. Crowley National Renewable Energy Laboratory, MS 3323 1617 Cole Blvd. Golden, CO 80401
Re: Results depend on number of cores
crowley #19370 11/20/08 08:57 AM
Joined: Oct 2007
Posts: 24
M
Forum Member
OP Offline
Forum Member
M
Joined: Oct 2007
Posts: 24
Thanks for the replys. The CHARMM version used is c34b2 and the compiler is the Intel 10.1.0xx. The test case uses PME and the EWKSum changes...but a lot of other stuff with it. Here are a few examples:

First an energy calculation directly before starting dyna (the energy values are exactly the same):

1 core:

Code:
  SPACE FOR  9801618 ATOM PAIRS AND        0 GROUP PAIRS

Image nonbond list generation found:
3025528 ATOM PAIRS WERE FOUND FOR ATOM LIST
0 ATOM PAIRS WERE FOUND FOR ATOM SELF LIST
183614 GROUP PAIRS REQUIRED ATOM SEARCHES

ENER ENR: Eval# ENERgy Delta-E GRMS
ENER INTERN: BONDs ANGLes UREY-b DIHEdrals IMPRopers
ENER CROSS: CMAPs
ENER EXTERN: VDWaals ELEC HBONds ASP USER
ENER IMAGES: IMNBvdw IMELec IMHBnd RXNField EXTElec
ENER EWALD: EWKSum EWSElf EWEXcl EWQCor EWUTil
---------- --------- --------- --------- --------- ---------
ENER> 0 -9081.22997 0.00000 10.96532
ENER INTERN> 604.79079 2389.87023 903.96418 1632.99950 34.58238
ENER CROSS> -54.29590
ENER EXTERN> -537.37735 -9789.44662 0.00000 0.00000 0.00000
ENER IMAGES> -650.01027 -1555.85652 0.00000 0.00000 0.00000
ENER EWALD> 198.54308 -83075.37800 80816.38452 0.00000 0.00000
---------- --------- --------- --------- --------- ---------



2 cores:

Code:
  SPACE FOR  4932684 ATOM PAIRS AND        0 GROUP PAIRS

Image nonbond list generation found:
1516930 ATOM PAIRS WERE FOUND FOR ATOM LIST
0 ATOM PAIRS WERE FOUND FOR ATOM SELF LIST
93463 GROUP PAIRS REQUIRED ATOM SEARCHES

ENER ENR: Eval# ENERgy Delta-E GRMS
ENER INTERN: BONDs ANGLes UREY-b DIHEdrals IMPRopers
ENER CROSS: CMAPs
ENER EXTERN: VDWaals ELEC HBONds ASP USER
ENER IMAGES: IMNBvdw IMELec IMHBnd RXNField EXTElec
ENER EWALD: EWKSum EWSElf EWEXcl EWQCor EWUTil
---------- --------- --------- --------- --------- ---------
ENER> 0 -9081.22997 0.00000 10.96532
ENER INTERN> 604.79079 2389.87023 903.96418 1632.99950 34.58238
ENER CROSS> -54.29590
ENER EXTERN> -537.37735 -9789.44662 0.00000 0.00000 0.00000
ENER IMAGES> -650.01027 -1555.85652 0.00000 0.00000 0.00000
ENER EWALD> 198.54308 -83075.37800 80816.38452 0.00000 0.00000
---------- --------- --------- --------- --------- ---------



And now directly after the dyna command this is in the output file (there are some differences already (e.g. TOTKe) but the EWKSum is exactly the same):

1 core

Code:
            SHAKE TOLERANCE =     0.10000E-09
NUMBER OF DEGREES OF FREEDOM = 18993
DYNA DYN: Step Time TOTEner TOTKe ENERgy TEMPerature
DYNA PROP: GRMS HFCTote HFCKe EHFCor VIRKe
DYNA INTERN: BONDs ANGLes UREY-b DIHEdrals IMPRopers
DYNA CROSS: CMAPs
DYNA EXTERN: VDWaals ELEC HBONds ASP USER
DYNA IMAGES: IMNBvdw IMELec IMHBnd RXNField EXTElec
DYNA EWALD: EWKSum EWSElf EWEXcl EWQCor EWUTil
DYNA PRESS: VIRE VIRI PRESSE PRESSI VOLUme
DYNA XTLE: XTLTe SURFtension XTLPe XTLtemp
---------- --------- --------- --------- --------- ---------
DYNA> 0 273440.00000 -7341.90637 1891.57194 -9233.47831 100.23507
DYNA PROP> 16.15086 -5742.79485 3776.96226 1599.11152 17348.06802
DYNA INTERN> 604.79307 2389.87625 903.96516 1632.99977 34.58272
DYNA CROSS> -54.29595
DYNA EXTERN> -534.02804 -9801.59021 0.00000 0.00000 0.00000
DYNA IMAGES> -560.95772 -1785.31413 0.00000 0.00000 0.00000
DYNA EWALD> 195.48436 -83075.37800 80816.38441 0.00000 0.00000
DYNA PRESS> 2187.56955 -13752.94823 -1864.51518 -9575.82281 80448.70962
DYNA XTLE> 5478.13231 -410.23408 9021.71357 190.09164
---------- --------- --------- --------- --------- ---------



2 cores

Code:
            SHAKE TOLERANCE =     0.10000E-09
NUMBER OF DEGREES OF FREEDOM = 18993
DYNA DYN: Step Time TOTEner TOTKe ENERgy TEMPerature
DYNA PROP: GRMS HFCTote HFCKe EHFCor VIRKe
DYNA INTERN: BONDs ANGLes UREY-b DIHEdrals IMPRopers
DYNA CROSS: CMAPs
DYNA EXTERN: VDWaals ELEC HBONds ASP USER
DYNA IMAGES: IMNBvdw IMELec IMHBnd RXNField EXTElec
DYNA EWALD: EWKSum EWSElf EWEXcl EWQCor EWUTil
DYNA PRESS: VIRE VIRI PRESSE PRESSI VOLUme
DYNA XTLE: XTLTe SURFtension XTLPe XTLtemp
---------- --------- --------- --------- --------- ---------
DYNA> 0 273440.00000 -7238.32196 1995.15635 -9233.47831 105.72404
DYNA PROP> 16.15086 -5715.54164 4011.83455 1522.78032-131895.98735
DYNA INTERN> 604.79307 2389.87625 903.96516 1632.99977 34.58272
DYNA CROSS> -54.29595
DYNA EXTERN> -534.02804 -9801.59021 0.00000 0.00000 0.00000
DYNA IMAGES> -560.95772 -1785.31413 0.00000 0.00000 0.00000
DYNA EWALD> 195.48436 -83075.37800 80816.38441 0.00000 0.00000
DYNA PRESS> 2187.56955 85743.08868 -1864.51518 75360.36564 80448.70962
DYNA XTLE> 5478.13231 -410.23408 9021.71357 190.09164
---------- --------- --------- --------- --------- ---------



And now after 500 steps (everything is different):

1 core

Code:
  DYNAMC> Averages for the last      500  steps:
AVER DYN: Step Time TOTEner TOTKe ENERgy TEMPerature
AVER PROP: GRMS HFCTote HFCKe EHFCor VIRKe
AVER INTERN: BONDs ANGLes UREY-b DIHEdrals IMPRopers
AVER CROSS: CMAPs
AVER EXTERN: VDWaals ELEC HBONds ASP USER
AVER IMAGES: IMNBvdw IMELec IMHBnd RXNField EXTElec
AVER EWALD: EWKSum EWSElf EWEXcl EWQCor EWUTil
AVER PRESS: VIRE VIRI PRESSE PRESSI VOLUme
AVER XTLE: XTLTe SURFtension XTLPe XTLtemp
---------- --------- --------- --------- --------- ---------
AVER> 500 273441.00000 -5835.46347 5725.31172 -11560.77519 303.38629
AVER PROP> 14.42637 -5802.59950 5824.99706 32.86397 3477.14088
AVER INTERN> 401.98253 1709.34009 702.86663 1451.66328 21.20112
AVER CROSS> -62.57592
AVER EXTERN> -541.05259 -10707.03146 0.00000 0.00000 0.00000
AVER IMAGES> -531.27835 -1927.80596 0.00000 0.00000 0.00000
AVER EWALD> 185.26080 -83075.37800 80812.03265 0.00000 0.00000
AVER PRESS> 2005.50102 -4323.59494 -1706.05532 -373.60968 80618.47275
AVER XTLE> 1551.53522 360.84755 7348.35461 203.62736
---------- --------- --------- --------- --------- ---------



2 cores

Code:
  DYNAMC> Averages for the last      500  steps:
AVER DYN: Step Time TOTEner TOTKe ENERgy TEMPerature
AVER PROP: GRMS HFCTote HFCKe EHFCor VIRKe
AVER INTERN: BONDs ANGLes UREY-b DIHEdrals IMPRopers
AVER CROSS: CMAPs
AVER EXTERN: VDWaals ELEC HBONds ASP USER
AVER IMAGES: IMNBvdw IMELec IMHBnd RXNField EXTElec
AVER EWALD: EWKSum EWSElf EWEXcl EWQCor EWUTil
AVER PRESS: VIRE VIRI PRESSE PRESSI VOLUme
AVER XTLE: XTLTe SURFtension XTLPe XTLtemp
---------- --------- --------- --------- --------- ---------
AVER> 500 273441.00000 -4928.50816 5725.26873 -10653.77689 303.38401
AVER PROP> 14.60199 -4893.31386 5833.88633 35.19430 4454.44963
AVER INTERN> 425.61360 1846.64863 742.73793 1513.11914 24.31595
AVER CROSS> -60.29523
AVER EXTERN> -461.05474 -10268.14187 0.00000 0.00000 0.00000
AVER IMAGES> -463.63354 -1889.90502 0.00000 0.00000 0.00000
AVER EWALD> 199.90583 -83075.37800 80812.29044 0.00000 0.00000
AVER PRESS> -52.28660 -2917.34649 41.44072 819.52347 81378.70542
AVER XTLE> 2049.14198 100.42642 6949.13554 1121.64762
---------- --------- --------- --------- --------- ---------



If there is a fix for this I would be very interested in it.

Alexander


Moderated by  lennart, rmv 

Link Copied to Clipboard
Powered by UBB.threads™ PHP Forum Software 7.7.4
(Release build 20200307)
Responsive Width:

PHP: 5.6.33-0+deb8u1 Page Time: 0.007s Queries: 22 (0.003s) Memory: 0.9226 MB (Peak: 1.0037 MB) Data Comp: Off Server Time: 2020-07-14 06:36:43 UTC
Valid HTML 5 and Valid CSS