Performance of Beta version on IA64
We compared performance results of Auto-Parallelizer beta version with the results of the most efficient IA64 platform compiler - icc 11.0.074, as well as results of the most widely spread compiler - gcc (version 4.3.1). The comparison were done on 6 berchmarks from SPEC/CPU2006 and on 6 benchmarks from NAS Parallel Benchmarks 3.3. The following host was used for comparison:
4 x Intel Itanium2 1.0Ghz with 3Gb memory onboard
Compilation flags:
| icc | icc -O2 -ipo -no-prec-div |
| icc + parallel | icc -O2 -parallel -ipo -no-prec-div |
| gcc | gcc -O2 |
| utl | [see below] |
Measurements on SPEC/CPU2006 benchmarks
Utl options, that were used for SPEC/CPU2006 benchmarks compiling:
| 410.bwaves | -Ws,--alias-fortran -Ws,--strict-types |
| 436.cactusADM | -Ws,--alias-fortran -Ws,--strict-types (for FORTRAN sources) |
| 437.leslie3d | -Ws,--alias-fortran -Ws,--strict-types |
| 459.GemsFDT | -Ws,--inter-module -Ws,--alias-fortran -Ws,--strict-types |
| 462.libquantum | -Ws,--inter-module -Ws,--pto-wilson |
| 470.lbm | -Ws,--inter-module -Ws,--pto-wilson |
Below you will find the comparison results: firstly - as a diagram, and after that as a table with measurement results.
Measurements on NAS Parallel Benchmarks
Utl options, that were used for NAS Parallel Benchmarks benchmarks compiling:
| BT | -Ws,--strict-types -Ws,--alias-fortran -Ws,--opt-force -Ws,--inter-module -Ws,--inline -Ws,--localize -Ws,--lowerscope |
| CG | -Ws,--alias-fortran -Ws,--inter-module -Ws,--inline |
| EP | -Ws,--strict-types -Ws,--alias-fortran -Ws,--inter-module -Ws,--inline -Ws,--lowerscope |
| MG | -Ws,--strict-types -Ws,--alias-fortran -Ws,--inter-module -Ws,--inline |
| SP | -Ws,--strict-types -Ws,--alias-fortran -Ws,--inter-module -Ws,--inline -Ws,--localize -Ws,--lowerscope -Ws,--inline |
| UA | -Ws,--strict-types -Ws,--alias-fortran -Ws,--inter-module -Ws,--inline |
Below you will find the comparison results: firstly - as a diagram, and after that as a table with measurement results.
* - MG and CG benchmarks were measured with input data of B class. This was done to reduce measuring error, because these benchmarks work too fast on data of A class.
The other benchmarks where measured on input data of A class.
Measurement results on big machine
Besides, we had possibility to measure performance of Auto-Parallelizer on machine Bull NovaScale 5325 with following properties:
32 x dual-core Intel Itanium2 1.6Ghz with 256Gb memory onboard
Measurements on SPEC/CPU2006 benchmarks
Below you will find the comparison results: firstly - as a diagram, and after that as a table with measurement results.
Measurements on NAS Parallel Benchmarks
Below you will find the comparison results: firstly - as a diagram, and after that as a table with measurement results.
* - MG and CG benchmarks were measured with input data of B class. This was done to reduce measuring error, because these benchmarks work too fast on data of A class.
The other benchmarks where measured on input data of A class.
RUSSIAN