Performance of Beta version on PowerPC
We compared performance results of Auto-Parallelizer beta version with the results of the most efficient PowerPC platform compiler - xlc 10.1, as well as results of the most widely spread compiler - gcc (version 4.3.1). The comparison were done on 5 berchmarks from SPEC/CPU2006 and on 6 benchmarks from NAS Parallel Benchmarks 3.3. The following host was used for comparison:
2 x PPC970FX 2.2GHz with 4Gb memory onboard
Measurements were performed on hosts provided by the Joint Supercomputer Center.
Compilation flags:
| xlc | xlc -O3 -qtune=auto -qarch=auto -qipa |
| xlc + smp | xlc -O3 -qtune=auto -qarch=auto -qipa -qsmp |
| gcc | gcc -O2 |
| utl | [see below] |
Measurements on SPEC/CPU2006 benchmarks
Utl options, that were used for SPEC/CPU2006 benchmarks compiling:
| 410.bwaves | -Ws,--alias-fortran -Ws,--strict-types |
| 437.leslie3d | -Ws,--alias-fortran -Ws,--strict-types |
| 459.GemsFDT | -Ws,--inter-module -Ws,--alias-fortran -Ws,--strict-types |
| 462.libquantum | -Ws,--inter-module -Ws,--pto-wilson |
| 470.lbm | -Ws,--inter-module -Ws,--pto-wilson |
Below you will find the comparison results: firstly - as a diagram, and after that as a table with measurement results.
Measurements on NAS Parallel Benchmarks
Utl options, that were used for NAS Parallel Benchmarks benchmarks compiling:
| BT | -Ws,--strict-types -Ws,--alias-fortran -Ws,--opt-force -Ws,--inter-module -Ws,--inline -Ws,--localize -Ws,--lowerscope |
| CG | -Ws,--alias-fortran -Ws,--inter-module -Ws,--inline |
| EP | -Ws,--strict-types -Ws,--alias-fortran -Ws,--inter-module -Ws,--inline -Ws,--lowerscope |
| MG | -Ws,--strict-types -Ws,--alias-fortran -Ws,--inter-module -Ws,--inline |
| SP | -Ws,--strict-types -Ws,--alias-fortran -Ws,--inter-module -Ws,--inline -Ws,--localize -Ws,--lowerscope -Ws,--inline |
| UA | -Ws,--strict-types -Ws,--alias-fortran -Ws,--inter-module -Ws,--inline |
Below you will find the comparison results: firstly - as a diagram, and after that as a table with measurement results.
* - MG and CG benchmarks were measured with input data of B class. This was done to reduce measuring error, because these benchmarks work too fast on data of A class.
The other benchmarks where measured on input data of A class.
RUSSIAN