Benchmark performance refers to the use of a set of integer and floating-point programs (known collectively as a benchmark) that are designed to test different performance aspects of the computing system(s) under test. Benchmark programs should be designed to provide fair and effective comparisons among high-performance computing systems. For a benchmark to be meaningful, it should evaluate faithfully the performance for the intended use of the system.
The examples of benchmarks are the Dhrystone and Whetstone benchmarks. These are synthetic (not real) benchmarks intended to measure the performance of real machines.
The Dhrystone benchmark addresses integer performance. It consists of 100 statements and does not use floating-point operations or data. The rate obtained from Dhrystone is used to compute the MIPS index as a performance measure. This makes the Dhrystone rather unreliable as a source for performance measure.
The Whetstone, on the other hand, is a kernel program that addresses floating-point performance for arithmetic operations, array indexing, conditional branch, and subroutine calls. The execution speed obtained using Whetstone is used solely to determine the system performance. This leads to a single-figure measure for performance, which makes it unreliable.
Synthetic benchmarks were superseded by several application software segments that reflect real engineering and scientific applications. These include PERFECT (Performance Evaluation for Cost-Effective Transformations), TPC measure for database I/O performance, and SPEC (Standard Performance Evaluation Corporation) measure.
The SPEC is a non-profit corporation formed to “establish, maintain, and endorse a standardized set of relevant benchmarks that can be applied to the newest generation of high-performance computers”.
The first SPEC benchmark suite was released in 1989 (SPEC89). It consisted of ten engineering/scientific programs. Two measures were derived from SPEC89. The SPECmark measures the ten programs’ execution rates and SPECthruput, which examines the system’s throughput. Owing to its unsatisfactory results, SPEC89 was replaced by SPEC92 in 1992.
The SPEC92 consists of two suites as CINT92 and CFP92. CINT92 consists of six integer intensive C programs. CFP92, which consists of 14 floating-point intensive C and FORTRAN programs.
In SPEC92, the measure SPECratio represents the ratio of the actual execution time to the predetermined reference time for a given program. Also, SPEC92 uses the measure SPECint92 as the geometric mean of the SPECratio for the programs in CINT92.
Similarly, the measure SPECfp92 is the geometric mean of the SPECratio for the programs in CFP92. In using SPEC for performance measures, three major steps have to be taken: building the tools, preparing auxiliary files, and running the benchmark suites. The tools are used to compile, run, and evaluate the benchmarks.
Compilation information such as the optimization flags and references to alternate source code is kept in what is called makefile wrappers and configuration files. The tools and the auxiliary files are then used to compile and execute the code and compute the SPEC metrics.