TY - GEN
T1 - A comparison of two approaches to parallel simulation of multiprocessors
AU - Over, Andrew
AU - Clarke, Bill
AU - Strazdins, Peter
PY - 2007
Y1 - 2007
N2 - The design trend towards CMPs has made the simulation of multiprocessor systems a necessity and has also made multiprocessor systems widely available. While a serial multiprocessor simulation necessarily imposes a linear slowdown, running such a simulation in parallel may help mitigate this effect. In this paper we document our experiences with two different methods of parallelizing Sparc Sulima, a simulator of UltraSPARC IIICu-based multiprocessor systems. In the first approach, a simple interconnect model within the simulator is parallelized non-deterministically using careful locking. In the second, a detailed interconnect model is parallelized while preserving determinism using parallel discrete event simulation (PDES) techniques. While both approaches demonstrate a threefold speedup using 4 threads on workloads from the NAS parallel benchmarks, speedup proved constrained by loadbalancing between simulated processors. A theoretical model is developed to help understand why observed speedup is less than ideal. An analysis of the related speed-accuracy tradeoff in the first approach with respect to the simulation time quantum is also given; the results show that, for both serial and parallel simulation, a quantum in the order of a few hundreds of cycles represents a 'sweet-spot', but parallel simulation is significantly more accurate for a given quantum size. As with the speedup analysis, these effects are workload dependent.
AB - The design trend towards CMPs has made the simulation of multiprocessor systems a necessity and has also made multiprocessor systems widely available. While a serial multiprocessor simulation necessarily imposes a linear slowdown, running such a simulation in parallel may help mitigate this effect. In this paper we document our experiences with two different methods of parallelizing Sparc Sulima, a simulator of UltraSPARC IIICu-based multiprocessor systems. In the first approach, a simple interconnect model within the simulator is parallelized non-deterministically using careful locking. In the second, a detailed interconnect model is parallelized while preserving determinism using parallel discrete event simulation (PDES) techniques. While both approaches demonstrate a threefold speedup using 4 threads on workloads from the NAS parallel benchmarks, speedup proved constrained by loadbalancing between simulated processors. A theoretical model is developed to help understand why observed speedup is less than ideal. An analysis of the related speed-accuracy tradeoff in the first approach with respect to the simulation time quantum is also given; the results show that, for both serial and parallel simulation, a quantum in the order of a few hundreds of cycles represents a 'sweet-spot', but parallel simulation is significantly more accurate for a given quantum size. As with the speedup analysis, these effects are workload dependent.
UR - http://www.scopus.com/inward/record.url?scp=36949012714&partnerID=8YFLogxK
U2 - 10.1109/ISPASS.2007.363732
DO - 10.1109/ISPASS.2007.363732
M3 - Conference contribution
SN - 1424410819
SN - 9781424410811
T3 - ISPASS 2007: IEEE International Symposium on Performance Analysis of Systems and Software
SP - 12
EP - 22
BT - ISPASS 2007
T2 - ISPASS 2007: IEEE International Symposium on Performance Analysis of Systems and Software
Y2 - 25 April 2007 through 27 April 2007
ER -