Abstract
The performance of modern multiprocessor systems is often limited by the delays of interconnections or long latencies of memory subsystems. Instruction-level multithreading is a technique to tolerate such long latencies by switching from one instruction thread to another and continuing instruction execution concurrently with the long-latency operations. Using timed Petri net models, the paper analyzes performance limitations introduced by different components of distributed-memory multithreaded multiprocessor systems. Simulation results are used to compare performance improvements obtained by replicating critical components of the system to those obtained using components with better performance characteristics.
