AbstractA dynamic speculative multithreaded processor automatically extracts thread level parallelism from sequential binary applications without software support. The hardware is responsible for partitioning the program into threads and managing inter-thread dependencies. Current published dynamic thread partitioning algorithms work by detecting loops, procedures, or partitioning at fixed intervals. Research has thus far examined these algorithms in isolation from one another. This paper makes two contributions. First, it quantitatively compares different dynamic partitioning algorithms in the context of a fixed microarchitecture. The architecture is a single-chip shared memory multiprocessor enhanced to allow thread and value speculation. Second, this paper presents a new dynamic partitioning algorithm called MEM-slicing. Insights into the development and operation of this algorithm are presented. The technique is particularly suited to irregular, non-numeric programs, and greatly outperforms other algorithms in this domain. MEM-slicing is shown to be an important tool to enable the automatic parallelization of irregular binary applications. Over SPECint95, an average speedup of 3.4 is achieved on 8 processors.