@InProceedings{SenMuller00, title = "{S}ynchronisation in a {M}ultithreaded {P}rocessor", author= "Sen, Shondip and Muller, Henk and May, David", editor= "Welch, Peter H. and Bakkers, Andr\`{e} W. P.", pages = "137--144", booktitle= "{C}ommunicating {P}rocess {A}rchitectures 2000", isbn= "1 58603 077 9", year= "2000", month= "sep", abstract= "A multithreaded architecture exploits instruction level parallelism by interleaving instructions from disjoint thread contexts. As each thread executes within its own instruction stream with private data (the context registers), there is no interdependency between instructions from different threads. This allows high resource utilisation of a super scalar pipelined processor at a very low cost, in terms of complexity and silicon area. A new synchronisation mechanism for a multithreaded architecture is outlined. Two new instructions have been introduced to perform one to one and n-way synchronisation. The operation allows synchronisations to be requested and actioned efficiently on chip in as little as four clock cycles. Barriers and CSP style channels can easily be constructed with this new synchronisation instruction. A brief examination of performance of this multithreaded architecture shows that the optimum number of contexts per multithreaded processing element is four, based on test programs." }