A high-speed method for maintaining a summary of thread activity reduces the number of remote-memory operations for an n processor,
multiple node computer
system from n2 to (2n−1) operations. The method uses a hierarchical summary of-thread-activity
data structure that includes structures such as first and second level bit masks. The first level bit
mask is accessible to all nodes and contains a bit per node, the bit indicating whether the corresponding node contains a processor that has not yet passed through a
quiescent state. The second level bit
mask is local to each node and contains a bit per processor per node, the bit indicating whether the corresponding processor has not yet passed through a
quiescent state. The method includes determining from a
data structure on the processor's node (such as a second level bitmask) if the processor has passed through a
quiescent state. If so, it is then determined from the
data structure if all other processors on its node have passed through a quiescent state. If so, it is then indicated in a data structure accessible to all nodes (such as the first level bitmask) that all processors on the processor's node have passed through a quiescent state. The local
generation number can also be stored in the data structure accessible to all nodes. If a processor determines from this data structure that the processor is the last processor to pass through a quiescent state, the processor updates the data structure for storing a number of the
current generation stored in the memory of each node.