However, as the number of threads, or the
workload within each thread, increases, the point may be reached where computational cycles, i.e., CPU power, is the
limiting factor.
In Intel's terminology, the
single chip is referred to as a “
package.” While multi-threading does not provide the performance of a true multi-processor or multi-
core system, it can improve the utilization of on-
chip resources, leading to greater
throughput for several important
workload types, by exploiting additional instruction-level parallelism that is exposed by executing the instruction streams associated with multiple threads concurrently.
However, if both threads demand large amounts of cache, they will compete for the
limited capacity and likely slow each other down.
Because at least one resource is shared between the logical processors of a multi-threaded system, the problem can arise that one thread might be “anti-cooperative,” meaning that it does not conform to a predetermined notion of “fairness.” Examples of anti-competitive execution behavior include using so much of or otherwise “hoarding” the
shared resource or causing some other state change in the resource, such that a co-executing thread cannot execute as efficiently as it would if it had
exclusive or at least “normal” use of the resource, or such that hardware or
software intervention is required.
Although most anti-cooperative applications in the specific SMT architecture they studied caused performance degradations of less than five percent, Grunwald and Ghiasi showed that a malicious application could degrade the performance of another
workload running on the same physical
package by as much as 90% through, for example, the use of self-modifying code in a tight loop.
Existing OS schedulers are not designed to cope with such problems as a microarchitectural denial of service conflict (or outright
attack); rather, known schedulers may adjust the amount of
execution time allocated to each of a set of runnable threads, but this ignores that the allotted
execution time of a given thread may be wasted because of the actions of a co-executing, anti-cooperative thread.
For example, as Grunwald points out, self-modifying code can lead to frequent complete flushes of a shared
trace cache, which means that the cached information of the other running thread will also be lost, such that many
processing cycles are needed to build it back up again, over and over.
Even though the “nice” thread will have its allotted
execution time, it will not be able to use it efficiently and the OS scheduler will not be able to do anything to improve the situation, assuming that the scheduler detects the situation at all.
One problem with both of Snavely's approaches are the Sample and Optimize phases, during which the processors are devoted to test cases.
Because Snavely's method is two-pass, it is not suitable for run-time detection and alleviation of anti-
cooperative behavior at actual run time.
Yet another
disadvantage of Snavely's approaches is that his systems do not directly attempt to determine anti-competitive behavior.
In other words, Snavely assumes that threads will cooperate as well during actual “working” execution as they did during the Sample phase, but this assumption may not be correct—Snavely cannot detect and deal with previously undetected, run-time anti-cooperativeness.
In the presence of run-time anti-cooperative execution behavior, however, merely allocating more
CPU time to a thread does not ensure optimal execution progress.
As Grunwald points out, however, even very small thread segments (with self-modifying code, for example) can cause severe performance degradation of another running thread, such that merely reducing allocated time may not eliminate the problem: For example, a processor may have 90% of the total
CPU time, but the 10% used by another, coscheduled and highly anti-cooperative thread might cause much of the other processor's 90% to be wasted recovering from the resource hoarding of the anti-cooperative thread.
For example, a user may suppose that a particular important process contains self-modifying code in a tight loop, or has in the past caused problems for co-scheduled threads in an SMT architecture.
Stalling or suspending this thread would therefore benefit other threads, but would lead to a worse result from the user's perspective.
Proposed mechanisms for dealing with the problem of
shared resource hoarding in multi-threaded architectures fail to provide the user with any ability to influence how the OS addresses the problem.