Mechanism for Scheduling Execution of Threads for Fair Resource Allocation in a Multi-Threaded and/or Multi-Core Processing System

a multi-threaded and/or multi-core processing system technology, applied in the field of scheduling, can solve the problems of limited capacity, computational cycle limitation, cpu power, etc., and achieve the effect of reducing the number of threads, and increasing the throughput of several important workload types

Inactive Publication Date: 2010-08-12
VMWARE INC
View PDF4 Cites 69 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, as the number of threads, or the workload within each thread, increases, the point may be reached where computational cycles, i.e., CPU power, is the limiting factor.
In Intel's terminology, the single chip is referred to as a “package.” While multi-threading does not provide the performance of a true multi-processor or multi-core system, it can improve the utilization of on-chip resources, leading to greater throughput for several important workload types, by exploiting additional instruction-level parallelism that is exposed by executing the instruction streams associated with multiple threads concurrently.
However, if both threads demand large amounts of cache, they will compete for the limited capacity and likely slow each other down.
Because at least one resource is shared between the logical processors of a multi-threaded system, the problem can arise that one thread might be “anti-cooperative,” meaning that it does not conform to a predetermined notion of “fairness.” Examples of anti-competitive execution behavior include using so much of or otherwise “hoarding” the shared resource or causing some other state change in the resource, such that a co-executing thread cannot execute as efficiently as it would if it had exclusive or at least “normal” use of the resource, or such that hardware or software intervention is required.
Although most anti-cooperative applications in the specific SMT architecture they studied caused performance degradations of less than five percent, Grunwald and Ghiasi showed that a malicious application could degrade the performance of another workload running on the same physical package by as much as 90% through, for example, the use of self-modifying code in a tight loop.
Existing OS schedulers are not designed to cope with such problems as a microarchitectural denial of service conflict (or outright attack); rather, known schedulers may adjust the amount of execution time allocated to each of a set of runnable threads, but this ignores that the allotted execution time of a given thread may be wasted because of the actions of a co-executing, anti-cooperative thread.
For example, as Grunwald points out, self-modifying code can lead to frequent complete flushes of a shared trace cache, which means that the cached information of the other running thread will also be lost, such that many processing cycles are needed to build it back up again, over and over.
Even though the “nice” thread will have its allotted execution time, it will not be able to use it efficiently and the OS scheduler will not be able to do anything to improve the situation, assuming that the scheduler detects the situation at all.
One problem with both of Snavely's approaches are the Sample and Optimize phases, during which the processors are devoted to test cases.
Because Snavely's method is two-pass, it is not suitable for run-time detection and alleviation of anti-cooperative behavior at actual run time.
Yet another disadvantage of Snavely's approaches is that his systems do not directly attempt to determine anti-competitive behavior.
In other words, Snavely assumes that threads will cooperate as well during actual “working” execution as they did during the Sample phase, but this assumption may not be correct—Snavely cannot detect and deal with previously undetected, run-time anti-cooperativeness.
In the presence of run-time anti-cooperative execution behavior, however, merely allocating more CPU time to a thread does not ensure optimal execution progress.
As Grunwald points out, however, even very small thread segments (with self-modifying code, for example) can cause severe performance degradation of another running thread, such that merely reducing allocated time may not eliminate the problem: For example, a processor may have 90% of the total CPU time, but the 10% used by another, coscheduled and highly anti-cooperative thread might cause much of the other processor's 90% to be wasted recovering from the resource hoarding of the anti-cooperative thread.
For example, a user may suppose that a particular important process contains self-modifying code in a tight loop, or has in the past caused problems for co-scheduled threads in an SMT architecture.
Stalling or suspending this thread would therefore benefit other threads, but would lead to a worse result from the user's perspective.
Proposed mechanisms for dealing with the problem of shared resource hoarding in multi-threaded architectures fail to provide the user with any ability to influence how the OS addresses the problem.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Mechanism for Scheduling Execution of Threads for Fair Resource Allocation in a Multi-Threaded and/or Multi-Core Processing System
  • Mechanism for Scheduling Execution of Threads for Fair Resource Allocation in a Multi-Threaded and/or Multi-Core Processing System
  • Mechanism for Scheduling Execution of Threads for Fair Resource Allocation in a Multi-Threaded and/or Multi-Core Processing System

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0046]The main idea of the invention is flexible enforcement of performance isolation using the hardware capabilities of SMT / multi-core processors. The simplest embodiment of the invention is illustrated in FIG. 1: A pair of “partnered” processors CPU0, CPU1 are associated in a functional group 101 such that they share at least one group resource 102 under the control of known hardware mechanisms within the group. As just one example, in a simultaneous multi-threaded (SMT, or, here, simply “multi-threaded”) architecture such as Intel Corp.'s Hyper-Threaded Technology, there are two logical processors per package (a type of group), but a hardware mechanism in the processor package itself determines how each thread accesses the trace caches.

[0047]A scheduler 610 schedules each of a plurality (two are shown by way of example) of logically cooperating executable threads Ta, Tb for execution on the processors CPU0, CPU1, while an activity sensor 615 within or accessible by the scheduler ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

A thread scheduling mechanism is provided that flexibly enforces performance isolation of multiple threads to alleviate the effect of anti-cooperative execution behavior with respect to a shared resource, for example, hoarding a cache or pipeline, using the hardware capabilities of simultaneous multi-threaded (SMT) or multi-core processors. Given a plurality of threads running on at least two processors in at least one functional processor group, the occurrence of a rescheduling condition indicating anti-cooperative execution behavior is sensed, and, if present, at least one of the threads is rescheduled such that the first and second threads no longer execute in the same functional processor group at the same time.

Description

CROSS-REFERENCE TO RELATED APPLICATION(S)[0001]This application claims the benefit of U.S. patent application Ser. No. 11 / 015,506, filed on 16 Dec. 2004, now issued as U.S. Pat. No. 7,707,578.BACKGROUND OF THE INVENTION[0002]1. Field of the Invention[0003]This invention relates to schedulers as found in modern operating systems and in particular to a scheduler for use in a computer system with a multi-threaded and / or multi-core architecture.[0004]2. Background Art[0005]As is well known, modern computer systems consist of one or more central processing units (CPUs), as well as supporting hardware such as memory and memory management units (MMU) for each CPU, as well as less essential peripheral hardware such as I / O devices like network interfaces, disks, printers, etc. Software is also part of a computer system; typically, a software application provides the ultimate utility of the computer system for users.[0006]Users often want to use more than one of these software applications, p...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(United States)
IPC IPC(8): G06F9/46G06F9/44G06F9/455G06F12/08G06F9/305
CPCG06F9/5027G06F2209/485G06F2209/483
Inventor ZEDLEWSKI, JOHN R.WALDSPURGER, CARL A.
Owner VMWARE INC
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products