Unified fanless radiator for data center cooling systems
By introducing a unified fanless radiator in the data center, combined with air-liquid and liquid-liquid heat exchangers, the high cost and low efficiency of existing cooling systems for high heat density computing components are solved, achieving efficient and economical cooling.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- NVIDIA CORP
- Filing Date
- 2021-09-13
- Publication Date
- 2026-06-30
AI Technical Summary
Existing data center cooling systems are inefficient and costly due to their complex mechanical equipment, making it difficult to meet different cooling needs economically and effectively when faced with the high heat density and varying cooling requirements of computing components.
Employing a unified fanless radiator, combined with air-liquid and liquid-liquid heat exchangers, air and liquid cooling are handled separately through main and auxiliary cooling circuits, simplifying or replacing equipment such as CDU, CRAC, CRAH, RDHX, and SCHX, and achieving efficient liquid cooling.
It reduces the capital and operating costs of the cooling system, improves the power usage efficiency (PUE) of the data center, simplifies the cooling process, reduces noise, and improves cooling efficiency.
Smart Images

Figure CN114190049B_ABST
Abstract
Description
Technical Field
[0001] At least one embodiment relates to a cooling system, including systems and methods for operating such cooling systems. In at least one embodiment, such a cooling system can be used in a data center containing one or more racks or computing servers. Background Technology
[0002] Data center cooling systems use fans to circulate air through server components. Some supercomputers or other high-capacity computers may use water or other cooling systems instead of air cooling systems to draw heat from the server components or racks within the data center to areas outside the data center. Cooling systems can include coolers within the data center area, which may include areas outside the data center itself. Further, the areas outside the data center may include cooling towers or other external heat exchangers that receive heated coolant from the data center and dissipate the heat to the environment (or external cooling medium) via forced air or other means. The cooled coolant is then recycled back into the data center. Coolers and cooling towers together form a cooling facility. Attached Figure Description
[0003] Figure 1 An exemplary data center cooling system subjected to the improvements described in at least one embodiment is shown;
[0004] Figure 2 Server-level features associated with a unified fanless radiator for a data center cooling system, according to at least one embodiment, are illustrated.
[0005] Figure 3A Rack-level features associated with a unified fanless radiator for a data center cooling system, according to at least one embodiment, are shown;
[0006] Figure 3B Features of a liquid-to-liquid heat exchanger associated with a unified fanless radiator for a data center cooling system, according to at least one embodiment, are shown.
[0007] Figure 4 Data center-level features associated with a unified fanless radiator for a data center cooling system, according to at least one embodiment, are illustrated.
[0008] Figure 5 The diagram illustrates a relationship according to at least one embodiment. Figure 2-4 Methods associated with data center cooling systems;
[0009] Figure 6 A distributed system according to at least one embodiment is shown;
[0010] Figure 7An exemplary data center according to at least one embodiment is shown;
[0011] Figure 8 A client-server network according to at least one embodiment is shown;
[0012] Figure 9 A computer network according to at least one embodiment is shown;
[0013] Figure 10A A networked computer system according to at least one embodiment is shown;
[0014] Figure 10B A networked computer system according to at least one embodiment is shown;
[0015] Figure 10C A networked computer system according to at least one embodiment is shown;
[0016] Figure 11 The illustration shows one or more components of a system environment according to at least one embodiment, in which the service can be provided as a third-party network service;
[0017] Figure 12 A cloud computing environment according to at least one embodiment is shown;
[0018] Figure 13 This illustrates a set of functional abstraction layers provided by a cloud computing environment according to at least one embodiment;
[0019] Figure 14 A chip-level supercomputer according to at least one embodiment is illustrated;
[0020] Figure 15 A rack-mounted supercomputer according to at least one embodiment is illustrated;
[0021] Figure 16 A rack-mounted supercomputer according to at least one embodiment is shown;
[0022] Figure 17 A supercomputer at the entire system level according to at least one embodiment is shown;
[0023] Figure 18A The inference and / or training logic according to at least one embodiment is illustrated;
[0024] Figure 18B The inference and / or training logic according to at least one embodiment is illustrated;
[0025] Figure 19 The training and deployment of a neural network according to at least one embodiment are illustrated;
[0026] Figure 20 The system architecture of the network according to at least one embodiment is shown;
[0027] Figure 21 The system architecture of the network according to at least one embodiment is shown;
[0028] Figure 22 A control plane protocol stack according to at least one embodiment is shown;
[0029] Figure 23 A user plane protocol stack according to at least one embodiment is shown;
[0030] Figure 24 The components of a core network according to at least one embodiment are shown;
[0031] Figure 25 Components of a system supporting Network Function Virtualization (NFV) according to at least one embodiment are shown;
[0032] Figure 26 A processing system according to at least one embodiment is shown;
[0033] Figure 27 A computer system according to at least one embodiment is shown;
[0034] Figure 28 A system according to at least one embodiment is shown;
[0035] Figure 29 An exemplary integrated circuit according to at least one embodiment is shown;
[0036] Figure 30 A computing system according to at least one embodiment is shown;
[0037] Figure 31 An APU according to at least one embodiment is shown;
[0038] Figure 32 A CPU according to at least one embodiment is shown;
[0039] Figure 33 An exemplary accelerator integration slice according to at least one embodiment is shown;
[0040] Figures 34A-34B An exemplary graphics processor according to at least one embodiment is shown;
[0041] Figure 35A A graphics core according to at least one embodiment is shown;
[0042] Figure 35B A GPGPU according to at least one embodiment is shown;
[0043] Figure 36A A parallel processor according to at least one embodiment is shown;
[0044] Figure 36B A processing cluster according to at least one embodiment is shown;
[0045] Figure 36C A graphics multiprocessor according to at least one embodiment is shown;
[0046] Figure 37 A software stack of a programming platform according to at least one embodiment is shown;
[0047] Figure 38 The illustration shows an embodiment according to at least one of the embodiments. Figure 37 The CUDA implementation of the software stack;
[0048] Figure 39 The illustration shows an embodiment according to at least one of the embodiments. Figure 37 The ROCm implementation of the software stack;
[0049] Figure 40 The illustration shows an embodiment according to at least one of the embodiments. Figure 37 The OpenCL implementation of the software stack;
[0050] Figure 41 Software supported by a programming platform according to at least one embodiment is shown; and
[0051] Figure 42 A method for using at least one embodiment is shown. Figure 37-40 Compiled code executed on the programming platform. Detailed Implementation
[0052] In the following description, numerous specific details are set forth to provide a more thorough understanding of at least one embodiment. However, it will be apparent to those skilled in the art that the inventive concept can be practiced without one or more of these specific details.
[0053] In at least one embodiment, air cooling for high-density servers may be ineffective or potentially inefficient due to sudden high heat demands caused by varying computing loads in today's computing components. In at least one embodiment, since requirements vary between or tend towards a range of minimum to maximum cooling requirements, appropriate cooling systems must be used to meet these requirements economically. In at least one embodiment, liquid cooling systems can be used for medium to high cooling requirements. In at least one embodiment, different cooling requirements also reflect different thermal characteristics of the data center. In at least one embodiment, the heat generated from components, servers, and racks is collectively referred to as thermal characteristics or cooling requirements because cooling requirements must fully address thermal characteristics.
[0054] In at least one embodiment, a data center liquid cooling system is disclosed. In at least one embodiment, the data center cooling system addresses thermal characteristics in associated computing or data center equipment, such as graphics processing units (GPUs), switches, dual in-line memory modules (DIMMs), or central processing units (CPUs). Furthermore, in at least one embodiment, the associated computing or data center equipment may be a processing card having one or more GPUs, switches, or CPUs thereon. In at least one embodiment, each of the GPU, switch, and CPU may be a thermal characteristic of the computing device. In at least one embodiment, the GPU, CPU, or switch may have one or more cores, and each core may be a thermal characteristic.
[0055] In at least one embodiment, cooling of high heat-density racks for GPUs, switches, CPUs, and storage devices may rely on liquid coolants, such as water with additives or water alone. In at least one embodiment, air cooling may be used to cool lower heat-generating server / switch components. In at least one embodiment, the data center cooling system of this invention addresses problems arising from both liquid and air cooling in combined data centers, such as those related to the complexity of the various underlying cooling devices required to address both needs. In at least one embodiment, the data center cooling system of this invention can replace complex and mechanical devices such as coolant distribution units (CDUs), computer room air conditioners (CRACs), computer room air processors (CRAHs), backdoor heat exchangers (RDHXs), and subcooled heat exchangers (SCHXs), all of which rely partially or substantially on fan-based cooling. In at least one embodiment, the CRAH / CRAC / RDHX / SCHX components support air cooling and the CDU supports liquid cooling. In at least one embodiment, all air- and liquid-cooled components are housed in a unified fanless radiator that serves as the air source for air-cooled components via an air-to-liquid heat exchanger, while simultaneously providing cooling for liquid-cooled cold plates and the liquid in the immersion cooling system via a liquid-to-liquid heat exchanger.
[0056] In at least one embodiment, considering a data center design using a cold or hot aisle containment scheme, a uniform fanless radiator is placed above or below a rack row in the data center. In at least one embodiment, primary coolant from a data center cooling station or facility or from a liquid-side economical cooling tower main cooling loop can be supplied to the radiator. In at least one embodiment, the radiator may include a path for hot air travel, wherein the hot air is driven by server fans, component fans, rack-mounted fans, or wall fans in the data center. In at least one embodiment, this path may be multiple heat sinks for retaining heat from the hot air. In at least one embodiment, a path is provided for the primary coolant to adequately cool the hot air. In at least one embodiment, another path is provided for the primary coolant to cool auxiliary or second coolant returning to the cold aisle of the rack row housing for liquid-cooled components. In at least one embodiment, yet another path is provided for cooling air to return to the cold aisle of the rack row, such that the aforementioned fans can circulate the cooling air. In at least one embodiment, the cooling of the hot air and the auxiliary coolant occurs simultaneously in their respective heat exchangers within the radiator. In at least one embodiment, simultaneous cooling occurs in series or parallel within the heat sink, such that the primary coolant is first used to cool the auxiliary coolant and then the hot air, or that the primary coolant simultaneously cools both the auxiliary coolant and the cold air. In at least one embodiment, the auxiliary coolant is associated with one or more servers, cold plates, or racks and flows through server and rack manifolds.
[0057] In at least one embodiment, this disclosure enables the implementation of sections or portions of heat sinks for different heat exchangers. In at least one embodiment, a unified fanless heat sink is adapted to work with one or more racks in a data center and can represent a single unit suitable for cooling both air and liquid. In at least one embodiment, a unified fanless heat sink is adapted to eliminate data center-level fans and can provide uniform air and liquid cooling for high heat density racks while reducing the capital cost, operating cost, and data center footprint of cooling units, all of which simultaneously improve data center power usage efficiency (PUE).
[0058] In at least one embodiment, the radiator described herein relates to simplifying data center cooling by replacing coolant distribution units (CDUs), computer room air conditioners (CRACs), computer room air processors (CRAHs), rear door heat exchangers (RDHXs), and subcooling heat exchangers (SCHXs). In at least one embodiment, instead of such features, the radiator is supported by a main cooling loop from the cooler facility to supply main coolant directly to the radiator function. In at least one embodiment, the radiator may be a unified heat exchanger for both air and liquid, and is located above or below one or more racks. In at least one embodiment, the radiator essentially functions as an air-to-liquid heat exchanger, with the main coolant used to draw heat away from the racks via paths traveling through, between, or around the radiator's fins. In at least one embodiment, a portion of the radiator functions as a liquid-to-liquid heat exchanger, supporting one or more auxiliary cooling loops having corresponding auxiliary coolants for drawing heat from computing components, servers, or racks within the data center. In at least one embodiment, in addition to simplifying the cooling process, the radiator here also reduces the streamline and distance that the auxiliary coolant must flow before reaching the computing components, servers, or racks. In at least one embodiment, because the auxiliary coolant can be directly responsible for absorbing heat from the computing components, servers, or racks in the data center, the shorter the path for exchanging the absorbed heat with the main coolant, the faster it can cool the corresponding equipment.
[0059] In at least one embodiment, a fanless radiator on or below one or more racks in a data center supports a first configuration as an air-to-liquid heat exchanger, wherein a main cooling circuit removes a first amount of heat from the fanless radiator, and a second configuration as a liquid-to-liquid heat exchanger with an auxiliary cooling circuit that exchanges a second amount of heat with the main cooling circuit that traverses the fanless radiator.
[0060] In at least one embodiment, the data center cooling system primarily consists of an external cooler with fan-based cooling, further supported by various cooling pipelines and auxiliary fan-based systems, such as a coolant distribution unit (CDU), computer room air conditioner (CRAC), computer room air processor (CRAH), rear door heat exchanger (RDHX), and subcooling heat exchanger (SCHX), largely replaced by the radiators and associated minimal components of the present invention. In at least one embodiment, the radiator used as a heat exchanger located on or below one or more racks is fanless. In at least one embodiment, this reduces noise generated from the many replaced components described above.
[0061] In at least one embodiment, Figure 1An exemplary data center 100 with an improved cooling system described herein is illustrated. In at least one embodiment, the data center 100 may be one or more server rooms 102 having racks 110 and auxiliary equipment for housing one or more servers on one or more server trays. In at least one embodiment, the data center 100 is supported by a cooling tower 104 located outside the data center 100. In at least one embodiment, the cooling tower 104 dissipates heat from within the data center 100 by acting on a main cooling loop 106. In at least one embodiment, a cooling distribution unit (CDU) 112 is used between the main cooling loop 106 and a second or auxiliary cooling loop 108 to enable heat to be extracted from the second or auxiliary cooling loop 108 to the main cooling loop 106. In at least one embodiment, in one aspect, the auxiliary cooling loop 108 may access different plumbing lines entering the server trays as needed throughout. In at least one embodiment, loops 106, 108 are shown as line diagrams, but those skilled in the art will recognize that one or more plumbing features may be used. In at least one embodiment, flexible polyvinyl chloride (PVC) pipes can be used with an associated piping system to allow fluid to move along each of loops 106, 108. In at least one embodiment, one or more coolant pumps can be used to maintain pressure differentials within loops 106, 108 to allow coolant to move according to temperature sensors in different locations, including in a room, in one or more racks 110, and / or in server enclosures or server trays within racks 110.
[0062] In at least one embodiment, the coolant in the main cooling circuit 106 and the auxiliary cooling circuit 108 may be at least water and additives, such as ethylene glycol or propylene glycol. In operation, in at least one embodiment, each of the main cooling circuit and the auxiliary cooling circuit has its own coolant. In at least one embodiment, the coolant in the auxiliary cooling circuit may be dedicated to the requirements of components in the server tray or rack 110. In at least one embodiment, the CDU 112 is capable of complex control over the coolant in circuits 106, 108, independently or simultaneously. In at least one embodiment, the CDU may be adapted to control the flow rate so that the coolant is appropriately distributed to extract heat generated within the rack 110. In at least one embodiment, additional tubing 114 is provided from the auxiliary cooling circuit 108 to enter each server tray and supply coolant to the electrical and / or computing components.
[0063] In at least one embodiment, electrical and / or computing components are interchangeably used to refer to heat-generating components that benefit from the data center cooling system. In at least one embodiment, conduit 118 forming part of auxiliary cooling loop 108 may be referred to as a chamber manifold. In at least one embodiment, conduit 116 extending from conduit 118 may also be part of auxiliary cooling loop 108, but may be referred to as an exhaust manifold. In at least one embodiment, conduit 114 enters the rack as part of auxiliary cooling loop 108, but may be referred to as a rack cooling manifold. In at least one embodiment, exhaust manifold 116 extends along the drain in data center 100 to all racks. In at least one embodiment, the piping of auxiliary cooling loop 108 including manifolds 118, 116, and 114 can be improved by at least one embodiment of the present disclosure. In at least one embodiment, a cooler 120 may be provided in the main cooling loop within data center 102 to support cooling prior to the cooling tower. In at least one embodiment, to the extent that additional circuits exist in the main control circuit, those skilled in the art will recognize upon reading this disclosure that these additional circuits provide cooling outside the rack and outside the auxiliary cooling circuits; and may be combined with the main cooling circuit used in this disclosure.
[0064] In at least one embodiment, during operation, heat generated within the server tray of rack 110 can be transferred to coolant leaving rack 110 via the flexible tubing of manifold 114 of second cooling circuit 108. In at least one embodiment, second coolant from CDU 112 for cooling rack 110 (in auxiliary cooling circuit 108) moves toward rack 110. In at least one embodiment, second coolant from CDU 112 is transferred from one side of chamber manifold having conduit 118 via manifold 116 to one side of rack 110 and via conduit 114 through one side of server tray. In at least one embodiment, used or returned second coolant (or second coolant leaving to carry heat away from computing components) exits from the other side of server tray (e.g., after circulating through server tray or through components on server tray, entering the left side of rack for server tray and exiting the right side of rack). In at least one embodiment, the used second coolant exiting the server tray or rack 110 exits from a different side (such as the drain side) of the conduit 114 and moves to the drain side of the parallel but also the manifold 116. In at least one embodiment, from the manifold 116, the used second coolant moves in a parallel portion of the chamber manifold 118, traveling in the opposite direction to the incoming second coolant (which may also be a newer second coolant), and toward the CDU 112.
[0065] In at least one embodiment, the used second coolant exchanges its heat with the main coolant in the main cooling circuit 106 via CDU 112. In at least one embodiment, the used second coolant is refreshed (e.g., relatively cooled compared to the temperature of the used second coolant stage) and prepared to be circulated back to the computing unit via the second cooling circuit 108. In at least one embodiment, various flow and temperature control features in CDU 112 enable control of the heat exchanged from the used second coolant or the flow of the second coolant into and out of CDU 112. In at least one embodiment, CDU 112 is also capable of controlling the flow of the main coolant in the main cooling circuit 106.
[0066] In at least one embodiment, the heat sink may be associated with one or more racks. In at least one embodiment, a first portion of the heat sink may be adapted to function as an air-liquid heat exchanger with a main cooling loop to remove first heat from one or more racks. In at least one embodiment, a second portion of the heat sink may be adapted to function as a liquid-liquid heat exchanger with an auxiliary cooling loop to exchange second heat with the main cooling loop. In at least one embodiment, the heat sink may be adapted to perform supplemental or economical cooling for the data center. In at least one embodiment, a flow controller including a pump or valve may be used to regulate flow between different paths, such as CDUs and heat sinks in the data center. In at least one embodiment, the flow controller may be controlled to direct auxiliary coolant flow to the heat sink and back to the computing device, server tray, or rack. In at least one embodiment, the flow controller may be controlled to direct auxiliary coolant flow to the CDU and back to the computing device, server tray, or rack. In at least one embodiment, the flow controller may be controlled to direct auxiliary coolant flow to both the heat sink and the CDU.
[0067] In at least one embodiment, these pumps and valves may have both mechanical and electrical components. In at least one embodiment, for an electric pump, a signal is provided from an electrical component, which may be located within or remotely to the pump. In at least one embodiment, this signal may cause the pump to start or change speed to control the flow rate or volume of coolant through the electric pump. In at least one embodiment, when the flow controller is a valve, the opening and closing are controlled by a signal input to an electrical component, which may be located within or remotely to the valve.
[0068] In at least one embodiment, such as Figure 2 The server-grade feature 200 shown can be associated with a unified fanless radiator 250 of a data center cooling system. In at least one embodiment, Figure 2This is a block diagram of a plan view of a server tray 202 and an associated uniform fanless heatsink 250. In at least one embodiment, the heatsink 250 is associated with the server tray or enclosure 202, and therefore with one or more racks in a data center. In at least one embodiment, the server tray or enclosure 202 includes one or more flow loops 214A, B, which are associated with auxiliary coolant from a rack manifold of the rack in which the server tray or enclosure 202 is mounted. In at least one embodiment, the heatsink 250 is the form factor of one or more server enclosures or trays. In at least one embodiment, auxiliary coolant may flow into the server tray or enclosure 202 via a server manifold 204. In at least one embodiment, the server manifold 204 has separation of inlet coolant to cold plates 210A-D and outlet coolant from cold plates 210A-D, or may buffer all inlet and outlet coolant to achieve uniform temperature.
[0069] In at least one embodiment, cold plates 210A-D are associated with computing devices 220A-D. In at least one embodiment, auxiliary coolant may flow into manifold 204 and reach cold plates 210A, B via inlet line 210, and exit cold plates 210A, B via intermediate line 216 and outlet line 212. In at least one embodiment, the second cooling circuit 214B may be functionally similar to the first cooling circuit 214A, or may be supported by a different coolant source and server manifold than the first cooling circuit 214A. In at least one embodiment, when the second cooling circuit 214B is supported by at least different coolant sources, different coolants from different coolant sources flow into and out of server manifold 204, which has separate lines 206A, B to handle different coolants in a manner similar to those of the first cooling circuit 214A. In at least one embodiment, inlet lines 206A, B and outlet lines 208A, 208B are coupled to radiator 250 via diverters.
[0070] In at least one embodiment, the flow divider is a flow controller or includes a flow controller. In at least one embodiment, the flow controller is adapted to control the flow between the server tray 202 and the heatsink 250 via inlet lines 264 and 266 on the heatsink side and via inlet lines 206A, B and 208A, B on the server tray side. In at least one embodiment, the flow between the server tray 202 and the heatsink 250 is via a rack manifold, such as the rack manifold of FIG. 3. In at least one embodiment, the heatsink 250 has a first portion or region 260 having an auxiliary cooling circuit 262. In at least one embodiment, a common region outside region 260 is a second portion or region including heat sink 258. In at least one embodiment, any region of the heatsink may be referred to as the first region relative to the region of the heatsink having the auxiliary cooling circuit 262, and any region of the heatsink having the heat sink 258 may be referred to as the second region.
[0071] In at least one embodiment, the heat sink 258 in the second region functions as an air-to-liquid heat exchanger in the second region. In at least one embodiment, the main cooling circuit 256 passes through the second region and has an inlet line 252 and an outlet line 254 for the passage of main coolant. In at least one embodiment, the auxiliary cooling circuit 262 in the first region functions as a liquid-to-liquid heat exchanger in the first region. In at least one embodiment, the main coolant in the main cooling circuit passing through both the first and second regions can originate from a cooling facility outside the data center. In at least one embodiment, the main coolant in the main circuit 256 is adapted to absorb first heat from one or more racks via heat sinks and is adapted to absorb second heat from auxiliary cooling via engagement with auxiliary coolant. In at least one embodiment, this engagement can be via a conductive interface or a heat dissipation interface.
[0072] In at least one embodiment, the first region 260 may be adjacent to the inlet of the main cooling circuit entering the radiator 250. In at least one embodiment, this allows for efficient heat removal from the auxiliary coolant. In at least one embodiment, the main cooling circuit 256 traverses the first region 260 and then traverses a second region having multiple heat sinks of the radiator 250. In at least one embodiment, the auxiliary cooling circuit is located in the first region near the inlet of the first diversion of the main cooling circuit into the radiator. In at least one embodiment, a second diversion of the main cooling circuit traverses the second region having multiple heat sinks of the radiator. In at least one embodiment, this allows the main cooling circuit to have at least two parallel branches. In at least one embodiment, the parallel branches allow the main coolant to flow into the first and second regions simultaneously, such that efficient heat removal occurs simultaneously in both regions. In at least one embodiment, efficient heat removal is at least referenced to the temperature difference achievable between the main coolant and each of the first and second regions upon the entry of the main coolant. In at least one embodiment, when the main coolant acts in series to first absorb heat from the first region, its ability to absorb heat from the second region is reduced to the amount of heat remaining to be absorbed before the main coolant reaches saturation. In at least one embodiment, after saturation, the main coolant may not be able to effectively reduce heat in any area.
[0073] In at least one embodiment, the first heat is partly radiated from computing devices 220A-D in server tray 202. In at least one embodiment, heat is radiated away as a result of fans and / or heat sinks associated with devices 220A-D. In at least one embodiment, heat is radiated, for example, on the top cover of server tray 202, as a result of server fans associated with server tray 202. In at least one embodiment, heat is radiated due to wall fans within the server associated with the rack or due to wall fans mounted on the rack itself. In at least one embodiment, if located above the rack, heat radiates upward toward heat sink 202. In at least one embodiment, the heat sink has an opening 268 to allow heat to rise through heat sink 258. In at least one embodiment, heat sink 258 retains heat therein. In at least one embodiment, heat is absorbed by the main coolant through the main cooling circuit 256 as the main coolant flows through the heat sink 250 in a tortuous path.
[0074] In at least one embodiment, the radiator 250 has a second portion or region 260 serving as a liquid-to-liquid heat exchanger. In at least one embodiment, auxiliary coolant may enter directly or via a rack manifold into the auxiliary circuit 262 of the liquid-to-liquid heat exchanger formed by the auxiliary circuit 262 and the main circuit below (or above) the main cooling circuit 256. In at least one embodiment, a separate branch path of the main cooling circuit may serve the first and second regions independently and simultaneously. In at least one embodiment, the liquid-to-liquid heat exchanger in the second portion or region 260 enables at least one auxiliary cooling circuit 262 with auxiliary coolant to exchange a second heat with the main cooling circuit 256 (or a branch from the main cooling circuit 256). In at least one embodiment, the second heat is associated with at least one computing component 220A-D of one or more racks. In at least one embodiment, another server-level CDU may exchange heat between the first coolant associated with the cold plate 210A-D and the second coolant associated with the rack manifold, and the second coolant may be operatively used as an auxiliary coolant cooled in the radiator 250. In at least one embodiment, this arrangement is equivalent to guiding auxiliary coolant from a server tray (with or without rack manifold).
[0075] In at least one embodiment, the main cooling loop may branch into two streams. In at least one embodiment, a first stream of the main cooling loop may be used to absorb heat from heat sink 258, and a second stream of the main cooling loop may be used to absorb heat from auxiliary cooling loop 262. In at least one embodiment, the two streams make the initial temperatures of the main coolant reaching the respective first and second regions of the radiator 250 equal. In at least one embodiment, this configuration represents a parallel cooling configuration for the radiator 250. In at least one embodiment, when a single stream of the main cooling loop is implemented in the radiator 250, the main coolant first absorbs heat from auxiliary cooling loop 262 and then continues to absorb or extract heat from heat sink 258. In at least one embodiment, this configuration represents a series cooling configuration for the radiator 250. In at least one embodiment, the inlet line 252 and outlet line 254 of the main cooling loop 256 branch from a main main cooling loop associated with a cooler or cooling facility outside the data center hosting rack. In at least one embodiment, another branch of the main main cooling loop may be associated with a CDU.
[0076] In at least one embodiment, such as Figure 3A The rack-level feature 300 shown can be associated with a unified fanless radiator 320 of a data center cooling system. In at least one embodiment, the rack-level feature 300 shows a... Figure 2A side view of the radiator. In at least one embodiment, the curved path of the main cooling circuit 340 within the radiator 320 is diagonally oriented. In at least one embodiment, the main cooling circuit 340 within the radiator 320 is horizontally or vertically oriented, and may employ paths other than the curved path. In at least one embodiment, rack-level feature 300 includes a rack 302 having branches 304, 306 for one or more rack manifolds or rack cooling manifolds 314A, 314B. In at least one embodiment, rack manifolds 314A, 314B are shown as single manifolds for coolant inlet and outlet, but a single manifold on one of the branches 304, 306 and having multiple internal channels may be used to receive and outlet coolant from auxiliary cooling circuits. In at least one embodiment, the channels allow coolant to flow from a drain manifold to a server manifold or directly to a server tray, and allow coolant to flow from a server manifold or server tray to a drain manifold.
[0077] In at least one embodiment, rack manifolds 314A, 314B have branched inlets 310A, B and outlets 312A, B. In at least one embodiment, the branched inlets and outlets enable a local cooling loop from rack 302 to radiator 320 and back to rack 302; or a data center cooling loop from rack 302 to CDU and back to rack 302. In at least one embodiment, both the local cooling loop and the data center cooling loop can operate simultaneously using different levels of coolant directed to radiator 320 and CDU. In at least one embodiment, only one of the local cooling loop or the data center cooling loop can be operated at any given time.
[0078] In at least one embodiment, auxiliary coolant from the CDU may enter the inlet-side rack manifold 314A via a branched inlet 310A and inlet line 310. In at least one embodiment, an auxiliary cooling loop without a CDU may exist between the radiator 320 and the rack 302. In at least one embodiment, the auxiliary coolant continues to cool one or more computing devices 332 within the server tray 308 or cools the server tray 308 via line 320. In at least one embodiment, the auxiliary coolant exits via line 318 back to the outlet-side rack manifold 314. In at least one embodiment, these rack manifolds for entry and exit are on the same branch 304, 306. In at least one embodiment, the auxiliary coolant exits the rack manifold 314B via outlet line 312 and a branched outlet 312A.
[0079] In at least one embodiment, flow controllers 310C and 312C implement a local cooling loop between rack 302 and radiator 320. In at least one embodiment, flow controller 312C on the outlet side of rack manifold 314B diverts auxiliary coolant to auxiliary loop 338 of radiator 320. In at least one embodiment, flow controllers 310C and 312C may be at least one flow controller for diverting auxiliary coolant from a first path including CDU and associated with the main cooling loop to a second path associated with the radiator. In at least one embodiment, main coolant from the main cooling loop associated with the cooling facility enters radiator 320 via inlet line 322. In at least one embodiment, a portion of the main coolant flows through branch 326, while the remainder flows through another branch 340 of the main cooling loop. In at least one embodiment, this allows simultaneous cooling of heat sinks and auxiliary coolant in auxiliary cooling loop 338.
[0080] In at least one embodiment, the main cooling loop traverses a plurality of heat sinks 324 of the heat sink 320 horizontally, vertically, or diagonally. In at least one embodiment, the plurality of heat sinks 324 retain heat from one or more racks 302. In at least one embodiment, the main coolant absorbs a first heat from the rack 302, wherein the first heat is represented at least in part by the heat retained in the plurality of heat sinks 324 of the heat sink. In at least one embodiment, hot air 336 having heat from one or more computing devices 322 (via associated heat sinks and / or fans 334), server tray fans 330, or other rack-related fans can pass through the heat sinks 340 of the heat sink 320. In at least one embodiment, the heat sinks 340 retain heat from the hot air 336. In at least one embodiment, the plurality of heat sinks 324 are then adapted to allow the main cooling loop to coil through the plurality of heat sinks. In at least one embodiment, the heat sinks are in physical contact with the main cooling loop to achieve conductive heat transfer, or in the main cooling loop to achieve dissipative heat transfer. In at least one embodiment, a plurality of heat sink fins 324 of the heat sink 320 are aligned with the heat-generating features or surfaces 332 of one or more racks 302, so that it can effectively retain heat from the heat-generating surfaces. In at least one embodiment, effectiveness is achieved by allowing as much hot air 336 as possible to engage with the surfaces of the heat sink 324.
[0081] In at least one embodiment, the auxiliary cooling loop is adapted to provide supplemental or economical cooling concurrently with or separate from a second auxiliary cooling loop associated with a cooling distribution unit (CDU) of a data center cooling system. In at least one embodiment, the radiator 320 can be used in a mobile data center environment without access to high-capacity cooler facilities. However, in at least one embodiment, the radiator 320 can be used to support an existing data center cooling system for economical or supplemental cooling.
[0082] In at least one embodiment, the radiator can be engaged when it is determined that the maximum cooling capacity of the radiator 320 can address the cooling requirements or thermal characteristics. In at least one embodiment, the radiator 320 can be engaged by provided flow controllers 310C, 312C, enabling a local cooling loop having a path from rack 302 to radiator 320 and back to rack 302. In at least one embodiment, the radiator can be engaged when it is determined that the maximum cooling capacity of the radiator 320 can partially address the cooling requirements or thermal characteristics, but a CDU can also be engaged. In at least one embodiment, this represents economical cooling in a data center cooling system. In at least one embodiment, the radiator 320 can be engaged with a CDU via flow controllers 310C, 312C. In at least one embodiment, the result is a combined local and data center cooling loop having a first path for a portion of the auxiliary coolant from rack 302 to radiator 320 and back to rack 302; and a path for another portion of the auxiliary coolant from rack 302 to CDU (e.g., in...). Figure 4 (middle) and return to the second path of rack 302. In at least one embodiment, this represents supplemental cooling in the data center cooling system.
[0083] In at least one embodiment, flow controllers 310C and 312C are associated with at least one auxiliary cooling circuit to control supplemental or economical cooling by allowing additional coolant from the second auxiliary cooling circuit to flow in parallel or separately from the auxiliary coolant, either in combination with or separately from the second coolant. In at least one embodiment, the heat sink may be located above or below rack 302. In at least one embodiment, the main cooling circuit may be able to traverse above one or more racks or alongside a manifold associated with the auxiliary cooling circuit. In at least one embodiment, when the heat sink 320 is located below rack 302, fans associated with the computing device, server, and rack may be adapted to draw in hot air from the lower part of the rack and below.
[0084] In at least one embodiment, such as Figure 3B The liquid-to-liquid heat exchanger shown, feature 350, can be associated with a unified fanless radiator 320 in a data center cooling system. In at least one embodiment, having Figure 3BThe liquid-to-liquid heat exchanger shown in feature 350 can be used for Figure 2 In an alternative to heat exchanger 260, feature 350 represents a plate-type liquid-liquid heat exchanger. In at least one embodiment, feature 350 includes a front cover 352 having a gasket or seal on its periphery, and a rear cover 354 having a corresponding gasket or seal on its periphery. In at least one embodiment, plates 356 and 358 are present to distribute primary or secondary coolant via an enabled portion 370 of each respective plate.
[0085] In at least one embodiment, covers 352, 354 and plates 356, 358 include through ports that may be self-sealing ports. In at least one embodiment, primary coolant flows into the heat exchanger through port 362A, then flows via passage 364A, via passage 364B, and exits the heat exchanger via port 362B. In at least one embodiment, secondary coolant flows into the heat exchanger through port 360A, then flows via path 366A, path 366B, and exits the heat exchanger via port 360B. In at least one embodiment, alternative plates in plates 356, 358 are used for either primary or secondary coolant. In at least one embodiment, plate 356 has an activation portion 370 that guides primary coolant from path 364A to the activation portion of the corresponding plate. In at least one embodiment, a gasket or seal 372 prevents primary coolant from penetrating into the paths 366A, 366B of the secondary coolant. In at least one embodiment, the activation portion 370 directs the main coolant from the activation portion to path 364B for exiting the heat exchanger. In at least one embodiment, plate 358 has a similar activation portion that directs auxiliary coolant from path 366A for receiving auxiliary coolant to path 366B for auxiliary coolant outflow. In at least one embodiment, plate 358 for auxiliary coolant has corresponding seals or gaskets to prevent the auxiliary coolant from mixing with the main coolant.
[0086] In at least one embodiment, the activation portion 370 diffuses the primary or secondary coolant to as much surface area as possible within the respective plate. In at least one embodiment, the plates 356 for the primary coolant sandwich the plate 358 for the secondary coolant. In at least one embodiment, the surfaces of the plates are capable of indirectly dissipating heat through the primary coolant or conducting heat to cool the secondary coolant. In at least one embodiment, the branched outlets of the rack manifold reach inlet port 360A and exit outlet port 360B for recirculation within the rack or additional cooling in the CDU. In at least one embodiment, the primary coolant enters at inlet port 362A and exits at outlet port 362B. In at least one embodiment, the respective coolants are distributed in the respective activation portions of the respective plates, providing a large surface area between the plates for liquid-to-liquid heat exchange. In at least one embodiment, the primary coolant exiting outlet port 362B is directed to an air-to-liquid heat exchanger to receive further heat from the heat sink 324. In at least one embodiment, the main coolant exiting the outlet port 362B is directed from the radiator to the cooling facility or is directed back to the main cooling circuit, where the main coolant mixes with the main coolant previously branched from the main cooling circuit to the air-liquid heat exchanger and returns to the mixing point in the main cooling circuit.
[0087] In at least one embodiment, such as Figure 4 The data center-level feature 400 shown may be associated with a unified fanless radiator 420 of a data center cooling system. In at least one embodiment, the main cooling loop 422 is branched 422A, 422B to the CDU 406 and the radiator 420. In at least one embodiment, any portion of the cooling loop in the data center that includes piping for the main coolant cooled in the cooler facility 408 is referred to as the main cooling loop, and is used as at least a part or component of the main cooling loop, unless otherwise stated. In at least one embodiment, the radiator 420 is used without the CDU 406. In at least one embodiment, the main cooling loop 422 thus circulates only the main coolant to the radiator 420 and back to the cooler facility 408. In at least one embodiment, these branches from the main cooling loop 422 are still referred to herein as the main cooling loop, either individually or collectively.
[0088] In at least one embodiment, the main cooling circuits 422, 422A use an air-to-liquid heat exchanger 420A to absorb first heat from one or more racks 404 using a main coolant. In at least one embodiment, the main coolant is also used to exchange second heat from an auxiliary coolant in an auxiliary cooling circuit using a liquid-to-liquid heat exchanger 420B. In at least one embodiment, the main cooling circuit is associated with a cooler facility 408, and the auxiliary cooling circuit is associated with at least one computing device in one or more racks 404, such as... Figure 2 As shown in Figure 3.
[0089] In at least one embodiment, the auxiliary cooling circuit refers to all piping carrying auxiliary coolant or any coolant different from the main coolant. In at least one embodiment, there may be multiple heat exchangers and other components isolating the auxiliary coolant to provide multiple smaller cooling circuits. In at least one embodiment, these multiple smaller cooling circuits are still referred to herein as auxiliary cooling circuits, either individually or all together.
[0090] In at least one embodiment, the main cooling circuit traverses multiple fins of a radiator 420 having an air-to-liquid heat exchanger 420A and a liquid-to-liquid heat exchanger 420B. In at least one embodiment, the multiple fins retain heat from at least one computing device, and the main coolant absorbs initial heat from one or more racks from the retained heat of the multiple fins. In at least one embodiment, auxiliary coolant may be connected from the CDU 406 via an inlet line 412 and an outlet line 414 between the CDU 406 and the manifold 410. In at least one embodiment, a flow controller 424 allows the CDU 406 to be connected to or disconnected from a data center cooling system, which may then rely solely on the radiator 420.
[0091] In at least one embodiment, the flow between the manifold 410 and the rack 404 is supported by corresponding inlet lines 416 and outlet lines 418. In at least one embodiment, a flow controller 424 may be used with a flow controller on each of the inlet lines 416 and outlet lines 418, and all flow controllers associated with the rack manifold may be used to control supplemental or economical cooling in the data center cooling system. In at least one embodiment, some of these flow controllers may be removed from or disconnected from the auxiliary coolant in the CDU 406, such that the rack 404 is engaged only with the radiator 420. In at least one embodiment, if the CDU 406 is re-engaged, some of these flow controllers may be used with the flow controllers associated with the rack manifold to achieve co-current or separate flow of the supplemental coolant, such that both the radiator 420 and the CDU 406 are available simultaneously. In at least one embodiment, the supplemental coolant may be an auxiliary coolant similar to the auxiliary coolant used in the rack 404 with the radiator 420. In at least one embodiment, the auxiliary cooling circuit with CDU may be referred to as the second auxiliary cooling circuit, the auxiliary cooling circuit of the radiator forms the first auxiliary cooling circuit, and the additional coolant may flow in combination with or separately from the second coolant.
[0092] In at least one embodiment, at least one processor can be with Figure 2-4Each of the respective flow controllers discussed herein engages or disengages from the respective loops, CDUs, radiators, and branch lines for the main cooling loop and for one or more auxiliary cooling loops. In at least one embodiment, the electrical components of the flow controller may receive signals from at least one processor and may cause a mechanical response to throttle or increase the flow rate of coolant through the respective loops, from CDU to CDU, from radiator to radiator, and through the branch lines for the main cooling loop and for the auxiliary cooling loops.
[0093] In at least one embodiment, each of the at least one processor has inference and / or training logic 1815, code and / or data storage 1801 for storing forward and / or output weights and / or input / output data, and / or other parameters for configuring neurons or layers of a neural network trained and / or used for inference in aspects of one or more embodiments. In at least one embodiment, training logic 1815 may include or be coupled to code and / or data storage 1801 to store graph code or other software to control timing and / or sequence, wherein weight and / or other parameter information is loaded to configure the logic, including integer and / or floating-point units (collectively referred to as arithmetic logic units (ALUs)). In at least one embodiment, code (such as graph code) loads weight or other parameter information into the processor ALU based on the architecture of the neural network to which such code corresponds. In at least one embodiment, code and / or data storage 1801 stores weight parameters and / or input / output data for each layer of a neural network that is trained or used in conjunction with one or more embodiments during forward propagation of input / output data and / or weight parameters during training and / or inference using aspects of one or more embodiments. In at least one embodiment, any portion of the code and / or data storage 1801 may be included together with other on-chip or off-chip data storage devices, including the processor's L1, L2, or L3 cache memory or system memory.
[0094] In at least one embodiment, the inference and / or training logic 1815 of at least one processor is part of a building management system (BMS) for controlling flow controllers at one or more locations at the server, rack, and row levels. In at least one embodiment, determining supplemental or economical cooling (or both); or engaging an air-to-liquid heat exchanger or a liquid-to-liquid heat exchanger (or both) may be provided to one or more neural networks of the inference and / or training logic 1815 to enable the neural networks to infer which flow controllers should smoothly engage or disengage. In at least one embodiment, the neural networks may be trained to make inferences based on previously associated thermal characteristics or cooling requirements from computing devices, servers, or racks, and the cooling capacity or capability indicated by radiators in each and both of their heat exchangers. In at least one embodiment, previous cooling requirements satisfied by the air-to-liquid heat exchanger may enable the neural networks to make similar inferences about future similar cooling requirements to be satisfied (considering minor variations therefrom) by adjusting the flow controllers to engage the air-to-liquid heat exchanger. In at least one embodiment, similarly, prior cooling requirements satisfied by a liquid-to-liquid heat exchanger can enable one or more neural networks to make similar inferences about future similar cooling requirements (considering minor variations therefrom) by adjusting flow controllers to engage liquid-to-liquid heat exchangers. In at least one embodiment, one or more neural networks can determine and send selections to flow controllers (such as to electrical components associated with the flow controllers) to engage or disengage appropriate heat exchangers.
[0095] In at least one embodiment, Figure 5 The text shows the relationship with... Figures 2 to 4 The method 500 is associated with a data center cooling system. In at least one embodiment, step 502 provides a heat sink associated with one or more racks in the data center. In at least one embodiment, a further step 504 of the method enables a first portion of the heat sink to function as an air-to-liquid heat exchanger. In at least one embodiment, step 506 of the method enables a second portion of the heat sink to function as a liquid-to-liquid heat exchanger. In at least one embodiment, step 508 determines that the current thermal characteristics of one or more racks require supplemental or economical cooling.
[0096] In at least one embodiment, step 510 may be performed when step 508 results in a positive determination. In at least one embodiment, the heat exchanger may be enabled via steps 504 and 506. In at least one embodiment, step 510 enables the main cooling circuit using the main coolant to draw away first heat from one or more racks. Further, in at least one embodiment, step 512 enables at least one auxiliary cooling circuit with an auxiliary coolant to exchange second heat with the main cooling circuit, the second heat being associated with at least one computing component in one or more racks.
[0097] In at least one embodiment, another step or sub-step of method 500 includes providing a branch for auxiliary coolant using at least one flow controller, the branch being designed to return to a cooling distribution unit (CDU) associated with the main cooling circuit, such that the auxiliary coolant flows to the radiator. In at least one embodiment, another step or sub-step of method 500 includes enabling the main cooling circuit to traverse a plurality of heat sinks of the radiator. In at least one embodiment, the plurality of heat sinks retain heat from one or more racks. In at least one embodiment, the main coolant is adapted to absorb first heat from the heat retained in the plurality of heat sinks of the radiator. In at least one embodiment, another step or sub-step of method 500 includes positioning at least one auxiliary cooling circuit in a first region near the inlet of the main cooling circuit into the radiator. In at least one embodiment, the main cooling circuit is adapted to traverse the first region and then traverse the second region having the plurality of heat sinks of the radiator. In at least one embodiment, another step or sub-step of method 500 includes positioning at least one auxiliary cooling circuit in a first region near the inlet of a first branch of the main cooling circuit into the radiator. In at least one embodiment, a second branch of the main cooling circuit is provided to traverse the second region having the plurality of heat sinks of the radiator.
[0098] In at least one embodiment, a further step or sub-step of method 500 includes implementing a plurality of heat sinks for a heat sink. In at least one embodiment, the plurality of heat sinks retain heat from fans associated with computing devices and server trays of one or more racks. In at least one embodiment, the plurality of heat sinks are adapted to allow a main cooling loop to coil around the plurality of heat sinks. In at least one embodiment, a further step or sub-step of method 500 includes aligning the plurality of heat sinks of the heat sink with the heat-generating surfaces of one or more racks and enabling the plurality of heat sinks to retain heat from the heat-generating surfaces. In at least one embodiment, a further step or sub-step of method 500 includes providing at least one auxiliary cooling loop as supplemental or economical cooling, said supplemental or economical cooling operating concurrently or separately with a second auxiliary cooling loop associated with a cooling distribution unit (CDU) of a data center cooling system. In at least one embodiment, a further step or sub-step of method 500 includes controlling the supplemental or economical cooling using a flow controller associated with the at least one auxiliary cooling loop by enabling a co-current or separate flow of additional coolant from the second auxiliary cooling loop to flow alongside the auxiliary coolant, either in combination with or separately from the auxiliary coolant. In at least one embodiment, another step or sub-step of method 500 includes positioning a radiator having an air-to-liquid heat exchanger and a liquid-to-liquid heat exchanger above or below one or more racks. In at least one embodiment, the main cooling circuit may be able to cross above one or more racks or alongside a manifold associated with an auxiliary cooling circuit.
[0099] Servers and data centers
[0100] The following figures illustrate, but are not limited to, systems based on exemplary network servers and data centers that can be used to implement at least one embodiment.
[0101] Figure 6 A distributed system 600 according to at least one embodiment is illustrated. In at least one embodiment, the distributed system 600 includes one or more client computing devices 602, 604, 606, and 608 configured to execute and operate client applications, such as web browsers, proprietary clients, and / or variations thereof, on one or more networks 610. In at least one embodiment, a server 612 may be communicatively coupled to remote client computing devices 602, 604, 606, and 608 via network 610.
[0102] In at least one embodiment, server 612 may be adapted to run one or more services or software applications, such as services and applications that manage session activity for single sign-on (SSO) access across multiple data centers. In at least one embodiment, server 612 may also provide other services, or the software applications may include non-virtual and virtual environments. In at least one embodiment, these services may be provided as web-based services or cloud services or under a Software as a Service (SaaS) model to users of client computing devices 602, 604, 606, and / or 608. In at least one embodiment, users operating client computing devices 602, 604, 606, and / or 608 may in turn use one or more client applications to interact with server 612 to utilize the services provided by these components.
[0103] In at least one embodiment, software components 618, 620, and 622 of system 600 are implemented on server 612. In at least one embodiment, one or more components of system 600 and / or the services provided by these components may also be implemented by one or more client computing devices 602, 604, 606, and / or 608. In at least one embodiment, a user operating a client computing device can then utilize one or more client applications to use the services provided by these components. In at least one embodiment, these components may be implemented using hardware, firmware, software, or a combination thereof. It should be understood that various different system configurations are possible and may differ from the distributed system 600. Therefore, Figure 6 The embodiments shown are at least one embodiment of a distributed system for implementing the system of the embodiments, and are not intended to be limiting.
[0104] In at least one embodiment, client computing devices 602, 604, 606, and / or 608 may include different types of computing systems. In at least one embodiment, the client computing device may include a portable handheld device (e.g., Cellular phone Computing tablets, personal digital assistants (PDAs), or wearable devices (e.g., Google) Head-mounted display), running software (such as Microsoft Windows) The device may run various mobile operating systems (such as iOS), including Windows Phone, Android, BlackBerry 10, Palm OS, and / or variants thereof. In at least one embodiment, the device may support different applications, such as various internet-related applications, email, short message service (SMS) applications, and may use various other communication protocols. In at least one embodiment, the client computing device may also include a general-purpose personal computer, which, by way of at least one embodiment, includes a computer running different versions of Microsoft... Apple Personal computers and / or laptops running Linux operating systems.
[0105] In at least one embodiment, the client computing device can be running a variety of commercially available operating systems. Or a workstation computer running any of the UNIX-like operating systems, including but not limited to various GNU / Linux operating systems such as Google Chrome OS. In at least one embodiment, the client computing device may further include electronic devices capable of communicating via one or more networks 610, such as thin client computers, internet-enabled gaming systems (e.g., with or without...). Gesture input devices include Microsoft Xbox game consoles and / or personal messaging devices. Despite Figure 6 The distributed system 600 is shown as having four client computing devices, but can support any number of client computing devices. Other devices (such as devices with sensors) can interact with the server 612.
[0106] In at least one embodiment, network 610 in distributed system 600 can be any type of network capable of supporting data communication using any of the various available protocols, including but not limited to TCP / IP (Transmission Control Protocol / Internet Protocol), SNA (System Network Architecture), IPX (Internet Packet Switching), AppleTalk, and / or variations thereof. In at least one embodiment, network 610 can be a local area network (LAN), a network based on Ethernet, Token Ring, wide area network, the Internet, virtual network, virtual private network (VPN), intranet, extranet, public switched telephone network (PSTN), infrared network, wireless network (e.g., in the IEEE 802.11 protocol suite), Networks operating under any of the wireless protocols (and / or any other wireless protocols), and / or any combination of these and / or other networks.
[0107] In at least one embodiment, server 612 may consist of one or more general-purpose computers, dedicated server computers (including PC (personal computer) servers in at least one embodiment), Servers can be configured as servers, mid-range servers, mainframe computers, rack servers, server farms, server clusters, or any other suitable arrangement and / or combination. In at least one embodiment, server 612 may include one or more virtual machines running a virtual operating system or other computing architectures involving virtualization. In at least one embodiment, one or more flexible pools of logical storage devices may be virtualized to maintain virtual storage devices for the server. In at least one embodiment, the virtual network may be controlled by server 612 using software-defined networking. In at least one embodiment, server 612 may be adapted to run one or more services or software applications.
[0108] In at least one embodiment, server 612 can run any operating system, and any commercially available server operating system. In at least one embodiment, server 612 can also run any of a variety of additional server applications and / or mid-level applications, including an HTTP (Hypertext Transfer Protocol) server, an FTP (File Transfer Protocol) server, a CGI (Common Gateway Interface) server, etc. Servers, database servers, and / or variations thereof. In at least one embodiment, exemplary database servers include, but are not limited to, those commercially available from Oracle, Microsoft, Sybase, IBM (International Business Machines), and / or variations thereof.
[0109] In at least one embodiment, server 612 may include one or more applications to analyze and merge data feeds and / or event updates received from users of client computing devices 602, 604, 606, and 608. In at least one embodiment, data feeds and / or event updates may include, but are not limited to, data received from one or more third-party information sources and continuous data streams. feed, Updates or real-time updates may include real-time events related to sensor data applications, financial quotes, network performance measurement tools (e.g., network monitoring and business management applications), clickstream analysis tools, vehicle traffic monitoring, and / or their changes. In at least one embodiment, server 612 may also include one or more applications for displaying data feeds and / or real-time events via one or more display devices of client computing devices 602, 604, 606, and 608.
[0110] In at least one embodiment, the distributed system 600 may further include one or more databases 614 and 616. In at least one embodiment, the databases may provide mechanisms for storing information such as user interaction information, usage pattern information, adaptation rule information, and other information. In at least one embodiment, databases 614 and 616 may reside in various locations. In at least one embodiment, one or more of databases 614 and 616 may reside on non-transient storage media local to (and / or within) server 612. In at least one embodiment, databases 614 and 616 may be located remotely from server 612 and communicate with server 612 via a network-based or dedicated connection. In at least one embodiment, databases 614 and 616 may reside in a storage area network (SAN). In at least one embodiment, any necessary files for performing functions belonging to server 612 may be stored locally on server 612 and / or remotely as appropriate. In at least one embodiment, databases 614 and 616 may include relational databases, such as databases adapted to store, update, and retrieve data in response to SQL-formatted commands.
[0111] Figure 7 An exemplary data center 700 according to at least one embodiment is shown. In at least one embodiment, the data center 700 includes, but is not limited to, a data center infrastructure layer 710, a framework layer 720, a software layer 730, and an application layer 740.
[0112] In at least one embodiment, such as Figure 7 As shown, the data center infrastructure layer 710 may include a resource coordinator 712, grouped computing resources 714, and node computing resources (“nodes CR”) 716(1)-716(N), where “N” represents any complete positive integer. In at least one embodiment, nodes CR 716(1)-716(N) may include, but are not limited to, any number of central processing units (“CPUs”) or other processors (including accelerators, field-programmable gate arrays (“FPGAs”), graphics processors, etc.), memory devices (e.g., dynamic read-only memory), storage devices (e.g., solid-state drives or disk drives), network input / output (“NW I / O”) devices, network switches, virtual machines (“VMs”), power modules, and cooling modules, etc. In at least one embodiment, one or more nodes CR 716(1)-716(N) may be servers having one or more of the aforementioned computing resources.
[0113] In at least one embodiment, the grouped computing resources 714 may include individual groups (not shown) of node CRs housed in one or more racks, or a plurality of racks (also not shown) housed in data centers in various geographic locations. The individual groups of node CRs within the grouped computing resources 714 may include computing, networking, memory, or storage resources that can be configured or allocated to support groups of one or more workloads. In at least one embodiment, several node CRs, including CPUs or processors, may be grouped within one or more racks to provide computing resources to support one or more workloads. In at least one embodiment, the one or more racks may also include any number of power modules, cooling modules, and network switches, in any combination.
[0114] In at least one embodiment, resource coordinator 712 may configure or otherwise control one or more nodes CR716(1)-716(N) and / or grouped computing resources 714. In at least one embodiment, resource coordinator 712 may include a Software Design Infrastructure (“SDI”) management entity for data center 700. In at least one embodiment, resource coordinator 712 may include hardware, software, or some combination thereof.
[0115] In at least one embodiment, such as Figure 7As shown, the framework layer 720 includes, but is not limited to, a job scheduler 732, a configuration manager 734, a resource manager 736, and a distributed file system 738. In at least one embodiment, the framework layer 720 may include a framework of software 752 supporting the software layer 730 and / or one or more applications 742 of the application layer 740. In at least one embodiment, the software 752 or application 742 may respectively include web-based service software or applications, such as services or applications provided by Amazon Web Services, Google Cloud, and Microsoft Azure. In at least one embodiment, the framework layer 720 may be, but is not limited to, a free and open-source software web application framework, such as Apache Spark™ (hereinafter referred to as "Spark") which can utilize the distributed file system 738 for large-scale data processing (e.g., "big data"). In at least one embodiment, the job scheduler 732 may include a Spark driver to facilitate the scheduling of workloads supported by the various layers of the data center 700. In at least one embodiment, the configuration manager 734 may be able to configure different layers, such as the software layer 730 and the framework layer 720 including Spark and the distributed file system 738 for supporting large-scale data processing. In at least one embodiment, resource manager 736 is capable of managing cluster or group computing resources mapped to or allocated to support distributed file system 738 and job scheduler 732. In at least one embodiment, cluster or group computing resources may include group computing resources 714 on data center infrastructure layer 710. In at least one embodiment, resource manager 736 may coordinate with resource coordinator 712 to manage these mapped or allocated computing resources.
[0116] In at least one embodiment, the software 752 included in the software layer 730 may include software used by at least a portion of nodes CR716(1)-716(N), grouped computing resources 714, and / or the distributed file system 738 of the framework layer 720. One or more types of software may include, but are not limited to, Internet web page search software, email virus scanning software, database software, and streaming video content software.
[0117] In at least one embodiment, one or more applications 742 included in the application layer 740 may include one or more types of applications used by at least a portion of nodes CR716(1)-716(N), grouped computing resources 714, and / or the distributed file system 738 of the framework layer 720. One or more types of applications may include, but are not limited to, CUDA applications, 5G network applications, artificial intelligence applications, data center applications, and / or variations thereof.
[0118] In at least one embodiment, any of the configuration manager 734, resource manager 736, and resource coordinator 712 can implement any number and type of self-modification actions based on any amount and type of data acquired in any technically feasible manner. In at least one embodiment, self-modification actions can mitigate potentially poor configuration decisions by data center operators of data center 700 and can prevent underutilization and / or poor performance of the data center.
[0119] Figure 8 A client-server network 804, formed by a plurality of interconnected network server computers 802, is illustrated according to at least one embodiment. In at least one embodiment, each network server computer 802 stores data accessible to other network server computers 802 and client computers 806 and networks 808 linked to the wide area network 804. In at least one embodiment, the configuration of the client-server network 804 may change over time when client computers 806 and one or more networks 808 connect and disconnect from the network 804, and when one or more backbone server computers 802 are added or removed from the network 804. In at least one embodiment, the client-server network includes client computers 806 and networks 808 when they are connected to network server computers 802. In at least one embodiment, the term "computer" includes any device or machine capable of accepting data, applying prescribed processes to the data, and providing the results of those processes.
[0120] In at least one embodiment, the client-server network 804 stores information accessible to the network server computer 802, the remote network 808, and the client computer 806. In at least one embodiment, the network server computer 802 is formed from a mainframe computer, a small computer, and / or a microcomputer, each having one or more processors. In at least one embodiment, the server computer 802 is linked together via wired and / or wireless transmission media (such as conductive wires, fiber optic cables) and / or microwave transmission media, satellite transmission media, or other conductive, optical, or electromagnetic wave transmission media. In at least one embodiment, the client computer 806 accesses the network server computer 802 via a similar wired or wireless transfer medium. In at least one embodiment, the client computer 806 can be linked to the client-server network 804 using a modem and standard telephone communication networks. In at least one embodiment, alternative carrier systems (such as cable and satellite communication systems) can also be used to link to the client-server network 804. In at least one embodiment, other private or time-sharing carrier systems can be used. In at least one embodiment, the network 804 is a global information network, such as the Internet. In at least one embodiment, the network is a private intranet using protocols similar to the Internet but with added security measures and restricted access controls. In at least one embodiment, network 804 is a private or semi-private network using proprietary communication protocols.
[0121] In at least one embodiment, the client computer 806 is any end-user computer, and may also be a mainframe computer, minicomputer, or microcomputer with one or more microprocessors. In at least one embodiment, the server computer 802 may sometimes be used as a client computer to access another server computer 802. In at least one embodiment, the remote network 808 may be a local area network (LAN), a network added to a wide area network via an independent service provider (ISP) for the Internet, or another group of computers interconnected via wired or wireless transmission media with fixed or time-varying configurations. In at least one embodiment, the client computer 806 may independently or via a link to and access network 804.
[0122] Figure 9A computer network 908 connecting one or more computers is illustrated according to at least one embodiment. In at least one embodiment, network 908 can be any type of electrically connected computer group, including, for example, the Internet, intranet, local area network (LAN), wide area network (WAN), or an interconnection combination of these network types. In at least one embodiment, the connection within network 908 can be a remote modem, Ethernet (IEEE 802.3), Token Ring (IEEE 802.5), Fiber Distributed Data Link Interface (FDDI), Asynchronous Transfer Mode (ATM), or any other communication protocol. In at least one embodiment, the computing devices linked to the network can be desktop computers, servers, portable, handheld, set-top boxes, personal digital assistants (PDAs), terminals, or any other desired type or configuration. In at least one embodiment, depending on their functionality, network-connected devices can vary widely in terms of processing power, internal memory, and other performance characteristics.
[0123] In at least one embodiment, communication within the network, as well as communication to or from computing devices connected to the network, can be wired or wireless. In at least one embodiment, network 908 may at least partially comprise the worldwide public Internet, which typically connects multiple users according to the Transmission Control Protocol / Internet Protocol (TCP / IP) specification based on a client-server model. In at least one embodiment, a client-server network is the dominant model for communication between two computers. In at least one embodiment, a client computer (“client”) issues one or more commands to a server computer (“server”). In at least one embodiment, the server fulfills client commands by accessing available network resources and returning information to the client in accordance with the client commands. In at least one embodiment, client computer systems and network resources residing on the network server are assigned network addresses for identification during communication between network elements. In at least one embodiment, communication from other network-connected systems to the server will include the network address of the relevant server / network resource as part of the communication, such that the appropriate destination of the data / request is identified as the recipient. In at least one embodiment, when network 908 includes the global Internet, the network address is an IP address in TCP / IP format, which can at least partially route data to email accounts, websites, or other Internet tools residing on the server. In at least one embodiment, information and services residing on the web server can be made available to the web browser of the client computer by mapping a domain name (e.g., www.site.com) to the IP address of the web server.
[0124] In at least one embodiment, multiple clients 902, 904, and 906 are connected to network 908 via corresponding communication links. In at least one embodiment, each of these clients can access network 908 via any desired form of communication, such as via dial-up modem connection, cable link, digital subscriber line (DSL), wireless or satellite link, or any other form of communication. In at least one embodiment, each client can communicate using any machine compatible with network 908 (e.g., personal computer (PC), workstation, dedicated terminal, personal data assistant (PDA), or other similar device). In at least one embodiment, clients 902, 904, and 906 may or may not be located in the same geographical area.
[0125] In at least one embodiment, multiple servers 910, 912, and 914 are connected to network 918 to serve clients communicating with network 918. In at least one embodiment, each server is typically a powerful computer or device that manages network resources and responds to client commands. In at least one embodiment, the server includes computer-readable data storage media, such as hard disk drives and RAM, that stores program instructions and data. In at least one embodiment, servers 910, 912, and 914 run applications that respond to client commands. In at least one embodiment, server 910 may run a web server application for responding to client requests for HTML pages and may also run a mail server application for receiving and routing emails. In at least one embodiment, other applications, such as an FTP server or media server for streaming audio / video data to clients, may also run on server 910. In at least one embodiment, different servers may be dedicated to performing different tasks. In at least one embodiment, server 910 may be a dedicated web server for managing website-related resources for different users, while server 912 may be dedicated to providing email management. In at least one embodiment, the other servers may be dedicated to a combination of two or more services typically available or provided over a network, such as media (audio, video, etc.), File Transfer Protocol (FTP), or other services. In at least one embodiment, each server may be located in the same or different location as the other servers. In at least one embodiment, multiple servers may exist to perform mirroring tasks for users, thereby mitigating congestion or minimizing traffic directed to and from a single server. In at least one embodiment, servers 910, 912, and 914 are under the control of a web hosting provider that maintains and delivers third-party content over network 918.
[0126] In at least one embodiment, the web hosting provider delivers services to two different types of clients. In at least one embodiment, one type, which may be referred to as a browser, requests content such as web pages, email messages, video clips, etc., from servers 910, 912, 914. In at least one embodiment, a second type (which may be referred to as a user) hires the web hosting provider to maintain network resources (such as websites) and make them available to the browser. In at least one embodiment, the user contracts with the web hosting provider to make memory space, processor capacity, and communication bandwidth available to the network resources they desire, according to the amount of server resources the user expects to utilize.
[0127] In at least one embodiment, in order for a web hosting provider to serve both clients, the application managing network resources hosted on the server must be properly configured. In at least one embodiment, the program configuration process involves defining a set of parameters that at least partially control the application's response to browser requests and also at least partially define the server resources available to a particular user.
[0128] In one embodiment, intranet server 916 communicates with network 908 via a communication link. In at least one embodiment, intranet server 916 communicates with server manager 918. In at least one embodiment, server manager 918 includes a database of application configuration parameters used by servers 910, 912, and 914. In at least one embodiment, a user modifies database 920 via intranet 916, and server manager 918 interacts with servers 910, 912, and 914 to modify application parameters such that they match the contents of the database. In at least one embodiment, a user logs into intranet server 916 by connecting to intranet 916 via computer 902 and entering authentication information such as username and password.
[0129] In at least one embodiment, when a user wishes to register for a new service or modify an existing service, the intranet server 916 authenticates the user and provides the user with an interactive screen display / control panel that allows the user access to configuration parameters for a specific application. In at least one embodiment, multiple modifiable text boxes describing aspects of the user's website or other network resources are presented to the user. In at least one embodiment, if the user desires to increase the storage space reserved for their website on the server, a field is provided where the user specifies the desired storage space. In at least one embodiment, in response to receiving this information, the intranet server 916 updates the database 920. In at least one embodiment, the server manager 918 forwards the information to the appropriate server and uses the new parameters during application operation. In at least one embodiment, the intranet server 916 is configured to provide the user with access to configuration parameters of network resources (e.g., web pages, email, FTP sites, media sites, etc.) that the user has contracted with a web hosting service provider.
[0130] Figure 10A A networked computer system 1000A according to at least one embodiment is illustrated. In at least one embodiment, the networked computer system 1000A includes a plurality of nodes or personal computers (“PCs”) 1002, 1018, 1020. In at least one embodiment, the personal computer or node 1002 includes a processor 1014, memory 1016, a camera 1004, a microphone 1006, a mouse 1008, a speaker 1010, and a monitor 1012. In at least one embodiment, PCs 1002, 1018, 1020 may each run one or more desktop servers, such as those on an internal network within a given company, or may be servers on a general network not limited to a specific environment. In at least one embodiment, each PC node in the network has a server, such that each PC node in the network represents a specific network server with a specific network URL address. In at least one embodiment, each server defaults to a default webpage for the user of that server, and the default webpage itself may contain embedded URLs pointing to further subpages for that user on that server, or to pages on other servers on the network or on other servers.
[0131] In at least one embodiment, nodes 1002, 1018, 1020 and other nodes of the network are interconnected via medium 1022. In at least one embodiment, medium 1022 may be a communication channel such as Integrated Services Digital Network (“ISDN”). In at least one embodiment, different nodes of the networked computer system may be connected via different communication media, including local area networks (“LANs”), simple old-fashioned telephone lines (“POTS”), sometimes referred to as the Public Switched Telephone Network (“PSTN”), and / or variations thereof. In at least one embodiment, different nodes of the network may also constitute users of computer systems interconnected via a network such as the Internet. In at least one embodiment, each server on the network (running from a specific node of the network at a given instance) has a unique address or identifier within the network, which may be specified according to a URL.
[0132] In at least one embodiment, multiple multipoint conferencing units (“MCUs”) can therefore be used to transmit data to and from various nodes or “endpoints” of the conferencing system. In at least one embodiment, nodes and / or MCUs can be interconnected via ISDN links or via a local area network (“LAN”), in addition to various other communication media (such as nodes connected via the Internet). In at least one embodiment, nodes of the conferencing system can typically be directly connected to a communication medium (such as a LAN) or via an MCU, and the conferencing system may include other nodes or components, such as routers, servers, and / or variations thereof.
[0133] In at least one embodiment, processor 1014 is a general-purpose programmable processor. In at least one embodiment, the processor of a node in the networked computer system 1000A may also be a dedicated video processor. In at least one embodiment, the different peripheral devices and components of a node (such as those of node 1002) may differ from those of other nodes. In at least one embodiment, nodes 1018 and 1020 may be configured to be the same as or different from node 1002. In at least one embodiment, the node may be implemented on any suitable computer system other than a PC system.
[0134] Figure 10BA networked computer system 1000B according to at least one embodiment is illustrated. In at least one embodiment, system 1000B illustrates a network (such as LAN 1024) that can be used to interconnect various nodes that can communicate with each other. In at least one embodiment, multiple nodes, such as PC nodes 1026, 1028, and 1030, are attached to LAN 1024. In at least one embodiment, nodes may also be connected to the LAN via a network server or other means. In at least one embodiment, system 1000B includes other types of nodes or elements; at least one embodiment includes a router, a server, and nodes.
[0135] Figure 10C A networked computer system 1000C is illustrated according to at least one embodiment. In at least one embodiment, system 1000C illustrates a WWW system with communication across a backbone communication network (such as the Internet 1032), the backbone communication network being usable for various nodes interconnecting the network. In at least one embodiment, the WWW is a set of protocols operating over the Internet and allows a graphical interface system to operate on it to access information via the Internet. In at least one embodiment, the Internet 1032 attached to the WWW consists of multiple nodes, such as PCs 1040, 1042, and 1044. In at least one embodiment, nodes interface with other nodes of the WWW via WWW HTTP servers (such as servers 1034 and 1036). In at least one embodiment, PC 1044 may be a PC forming a node of network 1032, and PC 1044 itself runs its server 1036, although for illustrative purposes... Figure 10C PC1044 and server 1036 are shown separately.
[0136] In at least one embodiment, the WWW is a distributed type of application characterized by WWW HTTP, the WWW protocol, which runs on top of the Internet's Transmission Control Protocol / Internet Protocol (“TCP / IP”). In at least one embodiment, the WWW can therefore be characterized by a set of protocols running on the Internet (i.e., HTTP) as its “backbone”.
[0137] In at least one embodiment, a web browser is an application running on a node of a network that, in a WWW-compatible network system, allows a user of a specific server or node to view such information and thus allows the user to search for linked graphics and text-based files using hypertext links embedded in documents or files available from servers that understand HTTP. In at least one embodiment, when a user uses another server on a network such as the Internet to retrieve a given webpage from a first server associated with a first node, the retrieved document may have different hypertext links embedded therein, and a local copy of the retrieved page is created locally by the user. In at least one embodiment, when a user clicks a hypertext link, locally stored information associated with the selected hypertext link is generally sufficient to allow the user's machine to open a connection over the Internet to the server indicated by the hypertext link.
[0138] In at least one embodiment, more than one user may be coupled to each HTTP server via a LAN (such as LAN 1038, as shown with respect to WWW HTTP server 1034). In at least one embodiment, system 1000C may also include other types of nodes or elements. In at least one embodiment, the WWW HTTP server is an application running on a machine such as a PC. In at least one embodiment, each user may be considered to have a unique “server,” as shown with respect to PC 1044. In at least one embodiment, a server may be considered to be a server such as WWW HTTP server 1034 that provides access to the network for a LAN or multiple nodes or multiple LANs. In at least one embodiment, there are multiple users, each user having a desktop PC or a node on the network, each desktop PC potentially establishing a server for its users. In at least one embodiment, each server is associated with a specific network address or URL that, when accessed, provides a default webpage for that user at that specific network address or URL. In at least one embodiment, the webpage may contain further links (embedded URLs) pointing to further subpages for that user on that server, or to other servers on the network or to pages on other servers on the network.
[0139] Cloud computing and services
[0140] The following figures illustrate, but are not limited to, exemplary cloud-based systems that can be used to implement at least one embodiment.
[0141] In at least one embodiment, cloud computing is a computing style in which dynamically scalable and often virtualized resources are provided as a service over the Internet. In at least one embodiment, users do not need knowledge of, expertise in, or control over the technical infrastructure supporting them, which may be referred to as "in the cloud." In at least one embodiment, cloud computing incorporates infrastructure as a service, platforms as services, software as services, and other variations with common Internet-dependent themes to meet users' computing needs. In at least one embodiment, a typical cloud deployment (such as in a private cloud (e.g., an enterprise network)) or a data center (DC) in a public cloud (e.g., the Internet) may consist of thousands of servers (or alternatively, VMs), hundreds of Ethernet, Fibre Channel, or Fibre Channel Ethernet (FCoE) ports, switching and storage infrastructure, etc. In at least one embodiment, the cloud may also consist of network service infrastructure such as IPsec VPN hubs, firewalls, load balancers, wide area network (WAN) optimizers, etc. In at least one embodiment, remote subscribers can securely access cloud applications and services via a VPN tunnel (such as an IPsec VPN tunnel).
[0142] In at least one embodiment, cloud computing is a model for enabling convenient, on-demand network access to a shared pool of configurable computing resources (e.g., networks, servers, storage devices, applications, and services) that can be quickly configured and released with minimal management effort or service provider interaction.
[0143] In at least one embodiment, cloud computing is characterized by on-demand self-service, where consumers can automatically and unilaterally provision computing power, such as server time and network storage, as needed, without human interaction with each service provider. In at least one embodiment, cloud computing is characterized by broad network access, where capabilities are available on the network and accessed via standard mechanisms that facilitate use by heterogeneous thin or thick client platforms (e.g., mobile phones, laptops, and PDAs). In at least one embodiment, cloud computing is characterized by resource pooling, where a provider's computing resources are pooled to serve multiple consumers using a multi-tenant model, where different physical and virtual resources are dynamically signed and reallocated based on consumer demand. In at least one embodiment, there is a sense of location independence because consumers typically have no control or knowledge of the exact location of the provided resources, but may be able to specify the location at a higher level of abstraction (e.g., country, state, or data center).
[0144] In at least one embodiment, resources include storage, processing, memory, network bandwidth, and virtual machines. In at least one embodiment, cloud computing is characterized by rapid elasticity, where capacity can be rapidly and elastically provisioned (in some cases automatically) to scale down rapidly and released rapidly to scale up rapidly. In at least one embodiment, for consumers, the capacity available for provisioning generally appears unlimited and can be purchased at any time in any quantity. In at least one embodiment, cloud computing is characterized by measured services, where the cloud system automatically controls and optimizes resource usage by leveraging metering capabilities at some level of abstraction suitable for service types (e.g., storage, processing, bandwidth, and active user accounts). In at least one embodiment, resource usage can be monitored, controlled, and reported, thereby providing transparency for both the providers and consumers of the services utilized.
[0145] In at least one embodiment, cloud computing may be associated with different services. In at least one embodiment, cloud Software as a Service (SaaS) may refer to a service that provides consumers with the ability to use the provider's applications running on cloud infrastructure. In at least one embodiment, applications may be accessed from different client devices via a thin client interface such as a web browser (e.g., web-based email). In at least one embodiment, consumers do not manage or control the underlying cloud infrastructure, including networks, servers, operating systems, storage, or even the capabilities of individual applications, with possible exceptions of limited user-specific application configuration settings.
[0146] In at least one embodiment, Cloud Platform as a Service (PaaS) can refer to a service in which the ability to provide consumers with the capability to deploy consumer-created or acquired applications onto cloud infrastructure, these applications being created using programming languages and tools supported by the provider. In at least one embodiment, the consumer does not manage or control the underlying cloud infrastructure, including networks, servers, operating systems, or storage, but has control over the deployed applications and, possibly, the configuration of the application hosting environment.
[0147] In at least one embodiment, cloud infrastructure as a service (IaaS) can refer to a service in which the capabilities provided to consumers are processing, storage, networking, and other basic computing resources that consumers can deploy and run, including operating systems and applications. In at least one embodiment, consumers do not manage or control the underlying cloud infrastructure, but instead have control over the operating system, storage, deployed applications, and possibly limited control over selected networking components (e.g., host firewalls).
[0148] In at least one embodiment, cloud computing can be deployed in different ways. In at least one embodiment, a private cloud can refer to a cloud infrastructure that is only for organizational operations. In at least one embodiment, a private cloud can be managed by an organization or a third party and can exist on-site or off-site. In at least one embodiment, a community cloud can refer to a cloud infrastructure shared by several organizations and supporting a specific community with shared concerns (e.g., mission, security requirements, policies, and compliance considerations). In at least one embodiment, a community cloud can be managed by an organization or a third party and can exist on-site or off-site. In at least one embodiment, a public cloud can refer to a cloud infrastructure that is available to the general public or a large industry group and is owned by an organization providing cloud services. In at least one embodiment, a hybrid cloud can refer to a cloud infrastructure that is composed of two or more clouds (private, community, or public) that remain a single entity but are bound together by standardization or proprietary technologies that enable data and application portability (e.g., cloud bursting for load balancing between clouds). In at least one embodiment, the cloud computing environment is service-oriented, focusing on statefulness, loose coupling, modularity, and semantic interoperability.
[0149] Figure 11 The diagram illustrates one or more components of a system environment 1100 according to at least one embodiment, wherein services can be provided as third-party network services. In at least one embodiment, the third-party network may be referred to as a cloud, cloud network, cloud computing network, and / or variations thereof. In at least one embodiment, system environment 1100 includes one or more client computing devices 1104, 1106, and 1108, which can be used by users to interact with a third-party network infrastructure system 1102 that provides third-party network services (which may be referred to as cloud computing services). In at least one embodiment, third-party network infrastructure system 1102 may include one or more computers and / or servers.
[0150] It should be understood that Figure 11 The third-party network infrastructure system 1102 described herein may have components other than those described. Furthermore, Figure 11 An embodiment of a third-party network infrastructure system is described. In at least one embodiment, the third-party network infrastructure system 1102 may have a greater than Figure 11 The more or fewer components depicted may be combined into two or more components, or may have different component configurations or arrangements.
[0151] In at least one embodiment, client computing devices 1104, 1106, and 1108 may be configured to operate client applications, such as web browsers, which may be used by a user of the client computing devices to interact with a third-party network infrastructure system 1102 to use proprietary client applications or other applications that provide services provided by the third-party network infrastructure system 1102. Although the exemplary system environment 1100 is shown as having three client computing devices, any number of client computing devices can be supported. In at least one embodiment, other devices, such as devices with sensors, may interact with the third-party network infrastructure system 1102. In at least one embodiment, network 1110 may facilitate communication and data exchange between client computing devices 1104, 1106, and 1108 and the third-party network infrastructure system 1102.
[0152] In at least one embodiment, the services provided by the third-party network infrastructure system 1102 may include hosting services provided on demand to users of the third-party network infrastructure system. In at least one embodiment, various services may also be provided, including but not limited to online data storage and backup solutions, web-based email services, hosted office suites and document collaboration services, database management and processing, managed technical support services, and / or variations thereof. In at least one embodiment, the services provided by the third-party network infrastructure system can be dynamically expanded to meet the needs of its users.
[0153] In at least one embodiment, a specific instance of a service provided by the third-party network infrastructure system 1102 may be referred to as a "service instance". In at least one embodiment, generally, any service available to a user from the third-party network service provider system via a communication network (such as the Internet) is referred to as a "third-party network service". In at least one embodiment, in a public third-party network environment, the servers and systems comprising the third-party network service provider system are different from the servers and systems on the customer's own premises. In at least one embodiment, the third-party network service provider system may host applications, and users may subscribe to and use applications on demand via a communication network (such as the Internet).
[0154] In at least one embodiment, services within a third-party network infrastructure may include protected computer network access to storage, hosted databases, hosted web servers, software applications, or other services provided to users by a third-party network provider. In at least one embodiment, services may include password-protected access to remote storage devices on a third-party network via the Internet. In at least one embodiment, services may include a web-based hosted relational database and a scripting language middleware engine for private use by network developers. In at least one embodiment, services may include access to email software applications hosted on a website hosted by a third-party network provider.
[0155] In at least one embodiment, the third-party network infrastructure system 1102 may include a set of application, middleware, and database services delivered to customers in a self-service, subscription-based, elastically scalable, reliable, highly available, and secure manner. In at least one embodiment, the third-party network infrastructure system 1102 may also provide “big data” related computing and analytics services. In at least one embodiment, the term “big data” is generally used to refer to extremely large datasets that can be stored and manipulated by analysts and researchers to visualize large amounts of data, detect trends, and / or otherwise interact with the data. In at least one embodiment, big data and related applications may be hosted and / or manipulated by the infrastructure system at many levels and at different scales. In at least one embodiment, dozens, hundreds, or thousands of processors linked in parallel may act on such data to present the data or simulate external forces on the data or what it represents. In at least one embodiment, these datasets may involve structured data (such as structured data organized in a database or otherwise according to a structured model) and / or unstructured data (e.g., emails, images, data blobs (binary large objects), web pages, complex event processing). In at least one embodiment, by leveraging the ability of the embodiment to focus more (or less) computing resources relatively quickly on the target, third-party network infrastructure systems can be better used to perform tasks on large datasets based on the needs of enterprises, government agencies, research organizations, private individuals, groups of individuals or organizations that like to be reminded, or other entities.
[0156] In at least one embodiment, the third-party network infrastructure system 1102 can be adapted to automatically provide, manage, and track customer subscriptions to services provided by the third-party network infrastructure system 1102. In at least one embodiment, the third-party network infrastructure system 1102 can provide third-party network services via different deployment models. In at least one embodiment, services can be provided under a public third-party network model, wherein the third-party network infrastructure system 1102 is owned by an organization selling third-party network services, and the services are available to the general public or various industry enterprises. In at least one embodiment, services can be provided under a private third-party network model, in which the third-party network infrastructure system 1102 operates only for a single organization and can provide services to one or more entities within that organization. In at least one embodiment, third-party network services can also be provided under a community third-party network model, wherein the third-party network infrastructure system 1102 and the services provided by the third-party network infrastructure system 1102 are shared by several organizations in the relevant community. In at least one embodiment, third-party network services can also be provided under a hybrid third-party network model, which is a combination of two or more different models.
[0157] In at least one embodiment, the services provided by the third-party network infrastructure system 1102 may include one or more services offered under the Software as a Service (SaaS) category, Platform as a Service (PaaS) category, Infrastructure as a Service (IaaS) category, or other service categories that include hybrid services. In at least one embodiment, a customer may subscribe to one or more services provided by the third-party network infrastructure system 1102 via a subscription order. In at least one embodiment, the third-party network infrastructure system 1102 then performs processing to provide the services in the customer's subscription order.
[0158] In at least one embodiment, the services provided by the third-party network infrastructure system 1102 may include, but are not limited to, application services, platform services, and infrastructure services. In at least one embodiment, application services may be provided by the third-party network infrastructure system via a SaaS platform. In at least one embodiment, the SaaS platform may be configured to provide third-party network services belonging to the SaaS category. In at least one embodiment, the SaaS platform may provide the ability to build and deliver a suite of on-demand applications on an integrated development and deployment platform. In at least one embodiment, the SaaS platform may manage and control the underlying software and infrastructure used to provide SaaS services. In at least one embodiment, by utilizing the services provided by the SaaS platform, customers can utilize applications running on the third-party network infrastructure system. In at least one embodiment, customers can obtain application services without needing to purchase separate licenses and support. In at least one embodiment, a variety of different SaaS services may be provided. In at least one embodiment, this may include, but is not limited to, services providing solutions for sales performance management, enterprise integration, and business agility for large organizations.
[0159] In at least one embodiment, the platform service may be provided by a third-party network infrastructure system 1102 via a PaaS platform. In at least one embodiment, the PaaS platform may be configured to provide third-party network services belonging to the PaaS category. In at least one embodiment, the platform service may include, but is not limited to, services that enable organizations to merge existing applications on a shared public architecture, and the ability to build new applications utilizing shared services provided by the platform. In at least one embodiment, the PaaS platform may manage and control the underlying software and infrastructure used to provide the PaaS service. In at least one embodiment, customers can access the PaaS service provided by the third-party network infrastructure system 1102 without the need for customers to purchase separate licenses and support.
[0160] In at least one embodiment, by leveraging services provided by the PaaS platform, customers can employ programming languages and tools supported by a third-party network infrastructure system and also control the deployed services. In at least one embodiment, the platform services provided by the third-party network infrastructure system may include database third-party network services, middleware third-party network services, and third-party network services. In at least one embodiment, the database third-party network service may support a shared service deployment model that enables organizations to aggregate database resources and provide databases as a service to customers in the form of a database third-party network. In at least one embodiment, within the third-party network infrastructure system, the middleware third-party network service can provide customers with a platform to develop and deploy different business applications, and the third-party network service can provide customers with a platform to deploy applications.
[0161] In at least one embodiment, various infrastructure services may be provided by an IaaS platform within a third-party network infrastructure system. In at least one embodiment, infrastructure services facilitate the management and control of underlying computing resources (such as storage, networking, and other basic computing resources) by customers utilizing services provided by SaaS and PaaS platforms.
[0162] In at least one embodiment, the third-party network infrastructure system 1102 may further include infrastructure resources 1130 for providing resources for offering various services to customers of the third-party network infrastructure system. In at least one embodiment, infrastructure resources 1130 may include a pre-integrated and optimized combination of hardware (such as servers, storage, and networking resources) to perform services provided by PaaS platforms and SaaS platforms, as well as other resources.
[0163] In at least one embodiment, resources in the third-party network infrastructure system 1102 can be shared by multiple users and dynamically reallocated as needed. In at least one embodiment, resources can be allocated to users in different time zones. In at least one embodiment, the third-party network infrastructure system 1102 can enable a first group of users in a first time zone to utilize the resources of the third-party network infrastructure system for a specified number of hours, and subsequently enable the reallocation of the same resources to another group of users located in a different time zone, thereby maximizing resource utilization.
[0164] In at least one embodiment, multiple internal shared services 1132 shared by different components or modules of the third-party network infrastructure system 1102 may be provided to enable services to be provided by the third-party network infrastructure system 1102. In at least one embodiment, these internal shared services may include, but are not limited to, security and identity services, integration services, enterprise library services, enterprise manager services, virus scanning and whitelisting services, high availability, backup and recovery services, services for enabling third-party network support, email services, notification services, file transfer services, and / or variations thereof.
[0165] In at least one embodiment, the third-party network infrastructure system 1102 can provide comprehensive management of third-party network services (e.g., SaaS, PaaS, and IaaS services) within the third-party network infrastructure system. In at least one embodiment, the third-party network management functionality may include the ability and / or variations thereof for provisioning, managing, and tracking customer subscriptions received by the third-party network infrastructure system 1102.
[0166] In at least one embodiment, such as Figure 11As shown, third-party network management functions can be provided by one or more modules, such as order management module 1120, order coordination module 1122, order supply module 1124, order management and monitoring module 1126, and identity management module 1128. In at least one embodiment, these modules may include or be provided using one or more computers and / or servers, which may be general-purpose computers, dedicated server computers, server groups, server clusters, or any other suitable arrangement and / or combination.
[0167] In at least one embodiment, in step 1134, a client using a client device, such as client computing devices 1104, 1106, or 1108, interacts with the third-party network infrastructure system 1102 by requesting one or more services provided by the third-party network infrastructure system 1102 and placing an order for a subscription to one or more services provided by the third-party network infrastructure system 1102. In at least one embodiment, the client may access a third-party network user interface (UI), such as third-party network UI 1112, third-party network UI 1114, and / or third-party network UI 1116, and place orders via these UIs. In at least one embodiment, order information received by the third-party network infrastructure system 1102 in response to a client placing an order may include information identifying the client and the one or more services provided by the third-party network infrastructure system 1102 that the client wishes to subscribe to.
[0168] In at least one embodiment, in step 1136, the order information received from the customer may be stored in the order database 1118. In at least one embodiment, if this is a new order, a new record may be created for the order. In at least one embodiment, the order database 1118 may be one of several databases operated by a third-party network infrastructure system 1118 and in conjunction with other system components.
[0169] In at least one embodiment, in step 1138, an order information may be forwarded to an order management module 1120, which may be configured to perform billing and accounting functions related to an order, such as verifying an order and, after verification, reserving an order.
[0170] In at least one embodiment, in step 1140, information about the order may be transmitted to an order coordination module 1122, which is configured to coordinate the supply of services and resources for orders placed by customers. In at least one embodiment, the order coordination module 1122 may use the services of the order supply module 1124 for supply. In at least one embodiment, the order coordination module 1122 enables the management of business processes associated with each order and applies business logic to determine whether the order should proceed to supply.
[0171] In at least one embodiment, in step 1142, upon receiving a new subscription order, the order coordination module 1122 sends a request to the order provisioning module 1124 to allocate resources and configure the resources required to satisfy the subscription order. In at least one embodiment, the order provisioning module 1124 implements resource allocation for the services ordered by the customer. In at least one embodiment, the order provisioning module 1124 provides an abstraction level between third-party network services provided by the third-party network infrastructure system 1100 and the physical implementation layer for providing resources for the requested services. In at least one embodiment, this allows the order coordination module 1122 to be isolated from implementation details, such as whether services and resources are actually provisioned in real-time or pre-provisioned and allocated / allocated only upon request.
[0172] In at least one embodiment, in step 1144, once the service and resources are provided, a notification instructing the subscribed customer that the requested service is now ready for use can be sent. In at least one embodiment, information (e.g., a link) can be sent to the customer, enabling the customer to begin using the requested service.
[0173] In at least one embodiment, in step 1146, the customer's subscription order can be managed and tracked by the order management and monitoring module 1126. In at least one embodiment, the order management and monitoring module 1126 can be configured to collect usage statistics regarding customer use of the subscription service. In at least one embodiment, statistics can be collected for storage usage, data transfer volume, number of users, and the amount and / or changes in system power-on and power-off times.
[0174] In at least one embodiment, the third-party network infrastructure system 1100 may include an identity management module 1128 configured to provide identity services, such as access management and authorization services within the third-party network infrastructure system 1100. In at least one embodiment, the identity management module 1128 may control information about customers who wish to utilize services provided by the third-party network infrastructure system 1102. In at least one embodiment, such information may include information authenticating the identities of such customers and information describing which actions those customers are authorized to perform relative to different system resources (e.g., files, directories, applications, communication ports, memory segments, etc.). In at least one embodiment, the identity management module 1128 may also include managing descriptive information about each customer, as well as information about how and by whom that descriptive information can be accessed and modified.
[0175] Figure 12 A cloud computing environment 1202 according to at least one embodiment is illustrated. In at least one embodiment, the cloud computing environment 1202 includes one or more computer systems / servers 1204, with computing devices such as personal digital assistants (PDAs) or cellular phones 1206A, desktop computers 1206B, laptop computers 1206C, and / or automotive computer systems 1206N communicating with the one or more computer systems / servers 1204. In at least one embodiment, this allows infrastructure, platforms, and / or software to be provided as services from the cloud computing environment 1202 so that each client does not need to maintain such resources individually. It should be understood that... Figure 12 The types of computing devices 1206A-N shown are intended to be illustrative only, and the cloud computing environment 1202 can communicate with any type of computerized device via any type of network and / or network / addressable connectivity (e.g., using a web browser).
[0176] In at least one embodiment, the computer system / server 1204, which may be represented as a cloud computing node, may operate with many other general-purpose or special-purpose computing system environments or configurations. In at least one embodiment, computing systems, environments, and / or configurations suitable for use with the computer system / server 1204 include, but are not limited to, personal computer systems, server computer systems, thin clients, thick clients, handheld or laptop devices, multiprocessor systems, microprocessor-based systems, set-top boxes, programmable consumer electronics, network PCs, minicomputer systems, mainframe computer systems, and distributed cloud computing environments that include any of the aforementioned systems or devices, and / or variations thereof.
[0177] In at least one embodiment, the computer system / server 1204 can be described in the general context of computer system executable instructions (such as program modules) executed by the computer system. In at least one embodiment, the program module includes routines, programs, objects, components, logic, data structures, etc., that perform a specific task or implement a specific abstract data type. In at least one embodiment, the exemplary computer system / server 1204 can be practiced in a distributed big data computing environment, where tasks are performed by remote processing devices linked via a communication network. In at least one embodiment, in a distributed cloud computing environment, the program module may reside in both local and remote computer system storage media, including memory storage devices.
[0178] Figure 13 The cloud computing environment 1202 according to at least one embodiment is shown. Figure 12 This provides a set of functional abstractions. It should be understood beforehand that... Figure 13 The components, layers, and functions shown are intended to be illustrative only, and may vary.
[0179] In at least one embodiment, the hardware and software layer 1302 includes hardware and software components. In at least one embodiment, the hardware components include mainframes, different RISC (Reduced Instruction Set Computer) based servers, different computing systems, supercomputing systems, storage devices, networks, networking components, and / or variations thereof. In at least one embodiment, the software components include network application server software, different application server software, different database software, and / or variations thereof.
[0180] In at least one embodiment, the virtualization layer 1304 provides an abstraction layer from which exemplary virtual entities such as virtual servers, virtual storage, virtual networks (including virtual private networks), virtual applications, virtual clients, and / or variations thereof can be provided.
[0181] In at least one embodiment, the management layer 1306 provides various functionalities. In at least one embodiment, resource provisioning provides dynamic acquisition of computing resources and other resources for performing tasks within the cloud computing environment. In at least one embodiment, metering provides usage tracking of resources within the cloud computing environment, and billing or invoicing for the consumption of these resources. In at least one embodiment, resources may include application software licenses. In at least one embodiment, security provides authentication for users and tasks, and protection for data and other resources. In at least one embodiment, the user interface provides access to the cloud computing environment for both users and system administrators. In at least one embodiment, service level management provides the allocation and management of cloud computing resources to meet required service levels. In at least one embodiment, service level agreement (SLA) management provides the pre-scheduling and procurement of cloud computing resources, anticipating future requirements for those resources according to the SLA.
[0182] In at least one embodiment, workload layer 1308 provides functionality utilizing a cloud computing environment. In at least one embodiment, workloads and functions that can be provided from this layer include: mapping and navigation, software development and management, educational services, data analysis and processing, transaction processing, and service delivery.
[0183] Supercomputing
[0184] The following figures illustrate, but are not limited to, exemplary supercomputer-based systems that can be used to implement at least one embodiment.
[0185] In at least one embodiment, a supercomputer can refer to a hardware system exhibiting substantially parallelism and comprising at least one chip, wherein the chips in the system are interconnected via a network and housed in a hierarchically organized enclosure. In at least one embodiment, a large hardware system filling a server room with several racks is at least one embodiment of a supercomputer, each rack comprising several board / rack modules, each board / rack module comprising several chips all interconnected by a scalable network. In at least one embodiment, a single rack of such a large hardware system is at least one other embodiment of a supercomputer. In at least one embodiment, a single chip exhibiting substantially parallelism and comprising several hardware components can also be considered a supercomputer, because as feature size may decrease, the amount of hardware that can be incorporated into a single chip may also increase.
[0186] Figure 14A chip-level supercomputer according to at least one embodiment is illustrated. In at least one embodiment, the main computation is performed within a finite state machine (1404) referred to as a thread unit, inside an FPGA or ASIC chip. In at least one embodiment, a task and synchronization network (1402) connects to the finite state machine and is used to dispatch threads and perform operations in the correct order. In at least one embodiment, a memory network (1406, 1410) is used to access a multi-level partitioned on-chip cache hierarchy (1408, 1412). In at least one embodiment, a memory controller (1416) and an off-chip memory network (1414) are used to access off-chip memory. In at least one embodiment, an I / O controller (1418) is used for cross-chip communication when the design is not suitable for a single logic chip.
[0187] Figure 15 A supercomputer at the rock module level is illustrated according to at least one embodiment. In at least one embodiment, within the rack module, there are multiple FPGA or ASIC chips (1502) connected to one or more DRAM cells (1504) constituting the main accelerator memory. In at least one embodiment, each FPGA / ASIC chip is connected to its adjacent FPGA / ASIC chip (1506) using a wide bus on the board via differential high-speed signaling. In at least one embodiment, each FPGA / ASIC chip is also connected to at least one high-speed serial communication cable.
[0188] Figure 16 A rack-level supercomputer according to at least one embodiment is shown. Figure 17 A supercomputer at the entire system level is illustrated according to at least one embodiment. In at least one embodiment, see [link to at least one embodiment]. Figure 16 and Figure 17A scalable, potentially incomplete, hypercube network is implemented between rack modules within a rack and across the entire system rack using high-speed serial optics or copper cables (1602, 1702). In at least one embodiment, one of the FPGA / ASIC chips in the accelerator is connected to the host system (1704) via a PCI-Express connection. In at least one embodiment, the host system includes a host microprocessor (1708) on which the software portion of the application runs, and a memory consisting of one or more host memory DRAM cells (1706) consistent with the memory on the accelerator. In at least one embodiment, the host system may be a separate module on one of the racks or may be integrated with one of the modules of the supercomputer. In at least one embodiment, a cube-connected loop topology provides communication links to create a hypercube network for a large supercomputer. In at least one embodiment, a group of FPGA / ASIC chips on a rack module may act as a single hypercube node, increasing the total number of external links per group compared to a single chip. In at least one embodiment, a group comprises chips A, B, C, and D on a rack module having an internal wide differential bus connecting A, B, C, and D in a ring organization. In at least one embodiment, there are 12 serial communication cables connecting the rack module to the outside world. In at least one embodiment, chip A on the rack module is connected to serial communication cables 0, 1, and 2. In at least one embodiment, chip B is connected to cables 3, 4, and 5. In at least one embodiment, chip C is connected to cables 6, 7, and 8. In at least one embodiment, chip D is connected to cables 9, 10, and 11. In at least one embodiment, the entire group {A, B, C, D} constituting the rack module can form a hypercube node within a supercomputer system, with up to 2^12 = 4096 rack modules (16384 FPGA / ASIC chips). In at least one embodiment, in order for chip A to send a message outward on link 4 of group {A, B, C, D}, the message must first be routed to chip B using an on-board differential wide bus connection. In at least one embodiment, messages arriving on link 4 from group {A, B, C, D} (i.e., to chip A) must also first be routed to the correct destination chip (A) within group {A, B, C, D}. In at least one embodiment, parallel supercomputer systems of other sizes can also be implemented.
[0189] AI
[0190] The following figures illustrate, but are not limited to, exemplary artificial intelligence-based systems that can be used to implement at least one embodiment.
[0191] Figure 18AInference and / or training logic 1815 for performing inference and / or training operations associated with one or more embodiments is shown. The following is in conjunction with... Figure 18A And / or 18B provides details about reasoning and / or training logic 1815.
[0192] In at least one embodiment, the inference and / or training logic 1815 may include, but is not limited to, a code and / or data memory 1801 for storing forward and / or output weights and / or input / output data, and / or other parameters for configuring neurons or layers of a neural network being trained and / or used for inference in aspects of one or more embodiments. In at least one embodiment, the training logic 1815 may include or be coupled to the code and / or data storage device 1801 to store graphical code or other software to control timing and / or sequence, wherein weight and / or other parameter information is loaded to configure the logic, including integer and / or floating-point units (collectively, arithmetic logic units (ALUs)). In at least one embodiment, code (such as graphical code) loads weight or other parameter information into the processor ALU based on the architecture of the neural network to which such code corresponds. In at least one embodiment, the code and / or data storage device 1801 stores weight parameters and / or input / output data for each layer of a neural network that is trained or used in conjunction with one or more embodiments during forward propagation of input / output data and / or weight parameters during training and / or inference using aspects of one or more embodiments. In at least one embodiment, any portion of the code and / or data storage device 1801 may be included together with other on-chip or off-chip data storage devices, including the processor's L1, L2, or L3 cache memory or system memory.
[0193] In at least one embodiment, any portion of the code and / or data storage device 1801 may be internal or external to one or more processors or other hardware logic devices or circuits. In at least one embodiment, the code and / or data storage device 1801 may be a cache memory, dynamic random-addressable memory (“DRAM”), static random-addressable memory (“SRAM”), non-volatile memory (e.g., flash memory), or other storage device. In at least one embodiment, the choice of whether the code and / or data storage device 1801 is internal or external to the processor, or the inclusion of DRAM, SRAM, flash memory, or some other storage type in at least one embodiment, may depend on the available on-chip storage relative to off-chip storage, the latency requirements of the training and / or inference functions being performed, the batch size of data used in the inference and / or training of the neural network, or some combination of these factors.
[0194] In at least one embodiment, the inference and / or training logic 1815 may include, but is not limited to, a code and / or data storage device 1805 for storing backward and / or output weights and / or input / output data corresponding to neurons or layers of a neural network trained and / or used for inference in one or more aspects of one or more embodiments. In at least one embodiment, the code and / or data storage device 1805 stores weight parameters and / or input / output data for each layer of the neural network, which is trained or used in conjunction with one or more embodiments during backward propagation of input / output data and / or weight parameters during training and / or inference in one or more aspects of one or more embodiments. In at least one embodiment, the training logic 1815 may include or be coupled to the code and / or data storage device 1805 to store graphical code or other software to control timing and / or sequence, wherein weight and / or other parameter information is loaded to configure the logic, including integer and / or floating-point units (collectively referred to as Arithmetic Logic Units (ALUs)).
[0195] In at least one embodiment, code (such as graphical code) causes weights or other parameter information to be loaded into the processor ALU based on the architecture of the neural network corresponding to such code. In at least one embodiment, any portion of the code and / or data storage device 1805 may be included together with other on-chip or off-chip data storage devices, including the processor's L1, L2, or L3 cache or system memory. In at least one embodiment, any portion of the code and / or data storage device 1805 may be internal or external to one or more processors or other hardware logic devices or circuits. In at least one embodiment, the code and / or data storage device 1805 may be a cache memory, DRAM, SRAM, non-volatile memory (e.g., flash memory), or other storage device. In at least one embodiment, the choice of whether the code and / or data storage device 1805 is internal or external to the processor, or the inclusion of DRAM, SRAM, flash memory, or some other storage type, may depend on the available on-chip storage relative to off-chip storage, the latency requirements of the training and / or inference functions being performed, the batch size of data used in the inference and / or training of the neural network, or some combination of these factors.
[0196] In at least one embodiment, code and / or data storage 1801 and code and / or data storage 1805 may be separate storage structures. In at least one embodiment, code and / or data storage 1801 and code and / or data storage 1805 may be combined storage structures. In at least one embodiment, code and / or data storage devices 1801 and 1805 may be partially combined and partially separated. In at least one embodiment, any portion of code and / or data storage 1801 and 1805 may be included together with other on-chip or off-chip data storage (including the processor's L1, L2, or L3 cache or system memory).
[0197] In at least one embodiment, the inference and / or training logic 1815 may include, but is not limited to, one or more arithmetic logic units (“ALUs”) 1810, including integer and / or floating-point units, at least in part based on or instructing to perform logical and / or mathematical operations, training and / or inference code (e.g., graphical code), the result of which may produce activations (e.g., output values from layers or neurons within a neural network) stored in activation storage device 1820, said activations being functions of input / output and / or weight parameter data stored in code and / or data storage devices 1801 and / or code and / or data storage devices 1805. In at least one embodiment, in response to execution instructions or other code, activations stored in activation storage device 1820 are generated based on linear algebra and / or matrix-based mathematics performed by ALU 1810, wherein weight values stored in code and / or data storage devices 1805 and / or data storage device 1801 are used as operands along with other values such as bias values, gradient information, momentum values, or other parameters or hyperparameters, any or all of which may be stored in code and / or data storage 1805 or code and / or data storage 1801 or in another memory on or off the chip.
[0198] In at least one embodiment, one or more ALUs 1810 are included within one or more processors or other hardware logic devices or circuits, while in another embodiment, one or more ALUs 1810 may be external to the processor or other hardware logic devices or circuits (e.g., coprocessors) that use them. In at least one embodiment, ALUs 1810 may be included within an execution unit of a processor or otherwise within an ALU stack accessible by the execution unit of the processor, which may be within the same processor or distributed among different types of processors (e.g., central processing unit, graphics processing unit, fixed-function unit, etc.). In at least one embodiment, code and / or data storage devices 1801, 1805, and activation storage device 1820 may share a processor or other hardware logic device or circuit, while in another embodiment, they may be in different processors or other hardware logic devices or circuits, or some combination of the same and different processors or other hardware logic devices or circuits. In at least one embodiment, any portion of the activated storage device 1820 may be included together with other on-chip or off-chip data storage devices, including the processor's L1, L2, or L3 cache or system memory. Furthermore, inference and / or training code may be stored together with other code accessible to the processor or other hardware logic or circuitry, and may be acquired and / or processed using the processor's fetch, decode, schedule, execute, de-fetch, and / or other logic circuitry.
[0199] In at least one embodiment, the active storage device 1820 may be a cache memory, DRAM, SRAM, non-volatile memory (e.g., flash memory), or other storage device. In at least one embodiment, the active storage device 1820 may be wholly or partially located within or outside one or more processors or other logic circuits. In at least one embodiment, the selection of whether the active storage device 1820 is inside or outside the processor, or including DRAM, SRAM, flash memory, or some other storage type in at least one embodiment, may depend on the available on-chip storage relative to off-chip storage, the latency requirements of the training and / or inference functions being performed, the batch size of data used in the inference and / or training of the neural network, or some combination of these factors.
[0200] In at least one embodiment, Figure 18A The inference and / or training logic 1815 shown can be used in conjunction with an application-specific integrated circuit (“ASIC”), such as those from Google. Processing unit, from Graphcore TM Inference processing unit (IPU), or from Intel Corporation (For example, a "Lake Crest" processor. In at least one embodiment, Figure 18A The inference and / or training logic 1815 shown can be used in conjunction with central processing unit (“CPU”) hardware, graphics processing unit (“GPU”) hardware or other hardware such as field programmable gate array (“FPGA”)).
[0201] Figure 18B Inference and / or training logic 1815 according to at least one embodiment is illustrated. In at least one embodiment, the inference and / or training logic 1815 may include, but is not limited to, hardware logic in which computational resources are dedicated or otherwise exclusively used in conjunction with weight values or other information corresponding to one or more neuron layers within a neural network. In at least one embodiment, Figure 18B The inference and / or training logic 1815 shown can be combined with an application-specific integrated circuit (ASIC) (such as those from Google). Processing unit, from Graphcore TM Inference processing unit (IPU), or from Intel Corporation (For example, a "lake tooth" processor) is used. In at least one embodiment, Figure 18B The inference and / or training logic 1815 shown can be used in conjunction with central processing unit (CPU) hardware, graphics processing unit (GPU) hardware, or other hardware such as a field-programmable gate array (FPGA). In at least one embodiment, the inference and / or training logic 1815 includes, but is not limited to, code and / or data storage devices 1801 and 1805, which can be used to store code (e.g., graphical code), weight values, and / or other information, including bias values, gradient information, momentum values, and / or other parameter or hyperparameter information. Figure 18B In at least one embodiment described herein, each of code and / or data storage devices 1801 and 1805 is associated with dedicated computing resources (e.g., computing hardware 1802 and computing hardware 1806). In at least one embodiment, each of computing hardware 1802 and computing hardware 1806 includes one or more ALUs that perform mathematical functions (such as linear algebraic functions) on information stored in code and / or data storage devices 1801 and 1805, respectively, with the results stored in active memory 1820.
[0202] In at least one embodiment, each code and / or data memory 1801 and 1805 and corresponding computing hardware 1802 and 1806 corresponds to a different layer of the neural network, such that the activation obtained from one storage / computation pair 1801 / 1802 of the code and / or data storage 1801 and computing hardware 1802 is provided as input to the next storage / computation pair 1805 / 1806 of the code and / or data storage 1805 and computing hardware 1806 to reflect the conceptual organization of the neural network. In at least one embodiment, each of the storage / computation pairs 1801 / 1802 and 1805 / 1806 may correspond to more than one neural network layer. In at least one embodiment, additional storage / computation pairs (not shown) following or paralleling the storage / computation pairs 1801 / 1802 and 1805 / 1806 may be included in the inference and / or training logic 1815.
[0203] Figure 19 The training and deployment of a deep neural network according to at least one embodiment are illustrated. In at least one embodiment, an untrained neural network 1906 is trained using a training dataset 1902. In at least one embodiment, the training framework 1904 is a PyTorch framework, while in other embodiments, the training framework 1904 is TensorFlow, Boost, Caffe, Microsoft Cognitive Toolkit / CNTK, MXNet, Chainer, Keras, Deeplearning4j, or other training frameworks. In at least one embodiment, the training framework 1904 trains the untrained neural network 1906 and enables it to be trained using the processing resources described herein to generate a trained neural network 1908. In at least one embodiment, the weights may be randomly selected or selected by pre-training using a deep belief network. In at least one embodiment, training may be performed in a supervised, partially supervised, or unsupervised manner.
[0204] In at least one embodiment, supervised learning is used to train an untrained neural network 1906, wherein the training dataset 1902 includes inputs paired with desired outputs for input, or wherein the training dataset 1902 includes inputs with known outputs, and the outputs of the neural network 1906 are manually graded. In at least one embodiment, the untrained neural network 1906 is trained in a supervised manner, and inputs from the training dataset 1902 are processed, and the resulting outputs are compared with a set of expected or desired outputs. In at least one embodiment, errors are then propagated back through the untrained neural network 1906. In at least one embodiment, a training framework 1904 adjusts the weights controlling the untrained neural network 1906. In at least one embodiment, the training framework 1904 includes tools for monitoring how well the untrained neural network 1906 converges toward a model (such as a trained neural network 1908) adapted to generate correct answers (such as results 1914) based on input data (such as a new dataset 1912). In at least one embodiment, the training framework 1904 repeatedly trains the untrained neural network 1906 while using a loss function and tuning algorithms (such as stochastic gradient descent) to refine the weights to refine the output of the untrained neural network 1906. In at least one embodiment, the training framework 1904 trains the untrained neural network 1906 until the untrained neural network 1906 achieves the desired accuracy. In at least one embodiment, the trained neural network 1908 can then be deployed to implement any number of machine learning operations.
[0205] In at least one embodiment, unsupervised learning is used to train an untrained neural network 1906, wherein the untrained neural network 1906 attempts to train itself using unlabeled data. In at least one embodiment, the unsupervised learning training dataset 1902 will include input data without any associated output data or "ground truth" data. In at least one embodiment, the untrained neural network 1906 can learn groupings within the training dataset 1902 and can determine how each input relates to the untrained dataset 1902. In at least one embodiment, unsupervised training can be used to generate self-organizing maps in a trained neural network 1908 capable of performing operations useful in reducing the dimensionality of the new dataset 1912. In at least one embodiment, unsupervised training can also be used to perform anomaly detection, which allows the identification of data points in the new dataset 1912 that deviate from the normal patterns of the new dataset 1912.
[0206] In at least one embodiment, semi-supervised learning can be used, which is a technique in which the training dataset 1902 includes a mixture of labeled and unlabeled data. In at least one embodiment, the training framework 1904 can be used to perform incremental learning, such as through transfer learning techniques. In at least one embodiment, incremental learning enables the trained neural network 1908 to adapt to a new dataset 1912 without forgetting the knowledge dripped into the trained neural network 1408 during the initial training.
[0207] 5G network
[0208] The following figures illustrate, but are not limited to, exemplary 5G network-based systems that can be used to implement at least one embodiment.
[0209] Figure 20 An architecture of a system 2000 for a network according to at least one embodiment is illustrated. In at least one embodiment, system 2000 is shown to include user equipment (UE) 2002 and UE 2004. In at least one embodiment, UE 2002 and 2004 are shown as smartphones (e.g., handheld touchscreen mobile computing devices capable of connecting to one or more cellular networks), but may also include any mobile or non-mobile computing device, such as a personal digital assistant (PDA), pager, laptop computer, desktop computer, wireless handheld device, or any computing device including a wireless communication interface.
[0210] In at least one embodiment, either UE 2002 or UE 2004 may include an Internet of Things (IoT) UE, which may include a network access layer designed for low-power IoT applications utilizing short-lived UE connections. In at least one embodiment, the IoT UE may utilize technologies such as machine-to-machine (M2M) or machine-type communication (MTC) for exchanging data with an MTC server or device via a Public Land Mobile Network (PLMN), Proximity-Based Service (ProSe) or Device-to-Device (D2D) communication, sensor networks, or the IoT network. In at least one embodiment, the M2M or MTC data exchange may be machine-initiated data exchange. In at least one embodiment, the IoT network describes interconnected IoT UEs, which may include uniquely identifiable embedded computing devices (within the Internet infrastructure) with short-lived connections. In at least one embodiment, the IoT UE may execute background applications (e.g., keep-alive messages, state updates, etc.) to facilitate connectivity to the IoT network.
[0211] In at least one embodiment, UE 2002 and UE 2004 may be configured to connect (e.g., communicatively coupled) to a radio access network (RAN) 2016. In at least one embodiment, RAN 2016 may be an evolved Universal Mobile Telecommunications System (UMTS) Terrestrial Radio Access Network (E-UTRAN), NextGenRAN (NGRAN), or some other type of RAN. In at least one embodiment, UE 2002 and UE 2004 utilize connection 2012 and connection 2014, respectively, each connection including a physical communication interface or layer. In at least one embodiment, connections 2012 and 2014 are shown as air interfaces for communication coupling and may be consistent with cellular communication protocols such as the Global System for Mobile Communications (GSM) protocol, Code Division Multiple Access (CDMA) network protocol, Push-to-Talk (PTT) protocol, Cellular-based PTT (POC) protocol, Universal Mobile Telecommunications System (UMTS) protocol, 3GPP Long Term Evolution (LTE) protocol, 5G protocol, New Radio (NR) protocol, and variants thereof.
[0212] In at least one embodiment, UE2002 and 2004 may also directly exchange communication data via ProSe interface 2006. In at least one embodiment, ProSe interface 2006 may alternatively be referred to as a sidelink interface, which includes one or more logical channels, including but not limited to the Physical Sidelink Control Channel (PSCCH), Physical Sidelink Shared Channel (PSSCH), Physical Sidelink Discovery Channel (PSDCH), and Physical Sidelink Broadcast Channel (PSBCH).
[0213] In at least one embodiment, UE2004 is shown configured to access access point (AP)2010 via connection 2008. In at least one embodiment, connection 2008 may include a local wireless connection, such as a connection consistent with any IEEE 802.11 protocol, wherein AP2010 will include Wireless Fidelity. Router. In at least one embodiment, AP2010 is shown as a core network connected to the Internet but not to a wireless system.
[0214] In at least one embodiment, RAN2016 may include one or more access nodes enabling connectivity between 2012 and 2014. In at least one embodiment, these access nodes (ANs) may be referred to as base stations (BS), node Bs, evolved Node Bs (eNBs), next-generation Node Bs (gNBs), RAN nodes, etc., and may include ground stations (e.g., ground access points) or satellite stations providing coverage within a geographic area (e.g., a cell). In at least one embodiment, RAN2016 may include one or more RAN nodes (e.g., macro RAN node 2018) for providing macrocell coverage and one or more RAN nodes (e.g., low-power (LP) RAN node 2020) for providing femtocells or picocells (e.g., cells with smaller coverage areas, smaller user capacity, or higher bandwidth compared to macrocells).
[0215] In at least one embodiment, either RAN node 2018 or 2020 may terminate the air interface protocol and may be the first contact point for UEs 2002 and 2004. In at least one embodiment, either RAN node 2018 or 2020 may implement different logical functions of RAN 2016, including but not limited to Radio Network Controller (RNC) functions such as radio bearer management, uplink and downlink dynamic radio resource management, and data packet scheduling and mobility management.
[0216] In at least one embodiment, UE 2002 and UE 2004 may be configured to communicate with each other or with either RAN node 2018 or RAN node 2020 via a multi-carrier communication channel using orthogonal frequency division multiplexing (OFDM) communication signals according to various communication technologies, such as, but not limited to, orthogonal frequency division multiple access (OFDMA) communication technology (e.g., for downlink communication) or single-carrier frequency division multiple access (SC-FDMA) communication technology (e.g., for uplink and ProSe or sidelink communication), and / or variations thereof. In at least one embodiment, the OFDM signal may include multiple orthogonal subcarriers.
[0217] In at least one embodiment, the downlink resource grid can be used for downlink transmissions from either RAN nodes 2018 and 2020 to UEs 2002 and 2004, while uplink transmissions can utilize similar techniques. In at least one embodiment, the grid can be a time-frequency grid, referred to as a resource grid or time-frequency resource grid, which represents the physical resources in the downlink within each time slot. In at least one embodiment, this time-frequency plane representation is a common practice in OFDM systems, making it intuitive for radio resource allocation. In at least one embodiment, each column and each row of the resource grid corresponds to an OFDM symbol and an OFDM subcarrier, respectively. In at least one embodiment, the duration of the resource grid in the time domain corresponds to a time slot in a radio frame. In at least one embodiment, the minimum time-frequency unit in the resource grid is represented as a resource element. In at least one embodiment, each resource grid comprises multiple resource blocks, which describe the mapping of certain physical channels to resource elements. In at least one embodiment, each resource block comprises a set of resource elements. In at least one embodiment, in the frequency domain, this can represent the minimum number of resources that can currently be allocated. In at least one embodiment, there are several different physical downlink channels transmitted using such resource blocks.
[0218] In at least one embodiment, the Physical Downlink Shared Channel (PDSCH) can carry user data and higher-layer signaling to UEs 2002 and 2004. In at least one embodiment, the Physical Downlink Control Channel (PDCCH) can carry information such as transmission format and resource allocation related to the PDSCH channel. In at least one embodiment, it can also inform UEs 2002 and 2004 of transmission format, resource allocation, and HARQ (Hybrid Automatic Repeat Request) information related to the uplink shared channel. In at least one embodiment, typically, downlink scheduling (allocating control and shared channel resource blocks to UE 2002 within the cell) can be performed at either RAN node 2018 or 2020 based on channel quality information fed back from either UE 2002 or 2004. In at least one embodiment, downlink resource allocation information can be transmitted on the PDCCH used for (e.g., allocated to) each of UEs 2002 and 2004.
[0219] In at least one embodiment, the PDCCH can use Control Channel Elements (CCEs) to transmit control information. In at least one embodiment, PDCCH complex-valued symbols can first be organized into quadruplets before being mapped to resource elements, and then permuted using a sub-block interleaver for rate matching. In at least one embodiment, one or more of these CCEs can be used to transmit each PDCCH, where each CCE can correspond to nine sets of four physical resource elements referred to as resource element groups (REGs). In at least one embodiment, four Quadrature Phase Shift Keying (QPSK) symbols can be mapped to each REG. In at least one embodiment, depending on the size of the downlink control information (DCI) and channel conditions, one or more CCEs can be used to transmit the PDCCH. In at least one embodiment, there can be four or more different PDCCH formats (e.g., aggregation levels, L = 1, 2, 4, or 8) with different numbers of CCEs as defined in LTE.
[0220] In at least one embodiment, the Enhanced Physical Downlink Control Channel (EPDCCH) using PDSCH resources can be used for control information transmission. In at least one embodiment, one or more Enhanced Control Channel Elements (ECCEs) can be used to transmit the EPDCCH. In at least one embodiment, each ECCE can correspond to nine sets of four physical resource elements referred to as Enhanced Resource Element Groups (EREGs). In at least one embodiment, the ECCE can have a different number of EREGs in some cases.
[0221] In at least one embodiment, RAN 2016 is shown communicatively coupled to core network (CN) 2038 via S1 interface 2022. In at least one embodiment, CN 2038 may be an evolved packet core (EPC) network, a NextGen packet core (NPC) network, or some other type of CN. In at least one embodiment, S1 interface 2022 is divided into two parts: S1-U interface 2026, which carries service data between RAN nodes 2018 and 2020 and serving gateway (S-GW) 2030; and S1-Mobility Management Entity (MME) interface 2024, which is the signaling interface between RAN nodes 2018 and 2020 and MME 2028.
[0222] In at least one embodiment, CN2038 includes MME2028, S-GW2030, Packet Data Network (PDN) Gateway (P-GW)2034, and Home Subscriber Server (HSS)2032. In at least one embodiment, MME2028 may functionally resemble the control plane of a legacy General Packet Radio Service (GPRS) Support Node (SGSN). In at least one embodiment, MME2028 may manage mobility aspects of access, such as gateway selection and tracking area list management. In at least one embodiment, HSS 2032 may include a database for network users, comprising subscription-related information to support network entities in handling communication sessions. In at least one embodiment, CN2038 may include one or more HSS2032s, depending on the number of mobile users, device capacity, network organization, etc. In at least one embodiment, HSS2032 may provide support for routing / roaming, authentication, authorization, naming / addressing resolution, location dependencies, etc.
[0223] In at least one embodiment, the S-GW2030 can terminate the S1 interface 2022 toward RAN2016 and route data packets between RAN2016 and CN2038. In at least one embodiment, the S-GW2030 can be a local mobility anchor for inter-RAN node handover and can also provide an anchor for inter-3GPP mobility. In at least one embodiment, other responsibilities may include lawful interception, charging, and some policy enforcement.
[0224] In at least one embodiment, the P-GW2034 can terminate the SGi interface toward the PDN.
[0225] In at least one embodiment, the P-GW2034 can route data packets between the EPC network 2038 and external networks (such as networks including the application server 2040 (or application function (AF))) via the Internet Protocol (IP) interface 2042. In at least one embodiment, the application server 2040 can be an element providing applications that utilize IP bearer resources of the core network (e.g., UMTS Packet Service (PS) domain, LTE PS data service, etc.). In at least one embodiment, the P-GW2034 is shown as communicatively coupled to the application server 2040 via the IP communication interface 2042. In at least one embodiment, the application server 2040 can also be configured to support one or more communication services (e.g., Voice over Internet Protocol (VoIP) sessions, PTT sessions, group communication sessions, social networking services, etc.) of UE2002 and 2004 via CN2038.
[0226] In at least one embodiment, P-GW2034 may also be a node for policy enforcement and charging data collection. In at least one embodiment, Policy and Charging Enforcement Function (PCRF) 2036 is the policy and charging control element of CN2038. In at least one embodiment, in a non-roaming scenario, a single PCRF may exist in the local public land mobile network (HPLMN) associated with the UE's Internet Protocol Connectivity Access Network (IP-CAN) session. In at least one embodiment, in a roaming scenario with local traffic breaches, two PCRFs may exist associated with the UE's IP-CAN session: the home PCRF (H-PCRF) within the HPLMMN and the visited PCRF (V-PCRF) within the visited public land mobile network (VPLMN). In at least one embodiment, PCRF2036 may be communicatively coupled to application server 2040 via P-GW2034. In at least one embodiment, application server 2040 may signal PCRF2036 to indicate new service flows and select appropriate Quality of Service (QoS) and charging parameters. In at least one embodiment, PCRF2036 can configure this rule into a Policy and Charging Enforcement Function (PCEF) (not shown) with an appropriate Service Flow Template (TFT) and identifier for a QoS Class (QCI), which is initiated by the application server 2040 for QoS and charging.
[0227] Figure 21 The architecture of a system 2100 of a network according to some embodiments is shown. In at least one embodiment, the system 2100 is shown to include a UE 2102, a 5G access node or RAN node (shown as (R)AN node 2108), a user plane function (UPF 2104), a data network (DN 2106), and in at least one embodiment, operator services, internet access or third-party services, and a 5G core network (5GC) (shown as CN 2110).
[0228] In at least one embodiment, CN2110 includes authentication server functionality (AUSF2114); core access and mobility management functionality (AMF2112); session management functionality (SMF2118); network exposure functionality (NEF2116); policy control functionality (PCF2122); network function (NF) repository functionality (NRF2120); unified data management (UDM2124); and application functionality (AF2126). In at least one embodiment, CN2110 may also include other elements not shown, such as structured data storage network functionality (SDSF), unstructured data storage network functionality (UDSF), and variations thereof.
[0229] In at least one embodiment, UPF2104 may act as an anchor point for intra- and inter-RAT mobility, an external PDU session point interconnecting to DN2106, and a branch point supporting multihomed PDU sessions. In at least one embodiment, UPF2104 may also perform packet routing and forwarding, packet inspection, user plane portion enforcement of policy rules, lawful packet interception (UP collection), traffic usage reporting, and QoS processing for the user plane (e.g., packet filtering, gating, UL / DL rate enforcement, uplink service verification (e.g., SDF-to-QoS flow mapping), transport-level packet marking in uplink and downlink, and downlink packet buffering and downlink data notification triggering. In at least one embodiment, UPF2104 may include an uplink classifier to support routing traffic flows to the data network. In at least one embodiment, DN2106 may represent different network operator services, Internet access, or third-party services.
[0230] In at least one embodiment, AUSF2114 can store data for authentication of UE2102 and handle authentication-related functions. In at least one embodiment, AUSF2114 can facilitate common authentication frames for different access types.
[0231] In at least one embodiment, AMF2112 can be responsible for registration management (e.g., for registering UE2102, etc.), connection management, reachability management, mobility management, and lawful interception of AMF-related events, as well as access authentication and authorization. In at least one embodiment, AMF2112 can provide SM message transmission for SMF2118 and act as a transparent proxy for routing SM messages. In at least one embodiment, AMF2112 can also provide UE2102 with SMS Functionality (SMSF) (…). Figure 21 Transmission of Short Message Service (SMS) messages between (not shown). In at least one embodiment, AMF2112 may act as a Security Anchoring Function (SEA), which may include interaction with AUSF2114 and UE2102 and receiving an intermediate key established as a result of the UE2102 authentication process. In at least one embodiment, in the case of using USIM-based authentication, AMF2112 may retrieve security material from AUSF2114. In at least one embodiment, AMF2112 may also include a Security Context Management (SCM) function, which receives from the SEA a key it uses to derive the access network-specific key. Furthermore, in at least one embodiment, AMF2112 may be the termination point (N2 reference point) of the RANCP interface, the termination point of NAS (NI) signaling, and perform NAS encryption and integrity protection.
[0232] In at least one embodiment, AMF2112 can also support NAS signaling with UE2102 via the N3 Interoperability Function (IWF) interface. In at least one embodiment, the N3IWF can be used to provide access to untrusted entities. In at least one embodiment, the N3IWF can be a termination point for the N2 and N3 interfaces of the control plane and user plane, thus enabling N2 signaling from the SMF and AMF to be processed for PDU sessions and QoS, encapsulation / decapsulation of IPSec and N3 tunnel packets, uplink marking of N3 user plane packets, and implementation of QoS corresponding to the N3 packet marking, taking into account the QoS requirements associated with such marking received via N2. In at least one embodiment, the N3IWF can also relay uplink and downlink control plane NAS (NI) signaling between UE2102 and AMF2112, and relay uplink and downlink user plane packets between UE2102 and UPF2104. In at least one embodiment, the N3IWF also provides a mechanism for establishing an IPsec tunnel with the UE2102.
[0233] In at least one embodiment, the SMF2118 may be responsible for session management (e.g., session establishment, modification, and release, including tunnel maintenance between the UPF and AN nodes); UE IP address allocation and management (including optional authorization); selection and control of UP functions; configuration of traffic tributaries at the UPF to route traffic to appropriate destinations; interface termination towards policy control functions; policy enforcement and QoS control portions; lawful interception (for SM events and interfaces to the LI system); termination of the SM portion of NAS messages; downlink data notification; initiator of AN-specific SM information, which is sent to the AN via the AMF on N2; and determination of the SSC mode of the session. In at least one embodiment, the SMF2118 may include the following roaming functions: handling local implementation to apply QoSSLAB (VPLMN); charge data collection and charge interface (VPLMN); lawful interception (for SM events in the VPLMN and interface to the LI system); and supporting interaction with external DNs to transmit signaling for PDU session authorization / authentication performed by the external DNs.
[0234] In at least one embodiment, the NEF2116 can provide means for securely exposing services and capabilities provided to third parties by 3GPP network functions, internal exposure / re-exposure, application functions (e.g., AF2126), edge computing or fog computing systems, etc. In at least one embodiment, the NEF2116 can authenticate, authorize, and / or throttle AFs. In at least one embodiment, the NEF2116 can also translate information exchanged with the AF2126 and information exchanged with internal network functions. In at least one embodiment, the NEF2116 can translate between AF service identifiers and internal 5GC information. In at least one embodiment, the NEF2116 can also receive information from other network functions (NFs) based on the exposure capabilities of other network functions. In at least one embodiment, this information can be stored as structured data at the NEF2116 or stored at a data storage NF using a standardized interface. In at least one embodiment, the stored information can then be re-exposed by the NEF2116 to other NFs and AFs, and / or used for other purposes, such as analysis.
[0235] In at least one embodiment, the NRF2120 may support service discovery functionality, receiving NF discovery requests from NF instances and providing information about discovered NF instances to the NF instances. In at least one embodiment, the NRF2120 also maintains information about available NF instances and the services they support.
[0236] In at least one embodiment, the PCF2122 may provide policy rules to control plane functions for enforcement, and may also support a unified policy framework for managing network behavior. In at least one embodiment, the PCF2122 may also implement a front-end (FE) to access subscription information related to policy decisions in the UDR of the UDM2124.
[0237] In at least one embodiment, UDM2124 can process subscription-related information to support network entities in handling communication sessions and can store subscription data of UE2102. In at least one embodiment, UDM2124 may include two parts: an application FE and a user data repository (UDR). In at least one embodiment, UDM may include a UDM FE responsible for handling credentials, location management, subscription management, etc. In at least one embodiment, several different front-ends may serve the same user in different transactions. In at least one embodiment, UDM-FE accesses sub-script information stored in the UDR and performs authentication credential processing; user identity enhancement; access authorization; registration / mobility management; and subscription management. In at least one embodiment, the UDR can interact with PCF2122.
[0238] In at least one embodiment, the UDM2124 may also support SMS management, wherein the SMS-FE implements similar application logic as described above.
[0239] In at least one embodiment, AF2126 can provide application impacts on traffic routing, access to Network Capability Exposure (NCE), and interact with a policy framework for policy control. In at least one embodiment, NCE can be a mechanism allowing 5GC and AF2126 to provide information to each other via NEF2116, which can be used for edge computing implementations. In at least one embodiment, network operators and third-party services can be hosted near the attached access point of UE2102 to achieve efficient service delivery by reducing end-to-end latency and load on the transport network. In at least one embodiment, for edge computing implementations, 5GC can select UPF2104 close to UE2102 and perform traffic steering from UPF2104 to DN2106 via the N6 interface. In at least one embodiment, this can be based on UE subscription data, UE location, and information provided by AF2126. In at least one embodiment, AF2126 can influence UPF (re)selection and traffic routing. In at least one embodiment, based on operator deployment, when AF2126 is considered a trusted entity, network operators can allow AF2126 to interact directly with the relevant NF.
[0240] In at least one embodiment, CN2110 may include an SMSF, which may be responsible for SMS subscription checks and authentication, and relay SM messages to / from UE2102 to / from other entities, such as SMS-GMSC / IWMSC / SMS routers. In at least one embodiment, SMS may also interact with AMF2112 and UDM2124 for a notification process that UE2102 is available for SMS delivery (e.g., setting a UE unreachable flag and notifying UDM2124 when UE2102 is available for SMS).
[0241] In at least one embodiment, system 2100 may include the following service-based interfaces: Namf: a service-based interface presented by AMF; Nsmf: a service-based interface presented by SMF; Nnef: a service-based interface presented by NEF; Npcf: a service-based interface presented by PCF; Nudm: a service-based interface presented by UDM; Naf: a service-based interface presented by AF; Nnrf: a service-based interface presented by NRF; Nausf: a service-based interface presented by AUSF.
[0242] In at least one embodiment, system 2100 may include the following reference points: N1: a reference point between the UE and the AMF; N2: a reference point between the (R)AN and the AMF; N3: a reference point between the (R)AN and the UPF; N4: a reference point between the SMF and the UPF; N6: a reference point between the UPF and the data network. In at least one embodiment, there may be more reference points and / or service-based interfaces between NF services in the NF, and for clarity, how these interfaces and reference points are omitted. In at least one embodiment, the NS reference point may be between the PCF and the AF; the N7 reference point may be between the AMF and the SMF; the N11 reference point may be between the AMF and the SMF, etc. In at least one embodiment, CN2110 may include an Nx interface, which is an inter-CN interface between the MME and the AMF2112 to enable interoperability between CN2110 and CN7221.
[0243] In at least one embodiment, system 2100 may include a plurality of RAN nodes (such as (R)AN nodes 2108), wherein an Xn interface is defined between two or more (R)AN nodes 2108 (e.g., gNBs) connected to 5GC 410, between (R)AN nodes 2108 (e.g. gNBs) connected to CN2110 and eNBs (e.g. macro RAN nodes), and / or between two eNBs connected to CN2110.
[0244] In at least one embodiment, the Xn interface may include an Xn user plane (Xn-U) interface and an Xn control plane (Xn-C) interface. In at least one embodiment, Xn-U may provide non-Guarjet delivery of user plane PDUs and support / provide data forwarding and flow control functions. In at least one embodiment, Xn-C may provide management and error handling functions, and functions for managing the Xn-C interface; mobility support for UE 2102 in connected mode (e.g., CM-CONNECTED) includes functions for managing UE mobility in connected mode between one or more (R)AN nodes 2108. In at least one embodiment, mobility support may include context transfer from the old (source) serving (R)AN node 2108 to the new (target) serving (R)AN node 2108; and control of user plane tunneling between the old (source) serving (R)AN node 2108 and the new (target) serving (R)AN node 2108.
[0245] In at least one embodiment, the Xn-U protocol stack may include a translation port network layer built on top of the Internet Protocol (IP) transport layer and a GTP-U layer built on top of UDP and / or (one or more) IP layers to carry user plane PDUs. In at least one embodiment, the Xn-C protocol stack may include an application layer signaling protocol (referred to as the Xn Application Protocol (Xn-AP)) and a transport network layer built on top of the SCTP layer. In at least one embodiment, the SCTP layer may be built on top of the IP layer. In at least one embodiment, the SCTP layer provides guaranteed delivery of application layer messages. In at least one embodiment, point-to-point transmission is used to deliver signaling PDUs at the transport IP layer. In at least one embodiment, the Xn-U protocol stack and / or the Xn-C protocol stack may be the same as or similar to the user plane and / or control plane protocol stacks shown and described herein.
[0246] Figure 22 This is an illustration of a control plane protocol stack according to some embodiments. In at least one embodiment, control plane 2200 is shown as a communication protocol stack between UE 2002 (or alternatively, UE 2004), RAN 2016, and MME 2028.
[0247] In at least one embodiment, PHY layer 2202 can transmit or receive information used by MAC layer 2204 through one or more air interfaces. In at least one embodiment, PHY layer 2202 can also perform link adaptive or adaptive modulation and coding (AMC), power control, cell search (e.g., for initial synchronization and handover purposes), and other measurements used by higher layers (e.g., RRC layer 2210). In at least one embodiment, PHY layer 2202 can further perform error detection, forward error correction (FEC) coding / decoding of the transport channel, modulation / demodulation of the physical channel, interleaving, rate matching, mapping to the physical channel, and multiple-input multiple-output (MIMO) antenna processing.
[0248] In at least one embodiment, MAC layer 2204 can perform mapping between logical channels and transport channels, multiplexing MAC service data units (SDUs) from one or more logical channels onto a transport block (TB) to be delivered to the PHY via the transport channel, demultiplexing a MAC SDU from a transport block (TB) delivered from the PHY via the transport channel onto one or more logical channels, multiplexing MAC SDUs onto TBs, scheduling information reporting, error correction via hybrid automatic repeat request (HARD), and logical channel prioritization.
[0249] In at least one embodiment, the RLC layer 2206 can operate in multiple operating modes, including: Transparent Mode (TM), Unacknowledged Mode (UM), and Acknowledged Mode (AM). In at least one embodiment, the RLC layer 2206 can perform the transmission of upper-layer protocol data units (PDUs), error correction via Automatic Repeat Request (ARQ) for AM data transmission, and the concatenation, segmentation, and reassembly of RLC SDUs for UM and AM data transmission. In at least one embodiment, the RLC layer 2206 can also perform resegmentation of RLC data PDUs for AM data transmission, reordering of RLC data PDUs for UM and AM data transmission, detection of duplicate data for UM and AM data transmission, discarding of RLC SDUs for UM and AM data transmission, detection of protocol errors in AM data transmission, and performance of RLC reconstruction.
[0250] In at least one embodiment, the PDCP layer 2208 can perform header compression and decompression of IP data, maintain the PDCP sequence number (SN), perform intra-sequence delivery of higher-layer PDUs when reconstructing lower layers, eliminate duplication of lower-layer SDUs when reconstructing lower layers for radio bearers mapped on RLC AM, encrypt and decrypt control plane data, perform integrity protection and integrity verification of control plane data, discard data based on control timers, and perform security operations (e.g., encryption, decryption, integrity protection, integrity verification, etc.).
[0251] In at least one embodiment, the main services and functions of RRC layer 2210 may include broadcasting system information (e.g., included in a Master Information Block (MIB) or System Information Block (SIB) associated with the Non-Access Stratum (NAS), broadcasting system information associated with the Access Stratum (AS), paging, establishment, maintenance, and release of RRC connections between the UE and the E-UTRAN (e.g., RRC connection paging, RRC connection establishment, RRC connection modification, and RRC connection release), establishment, configuration, maintenance, and release of point-to-point radio bearers, including security functions for key management, inter-Radio Access Technology (RAT) mobility, and measurement configuration for UE measurement reporting. In at least one embodiment, the MIB and SIB may include one or more Information Elements (IEs), each of which may include a separate data field or data structure.
[0252] In at least one embodiment, UE 2002 and RAN 2016 may use a Uu interface (e.g., LTE-Uu interface) to exchange control plane data via a protocol stack including PHY layer 2202, MAC layer 2204, RLC layer 2206, PDCP layer 2208 and RRC layer 2210.
[0253] In at least one embodiment, a Non-Access Stratum (NAS) protocol (NAS protocol 2212) forms the highest layer of the control plane between UE 2002 and MME 2028. In at least one embodiment, NAS protocol 2212 supports the mobility and session management procedures of UE 2002 to establish and maintain an IP connection between UE 2002 and P-GW 2034.
[0254] In at least one embodiment, the Si Application Protocol (S1-AP) layer (Si-AP layer 2222) can support the functionality of the Si interface and include basic procedures (EP). In at least one embodiment, the EP is an interaction unit between RAN2016 and CN2028. In at least one embodiment, the S1-AP layer services can include two groups: UE-associated services and non-UE-associated services. In at least one embodiment, these servers perform functions including, but not limited to: E-UTRAN Radio Access Bearer (E-RAB) management, UE capability indication, mobility, NAS signaling transmission, RAN Information Management (RIM), and configuration transfer.
[0255] In at least one embodiment, the Flow Control Transmission Protocol (SCTP) layer (or alternatively, the Flow Control Transmission Protocol / Internet Protocol (SCTP / IP) layer) (SCTP layer 2220) may be partially based on the IP protocol supported by IP layer 2218 to ensure reliable delivery of signaling messages between RAN 2016 and MME 2028. In at least one embodiment, L2 layer 2216 and L1 layer 2214 may refer to the communication links (e.g., wired or wireless) used by the RAN node and MME to exchange information.
[0256] In at least one embodiment, RAN2016 and one or more MME2028 may use the S1-MME interface to exchange control plane data via a protocol stack including L1 layer 2214, L2 layer 2216, IP layer 2218, SCTP layer 2220 and S1-AP layer 2222.
[0257] Figure 23 This is an illustration of a user plane protocol stack according to at least one embodiment. In at least one embodiment, user plane 2300 is shown as a communication protocol stack between UE2002, RAN2016, S-GW2030, and P-GW2034. In at least one embodiment, user plane 2300 may utilize the same protocol layer as control plane 2200. In at least one embodiment, UE2002 and RAN 2016 may utilize a Uu interface (e.g., LTE-Uu interface) to exchange user plane data via a protocol stack including PHY layer 2202, MAC layer 2204, RLC layer 2206, and PDCP layer 2208.
[0258] In at least one embodiment, the General Packet Radio Service (GPRS) Tuning Protocol-U (GTP-U layer 2304) layer for the user plane can be used to carry user data within the GPRS core network and between the radio access network and the core network. In at least one embodiment, the transmitted user data can be data packets of any format, such as IPv4, IPv6, or PPP. In at least one embodiment, the UDP and IP Security (UDP / IP) layer (UDP / IP layer 2302) can provide checksums for data integrity, port numbers for addressing different functions at the source and destination, and encryption and authentication of selected data streams. In at least one embodiment, the RAN2016 and S-GW2030 can utilize the S1-U interface to exchange user plane data via a protocol stack including L1 layer 2214, L2 layer 2216, UDP / IP layer 2302, and GTP-U layer 2304. In at least one embodiment, the S-GW2030 and P-GW2034 can utilize the S5 / S8a interface to exchange user plane data via a protocol stack including L1 layer 2214, L2 layer 2216, UDP / IP layer 2302, and GTP-U layer 2304. In at least one embodiment, as described above... Figure 22 The NAS protocol discussed here supports the mobility and session management process of UE 2002 to establish and maintain an IP connection between UE 2002 and P-GW 2034.
[0259] Figure 24 A component 2400 of a core network according to at least one embodiment is illustrated. In at least one embodiment, components of CN2038 may be implemented in a physical node or a separate physical node, said separate physical node including components for reading and executing instructions from a machine-readable or computer-readable medium (e.g., a non-transitory machine-readable storage medium). In at least one embodiment, network function virtualization (NFV) is used to virtualize any or all of the aforementioned network node functions via executable instructions stored in one or more computer-readable storage media (described further in detail below). In at least one embodiment, a logical instance of CN2038 may be referred to as network slice 2402 (e.g., network slice 2402 is shown as including HSS2032, MME2028, and S-GW2030). In at least one embodiment, a logical instance of a portion of CN2038 may be referred to as network subslice 2404 (e.g., network subslice 2404 is shown as including P-GW2034 and PCRF2036).
[0260] In at least one embodiment, the NFV architecture and infrastructure can be used to virtualize one or more network functions onto physical resources comprising a combination of industry-standard server hardware, storage hardware, or switches, which may alternatively be performed by dedicated hardware. In at least one embodiment, the NFV system can be used to perform virtual or reconfigurable implementations of one or more EPC components / functions.
[0261] Figure 25 This is a block diagram illustrating components of a system 2500 for supporting Network Functions Virtualization (NFV) according to at least one embodiment. In at least one embodiment, the system 2500 is shown to include a virtualization infrastructure manager (shown as VIM2502), a network functions virtualization infrastructure (shown as NFVI2504), a VNF manager (shown as VNFM2506), virtualized network functions (shown as VNF2508), a component manager (shown as EM2510), an NFV coordinator (shown as NFVO2512), and a network manager (shown as NM2514).
[0262] In at least one embodiment, VIM2502 manages the resources of NFVI2504. In at least one embodiment, NFVI2504 may include physical or virtual resources and applications (including hypervisors) for executing system 2500. In at least one embodiment, VIM2502 may utilize NFVI2504 to manage the lifecycle of virtual resources (e.g., the creation, maintenance, and teardown of virtual machines (VMs) associated with one or more physical resources), track VM instances, track performance, fault and security of VM instances and associated physical resources, and expose VM instances and associated physical resources to other management systems.
[0263] In at least one embodiment, VNFM 2506 can manage VNF 2508. In at least one embodiment, VNF 2508 can be used to perform EPC components / functions. In at least one embodiment, VNFM 2506 can manage the lifecycle of VNF 2508 and track the performance, faults, and security of the virtual aspects of VNF 2508. In at least one embodiment, EM 2510 can track the performance, faults, and security of the functional aspects of VNF 2508. In at least one embodiment, the tracking data from VNFM 2506 and EM 2510 may include (in at least one embodiment) performance measurement (PM) data used by VIM 2502 or NFVI 2504. In at least one embodiment, both VNFM 2506 and EM 2510 can scale up / down the number of VNFs in system 2500.
[0264] In at least one embodiment, NFVO2512 can coordinate, authorize, release, and occupy resources of NFVI2504 to provide requested services (e.g., perform EPC functions, components, or slices). In at least one embodiment, NM2514 can provide an end-user function package responsible for managing a network that may include network elements with VNFs, non-virtualized network functions, or both (VNF management may occur via EM2510).
[0265] Computer-based systems
[0266] The following figures present, but are not limited to, exemplary computer-based systems that can be used to implement at least one embodiment.
[0267] Figure 26 A processing system 2600 according to at least one embodiment is illustrated. In at least one embodiment, the system 2600 includes one or more processors 2602 and one or more graphics processors 2608, and may be a single-processor desktop system, a multi-processor workstation system, or a server system having a large number of processors 2602 or processor cores 2607. In at least one embodiment, the processing system 2600 is a processing platform incorporated within a system-on-a-chip (SoC) integrated circuit for use in mobile, handheld, or embedded devices.
[0268] In at least one embodiment, the processing system 2600 may include or be integrated into a server-based gaming platform, including a game console, mobile game console, handheld game console, or online game console, which are game and media consoles. In at least one embodiment, the processing system 2600 is a mobile phone, smartphone, tablet computing device, or mobile internet device. In at least one embodiment, the processing system 2600 may also include components coupled to or integrated into a wearable device, such as a smartwatch wearable device, smart glasses device, augmented reality device, or virtual reality device. In at least one embodiment, the processing system 2600 is a television or set-top box device having one or more processors 2602 and a graphical interface generated by one or more graphics processors 2608.
[0269] In at least one embodiment, each of the one or more processors 2602 includes one or more processor cores 2607 to process instructions that, when executed, perform operations against the system and user software. In at least one embodiment, each of the one or more processor cores 2607 is configured to process a particular instruction set 2609. In at least one embodiment, the instruction set 2609 may facilitate Complex Instruction Set Computing (CISC), Reduced Instruction Set Computing (RISC), or computation via Very Long Instruction Word (VLIW). In at least one embodiment, the plurality of processor cores 2607 may each process a different instruction set 2609, which may include instructions that facilitate the emulation of other instruction sets. In at least one embodiment, the processor cores 2607 may also include other processing devices, such as digital signal processors (DSPs).
[0270] In at least one embodiment, processor 2602 includes cache memory 2604. In at least one embodiment, processor 2602 may have a single internal cache or multiple levels of internal caches. In at least one embodiment, the cache memory is shared among various components of processor 2602. In at least one embodiment, processor 2602 also uses an external cache (e.g., a Level 3 (L3) cache or a last-level cache (LLC)) (not shown), which can be shared among processor cores 2607 using known cache coherence techniques. In at least one embodiment, processor 2602 further includes a register file 2606, which may include different types of registers for storing different types of data (e.g., integer registers, floating-point registers, status registers, and instruction pointer registers). In at least one embodiment, register file 2606 may include general-purpose registers or other registers.
[0271] In at least one embodiment, one or more processors 2602 are coupled to one or more interface buses 2610 to transmit communication signals, such as address, data, or control signals, between the processors 2602 and other components in the system 2600. In at least one embodiment, the interface bus 2610 may be a processor bus, such as a version of the Direct Media Interface (DMI) bus. In at least one embodiment, the interface bus 2610 is not limited to the DMI bus and may include one or more peripheral component interconnect buses (e.g., PCI, PCI Express), memory buses, or other types of interface buses. In at least one embodiment, the processor 2602 includes an integrated memory controller 2616 and a platform controller hub 2630. In at least one embodiment, the memory controller 2616 facilitates communication between storage devices and other components of the processing system 2600, while the platform controller hub (PCH) 2630 provides connectivity to input / output (I / O) devices via a local I / O bus.
[0272] In at least one embodiment, storage device 2620 may be a dynamic random access memory (DRAM) device, a static random access memory (SRAM) device, a flash memory device, a phase-change memory device, or a device with suitable performance for use as processor memory. In at least one embodiment, storage device 2620 may be used as system memory of processing system 2600 to store data 2622 and instructions 2621 for use when one or more processors 2602 execute an application or process. In at least one embodiment, memory controller 2616 is also coupled to an optional external graphics processor 2612, which may communicate with one or more graphics processors 2608 of processor 2602 to perform graph and media operations. In at least one embodiment, display device 2611 may be connected to processor 2602. In at least one embodiment, display device 2611 may include one or more internal display devices, such as those in mobile electronic devices or portable computer devices, or external display devices connected via a display interface (e.g., DisplayPort). In at least one embodiment, the display device 2611 may include a head-mounted display (HMD), such as a stereoscopic display device for virtual reality (VR) or augmented reality (AR) applications.
[0273] In at least one embodiment, the platform controller hub 2630 enables peripheral devices to connect to the storage device 2620 and the processor 2602 via a high-speed I / O bus. In at least one embodiment, the I / O peripheral devices include, but are not limited to, an audio controller 2646, a network controller 2634, a firmware interface 2628, a wireless transceiver 2626, a touch sensor 2625, and a data storage device 2624 (e.g., a hard disk drive, flash memory, etc.). In at least one embodiment, the data storage device 2624 may be connected via a memory interface (e.g., SATA) or via a peripheral bus, such as a peripheral component interconnect bus (e.g., PCI, PCIe). In at least one embodiment, the touch sensor 2625 may include a touchscreen sensor, a pressure sensor, or a fingerprint sensor. In at least one embodiment, the wireless transceiver 2626 may be a Wi-Fi transceiver, a Bluetooth transceiver, or a mobile network transceiver, such as a 3G, 4G, or LTE transceiver. In at least one embodiment, the firmware interface 2628 enables communication with the system firmware, and in at least one embodiment, may be a Unified Extensible Firmware Interface (UEFI). In at least one embodiment, network controller 2634 may enable network connectivity to a wired network. In at least one embodiment, a high-performance network controller (not shown) is coupled to interface bus 2610. In at least one embodiment, audio controller 2646 is a multi-channel high-definition audio controller. In at least one embodiment, processing system 2600 includes an optional legacy I / O controller 2640 for coupling legacy (e.g., Personal System 2 (PS / 2)) devices to processing system 2600. In at least one embodiment, platform controller hub 2630 may also be connected to one or more Universal Serial Bus (USB) controllers 2642 that connect input devices, such as a keyboard and mouse combination 2643, a camera 2644, or other USB input devices.
[0274] In at least one embodiment, instances of the memory controller 2616 and platform controller hub 2630 may be integrated into a discrete external graphics processor, such as external graphics processor 2612. In at least one embodiment, the platform controller hub 2630 and / or the memory controller 2616 may be external to one or more processors 2602. For example, in at least one embodiment, the processing system 2600 may include an external memory controller 2616 and a platform controller hub 2630, which may be configured as a memory controller hub and a peripheral controller hub in a system chipset communicating with the processor 2602.
[0275] Figure 27A computer system 2700 according to at least one embodiment is illustrated. In at least one embodiment, the computer system 2700 may be a system having interconnected devices and components, a System-on-a-Chip (SoC), or some combination thereof. In at least one embodiment, the computer system 2700 is formed by a processor 2702, which may include execution units for executing instructions. In at least one embodiment, the computer system 2700 may include, but is not limited to, components such as the processor 2702, which employs execution units including logic to execute algorithms for process data. In at least one embodiment, the computer system 2700 may include a processor, such as one available from Intel Corporation of Santa Clara, California. Processor family, Xeon™ XScale™ and / or StrongARM™ Core TM or Nervana TM A microprocessor may be used, although other systems (including PCs, engineering workstations, set-top boxes, etc.) with other microprocessors may also be used. In at least one embodiment, computer system 2700 may execute a version of the Windows operating system available from Microsoft Corporation of Redmond, Washington, although other operating systems (UNIX and Linux in at least one embodiment), embedded software, and / or graphical user interfaces may also be used.
[0276] In at least one embodiment, the computer system 2700 can be used in other devices, such as handheld devices and embedded applications. Some of the handheld devices in at least one embodiment include cellular phones, Internet Protocol (IP) devices, digital cameras, personal digital assistants (“PDAs”), and handheld PCs. In at least one embodiment, the embedded application can include a microcontroller, a digital signal processor (“DSP”), a SoC, a network computer (“NetPC”), a set-top box, a network hub, a wide area network (“WAN”) switch, or any other system that can execute one or more instructions according to at least one embodiment.
[0277] In at least one embodiment, computer system 2700 may include, but is not limited to, processor 2702, which may include, but is not limited to, one or more execution units 2708 configured to execute a Computational Unified Device Architecture (“CUDA”). (Developed by NVIDIA Corporation, Santa Clara, California) In at least one embodiment, the CUDA program is at least a part of a software application written in the CUDA programming language. In at least one embodiment, the computer system 2700 is a single-processor desktop or server system. In at least one embodiment, the computer system 2700 may be a multiprocessor system. In at least one embodiment, the processor 2702 may include, but is not limited to, a CISC microprocessor, a RISC microprocessor, a VLIW microprocessor, a processor implementing instruction set combinations, or any other processor device, such as a digital signal processor, in at least one embodiment. In at least one embodiment, the processor 2702 may be coupled to a processor bus 2710, which can transmit data signals between the processor 2702 and other components in the computer system 2700.
[0278] In at least one embodiment, processor 2702 may include, but is not limited to, a Level 1 (“L1”) internal cache memory (“cache”) 2704. In at least one embodiment, processor 2702 may have a single internal cache or multiple levels of internal cache. In at least one embodiment, the cache memory may reside external to processor 2702. In at least one embodiment, processor 2702 may include a combination of internal and external caches. In at least one embodiment, register file 2706 may store different types of data in various registers, including but not limited to integer registers, floating-point registers, status registers, and instruction pointer registers.
[0279] In at least one embodiment, an execution unit 2708, including but not limited to logic for performing integer and floating-point operations, is also located within processor 2702. Processor 2702 may also include a microcode (“ucode”) read-only memory (“ROM”) for storing microcode of certain macro instructions. In at least one embodiment, execution unit 2708 may include logic for processing a packaged instruction set 2709. In at least one embodiment, by including the packaged instruction set 2709 in the instruction set of general-purpose processor 2702, along with associated circuitry for executing the instructions, packaged data in general-purpose processor 2702 can be used to perform operations used by numerous multimedia applications. In at least one embodiment, many multimedia applications can be executed more quickly and efficiently by using the full width of the processor’s data bus to perform operations on the packaged data, which may eliminate the need to transfer smaller data units on the processor’s data bus to perform one or more operations on a data element at a time.
[0280] In at least one embodiment, the execution unit 2708 may also be used in a microcontroller, embedded processor, graphics device, DSP, and other types of logic circuitry. In at least one embodiment, the computer system 2700 may include, but is not limited to, the memory 2720. In at least one embodiment, the memory 2720 may be implemented as a DRAM device, an SRAM device, a flash memory device, or other storage device. The memory 2720 may store instructions 2719 and / or data 2721 represented by data signals that can be executed by the processor 2702.
[0281] In at least one embodiment, the system logic chip may be coupled to processor bus 2710 and memory 2720. In at least one embodiment, the system logic chip may include, but is not limited to, a memory controller hub (“MCH”) 2716, and processor 2702 may communicate with MCH 2716 via processor bus 2710. In at least one embodiment, MCH 2716 may provide a high-bandwidth memory path 2718 to memory 2720 for instruction and data storage, as well as for storage of graphics commands, data, and textures. In at least one embodiment, MCH 2716 may initiate data signals between processor 2702, memory 2720, and other components in computer system 2700, and bridge data signals between processor bus 2710, memory 2720, and system I / O 2722. In at least one embodiment, the system logic chip may provide a graphics port for coupling to a graphics controller. In at least one embodiment, MCH 2716 may be coupled to memory 2720 via high-bandwidth memory path 2718, and graphics / video card 2712 may be coupled to MCH 2716 via Accelerated Graphics Port (“AGP”) interconnect 2714.
[0282] In at least one embodiment, computer system 2700 may use system I / O 2722 as a proprietary hub interface bus to couple MCH 2716 to I / O controller hub (“ICH”) 2730. In at least one embodiment, ICH 2730 may provide direct connectivity to certain I / O devices via a local I / O bus. In at least one embodiment, the local I / O bus may include, but is not limited to, a high-speed I / O bus for connecting peripheral devices to memory 2720, chipset, and processor 2702. Examples may include, but are not limited to, an audio controller 2729, a firmware hub (“Flash BIOS”) 2728, a wireless transceiver 2726, data storage 2724, a conventional I / O controller 2723 including user input 2725 and a keyboard interface, a serial expansion port 2777 (e.g., USB), and a network controller 2734. Data storage 2724 may include a hard disk drive, floppy disk drive, CD-ROM device, flash memory device, or other mass storage device.
[0283] In at least one embodiment, Figure 27 A system comprising interconnected hardware devices or "chips" is shown. In at least one embodiment, Figure 27 An exemplary SoC can be shown. In at least one embodiment, Figure 27 The devices shown can be interconnected with proprietary interconnects, standardized interconnects (e.g., PCIe), or some combination thereof. In at least one embodiment, one or more components of system 2700 are interconnected using a compute fast link (CXL) interconnect.
[0284] Figure 28 A system 2800 according to at least one embodiment is illustrated. In at least one embodiment, system 2800 is an electronic device utilizing processor 2810. In at least one embodiment, system 2800 may be, for example, but not limited to, a laptop computer, tower server, rack server, blade server, desktop computer, tablet computer, mobile device, telephone, embedded computer, or any other suitable electronic device.
[0285] In at least one embodiment, system 2800 may include, but is not limited to, processor 2810 communicatively coupled to any suitable number or type of components, peripherals, modules, or devices. In at least one embodiment, processor 2810 is coupled using a bus or interface, such as I... 2 C-bus, System Management Bus (“SMBus”), Low Pin Count (LPC) bus, Serial Peripheral Interface (“SPI”), High Definition Audio (“HDA”) bus, Serial Advanced Technology Accessory (“SATA”) bus, USB (versions 1, 2, and 3) or Universal Asynchronous Receiver / Transmitter (“UART”) bus. In at least one embodiment, Figure 28 A system is illustrated, comprising interconnected hardware devices or "chips". In at least one embodiment, Figure 28 An exemplary SoC can be shown. In at least one embodiment, Figure 28 The device shown can be interconnected with proprietary interconnects, standardized interconnects (e.g., PCIe), or some combination thereof. In at least one embodiment, Figure 28 One or more components are interconnected using Computational Fast Link (CXL) interconnects.
[0286] In at least one embodiment, Figure 28 This may include a display 2824, a touchscreen 2825, a touchpad 2830, a near-field communication unit (“NFC”) 2845, a sensor hub 2840, a thermal sensor 2846, a fast chipset (“EC”) 2835, a trusted platform module (“TPM”) 2838, a BIOS / firmware / flash memory (“BIOS, FW Flash”) 2822, a DSP 2860, a solid-state drive (“SSD”) or hard disk drive (“HDD”) 2820, a wireless local area network unit (“WLAN”) 2850, a Bluetooth unit 2852, a wireless wide area network unit (“WWAN”) 2856, a global positioning system (GPS) 2855, a camera (“USB 3.0 camera”) 2854 (e.g., a USB 3.0 camera), or a low-power double data rate (“LPDDR”) memory unit (“LPDDR3”) 2815 implemented in at least one embodiment of the LPDDR3 standard. These components may each be implemented in any suitable manner.
[0287] In at least one embodiment, other components may be communicatively coupled to processor 2810 via the components discussed above. In at least one embodiment, accelerometer 2841, ambient light sensor (“ALS”) 2842, compass 2843, and gyroscope 2844 may be communicatively coupled to sensor hub 2840. In at least one embodiment, thermal sensor 2839, fan 2837, keyboard 2846, and touchpad 2830 may be communicatively coupled to EC 2835. In at least one embodiment, speaker 2863, earphone 2864, and microphone (“mic”) 2865 may be communicatively coupled to audio unit (“audio codec and Class D amplifier”) 2864, which in turn may be communicatively coupled to DSP 2860. In at least one embodiment, audio unit 2864 may include, but is not limited to, audio encoder / decoder (“codec”) and Class D amplifier. In at least one embodiment, SIM card (“SIM”) 2857 may be communicatively coupled to WWAN unit 2856. In at least one embodiment, components such as WLAN unit 2850, Bluetooth unit 2852, and WWAN unit 2856 can be implemented as next-generation form factor (NGFF).
[0288] Figure 29 An exemplary integrated circuit 2900 according to at least one embodiment is illustrated. In at least one embodiment, the exemplary integrated circuit 2900 is a SoC (System-on-a-Chip) that can be manufactured using one or more IP cores. In at least one embodiment, the integrated circuit 2900 includes one or more application processors 2905 (e.g., CPUs), at least one graphics processor 2910, and may additionally include an image processor 2915 and / or a video processor 2920, any of which may be a modular IP core. In at least one embodiment, the integrated circuit 2900 includes peripheral or bus logic, which includes a USB controller 2925, a UART controller 2930, an SPI / SDIO controller 2935, and an I... 2 S / I 2 C controller 2940. In at least one embodiment, integrated circuit 2900 may include display device 2945 coupled to one or more of high-definition multimedia interface (HDMI) controller 2950 and mobile industrial processor interface (MIPI) display interface 2955. In at least one embodiment, storage may be provided by flash memory subsystem 2960, including flash memory and flash memory controller. In at least one embodiment, a memory interface may be provided via memory controller 2965 for accessing SDRAM or SRAM memory devices. In at least one embodiment, some integrated circuits also include embedded security engine 2970.
[0289] Figure 30A computing system 3000 according to at least one embodiment is illustrated. In at least one embodiment, the computing system 3000 includes a processing subsystem 3001 having one or more processors 3002 and a system memory 3004 communicating via an interconnect path that may include a memory hub 3005. In at least one embodiment, the memory hub 3005 may be a separate component within a chipset assembly or may be integrated within one or more processors 3002. In at least one embodiment, the memory hub 3005 is coupled to an I / O subsystem 3011 via a communication link 3006. In at least one embodiment, the I / O subsystem 3011 includes an I / O hub 3007 that enables the computing system 3000 to receive input from one or more input devices 3008. In at least one embodiment, the I / O hub 3007 may enable a display controller, included in one or more processors 3002, for providing output to one or more display devices 3010A. In at least one embodiment, one or more display devices 3010A coupled to the I / O hub 3007 may include local, internal, or embedded display devices.
[0290] In at least one embodiment, the processing subsystem 3001 includes one or more parallel processors 3012 coupled to a memory hub 3005 via a bus or other communication link 3013. In at least one embodiment, the communication link 3013 may be one of many standards-based communication link technologies or protocols, such as, but not limited to, PCIe, or may be a vendor-specific communication interface or communication architecture. In at least one embodiment, the one or more parallel processors 3012 form a compute-intensive parallel or vector processing system that may include a large number of processing cores and / or processing clusters, such as a multi-core integrated (MIC) processor. In at least one embodiment, the one or more parallel processors 3012 form a graphics processing subsystem capable of outputting pixels to one or more display devices 3010A coupled via an I / O hub 3007. In at least one embodiment, the one or more parallel processors 3012 may also include a display controller and a display interface (not shown) to enable direct connection to one or more display devices 3010B.
[0291] In at least one embodiment, system storage unit 3014 may be connected to I / O hub 3007 to provide a storage mechanism for computing system 3000. In at least one embodiment, I / O switch 3016 may be used to provide an interface mechanism to enable connectivity between I / O hub 3007 and other components, such as network adapter 3018 and / or wireless network adapter 3019 which may be integrated into the platform, and various other devices that can be added via one or more additional devices 3020. In at least one embodiment, network adapter 3018 may be an Ethernet adapter or another wired network adapter. In at least one embodiment, wireless network adapter 3019 may include one or more Wi-Fi, Bluetooth, NFC, or other network devices comprising one or more radios.
[0292] In at least one embodiment, the computing system 3000 may include other components not explicitly shown, including USB or other port connections, optical storage drives, video capture devices and / or variations thereof, and may also be connected to the I / O hub 3007. In at least one embodiment, for Figure 30 The communication paths that interconnect the various components can be implemented using any suitable protocol, such as PCI (Peripheral Component Interconnect) based protocols (e.g., PCIe), or other bus or point-to-point communication interfaces and / or protocols (e.g., NVLink high-speed interconnect or interconnect protocols).
[0293] In at least one embodiment, one or more parallel processors 3012 include circuitry optimized for graphics and video processing (including video output circuitry in at least one embodiment) and constitute a graphics processing unit (GPU). In at least one embodiment, one or more parallel processors 3012 include circuitry optimized for general-purpose processing. In at least one embodiment, components of the computing system 3000 may be integrated with one or more other system elements on a single integrated circuit. In at least one embodiment, one or more parallel processors 3012, memory hub 3005, processor 3002, and I / O hub 3007 may be integrated into a system-on-a-chip (SoC) integrated circuit. In at least one embodiment, components of the computing system 3000 may be integrated into a single package to form a system-in-package (SIP) configuration. In at least one embodiment, at least a portion of the components of the computing system 3000 may be integrated into a multi-chip module (MCM) that can interconnect with other MCMs to a modular computing system. In at least one embodiment, the I / O subsystem 3011 and display device 3010B are omitted from the computing system 3000.
[0294] Processing system
[0295] The following figures illustrate, but are not limited to, exemplary processing systems that can be used to implement at least one embodiment.
[0296] Figure 31 An accelerated processing unit (“APU”) 3100 according to at least one embodiment is illustrated. In at least one embodiment, the APU 3100 was developed by AMD Inc. of Santa Clara, California. In at least one embodiment, the APU 3100 can be configured to execute applications, such as CUDA programs. In at least one embodiment, the APU 3100 includes, but is not limited to, a core complex 3110, a graphics complex 3140, an architecture 3160, an I / O interface 3170, a memory controller 3180, a display controller 3192, and a multimedia engine 3194. In at least one embodiment, the APU 3100 can be, but is not limited to, any combination of any number of core complexes 3110, any number of graphics complexes 3140, any number of display controllers 3192, and any number of multimedia engines 3194. For illustrative purposes, multiple instances of similar objects are indicated herein by reference numerals, wherein the reference numerals identify the object, and the numbers in parentheses identify the desired instances.
[0297] In at least one embodiment, the core complex 3110 is a CPU, the graphics complex 3140 is a GPU, and the APU 3100 is a processing unit that is not limited to 3110 and 3140 integrated onto a single chip. In at least one embodiment, some tasks may be assigned to the core complex 3110, while other tasks may be assigned to the graphics complex 3140. In at least one embodiment, the core complex 3110 is configured to execute main control software associated with the APU 3100, such as an operating system. In at least one embodiment, the core complex 3110 is the main processor of the APU 3100, which controls and coordinates the operation of other processors. In at least one embodiment, the core complex 3110 issues commands to control the operation of the graphics complex 3140. In at least one embodiment, the core complex 3110 may be configured to execute host executable code derived from CUDA source code, and the graphics complex 3140 may be configured to execute device executable code derived from CUDA source code.
[0298] In at least one embodiment, the core complex 3110 includes, but is not limited to, cores 3120(1)-3120(4) and L3 cache 3130. In at least one embodiment, the core complex 3110 may include, but is not limited to, any combination of any number of cores 3120 and any number and type of cache. In at least one embodiment, the cores 3120 are configured to execute instructions of a specific instruction set architecture (“ISA”). In at least one embodiment, each core 3120 is a CPU core.
[0299] In at least one embodiment, each core 3120 includes, but is not limited to, a fetch / decode unit 3122, an integer execution engine 3124, a floating-point execution engine 3126, and an L2 cache 3128. In at least one embodiment, the fetch / decode unit 3122 fetches instructions, decodes these instructions, generates micro-operations, and dispatches individual micro-instructions to the integer execution engine 3124 and the floating-point execution engine 3126. In at least one embodiment, the fetch / decode unit 3122 may simultaneously dispatch one micro-instruction to the integer execution engine 3124 and another micro-instruction to the floating-point execution engine 3126. In at least one embodiment, the integer execution engine 3124 performs operations not limited to integer and memory operations. In at least one embodiment, the floating-point engine 3126 performs operations not limited to floating-point and vector operations. In at least one embodiment, the fetch-decode unit 3122 dispatches micro-instructions to a single execution engine, which replaces both the integer execution engine 3124 and the floating-point execution engine 3126.
[0300] In at least one embodiment, each core 3120(i) can access an L2 cache 3128(i) included in core 3120(i), where i is an integer representing a specific instance of core 3120. In at least one embodiment, each core 3120 included in core complex 3110(j) is connected to other cores 3120 included in core complex 3110(j) via an L3 cache 3130(j) included in core complex 3110(j), where j is an integer representing a specific instance of core complex 3110. In at least one embodiment, a core 3120 included in core complex 3110(j) can access all L3 caches 3130(j) included in core complex 3110(j), where j is an integer representing a specific instance of core complex 3110. In at least one embodiment, the L3 cache 3130 may include, but is not limited to, any number of slices.
[0301] In at least one embodiment, the graphics complex 3140 can be configured to perform computational operations in a highly parallel manner. In at least one embodiment, the graphics complex 3140 is configured to perform graphics pipeline operations, such as drawing commands, pixel operations, geometric calculations, and other operations associated with rendering an image to a display. In at least one embodiment, the graphics complex 3140 is configured to perform graphics-independent operations. In at least one embodiment, the graphics complex 3140 is configured to perform both graphics-related and graphics-independent operations.
[0302] In at least one embodiment, the graphics complex 3140 includes, but is not limited to, any number of computing units 3150 and an L2 cache 3142. In at least one embodiment, the computing units 3150 share the L2 cache 3142. In at least one embodiment, the L2 cache 3142 is partitioned. In at least one embodiment, the graphics complex 3140 includes, but is not limited to, any number of computing units 3150 and any number (including zero) and type of cache. In at least one embodiment, the graphics complex 3140 includes, but is not limited to, any number of dedicated graphics hardware.
[0303] In at least one embodiment, each computing unit 3150 includes, but is not limited to, any number of SIMD units 3152 and shared memory 3154. In at least one embodiment, each SIMD unit 3152 implements a SIMD architecture and is configured to execute operations in parallel. In at least one embodiment, each computing unit 3150 may execute any number of thread blocks, but each thread block executes on a single computing unit 3150. In at least one embodiment, a thread block includes, but is not limited to, any number of execution threads. In at least one embodiment, a workgroup is a thread block. In at least one embodiment, each SIMD unit 3152 executes a different warp. In at least one embodiment, a warp is a group of threads (e.g., 16 threads), where each thread in the warp belongs to a single thread block and is configured to process different datasets based on a single instruction set. In at least one embodiment, prediction can be used to disable one or more threads in a warp. In at least one embodiment, a channel is a thread. In at least one embodiment, a work item is a thread. In at least one embodiment, a wavefront is a warp. In at least one embodiment, different wavefronts in a thread block can be synchronized together and communicate via shared memory 3154.
[0304] In at least one embodiment, structure 3160 is a system interconnect that facilitates data and control transfers across core complex 3110, graphics complex 3140, I / O interface 3170, memory controller 3180, display controller 3192, and multimedia engine 3194. In at least one embodiment, in addition to or instead of structure 3160, APU 3100 may also include, but is not limited to, any number and type of system interconnects that facilitate data and control transfers across any number and type of components that may be directly or indirectly linked, either internally or externally to APU 3100. In at least one embodiment, I / O interface 3170 represents any number and type of I / O interface (e.g., PCI, PCI-Extended (“PCI-X”), PCIe, Gigabit Ethernet (“GBE”), USB, etc.). In at least one embodiment, various types of peripheral devices are coupled to I / O interface 3170. In at least one embodiment, the peripheral device coupled to the I / O interface 3170 may include, but is not limited to, a keyboard, mouse, printer, scanner, joystick or other types of game controllers, media recording devices, external storage devices, network interface cards, etc.
[0305] In at least one embodiment, the display controller AMD92 displays images on one or more display devices (e.g., liquid crystal display (LCD) devices). In at least one embodiment, the multimedia engine 240 includes, but is not limited to, any number and type of multimedia-related circuitry, such as video decoders, video encoders, image signal processors, etc. In at least one embodiment, the memory controller 3180 facilitates data transfer between the APU 3100 and the unified system memory 3190. In at least one embodiment, the core complex 3110 and the graphics complex 3140 share the unified system memory 3190.
[0306] In at least one embodiment, the APU 3100 implements a memory subsystem, including but not limited to any number and type of memory controllers 3180 and memory devices (e.g., shared memory 3154) that can be dedicated to a single component or shared among multiple components. In at least one embodiment, the APU 3100 implements a cache subsystem, including but not limited to one or more cache memories (e.g., L2 cache 1628, L3 cache 3130, and L2 cache 3142), each cache memory being component-private or shared among any number of components (e.g., core 3120, core complex 3110, SIMD unit 3152, compute unit 3150, and graphics complex 3140).
[0307] Figure 32A CPU 3200 according to at least one embodiment is illustrated. In at least one embodiment, the CPU 3200 was developed by AMD Inc. of Santa Clara, California. In at least one embodiment, the CPU 3200 can be configured to execute an application. In at least one embodiment, the CPU 3200 is configured to execute host control software, such as an operating system. In at least one embodiment, the CPU 3200 issues commands to control the operation of an external GPU (not shown). In at least one embodiment, the CPU 3200 can be configured to execute host executable code derived from CUDA source code, and the external GPU can be configured to execute device executable code derived from such CUDA source code. In at least one embodiment, the CPU 3200 includes, but is not limited to, any number of core complexes 3210, architectures 3260, I / O interfaces 3270, and memory controllers AMAD80.
[0308] In at least one embodiment, the core complex 3210 includes, but is not limited to, cores 3220(1)-3220(4) and L3 cache 3230. In at least one embodiment, the core complex 3210 may include, but is not limited to, any combination of any number of cores 3220 and any number and type of cache. In at least one embodiment, the cores 3220 are configured to execute instructions of a specific ISA. In at least one embodiment, each core 3220 is a CPU core.
[0309] In at least one embodiment, each core 3220 includes, but is not limited to, a fetch / decode unit 3222, an integer execution engine 3224, a floating-point execution engine 3226, and an L2 cache 3228. In at least one embodiment, the fetch / decode unit 3222 fetches instructions, decodes these instructions, generates micro-operations, and dispatches individual micro-instructions to the integer execution engine 3224 and the floating-point execution engine 3226. In at least one embodiment, the fetch / decode unit 3222 may simultaneously dispatch one micro-instruction to the integer execution engine 3224 and another micro-instruction to the floating-point execution engine 3226. In at least one embodiment, the integer execution engine 3224 performs operations not limited to integer and memory operations. In at least one embodiment, the floating-point engine 3226 performs operations not limited to floating-point and vector operations. In at least one embodiment, the fetch-decode unit 3222 dispatches micro-instructions to a single execution engine, which replaces both the integer execution engine 3224 and the floating-point execution engine 3226.
[0310] In at least one embodiment, each core 3220(i) can access an L2 cache 3228(i) included in core 3220(i), where i is an integer representing a specific instance of core 3220. In at least one embodiment, each core 3220 included in core complex 3210(j) is connected to other cores 3220 in core complex 3210(j) via an L3 cache 3230(j) included in core complex 3210(j), where j is an integer representing a specific instance of core complex 3210. In at least one embodiment, a core 3220 included in core complex 3210(j) can access all L3 caches 3230(j) included in core complex 3210(j), where j is an integer representing a specific instance of core complex 3210. In at least one embodiment, the L3 cache 3230 may include, but is not limited to, any number of slices.
[0311] In at least one embodiment, structure 3260 is a system interconnect that facilitates data and control transfers across core complexes 3210(1)-3210(N) (where N is a positive integer), I / O interface 3270, and memory controller 3280. In at least one embodiment, in addition to or instead of structure 3260, CPU 3200 may also include, but is not limited to, any number and type of system interconnects that facilitate data and control transfers across any number and type of components that may be directly or indirectly linked, either inside or outside CPU 3200. In at least one embodiment, I / O interface 3270 represents any number and type of I / O interfaces (e.g., PCI, PCI-X, PCIe, GBE, USB, etc.). In at least one embodiment, various types of peripheral devices are coupled to I / O interface 3270. In at least one embodiment, peripheral devices coupled to I / O interface 3270 may include, but are not limited to, displays, keyboards, mice, printers, scanners, joysticks or other types of game controllers, media recording devices, external storage devices, network interface cards, etc.
[0312] In at least one embodiment, memory controller 3280 facilitates data transfer between CPU 3200 and system memory 3290. In at least one embodiment, core complex 3210 and graphics complex 3240 share system memory 3290. In at least one embodiment, CPU 3200 implements a memory subsystem, which includes, but is not limited to, any number and type of memory controllers 3280 and memory devices that may be dedicated to a component or shared among multiple components. In at least one embodiment, CPU 3200 implements a cache subsystem, which includes, but is not limited to, one or more cache memories (e.g., L2 cache 3228 and L3 cache 3230), each cache memory may be component-private or shared among any number of components (e.g., core 3220 and core complex 3210).
[0313] Figure 33 An exemplary accelerator integration slice 3390 according to at least one embodiment is illustrated. As used herein, a "slice" includes a designated portion of the processing resources of an accelerator integrated circuit. In at least one embodiment, the accelerator integrated circuit provides cache management, memory access, environment management, and interrupt management services for multiple graphics processing engines among multiple graphics acceleration modules. Each graphics processing engine may comprise a separate GPU. Optionally, the graphics processing engine may include different types of graphics processing engines within the GPU, such as graphics execution units, media processing engines (e.g., video encoders / decoders), samplers, and blit engines. In at least one embodiment, a graphics acceleration module may be a GPU having multiple graphics processing engines. In at least one embodiment, the graphics processing engines may be individual GPUs integrated on a general-purpose package, line card, or chip.
[0314] The application's effective address space 3382 within system memory 3314 stores process element 3383. In one embodiment, process element 3383 is stored in response to a GPU call 3381 from an application 3380 executing on processor 3307. Process element 3383 contains the processing state of the corresponding application 3380. A job descriptor (WD) 3384 contained in process element 3383 may be a single job requested by the application or may contain pointers to job queues. In at least one embodiment, WD 3384 is a pointer to a job request queue in the application's effective address space 3382.
[0315] The graphics acceleration module 3346 and / or individual graphics processing engines may be shared by all or some processes in the system. In at least one embodiment, infrastructure may be included for establishing a processing state and sending WD 3384 to the graphics acceleration module 3346 to begin operation in a virtualized environment.
[0316] In at least one embodiment, a dedicated process programming model is used for implementation. In this model, a single process owns the graphics acceleration module 3346 or an individual graphics processing engine. Since the graphics acceleration module 3346 is owned by a single process, the hypervisor initializes the accelerator integrated circuit for the owned partition, and the operating system initializes the accelerator integrated circuit for the owned partition when the graphics acceleration module 3346 is allocated.
[0317] During operation, the WD fetch unit 3391 in the accelerator integrated slice 3390 fetches the next WD 3384, which includes instructions for the work to be performed by one or more graphics processing engines of the graphics acceleration module 3346. Data from the WD 3384 can be stored in register 3345 and used by the memory management unit (MMU) 3339, interrupt management circuitry 3347, and / or environment management circuitry 3348, as shown. At least one embodiment of the MMU 3339 includes segment / page roaming circuitry for accessing segment / page tables 3386 within the OS virtual address space 3385. The interrupt management circuitry 3347 can handle interrupt events (INT) 3392 received from the graphics acceleration module 3346. When performing graph operations, the effective address 3393 generated by the graphics processing engine is translated into an actual address by the MMU 3339.
[0318] In one embodiment, the same register set 3345 is copied for each graphics processing engine and / or graphics acceleration module 3346 and can be initialized by the hypervisor or operating system. Each of these copied registers can be included in the accelerator integration slice 3390. Exemplary registers that can be initialized by the hypervisor are shown in Table 1.
[0319] Table 1 – Registers for Supervisor Initialization
[0320]
[0321]
[0322] Table 2 shows exemplary registers that can be initialized by the operating system.
[0323] Table 2 – Operating System Initialization Registers
[0324] 1 Process and thread identification 2 Valid Address (EA) Environment Save / Restore Pointer 3 Virtual Address (VA) accelerator utilization record pointer 4 Virtual address (VA) stores segment table pointers 5 mask of authority 6 Job descriptor
[0325] In one embodiment, each WD 3384 is specific to a particular graphics acceleration module 3346 and / or a particular graphics processing engine. It contains all the information required for the graphics processing engine to perform its work or to do so, or it may be a pointer to a memory location where the application has established a command queue for the work to be done.
[0326] Figures 34A-34B An exemplary graphics processor according to at least one embodiment herein is illustrated. In at least one embodiment, any exemplary graphics processor may be manufactured using one or more IP cores. In addition to the illustrations, other logic and circuitry may be included in at least one embodiment, including additional graphics processor / cores, peripheral interface controllers, or general-purpose processor cores. In at least one embodiment, the exemplary graphics processor is used within a System-on-a-Chip (SoC).
[0327] Figure 34A An exemplary graphics processor 3410 of a SoC integrated circuit according to at least one embodiment is shown, which can be manufactured using one or more IP cores. Figure 34B An additional exemplary graphics processor 3440 of a SoC integrated circuit according to at least one embodiment is shown, which can be manufactured using one or more IP cores. In at least one embodiment, Figure 34A The graphics processor 3410 is a low-power graphics processor core. In at least one embodiment, Figure 34B The graphics processor 3440 is a higher-performance graphics processor core. In at least one embodiment, each graphics processor 3410, 3440 may be... Figure 5 A variant of the 510 graphics processor.
[0328] In at least one embodiment, the graphics processor 3410 includes a vertex processor 3405 and one or more fragment processors 3415A-3415N (e.g., 3415A, 3415B, 3415C, 3415D to 3415N-1 and 3415N). In at least one embodiment, the graphics processor 3410 can execute different shader programs via separate logic, such that the vertex processor 3405 is optimized to perform operations for the vertex shader program, while one or more fragment processors 3415A-3415N perform fragment (e.g., pixel) shading operations for fragments or pixels or shader programs. In at least one embodiment, the vertex processor 3405 performs the vertex processing stage of the 3D graphics pipeline and generates primitive and vertex data. In at least one embodiment, the fragment processors 3415A-3415N use the primitive and vertex data generated by the vertex processor 3405 to generate framebuffers for display on a display device. In at least one embodiment, the fragment processors 3415A-3415N are optimized to execute fragment shader programs as provided in the OpenGL API, which can be used to perform operations similar to those of pixel shader programs provided in the Direct 3D API.
[0329] In at least one embodiment, the graphics processor 3410 additionally includes one or more MMUs 3420A-3420B, caches 3425A-3425B, and circuit interconnects 3430A-3430B. In at least one embodiment, one or more MMUs 3420A-3420B provide a virtual-to-physical address mapping for the graphics processor 3410, including for the vertex processor 3405 and / or fragment processors 3415A-3415N, which can reference vertex or image / texture data stored in memory, in addition to the vertex or image / texture data stored in one or more caches 3425A-3425B. In at least one embodiment, one or more MMUs 3420A-3420B can be synchronized with other MMUs within the system, including with... Figure 5 One or more application processors 505, image processors 515, and / or video processors 520 are associated with one or more MMUs, enabling each processor 505-520 to participate in a shared or unified virtual memory system. In at least one embodiment, one or more circuit interconnects 3430A-3430B enable the graphics processor 3410 to connect to other IP cores within the SoC via the SoC's internal bus or via a direct connection.
[0330] In at least one embodiment, the graphics processor 3440 includes Figure 34AThe graphics processor 3410 includes one or more MMUs 3420A-3420B, caches 3425A-3425B, and circuit interconnects 3430A-3430B. In at least one embodiment, the graphics processor 3440 includes one or more shader cores 3455A-3455N (e.g., 3455A, 3455B, 3455C, 3455D, 3455E, 3455F, to 3455N-1 and 3455N) that provide a unified shader core architecture, wherein a single core or type of core can execute all types of programmable shader code, including shader program code for implementing vertex shaders, fragment shaders, and / or compute shaders. In at least one embodiment, the number of shader cores may vary. In at least one embodiment, the graphics processor 3440 includes an inter-core task manager 3445 that acts as a thread dispatcher to assign execution threads to one or more shader cores 3455A-3455N and a tile unit 3458 to accelerate tile-based rendering operations, wherein rendering operations of a scene are subdivided in image space, for example, to take advantage of local spatial consistency within the scene or to optimize the use of internal caches.
[0331] Figure 35A A graphics core 3500 according to at least one embodiment is shown. In at least one embodiment, the graphics core 3500 may include... Figure 24 The graphics processor 2410 is located within it. In at least one embodiment, the graphics core 3500 may be... Figure 34B The graphics core 3500 uses a unified shader core 3455A-3455N. In at least one embodiment, the graphics core 3500 includes a shared instruction cache 3502, texture units 3518, and cache / shared memory 3520, which are common to execution resources within the graphics core 3500. In at least one embodiment, the graphics core 3500 may include multiple slices 3501A-3501N or partitions of each core, and the graphics processor may include multiple instances of the graphics core 3500. Slices 3501A-3501N may include supporting logic, including local instruction caches 3504A-3504N, thread schedulers 3506A-3506N, thread dispatchers 3508A-3508N, and a set of registers 3510A-3510N. In at least one embodiment, slices 3501A-3501N may include a set of additional functional units (AFU) 3512A-3512N, floating-point units (FPU) 3514A-3514N, integer arithmetic logic units (ALU) 3516A-3516N, address calculation units (ACU) 3513A-3513N, double-precision floating-point units (DPFPU) 3515A-3515N, and matrix processing units (MPU) 3517A-3517N.
[0332] In one embodiment, the FPU 3514A-3514N can perform single-precision (32-bit) and half-precision (16-bit) floating-point operations, while the DPFPU 3515A-3515N can perform double-precision (64-bit) floating-point operations. In at least one embodiment, the ALU 3516A-3516N can perform variable-precision integer operations with 8-bit, 16-bit, and 32-bit precision, and can be configured for mixed-precision operations. In at least one embodiment, the MPU 3517A-3517N can also be configured for mixed-precision matrix operations, including half-precision floating-point operations and 8-bit integer operations. In at least one embodiment, the MPU 3517A-3517N can perform various matrix operations to accelerate CUDA programs, including enabling accelerated general-purpose matrix-to-matrix multiplication (GEMM). In at least one embodiment, the AFU 3512A-3512N can perform additional logical operations not supported by floating-point or integer units, including trigonometric operations (e.g., Sine, Cosine, etc.).
[0333] Figure 35B A general-purpose graphics processing unit (GPGPU) 3530 is illustrated in at least one embodiment. In at least one embodiment, the GPGPU 3530 is highly parallel and suitable for deployment on a multi-chip module. In at least one embodiment, the GPGPU 3530 can be configured to enable highly parallel computational operations to be performed by a GPU array. In at least one embodiment, the GPGPU 3530 can be directly linked to other instances of the GPGPU 3530 to create a multi-GPU cluster to improve execution time for CUDA programs. In at least one embodiment, the GPGPU 3530 includes a host interface 3532 for connection to a host processor. In at least one embodiment, the host interface 3532 is a PCIe interface. In at least one embodiment, the host interface 3532 can be a vendor-specific communication interface or communication structure. In at least one embodiment, the GPGPU 3530 receives commands from the host processor and uses a global scheduler 3534 to assign execution threads associated with those commands to a set of compute clusters 3536A-3536H. In at least one embodiment, computing clusters 3536A-3536H share cache memory 3538. In at least one embodiment, cache memory 3538 can be used as an advanced cache of cache memory within computing clusters 3536A-3536H.
[0334] In at least one embodiment, the GPGPU 3530 includes memory 3544A-3544B coupled to computing clusters 3536A-3536H via a set of memory controllers 3542A-3542B. In at least one embodiment, memory 3544A-3544B may include various types of memory devices, including dynamic random access memory (DRAM) or graphics random access memory, such as synchronous graphics random access memory (SGRAM), including graphics double data rate (GDDR) memory.
[0335] In at least one embodiment, computing clusters 3536A-3536H each include a set of graphics cores, such as Figure 35A The graphics core 3500 may include various types of integer and floating-point logic units, capable of performing computational operations at various precisions, including computations suitable for CUDA programs. In at least one embodiment, at least a subset of the floating-point units in each computing cluster 3536A-3536H may be configured to perform 16-bit or 32-bit floating-point operations, while different subsets of the floating-point units may be configured to perform 64-bit floating-point operations.
[0336] In at least one embodiment, multiple instances of the GPGPU 3530 can be configured to operate as a computing cluster. In at least one embodiment, the computing clusters 3536A-3536H can implement any technically feasible communication technology for synchronization and data exchange. In at least one embodiment, the multiple instances of the GPGPU 3530 communicate via a host interface 3532. In at least one embodiment, the GPGPU 3530 includes an I / O hub 3539 that couples the GPGPU 3530 to a GPU link 3540, enabling direct connection to other instances of the GPGPU 3530. In at least one embodiment, the GPU link 3540 is coupled to a dedicated GPU-to-GPU bridge, enabling communication and synchronization among the multiple instances of the GPGPU 3530. In at least one embodiment, the GPU link 3540 is coupled to a high-speed interconnect for sending and receiving data to and from other GPGPUs or parallel processors. In at least one embodiment, the multiple instances of the GPGPU 3530 reside in a separate data processing system and communicate via a network device accessible via the host interface 3532. In at least one embodiment, the GPU link 3540 may be configured to connect to a host processor, supplementing or replacing the host interface 3532. In at least one embodiment, the GPGPU 3530 may be configured to execute CUDA programs.
[0337] Figure 36AA parallel processor 3600 according to at least one embodiment is shown. In at least one embodiment, various components of the parallel processor 3600 may be implemented using one or more integrated circuit devices, such as programmable processors, application-specific integrated circuits (ASICs), or FPGAs.
[0338] In at least one embodiment, the parallel processor 3600 includes a parallel processing unit 3602. In at least one embodiment, the parallel processing unit 3602 includes an I / O unit 3604 that enables communication with other devices, including other instances of the parallel processing unit 3602. In at least one embodiment, the I / O unit 3604 can be directly connected to other devices. In at least one embodiment, the I / O unit 3604 is connected to other devices using a hub or switch interface (e.g., memory hub 1405). In at least one embodiment, the connection between the memory hub 1405 and the I / O unit 3604 forms a communication link. In at least one embodiment, the I / O unit 3604 is connected to a host interface 3606 and a memory crossbar switch 3616, wherein the host interface 3606 receives commands for performing processing operations, and the memory crossbar switch 3616 receives commands for performing memory operations.
[0339] In at least one embodiment, when host interface 3606 receives a command buffer via I / O unit 3604, host interface 3606 can direct work operations to execute those commands to front end 3608. In at least one embodiment, front end 3608 is coupled to scheduler 3610, which is configured to assign commands or other work items to processing array 3612. In at least one embodiment, scheduler 3610 ensures that processing array 3612 is correctly configured and in an active state before assigning tasks to processing array 3612. In at least one embodiment, scheduler 3610 is implemented via firmware logic executed on a microcontroller. In at least one embodiment, the microcontroller-implemented scheduler 3610 can be configured to perform complex scheduling and work assignment operations at both coarse and fine granular levels, enabling fast preemption and context switching of threads executing on processing array 3612. In at least one embodiment, host software can demonstrate workloads scheduled on processing array 3612 via one of multiple graphics processing doorbells. In at least one embodiment, the workload can then be automatically distributed on the processing array 3612 by the scheduler 3610 logic within the microcontroller, which includes the scheduler 3610.
[0340] In at least one embodiment, the processing array 3612 may include up to "N" processing clusters (e.g., clusters 3614A, 3614B to 3614N). In at least one embodiment, each cluster 3614A-3614N of the processing array 3612 may execute a large number of concurrent threads. In at least one embodiment, the scheduler 3610 may use various scheduling and / or work allocation algorithms to allocate work to the clusters 3614A-3614N of the processing array 3612, which may vary depending on the workload generated by each type of program or computation. In at least one embodiment, scheduling may be handled dynamically by the scheduler 3610, or may be partially assisted by compiler logic during the compilation of program logic configured to be executed by the processing array 3612. In at least one embodiment, different clusters 3614A-3614N of the processing array 3612 may be assigned to process different types of programs or to perform different types of computations.
[0341] In at least one embodiment, the processing array 3612 can be configured to perform various types of parallel processing operations. In at least one embodiment, the processing array 3612 is configured to perform general-purpose parallel computing operations. In at least one embodiment, the processing array 3612 may include logic for performing processing tasks, including filtering video and / or audio data, performing modeling operations, including physical operations, and performing data transformations.
[0342] In at least one embodiment, the processing array 3612 is configured to perform parallel graphics processing operations. In at least one embodiment, the processing array 3612 may include additional logic to support the execution of such graphics processing operations, including but not limited to texture sampling logic for performing texture operations, as well as tessellation logic and other vertex processing logic. In at least one embodiment, the processing array 3612 may be configured to execute shader programs related to graphics processing, such as, but not limited to, vertex shaders, tessellation shaders, geometry shaders, and pixel shaders. In at least one embodiment, the parallel processing unit 3602 may transfer data from system memory via I / O unit 3604 for processing. In at least one embodiment, during processing, the transferred data may be stored in on-chip memory (e.g., parallel processor memory 3622) and then written back to system memory.
[0343] In at least one embodiment, when the parallel processing unit 3602 is used to perform graph processing, the scheduler 3610 may be configured to divide the processing workload into tasks of approximately equal size to better distribute graphics processing operations among the multiple clusters 3614A-3614N of the processing array 3612. In at least one embodiment, portions of the processing array 3612 may be configured to perform different types of processing. In at least one embodiment, a first portion may be configured to perform vertex shading and topology generation, a second portion may be configured to perform tessellation and geometry shading, and a third portion may be configured to perform pixel shading or other screen-space operations to generate a rendered image for display. In at least one embodiment, intermediate data generated by one or more of the clusters 3614A-3614N may be stored in a buffer to allow intermediate data to be transferred between the clusters 3614A-3614N for further processing.
[0344] In at least one embodiment, the processing array 3612 may receive processing tasks to be executed via a scheduler 3610, which receives commands defining the processing tasks from a front end 3608. In at least one embodiment, the processing task may include an index of data to be processed, such as surface (patch) data, raw data, vertex data, and / or pixel data, as well as state parameters and commands defining how the data is processed (e.g., what program to execute). In at least one embodiment, the scheduler 3610 may be configured to acquire an index corresponding to a task, or may receive an index from the front end 3608. In at least one embodiment, the front end 3608 may be configured to ensure that the processing array 3612 is configured to be active before initiating a workload specified by an incoming command buffer (e.g., a batch buffer, push buffer, etc.).
[0345] In at least one embodiment, each of one or more instances of the parallel processing unit 3602 may be coupled to the parallel processor memory 3622. In at least one embodiment, the parallel processor memory 3622 may be accessed via a memory crossbar switch 3616, which may receive memory requests from the processing array 3612 and the I / O unit 3604. In at least one embodiment, the memory crossbar switch 3616 may be accessed via a memory interface 3618. In at least one embodiment, the memory interface 3618 may include a plurality of partition units (e.g., partition units 3620A, 3620B to 3620N), each of which may be coupled to a portion (e.g., a memory cell) of the parallel processor memory 3622. In at least one embodiment, the plurality of partition units 3620A-3620N are configured to be equal to the number of memory units, such that the first partition unit 3620A has a corresponding first memory unit 3624A, the second partition unit 3620B has a corresponding memory unit 3624B, and the Nth partition unit 3620N has a corresponding Nth memory unit 3624N. In at least one embodiment, the number of partition units 3620A-3620N may not be equal to the number of memory devices.
[0346] In at least one embodiment, memory cells 3624A-3624N may include various types of memory devices, including dynamic random access memory (DRAM) or graphics random access memory, such as synchronous graphics random access memory (SGRAM), including graphics double data rate (GDDR) memory. In at least one embodiment, memory cells 3624A-3624N may also include 3D stacked memory, including but not limited to high bandwidth memory (HBM). In at least one embodiment, rendering targets such as frame buffers or texture maps may be stored across memory cells 3624A-3624N, allowing partitioning cells 3620A-3620N to write portions of each rendering target in parallel to efficiently utilize the available bandwidth of the parallel processor memory 3622. In at least one embodiment, local instances of the parallel processor memory 3622 may be excluded to facilitate a unified memory design that combines system memory with local cache memory.
[0347] In at least one embodiment, any of the clusters 3614A-3614N of the processing array 3612 can process data to be written to any memory cell 3624A-3624N within the parallel processor memory 3622. In at least one embodiment, the memory crossbar switch 3616 can be configured to transfer the output of each cluster 3614A-3614N to any partition cell 3620A-3620N or another cluster 3614A-3614N, and the clusters 3614A-3614N can perform further processing operations on the output. In at least one embodiment, each cluster 3614A-3614N can communicate with the memory interface 3618 via the memory crossbar switch 3616 to read from or write to various external storage devices. In at least one embodiment, the memory crossbar switch 3616 has a connection to a memory interface 3618 for communication with I / O unit 3604, and a connection to a local instance of parallel processor memory 3622, thereby enabling processing units within different processing clusters 3614A-3614N to communicate with system memory or other memory not local to parallel processing unit 3602. In at least one embodiment, the memory crossbar switch 3616 may use virtual channels to separate traffic flows between clusters 3614A-3614N and partition units 3620A-3620N.
[0348] In at least one embodiment, multiple instances of the parallel processing unit 3602 may be provided on a single insert card, or multiple insert cards may be interconnected. In at least one embodiment, different instances of the parallel processing unit 3602 may be configured to interoperate, even if the different instances have different numbers of processing cores, different numbers of local parallel processor memories, and / or other configuration differences. In at least one embodiment, some instances of the parallel processing unit 3602 may include higher-precision floating-point units relative to other instances. In at least one embodiment, a system combining one or more instances of the parallel processing unit 3602 or the parallel processor 3600 can be implemented in various configurations and form factors, including but not limited to desktop, laptop, or handheld personal computers, servers, workstations, game consoles, and / or embedded systems.
[0349] Figure 36BA processing cluster 3694 according to at least one embodiment is illustrated. In at least one embodiment, the processing cluster 3694 is included within a parallel processing unit. In at least one embodiment, the processing cluster 3694 is an example of one of the processing clusters 3614A-3614N of FIG. 36. In at least one embodiment, the processing cluster 3694 can be configured to execute a number of threads in parallel, wherein the term "thread" refers to an instance of a specific program executing on a particular set of input data. In at least one embodiment, a Single Instruction Multiple Data (SIMD) instruction issuing technique is used to support the parallel execution of a large number of threads without providing multiple independent instruction units. In at least one embodiment, a Single Instruction Multiple Threading (SIMT) technique is used to support the parallel execution of a large number of generally synchronous threads, which uses a common instruction unit configured to issue instructions to a set of processing engines within each processing cluster 3694.
[0350] In at least one embodiment, the operation of the processing cluster 3694 can be controlled by a pipeline manager 3632 that assigns processing tasks to the SIMT parallel processors. In at least one embodiment, the pipeline manager 3632 receives instructions from the scheduler 3610 of FIG. 36 and manages the execution of these instructions via the graphics multiprocessor 3634 and / or texture unit 3636. In at least one embodiment, the graphics multiprocessor 3634 is an exemplary instance of a SIMT parallel processor. However, in at least one embodiment, the processing cluster 3694 may include various types of SIMT parallel processors with different architectures. In at least one embodiment, the processing cluster 3694 may include one or more instances of the graphics multiprocessor 3634. In at least one embodiment, the graphics multiprocessor 3634 can process data, and the data cross switch 3640 can be used to distribute the processed data to one of a number of possible destinations (including other shader units). In at least one embodiment, the pipeline manager 3632 can facilitate the distribution of processed data by specifying the destination of the processed data to be distributed via the data cross switch 3640.
[0351] In at least one embodiment, each graphics multiprocessor 3634 within the processing cluster 3694 may include the same set of functional execution logic (e.g., arithmetic logic units, load-memory units (LSUs), etc.). In at least one embodiment, the functional execution logic may be configured in a pipelined manner, wherein new instructions may be issued before previous instructions complete. In at least one embodiment, the functional execution logic supports a variety of operations, including integer and floating-point arithmetic, comparison operations, Boolean operations, shift operations, and computation of various algebraic functions. In at least one embodiment, the same functional unit hardware may be used to perform different operations, and any combination of functional units may exist.
[0352] In at least one embodiment, instructions sent to the processing cluster 3694 constitute threads. In at least one embodiment, a group of threads executed across a set of parallel processing engines is a thread group. In at least one embodiment, the thread group executes programs on different input data. In at least one embodiment, each thread within the thread group may be assigned to a different processing engine within the graphics multiprocessor 3634. In at least one embodiment, the thread group may include fewer threads than the number of processing engines within the graphics multiprocessor 3634. In at least one embodiment, when the number of threads included in the thread group is less than the number of processing engines, one or more processing engines may be idle during a loop that is processing the thread group. In at least one embodiment, the thread group may also include more threads than the number of processing engines within the graphics multiprocessor 3634. In at least one embodiment, when the thread group includes more threads than the number of processing engines within the graphics multiprocessor 3634, processing can be performed in consecutive clock cycles. In at least one embodiment, multiple thread groups can be executed simultaneously on the graphics multiprocessor 3634.
[0353] In at least one embodiment, the graphics multiprocessor 3634 includes an internal cache memory for performing load and store operations. In at least one embodiment, the graphics multiprocessor 3634 may forgo the internal cache and use a cache memory within the processing cluster 3694 (e.g., L1 cache 3648). In at least one embodiment, each graphics multiprocessor 3634 may also access partition units (e.g., Figure 36A The L2 cache is located within partition units 3620A-3620N, which are shared among all processing clusters 3694 and can be used to transfer data between threads. In at least one embodiment, the graphics multiprocessor 3634 can also access off-chip global memory, which may include one or more of local parallel processor memory and / or system memory. In at least one embodiment, any memory outside of the parallel processing unit 3602 can be used as global memory. In at least one embodiment, the processing cluster 3694 includes multiple instances of the graphics multiprocessor 3634, which can share common instructions and data that can be stored in the L1 cache 3648.
[0354] In at least one embodiment, each processing cluster 3694 may include an MMU 3645 configured to map virtual addresses to physical addresses. In at least one embodiment, one or more instances of the MMU 3645 may reside within the memory interface 3618 of FIG. 36. In at least one embodiment, the MMU 3645 includes a set of page table entries (PTEs) for mapping virtual addresses to physical addresses of tiles (more on tiles) and optionally to cache line indices. In at least one embodiment, the MMU 3645 may include an address translation back buffer (TLB) or a cache that may reside within the graphics multiprocessor 3634, the L1 cache 3648, or the processing cluster 3694. In at least one embodiment, physical addresses are processed to allocate surface data access locality for efficient request interleaving between partition units. In at least one embodiment, cache line indices may be used to determine whether a request for a cache line is a hit or a miss.
[0355] In at least one embodiment, the processing cluster 3694 may be configured such that each graphics multiprocessor 3634 is coupled to a texture unit 3636 to perform texture mapping operations, such as determining texture sample locations, reading texture data, and filtering texture data. In at least one embodiment, texture data is read as needed from an internal texture L1 cache (not shown) or from an L1 cache within the graphics multiprocessor 3634, and texture data is also retrieved from an L2 cache, local parallel processor memory, or system memory. In at least one embodiment, each graphics multiprocessor 3634 outputs a processed task to a data crossover switch 3640 to provide the processed task to another processing cluster 3694 for further processing or to store the processed task in an L2 cache, local parallel processor memory, or system memory via a memory crossover switch 3616. In at least one embodiment, the pre-raster operation unit (preROP) 3642 is configured to receive data from the graphics multiprocessor 3634 and direct the data to a ROP unit that may be located together with partitioning units described herein (e.g., partitioning units 3620A-3620N of FIG. 36). In at least one embodiment, the PreROP 3642 unit may perform optimizations for color blending, organize pixel color data, and perform address translation.
[0356] Figure 36C A graphics multiprocessor 3696 according to at least one embodiment is illustrated. In at least one embodiment, the graphics multiprocessor 3696 is Figure 36BThe graphics multiprocessor 3634 is included. In at least one embodiment, the graphics multiprocessor 3696 is coupled to the pipeline manager 3632 of the processing cluster 3694. In at least one embodiment, the graphics multiprocessor 3696 has an execution pipeline including, but not limited to, an instruction cache 3652, an instruction unit 3654, an address mapping unit 3656, a register file 3658, one or more GPGPU cores 3662, and one or more LSUs 3666. The GPGPU cores 3662 and LSUs 3666 are coupled to cache memory 3672 and shared memory 3670 via memory and cache interconnect 3668.
[0357] In at least one embodiment, instruction cache 3652 receives a stream of instructions to be executed from pipeline manager 3632. In at least one embodiment, instructions are cached in instruction cache 3652 and dispatched to instruction unit 3654 for execution. In one embodiment, instruction unit 3654 may dispatch instructions as thread groups (e.g., thread bundles), assigning each thread of the thread group to a different execution unit within GPGPU core 3662. In at least one embodiment, instructions can access any local, shared, or global address space by specifying an address within a unified address space. In at least one embodiment, address mapping unit 3656 may be used to translate addresses in the unified address space into different memory addresses that can be accessed by LSU 3666.
[0358] In at least one embodiment, register file 3658 provides a set of registers for functional units of graphics multiprocessor 3696. In at least one embodiment, register file 3658 provides temporary storage for operands of data paths connected to functional units of graphics multiprocessor 3696 (e.g., GPGPU core 3662, LSU 3666). In at least one embodiment, register file 3658 is partitioned among each functional unit, such that a dedicated portion of register file 3658 is allocated to each functional unit. In at least one embodiment, register file 3658 is partitioned among different thread groups being executed by graphics multiprocessor 3696.
[0359] In at least one embodiment, each of the GPGPU cores 3662 may include an FPU and / or an ALU for executing instructions of the graph multiprocessor 3696. The GPGPU cores 3662 may be architecturally similar or may differ in architecture. In at least one embodiment, a first portion of the GPGPU core 3662 includes a single-precision FPU and an integer ALU, while a second portion of the GPGPU core includes a double-precision FPU. In at least one embodiment, the FPU may implement the IEEE 754-3608 standard for floating-point algorithms or enable variable-precision floating-point algorithms. In at least one embodiment, the graphics multiprocessor 3696 may additionally include one or more fixed-function or special-function units to perform specific functions, such as copying rectangles or pixel blending operations. In at least one embodiment, one or more of the GPGPU cores 3662 may also include fixed-function or special-function logic.
[0360] In at least one embodiment, the GPGPU core 3662 includes SIMD logic capable of executing a single instruction on multiple sets of data. In at least one embodiment, the GPGPU core 3662 can physically execute SIMD4, SIMD8, and SIMD9 instructions, and logically execute SIMD1, SIMD2, and SIMD32 instructions. In at least one embodiment, the SIMD instructions for the GPGPU core can be generated by a shader compiler at compile time, or automatically generated when executing a program written and compiled for a Single Program Multiple Data (SPMD) or SIMT architecture. In at least one embodiment, multiple threads of a program configured for a SIMT execution model can be executed using a single SIMD instruction. In at least one embodiment, eight SIMD threads performing the same or similar operations can be executed in parallel using a single SIMD8 logic unit.
[0361] In at least one embodiment, the memory and cache interconnect 3668 is an interconnect network connecting each functional unit of the graphics multiprocessor 3696 to the register file 3658 and the shared memory 3670. In at least one embodiment, the memory and cache interconnect 3668 is a cross-switch interconnect that allows the LSU 3666 to perform load and store operations between the shared memory 3670 and the register file 3658. In at least one embodiment, the register file 3658 can operate at the same frequency as the GPGPU core 3662, resulting in very low latency for data transfer between the GPGPU core 3662 and the register file 3658. In at least one embodiment, the shared memory 3670 can be used to enable communication between threads executing on functional units within the graphics multiprocessor 3696. In at least one embodiment, the cache memory 3672 can be used as a data cache to cache texture data communicated between functional units and texture units 3636. In at least one embodiment, the shared memory 3670 can also be used as a program-managed cache. In at least one embodiment, in addition to the automatically cached data stored in cache memory 3672, the thread executing on GPGPU core 3662 can also programmatically store data in shared memory.
[0362] In at least one embodiment, a parallel processor or GPGPU, as described herein, is communicatively coupled to the host / processor core to accelerate graphics operations, machine learning operations, pattern analysis operations, and various general-purpose GPU (GPGPU) functions. In at least one embodiment, the GPU may be communicatively coupled to the host processor / core via a bus or other interconnect (e.g., high-speed interconnects such as PCIe or NVLink). In at least one embodiment, the GPU may be integrated with the core on the same package or chip and communicatively coupled to the core via an internal processor bus / interconnect (i.e., within the package or chip). In at least one embodiment, regardless of how the GPU is connected, the processor core may assign work to the GPU in the form of a sequence of commands / instructions contained in the WD. In at least one embodiment, the GPU then uses dedicated circuitry / logic to efficiently process these commands / instructions.
[0363] General computing
[0364] The following figures illustrate, but are not limited to, exemplary software configurations used in general computing to implement at least one embodiment.
[0365] Figure 37A software stack of a programming platform according to at least one embodiment is illustrated. In at least one embodiment, the programming platform is a platform for accelerating computational tasks by utilizing hardware on a computing system. In at least one embodiment, software developers can access the programming platform through libraries, compiler instructions, and / or extensions to programming languages. In at least one embodiment, the programming platform may be, but is not limited to, CUDA, Radeon Open Computing Platform (“ROCm”), OpenCL (OpenCL developed by Khronosgroup). TM ), SYCL or Intel One API.
[0366] In at least one embodiment, the software stack 3700 of the programming platform provides an execution environment for the application 3701. In at least one embodiment, the application 3701 may include any computer software capable of being launched on the software stack 3700. In at least one embodiment, the application 3701 may include, but is not limited to, artificial intelligence (“AI”) / machine learning (“ML”) applications, high-performance computing (“HPC”) applications, virtual desktop infrastructure (“VDI”) or data center workloads.
[0367] In at least one embodiment, application 3701 and software stack 3700 run on hardware 3707. In at least one embodiment, hardware 3707 may include one or more GPUs, CPUs, FPGAs, AI engines, and / or other types of computing devices supporting a programming platform. In at least one embodiment, such as using CUDA, software stack 3700 may be vendor-specific and compatible only with devices from a specific vendor. In at least one embodiment, such as using OpenCL, software stack 3700 may be used with devices from different vendors. In at least one embodiment, hardware 3707 includes a host connected to one or more devices that can be accessed via application programming interface (API) calls to perform computational tasks. In at least one embodiment, compared to the host within hardware 3707, which may include, but is not limited to, a CPU (but may also include computing devices) and its memory, devices within hardware 3707 may include, but are not limited to, GPUs, FPGAs, AI engines, or other computing devices (but may also include CPUs) and their memory.
[0368] In at least one embodiment, the software stack 3700 of the programming platform includes, but is not limited to, multiple libraries 3703, a runtime 3705, and a device kernel driver 3706. In at least one embodiment, each library in the library 3703 may include data and programming code that can be used by a computer program and utilized during software development. In at least one embodiment, the library 3703 may include, but is not limited to, pre-written code and subroutines, classes, values, type specifications, configuration data, documentation, help data, and / or message templates. In at least one embodiment, the library 3703 includes functions optimized for execution on one or more types of devices. In at least one embodiment, the library 3703 may include, but is not limited to, functions for performing mathematical, deep learning, and / or other types of operations on the device. In at least one embodiment, the library 3803 is associated with a corresponding API 3802, which may include one or more APIs that expose functions implemented in the library 3803.
[0369] In at least one embodiment, application 3701 is written as source code, which is compiled into executable code, as follows: Figure 42 For more detailed discussion. In at least one embodiment, the executable code of application 3701 may run at least partially on an execution environment provided by software stack 3700. In at least one embodiment, during the execution of application 3701, code that needs to run on the device (compared to the host) may be obtained. In this case, in at least one embodiment, runtime 3705 may be invoked to load and start the necessary code on the device. In at least one embodiment, runtime 3705 may include any technically feasible runtime system capable of supporting the execution of application 3701.
[0370] In at least one embodiment, runtime 3705 is implemented as one or more runtime libraries associated with a corresponding API (which is shown as API 3704). In at least one embodiment, one or more such runtime libraries may include, but are not limited to, functions for memory management, execution control, device management, error handling and / or synchronization, etc. In at least one embodiment, memory management functions may include, but are not limited to, functions for allocating, dealing with, and copying device memory, and for transferring data between host memory and device memory. In at least one embodiment, execution control functions may include, but are not limited to, functions for launching functions on the device (sometimes referred to as "kernels" when the function is a global function that can be called from the host), and functions for setting attribute values in buffers maintained by the runtime library for a given function to be executed on the device.
[0371] In at least one embodiment, the runtime library and the corresponding API 3704 can be implemented in any technically feasible manner. In at least one embodiment, one (or any number of) APIs may expose a low-level set of functions for fine-grained control of the device, while another (or any number of) APIs may expose such a higher-level set of functions. In at least one embodiment, a high-level runtime API can be built on top of the low-level APIs. In at least one embodiment, one or more runtime APIs may be language-specific APIs layered on top of language-independent runtime APIs.
[0372] In at least one embodiment, device kernel driver 3706 is configured to facilitate communication with the underlying device. In at least one embodiment, device kernel driver 3706 may provide APIs such as API 3704 and / or low-level functions upon which other software depends. In at least one embodiment, device kernel driver 3706 may be configured to compile intermediate representation (“IR”) code into binary code at runtime. In at least one embodiment, for CUDA, device kernel driver 3706 may compile non-hardware-specific parallel thread execution (“PTX”) IR code into binary code for a specific target device (cached compiled binary code), which is sometimes referred to as “final” code. In at least one embodiment, doing so allows the final code to run on the target device, which may not exist when the source code was initially compiled into PTX code. Alternatively, in at least one embodiment, the device source code may be compiled into binary code offline, without requiring device kernel driver 3706 to compile the IR code at runtime.
[0373] Figure 38 The illustration shows an embodiment according to at least one of the embodiments. Figure 37 The software stack 3700 is a CUDA implementation. In at least one embodiment, the CUDA software stack 3800 on which an application 3801 can be launched includes a CUDA library 3803, a CUDA runtime 3805, a CUDA driver 3807, and a device kernel driver 3808. In at least one embodiment, the CUDA software stack 3800 executes on hardware 3809, which may include a CUDA-enabled GPU developed by NVIDIA Corporation of Santa Clara, California.
[0374] In at least one embodiment, application 3801, CUDA runtime 3805, and device kernel driver 3808 can respectively perform functions similar to those of application 3701, runtime 3705, and device kernel driver 3706, in combination with the above. Figure 37The CUDA driver 3807 is described in at least one embodiment. In at least one embodiment, the CUDA driver API 3807 includes a library (libcuda.so) implementing the CUDA driver API 3806. In at least one embodiment, similar to the CUDA runtime API 3804 implemented by the CUDA runtime library (cudart), the CUDA driver API 3806 may expose, but is not limited to, functions for memory management, execution control, device management, error handling, synchronization, and / or graphics interoperability. In at least one embodiment, the CUDA driver API 3806 differs from the CUDA runtime API 3804 in that the CUDA runtime API 3804 simplifies device code management by providing implicit initialization, context (similar to processes) management, and module (similar to dynamically loaded libraries) management. In contrast to the high-level CUDA runtime API 3804, in at least one embodiment, the CUDA driver API 3806 is a low-level API that provides finer-grained control over the device, particularly regarding context and module loading. In at least one embodiment, the CUDA driver API 3806 may expose functions for context management that are not exposed by the CUDA runtime API 3804. In at least one embodiment, the CUDA driver API 3806 is also language-independent and supports, in addition to the CUDA runtime API 3804, OpenCL, for example. Furthermore, in at least one embodiment, development libraries, including the CUDA runtime 3805, can be considered separate from the driver components, including the user-mode CUDA driver 3807 and the kernel-mode device driver 3808 (sometimes also referred to as the "display" driver).
[0375] In at least one embodiment, CUDA library 3803 may include, but is not limited to, mathematical libraries, deep learning libraries, parallel algorithm libraries, and / or signal / image / video processing libraries, which parallel computing applications (e.g., application 3801) may utilize. In at least one embodiment, CUDA library 3803 may include mathematical libraries, such as the cuBLAS library, which is an implementation of basic linear algebra subroutines (“BLAS”) for performing linear algebra operations; the cuFFT library for computing the Fast Fourier Transform (“FFT”); and the cuRAND library for generating random numbers, etc. In at least one embodiment, CUDA library 3803 may include deep learning libraries, such as the cuDNN library for primitives of deep neural networks and the TensorRT platform for high-performance deep learning inference, etc.
[0376] Figure 39 The illustration shows an embodiment according to at least one of the embodiments. Figure 37The software stack 3700 is a ROCm implementation. In at least one embodiment, the ROCm software stack 3900 on which the application 3901 can be launched includes a language runtime 3903, a system runtime 3905, a thunk 3907, a ROCm kernel driver 3908, and a device kernel driver. In at least one embodiment, the ROCm software stack 3900 executes on hardware 3909, which may include a ROCm-enabled GPU developed by AMD Inc. of Santa Clara, California.
[0377] In at least one embodiment, application 3901 can perform the above-described combination. Figure 37 The discussed application 3701 has similar functionality. Additionally, in at least one embodiment, the language runtime 3903 and system runtime 3905 can perform functions combined with the above. Figure 37 The runtime 3705 discussed has similar functionality. In at least one embodiment, the language runtime 3903 differs from the system runtime 3905 in that the system runtime 3905 is a language-independent runtime that implements the ROCr system runtime API 3904 and utilizes the Heterogeneous System Architecture (“HAS”) runtime API. In at least one embodiment, the HAS runtime API is a thin-user mode API that exposes interfaces for accessing and interacting with AMD GPUs, including functions for memory management, kernel execution control dispatched via architecture, error handling, system and agent information, and runtime initialization and shutdown, etc. In at least one embodiment, compared to the system runtime 3905, the language runtime 3903 is an implementation of a language-specific runtime API 3902 layered on top of the ROCr system runtime API 3904. In at least one embodiment, the language runtime API may include, but is not limited to, the Portable Heterogeneous Computing Interface (“HIP”) language runtime API, the Heterogeneous Computing Compiler (“HCC”) language runtime API, or the OpenCL API, etc. In particular, the HIP language is an extension of the C++ programming language, a functionally similar version with CUDA mechanisms, and in at least one embodiment, the HIP language runtime API includes elements combined with the above. Figure 38 The discussion focuses on functions similar to CUDA runtime API 3804, such as those used for memory management, execution control, device management, error handling, and synchronization.
[0378] In at least one embodiment, the thunk (ROCt) 3907 is an interface that can be used to interact with the underlying ROCm driver 3908. In at least one embodiment, the ROCm driver 3908 is a ROCk driver, which is a combination of an AMD GPU driver and a HAS kernel driver (amdkfd). In at least one embodiment, the AMD GPU driver is a device kernel driver for GPUs developed by AMD, which performs the above-described combination. Figure 37 The device kernel driver 3706 discussed has similar functionality. In at least one embodiment, the HAS kernel driver is a driver that allows different types of processors to share system resources more efficiently via hardware features.
[0379] In at least one embodiment, various libraries (not shown) may be included in the ROCm software stack 3900 above the language runtime 3903, and provide integration with the above. Figure 38 The discussed CUDA library 3803 has similar functionality. In at least one embodiment, various libraries may include, but are not limited to, mathematical, deep learning, and / or other libraries, such as the hipBLAS library which implements functions similar to CUDA cuBLAS, the rocFFT library which is similar to CUDA cuFFT for computing FFT, etc.
[0380] Figure 40 The illustration shows an embodiment according to at least one of the embodiments. Figure 37 The software stack 3700 is an OpenCL implementation. In at least one embodiment, the OpenCL software stack 4000 on which the application 4001 can be launched includes an OpenCL framework 4005, an OpenCL runtime 4006, and a driver 4007. In at least one embodiment, the OpenCL software stack 4000 executes on non-vendor-specific hardware 3809. In at least one embodiment, because devices developed by different vendors support OpenCL, specific OpenCL drivers may be required for interoperability with hardware from such vendors.
[0381] In at least one embodiment, the application 4001, the OpenCL runtime 4006, the device kernel driver 4007, and the hardware 4008 can respectively execute the above-described combination. Figure 37 The application 4001, runtime 3705, device kernel driver 3706, and hardware 3707 discussed have similar functionality. In at least one embodiment, application 4001 also includes an OpenCL kernel 4002 with code that will be executed on the device.
[0382] In at least one embodiment, OpenCL defines a "platform" that allows a host to control devices connected to that host. In at least one embodiment, the OpenCL framework provides a platform layer API and a runtime API, shown as Platform API 4003 and Runtime API 4005. In at least one embodiment, Runtime API 4005 uses a context to manage the execution of the kernel on the device. In at least one embodiment, each identified device can be associated with a respective context, which Runtime API 4005 can use to manage the device's command queue, program objects and kernel objects, shared memory objects, etc. In at least one embodiment, Platform API 4003 discloses functions that allow the device context to select and initialize devices, submit work to devices via command queues, and enable data transfers to and from devices, etc. Additionally, in at least one embodiment, the OpenCL framework provides various built-in functions (not shown), including mathematical functions, relational functions, and image processing functions, etc.
[0383] In at least one embodiment, compiler 4004 is also included in the OpenCL framework 4005. In at least one embodiment, the source code may be compiled offline before executing the application or online during application execution. Unlike CUDA and ROCm, the OpenCL application in at least one embodiment may be compiled online by compiler 4004, which is included to represent any number of compilers that can be used to compile source code and / or IR code (e.g., Standard Portable Intermediate Representation (“SPIR-V”) code) into binary code. Alternatively, in at least one embodiment, the OpenCL application may be compiled offline before executing such an application.
[0384] Figure 41 Software supported by a programming platform according to at least one embodiment is illustrated. In at least one embodiment, the programming platform 4104 is configured to support various programming models 4103, middleware and / or libraries 4102, and frameworks 4101 that the application 4100 may depend on. In at least one embodiment, the application 4100 may be an AI / ML application implemented using, for example, a deep learning framework (MXNet, PyTorch, or TensorFlow in at least one embodiment), which may depend on libraries such as cuDNN, the NVIDIA Collective Communications Library (“NCCL”), and / or the NVIDIA Developer Data Loading Library (“DALI”) CUDA library to provide accelerated computation on the underlying hardware.
[0385] In at least one embodiment, the programming platform 4104 can be a combination of the above-described components. Figure 33 Figure 34 and Figure 40 One of the described CUDA, ROCm, or OpenCL platforms. In at least one embodiment, the programming platform 4104 supports multiple programming models 4103, which are abstractions of the underlying computing system that allow for the expression of algorithms and data structures. In at least one embodiment, the programming model 4103 may expose features of the underlying hardware to improve performance. In at least one embodiment, the programming model 4103 may include, but is not limited to, CUDA, HIP, OpenCL, C++ Accelerated Massive Parallelism (“C++AMP”), Open Multiprocessing (“OpenMP”), Open Accelerator (“OpenACC”), and / or Vulcan Compute.
[0386] In at least one embodiment, the library and / or middleware 4102 provides an abstract implementation of the programming model 4104. In at least one embodiment, such a library includes data and programming code that can be used by a computer program and utilized during software development. In at least one embodiment, in addition to those available from the programming platform 4104, such middleware also includes software that provides services to applications. In at least one embodiment, the library and / or middleware 4102 may include, but is not limited to, cuBLAS, cuFFT, cuRAND, and other CUDA libraries, or rocBLAS, rocFFT, rocRAND, and other ROCm libraries. Additionally, in at least one embodiment, the library and / or middleware 4102 may include NCCL and ROCm communication collection library (“RCCL”) libraries, which provide communication routines for GPUs, the MIOpen library for deep learning acceleration, and / or intrinsic libraries for linear algebra, matrix and vector operations, geometric transformations, numerical solvers, and related algorithms.
[0387] In at least one embodiment, the application framework 4101 depends on libraries and / or middleware 4102. In at least one embodiment, each application framework 4101 is a software framework for implementing a standard structure of application software. In at least one embodiment, AI / ML applications can be implemented using frameworks such as Caffe, Caffe2, TensorFlow, Keras, PyTorch, or MxNet deep learning frameworks.
[0388] Figure 42 Compilation code according to at least one embodiment is shown to be used in Figure 37-40The application is executed on one of the programming platforms. In at least one embodiment, compiler 4201 receives source code 4200, which includes both host code and device code. In at least one embodiment, compiler 4201 is configured to convert source code 4200 into host executable code 4202 for execution on a host and device executable code 4203 for execution on a device. In at least one embodiment, source code 4200 may be compiled offline before executing the application or compiled online during application execution.
[0389] In at least one embodiment, source code 4200 may include code in any programming language supported by compiler 4201, such as C++, C, Fortran, etc. In at least one embodiment, source code 4200 may be included in a single-source file, which has a mixture of host code and device code, and indicates the location of the device code therein. In at least one embodiment, the single-source file may be a .cu file including CUDA code or a .hip.cpp file including HIP code. Alternatively, in at least one embodiment, source code 4200 may include multiple source code files instead of a single-source file, in which the host code and device code are separate.
[0390] In at least one embodiment, compiler 4201 is configured to compile source code 4200 into host executable code 4202 for execution on a host and device executable code 4203 for execution on a device. In at least one embodiment, compiler 4201 performs operations including resolving source code 4200 into an abstract system tree (AST), performing optimizations, and generating executable code. In at least one embodiment where source code 4200 comprises a single source file, compiler 4201 may separate device code and host code within such a single source file, compile the device code and host code into device executable code 4203 and host executable code 4202 respectively, and link device executable code 4203 and host executable code 4202 together in a single file, as described below. Figure 26 To be discussed in more detail.
[0391] In at least one embodiment, the host executable code 4202 and the device executable code 4203 can be in any suitable format, such as binary code and / or IR code. In the case of CUDA, in at least one embodiment, the host executable code 4202 may include native object code, while the device executable code 4203 may include code in a PTX intermediate representation. In at least one embodiment, in the case of ROCm, both the host executable code 4202 and the device executable code 4203 can include target binary code.
[0392] Other variations are within the spirit of this disclosure. Therefore, while the disclosed technology is readily adaptable to various modifications and alternative configurations, certain embodiments thereof are illustrated in the accompanying drawings and have been described in detail above. However, it should be understood that the disclosure is not intended to be limited to one or more specific forms disclosed, but rather, it is intended to cover all modifications, alternative configurations, and equivalents falling within the spirit and scope of this disclosure as defined in the appended claims.
[0393] Unless otherwise stated or obviously contradicted by the context, the terms “a,” “an,” and “the,” and similar references, used in the context of describing the disclosed embodiments (particularly in the context of the appended claims), should be interpreted as encompassing both singular and plural forms, rather than as definitions of the terms. Unless otherwise stated, the terms “comprising,” “having,” “including,” and “containing” should be interpreted as open-ended terms (meaning “including, but not limited to”). The term “connection” (referring to a physical connection where not modified) should be interpreted as partially or wholly contained, attached to, or joined together, even with some intervention. Unless otherwise indicated herein, references to numerical ranges herein are intended only as a way of abbreviating each individual value falling within that range, and each individual value is incorporated into the specification as if it were separately described herein. In at least one embodiment, unless otherwise indicated or contradicted by the context, the use of the terms “set” (e.g., “item set”) or “subset” should be interpreted as a non-empty set comprising one or more members. Furthermore, unless otherwise indicated or contradicted by the context, the term “subset” of the corresponding set does not necessarily mean an appropriate subset of the corresponding set, but rather that the subset and the corresponding set can be equal.
[0394] Unless otherwise explicitly stated or clearly contradicted by the context, connective phrases such as “at least one of A, B, and C” or “at least one of A, B, and C” are understood in the context to generally refer to items, terms, etc., which can be A or B or C, or any non-empty subset of the set A, B, and C. In at least one embodiment of a set with three members, the connective phrases “at least one of A, B, and C” and “at least one of A, B, and C” refer to any of the following sets: {A}, {B}, {C}, {A, B}, {A, C}, {B, C}, {A, B, C}. Therefore, such connective language is generally not intended to imply that some embodiments require the presence of at least one of A, at least one of B, and at least one of C. Additionally, unless otherwise stated or contradicted by the context, the term “multiple” indicates a plural state (e.g., “multiple items” means multiple items). In at least one embodiment, the number of items in the multiple items is at least two, but may be more if explicitly indicated or indicated by the context. Furthermore, unless otherwise stated or clearly understood from the context, the phrase “based on” means “at least partially based on” rather than “based on only”.
[0395] Unless otherwise indicated herein or clearly contradicted by the context, the operations of the processes described herein may be performed in any suitable order. In at least one embodiment, processes such as those described herein (or variations thereof and / or combinations thereof) are executed under the control of one or more computer systems configured with executable instructions and are implemented as code (e.g., executable instructions, one or more computer programs, or one or more application programs) that are executed jointly on one or more processors via hardware or a combination thereof. In at least one embodiment, the code is stored in the form of a computer program on a computer-readable storage medium, which in at least one embodiment includes a plurality of instructions executable by one or more processors. In at least one embodiment, the computer-readable storage medium is a non-transitory computer-readable storage medium that excludes transient signals (e.g., propagating transient electrical or electromagnetic transmissions) but includes non-transitory data storage circuitry (e.g., buffers, caches, and queues). In at least one embodiment, code (e.g., executable code or source code) is stored on one or more non-transitory computer-readable storage media (or other memory for storing executable instructions) on which executable instructions are stored, which, when executed by one or more processors of a computer system (i.e., as a result of execution), cause the computer system to perform the operations described herein. In at least one embodiment, the set of non-transitory computer-readable storage media comprises multiple non-transitory computer-readable storage media, and one or more of the individual non-transitory storage media lack all the code, but the multiple non-transitory computer-readable storage media collectively store all the code. In at least one embodiment, the executable instructions are executed such that different instructions are executed by different processors; in at least one embodiment, the non-transitory computer-readable storage media store the instructions, and the main central processing unit (“CPU”) executes some instructions while the graphics processing unit (“GPU”) executes other instructions. In at least one embodiment, different components of the computer system have separate processors, and the different processors execute different subsets of the instructions.
[0396] Therefore, in at least one embodiment, the computer system is configured to implement one or more services that perform the operations of the processes described herein, either individually or collectively, and such a computer system is configured with suitable hardware and / or software to enable the implementation of the operations. Furthermore, the computer system implementing at least one embodiment of this disclosure is a single device, and in another embodiment it is a distributed computer system comprising multiple devices operating in different ways, such that the distributed computer system performs the operations described herein, and that a single device does not perform all the operations.
[0397] The use of any and all at least one embodiment or exemplary language (e.g., “such as”) provided herein is intended only to better illustrate embodiments of this disclosure and does not constitute a limitation on the scope of the disclosure unless otherwise required. No language in the specification should be construed as indicating that any unclaimed element is essential to the practice of the disclosure.
[0398] All references cited in this article, including publications, patent applications and patents, are incorporated herein by reference as if each reference were individually and specifically indicated to be incorporated herein by reference and the entire contents of which are described herein.
[0399] In the specification and claims, the terms “coupled” and “connected” and their derivatives may be used. It should be understood that these terms may not be intended to be synonyms with each other. Rather, in one...
Claims
1. A data center cooling system, comprising: A radiator includes an air-liquid heat exchanger and a liquid-liquid heat exchanger, the air-liquid heat exchanger being used to transfer a first heat from hot air to a main coolant in a main cooling circuit, the liquid-liquid heat exchanger being used to transfer a second heat from an auxiliary coolant in an auxiliary cooling circuit to the main coolant in the main cooling circuit, the auxiliary coolant being transferred at least partially from a cooling distribution unit (CDU) to the liquid-liquid heat exchanger, and the first heat and the second heat being associated with at least one computing component of one or more racks.
2. The data center cooling system according to claim 1 further includes: At least one flow controller is provided for transferring auxiliary coolant from a first passage to a second passage, the first passage including the CDU and the second passage including the radiator.
3. The data center cooling system according to claim 1 or 2, wherein: The air-liquid heat exchanger includes a plurality of heat sinks that absorb and retain heat from the hot air, and the main coolant absorbs the first heat from the heat retained by the plurality of heat sinks.
4. The data center cooling system according to claim 1 or 2, wherein: Before the air-liquid heat exchanger transfers the first heat to the main coolant in the main cooling circuit, the liquid-liquid heat exchanger transfers the second heat to the main coolant in the main cooling circuit.
5. The data center cooling system according to claim 1 or 2, wherein: The liquid-liquid heat exchanger transfers the second heat to the main coolant in the main cooling circuit, while the air-liquid heat exchanger transfers the first heat to the main coolant.
6. The data center cooling system according to claim 1 or 2, wherein: The air-liquid heat exchanger includes a plurality of heat sinks, wherein the hot air is at least partially driven to flow through the plurality of heat sinks by one or more fans associated with the one or more racks.
7. The data center cooling system according to claim 1 or 2, wherein: The air-liquid heat exchanger includes a plurality of heat sinks aligned with the heating surfaces of the one or more racks, the plurality of heat sinks absorbing and retaining heat from the heating surfaces.
8. The data center cooling system according to claim 1 or 2, wherein: The CDU transfers a third heat from a first portion of the auxiliary coolant in the auxiliary cooling circuit, the third heat being associated with at least one computing component in one or more racks, and the liquid-liquid heat exchanger transfers a second heat from a second portion of the auxiliary coolant in the auxiliary cooling circuit.
9. The data center cooling system according to claim 1 or 2, further comprising: One or more flow controllers are used to enable the auxiliary coolant in the auxiliary cooling circuit to flow in and out of the liquid-liquid heat exchanger and in and out of the CDU.
10. The data center cooling system according to claim 1 or 2, wherein: The heat sink is located above or below the one or more racks.
11. A data center cooling system, comprising: The main cooling circuit uses an air-liquid heat exchanger with a unified radiator to absorb first heat from hot air from one or more racks using a main coolant, and uses a liquid-liquid heat exchanger with the unified radiator to exchange second heat from an auxiliary coolant in an auxiliary cooling circuit, the auxiliary coolant being transferred at least partially from a cooling distribution unit (CDU) to a second portion of the radiator, and the main cooling circuit is associated with a cooling facility, and the auxiliary cooling circuit is associated with at least one computing device.
12. The data center cooling system according to claim 11, wherein: The air-liquid heat exchanger includes a plurality of heat sinks for absorbing and retaining heat from the hot air in the one or more racks, the heat flowing through the plurality of heat sinks and transferring the first heat of the heat retained by the plurality of heat sinks to the main coolant.
13. The data center cooling system according to claim 11 or 12, wherein: The main cooling circuit passes through the liquid-liquid heat exchanger before entering the air-liquid heat exchanger.
14. The data center cooling system according to claim 11 or 12, wherein: The main cooling circuit passes through both the air-liquid heat exchanger and the liquid-liquid heat exchanger of the unified radiator.
15. The data center cooling system according to claim 11 or 12, further comprising: One or more flow controllers are used to enable the auxiliary coolant in the auxiliary cooling circuit to flow in and out of the liquid-liquid heat exchanger.
16. A method for a liquid cooling system for a data center, comprising: Provide heat sinks associated with one or more racks in a data center; The first portion of the radiator is configured to function as an air-liquid heat exchanger with a main cooling circuit that uses a main coolant to transfer initial heat from the one or more racks; and The second portion of the radiator is configured to function as a liquid-liquid heat exchanger, such that at least one auxiliary cooling circuit with an auxiliary coolant is able to exchange a second heat with the main cooling circuit, the auxiliary coolant being transferred at least partially from the cooling distribution unit (CDU) to the second portion of the radiator, the first heat and the second heat being associated with at least one computing component of the one or more racks.
17. The method of claim 16, further comprising: At least one flow controller is used to provide a branch for the auxiliary coolant intended to return to the CDU, so that the auxiliary coolant flows to the radiator.
18. The method according to claim 16 or 17, further comprising: The air-liquid heat exchanger provides a plurality of heat sinks that absorb and retain heat from the hot air from the one or more racks, and the main coolant absorbs the first heat from the heat retained in the plurality of heat sinks.
19. The method according to claim 16 or 17, further comprising: The main cooling circuit passes through the liquid-liquid heat exchanger before entering the air-liquid heat exchanger.
20. The method according to claim 16 or 17, further comprising: The main cooling circuit is configured to pass through both the air-liquid heat exchanger and the liquid-liquid heat exchanger simultaneously.
21. The method according to claim 16 or 17, further comprising: The air-liquid heat exchanger is provided with a plurality of heat sinks, and hot air from the one or more racks is allowed to flow through the plurality of heat sinks, at least in part, driven by one or more fans associated with the one or more racks.
22. The method according to claim 16 or 17, further comprising: The air-liquid heat exchanger is provided with a plurality of heat sinks, which are aligned with the heating surfaces of one or more racks, the plurality of heat sinks being used to absorb and retain heat from the heating surfaces.
23. The method according to claim 16 or 17, further comprising: The CDU is enabled to transfer a third heat from a first portion of the auxiliary coolant in the auxiliary cooling circuit, the third heat being associated with at least one computing component of the one or more racks, and the liquid-liquid heat exchanger is enabled to transfer the second heat from a second portion of the auxiliary coolant in the auxiliary cooling circuit.
24. The method according to claim 16 or 17, further comprising: One or more flow controllers are used to control the flow of the auxiliary coolant in and out of the liquid-liquid heat exchanger and in and out of the CDU.
25. The method according to claim 16 or 17, further comprising: The radiator having the air-liquid heat exchanger and the liquid-liquid heat exchanger is positioned above or below the one or more racks.