The invention relates to a computer rack, frame or system having a direct current power supply positioned at the upper portion of the rack. In one variation, the DC power supply is placed in the highest shelf in the computer rack. In another variation, the DC power supply is placed on top of the computer rack. In yet another variation, a dual column computer rack with a back-to-back configuration is implemented with DC power supplies placed in a top shelf of the one of the computer columns. The DC power supply may comprise of one or more direct current power supply modules configured to provide fail over protection. In another aspect of the invention, the power supply modules are placed in a separate rack and provide direct current to support computers in one or more computer racks.
A latency tolerant system for executing video processing operations. The system includes a host interface for implementing communication between the video processor and a host CPU, a scalar execution unit coupled to the host interface and configured to execute scalar video processing operations, and a vector execution unit coupled to the host interface and configured to execute vector video processing operations. A command FIFO is included for enabling the vector execution unit to operate on a demand driven basis by accessing the memory command FIFO. A memory interface is included for implementing communication between the video processor and a frame buffer memory. A DMA engine is built into the memory interface for implementing DMA transfers between a plurality of different memory locations and for loading the command FIFO with data and instructions for the vector execution unit.
A host connected to a switch using a PCI Express (PCIe) link. At the switch, the packets are received and routed as appropriate and provided to a conventional switch network port for egress. The conventional networking hardware on the host is substantially moved to the port at the switch, with various software portions retained as a driver on the host. This saves cost and space and reduces latency significantly. As networking protocols have multiple threads or flows, these flows can correlate to PCIe queues, easing QoS handling. The data provided over the PCIe link is essentially just the payload of the packet, so sending the packet from the switch as a different protocol just requires doing the protocol specific wrapping. In some embodiments, this use of different protocols can be done dynamically, allowing the bandwidth of the PCIe link to be shared between various protocols.
The present invention discloses a render farm based on CPU cluster. A distributed parallel cluster rendering system is constructed with high-efficiency low-energy-consumption CPUs so that the computing power obtains and even exceeds the computing performance of a supercomputer. The invention settles a batch rendering problem in the digital innovation producing process. Through using the render farm based on the CPU cluster according to the invention, the producing of the three-dimensional cartoon, special effect of video image, architecture designing, etc. can be completed with a high efficiency. The render farm based on CPU cluster according to the invention further has the advantages of increasing the rendering speed for more than 40 times, reducing the investment cost of building the render farm for 20%-70%, and saving the energy consumption in the production process for 60%-80%.
A stream based memory access system for a video processor for executing video processing operations. The video processor includes a scalar execution unit configured to execute scalar video processing operations and a vector execution unit configured to execute vector video processing operations. A frame buffer memory is included for storing data for the scalar execution unit and the vector execution unit. A memory interface is included for establishing communication between the scalar execution unit and the vector execution unit and the frame buffer memory. The frame buffer memory comprises a plurality of tiles. The memory interface implements a first sequential access of tiles and implements a second stream comprising a second sequential access of tiles for the vector execution unit or the scalar execution unit.
The invention discloses a miniaturized test device for a rocket-borne computer, which comprises a main control module, an interface module and a power supply module, wherein the main control module adopts a ZYNQ series SOC to serve as a main control chip, an ARM is responsible for state control and data processing and display of the test, and an FPGA is responsible for generating a real-time signal for the outside world and collecting an external real-time signal; the interface module is used for realizing A / D (Analog-to-Digital) conversion, D / A (Digital-to-Analog) conversion, level conversion, analog signal filtering and the like; and the power supply module is used for providing a required power supply for test equipment and a tested single machine on the rocket. The miniaturized test device can realize the portability of a ground test device, and can achieve a purpose of rapid testing.
A cabinet of blade server is prepared as setting cabinet height to be 7U, setting medium plate in cabinet and arranging ten connectors for containing 10 pieces of IU calculation blades on medium plate in parallel way, arranging medium plate at middle part of cabinet as its one side being space to contain 10 pieces of IU calculation blades and another end being space to place modules.
A radarsystem in which Coded ApertureRadarprocessing is performed on received radar signals reflected by one or more objects in a field of view which reflect a transmitted signal which covers a field of view with K sweeps and each sweep including Q frequency changes. For Type II CAR, the transmitted signal also includes N modulated codes per frequency step. The received radar signals are modulated by a plurality of binary modulators the results of which are applied to a mixer. The output of the mixer, for one acquisition results in a set of QK (for Type I CAR) or QKN (for Type II CAR) complex data samples, is distributed among a number of digital channels, each corresponding to a desired beam direction. For each channel, the complex digital samples are multiplied, sample by sample, by a complex signalmask that is different for each channel.
Disclosed is a liquid cooling module for computer servers, including: a pump, a fan, a heat exchanger, at least two ventilation grilles, an open central longitudinal space between the pump and the heat exchanger that is arranged to facilitate airflow therein from a grille of one short side wall to a grille of the other short side wall, this airflow being driven by the fan, a portion of secondary hydraulic circuit located in the liquid cooling module, for circulating a fluid coolant, including no bypass that would allow the pump to operate as a closed circuit and likely to clutter this open longitudinal space, a circuit control board positioned in the longitudinal extension of the open central longitudinal space so as to be directly swept by the airflow.
In order to realize computational programming, the invention provides an in-packagelookup table-based programmable processor, comprising a logic chip and a programmable storage chip which are located in the same package; the programmable storage chip comprises a lookup table circuit (LUT), and the logic chip comprises an arithmetic logic circuit (ALC). According to a user need, the LUT stores relevant data of a needed function; the ALC performs arithmetic operation on the relevant data of the function.
In order to realize computational programming, the invention provides a backside lookup table (BS-LUT)-based programmable processor, comprising a lookup table circuit (LUT) located on the backside of a processor substrate and an arithmetic logic circuit (ALC) located on the front of the processor substrate. According to a user need, the LUT stores relevant data of a needed function. The ALC performs arithmetic operation on the relevant data of the function.
The invention discloses a high-density server based on a fusion extension framework. The high-density server comprises a dense node system and a hardware framework; the hardware framework comprises a machine box, and the dense node system is arranged in the machine box; the dense node system comprises a basic module and a variable module; the basic module is used for achieving the power supply, cooling, management and output functions of the server; the variable module is used for meeting personalized requirements of users for calculation density and storage density. Compared with the prior art, the height of the high-density server based on the fusion extension framework is 4U, calculation, balancing and storage of dense nodes are achieved in the same machine box, and various application requirements are flexibly met.
The present invention discloses a configurable processor with an in-package look-up table. The configurable processor comprises a programmable memory die and a logic die located in a same package. The programmable memory die comprises a look-up table circuit (LUT) for storing data related to a desired function. The logic die comprises an arithmetic logic circuit (ALC) for performing arithmetic operations on the data read out from the LUT.
A host connected to a switch using a PCI Express (PCIe) link. At the switch, the packets are received and routed as appropriate and provided to a conventional switch network port for egress. The conventional networking hardware on the host is substantially moved to the port at the switch, with various software portions retained as a driver on the host. This saves cost and space and reduces latency significantly. As networking protocols have multiple threads or flows, these flows can correlate to PCIe queues, easing QoS handling. The data provided over the PCIe link is essentially just the payload of the packet, so sending the packet from the switch as a different protocol just requires doing the protocol specific wrapping. In some embodiments, this use of different protocols can be done dynamically, allowing the bandwidth of the PCIe link to be shared between various protocols.
The invention provides an in-packagelookup table (IP-LUT)-based processor for calculating a mathematical function. The processor comprises a logic chip and a storage chip; the storage chip comprises a lookup table circuit (LUT), and data stored by the LUT is correlated to the mathematical function; the logic chip comprises an arithmetic logic circuit (ALC), and the ALC performs arithmetic operation on relevant data of the mathematical function; the storage chip and the logic chip are located in the same package.
The invention discloses a video-based fast template matching GPU implementation method which comprises the following steps: (1) copying image data to GPU equipment, wherein the operation of copying the images to the equipment and image template matching are executed at the same time as two independent streams, one of which executes a current-frame image template matching calculation process operation while the other of which executes a next-frame image copying operation; (2) storing the image template data in a shared memory, storing the image data in a texture memory, and randomly accessing by virtue of the texture memory for computation of match degree with the image template; (3) carrying out calculation at the same time by virtue of a multi-thread parallel calculation mode, wherein one thread calculates the template match degree quantity value of one position; and (4) determining the maximum value or the minimum value of the template match degree quantity value by virtue of a global atom and a shared atom so as to obtain a template matching result. According to the method disclosed by the invention, the template matching time can be obviously shortened and the practical value of a template matching algorithm can be improved.
The invention provides a backside lookup table (BS-LUT)-based emulation processor for emulating a system. The system comprises a subsystem. The emulation processor comprises a lookup table circuit (LUT) and an arithmetic logic circuit (ALC). The LUT is located on the backside of a processor substrate, and data stored by the LUT is correlated to a mathematical model of the subsystem. The ALC is located on the front of the processor substrate, and performs arithmetic operation on relevant data of the model. The LUT is electrically coupled with the ALC by multiple through silicon vias (TSV).
The invention provides an NAND operation circuit for memory area calculation, a memory chip and a computer. The NAND operation circuit for memory area calculation comprises a bit line pre-charging circuit, a storage array and a reverse-phase structure, which are connected in sequence; and the NAND operation circuit is located in a memory of the computer to ensure that the memory has a NAND operation function and then NAND operation of data can be completed in the memory, thereby avoiding the long-distance transmission, between the memory and an ALU, of the data needing NAND operation, improving the operation speed, in a CPU, of the data needing NAND operation, shorting a part of the data processing time of the CPU, namely, shortening the operation time of the CPU, improving the calculationdensity and calculation bandwidth, and realizing the operation by using a single memory process.
The invention provides an emulation processor for emulating a system. The to-be-emulated system comprises a subsystem. The emulation processor comprises a storage chip and a logic chip; the storage chip comprises a lookup table circuit (LUT), and data stored by the LUT is correlated to a mathematical model of the subsystem; the logic chip comprises an arithmetic logic circuit (ALC), and the ALC performs arithmetic operation on relevant data of the model; the storage chip and the logic chip are located in the same package.