Method and system for performing channel allocation in encryption, decryption and authentication functionalities
The method and system address inefficiencies in GCM hardware by parallelizing AES and GHASH operations and decoupling encryption/decryption and authentication logic, improving performance and resource utilization in high-speed communication systems.
Patent Information
- Authority / Receiving Office
- WO · WO
- Patent Type
- Applications
- Current Assignee / Owner
- HUAWEI TECH CO LTD
- Filing Date
- 2024-12-12
- Publication Date
- 2026-06-18
AI Technical Summary
Existing hardware implementations of Galois Counter Mode (GCM) face inefficiencies due to underutilized GHASH logic, limited parallelism in data processing, and bottlenecks in processing one data packet per cycle, leading to resource underutilization and increased power consumption.
A method and system for parallelizing AES and GHASH operations, decoupling encryption/decryption and authentication logic, and dynamically allocating channels based on packet arrival and processing requirements, using a buffer to store computations and separate cryptographic functions to process multiple data blocks simultaneously.
Enhances performance and resource utilization, reduces physical layout and power consumption, and optimizes throughput in high-speed communication systems, particularly in mobile and IoT devices.
Smart Images

Figure CN2024138765_18062026_PF_FP_ABST
Abstract
Description
METHOD AND SYSTEM FOR PERFORMING CHANNEL ALLOCATION IN ENCRYPTION, DECRYPTION AND AUTHENTICATION FUNCTIONALITIESTECHNICAL FIELD
[0001] The present disclosure relates generally to the field of cryptographic systems; and more specifically, to a method of performing channel allocation in a hardware implementation of encryption-decryption functionality and a hardware implementation of authentication functionality in Galois Counter Mode and a system for the same.BACKGROUND
[0002] In recent years, high-performance hardware architectures for cryptographic functions have evolved significantly. Particularly, Galois Counter Mode (GCM) has become widely adopted for enhancement of data security provided by wireless communication systems and validating data accuracy in wireless networks. As a block cipher mode of operation, the GCM provides both data confidentiality and authentication. The GCM is a cornerstone for security protocols, such as Internet protocol security (IPsec) , and Transport layer security (TLS) , as well as for securing wireless network and storage systems. The GCM security relies on two key operations, first is advanced encryption standard (AES) for encryption / decryption and second is Galois hash (GHASH) for data authentication. To accommodate the high-speed requirements of modern data transmission, especially with large volume of datasets, AES and GHASH operations are required to be processed with more efficiency.
[0003] The existing hardware implementations of GCM have encountered persistent issues related to high-speed networks and large data volumes. Traditional approaches allocate resources (or channels) for AES and GHASH operations but fail to fully utilize the allocated resources (or channels) due to the constraints of sequential processing. The sequential handling of data packets creates bottlenecks, particularly when only one data packet is processed per clock cycle. Such constraints lead to resource underutilization and bottlenecks in packet handling. Additionally, limited number of processing channels further contribute to inefficient data processing, resulting in degraded overall performance in GCM-based systems. In parallel, traditional implementations of cryptographic functions, such as AES and GHASH, also struggle to balance the performance with efficient resource utilization, particularly in scenarios where physical space and power consumption are critical factors. The growing demand for cryptography in mobile devices and Internet of Things (IoT) systems has increased the requirement for solutions that provide robust security while minimizing physical space and power consumption. The current hardware implementations of AES and GHASH within GCM frameworks often fall short of achieving optimal performance while maintaining efficient resource utilization and reducing power consumption and physical space. Thus, there exists a technical problem of inefficient resource utilization in GCM hardware, primarily due to underutilized GHASH logic, limited parallelism in data processing, and bottlenecks in processing one data packet per cycle.
[0004] Therefore, in light of the foregoing discussion, there exists a need to overcome the aforementioned drawbacks associated with the conventional ways of using the GCM of operation.SUMMARY
[0005] The present disclosure provides a method of performing channel allocation in a hardware implementation of encryption / decryption functionality and a hardware implementation of authentication functionality. The present disclosure provides a solution to the existing problem of inefficient resource utilization in GCM hardware, primarily due to underutilized GHASH logic, limited parallelism in data processing, and bottlenecks in processing one data packet per cycle. An aim of the present disclosure is to provide a solution that overcomes at least partially the problems encountered in the prior art and provide an improved method for channel allocation in GCM hardware implementations of encryption, decryption, and authentication functionalities. The method involves parallelizing AES and GHASH operations to process multiple data blocks simultaneously, reducing bottlenecks and improving throughput of an overall communication system. The approach enhances the performance while minimizing power consumption, making the communication architecture suitable for high-speed and resource-constrained systems, such as mobile and IoT devices.
[0006] The object of the present disclosure is achieved by the solutions provided in the enclosed independent claims. Advantageous implementations of the present disclosure are further defined in the dependent claims.
[0007] In one aspect, the present disclosure provides a method of performing channel allocation in a hardware implementation of encryption / decryption functionality and in a hardware implementation of authentication functionality, where the hardware implementations are components of the Galois Counter Mode, GCM, comprising steps of receiving packets of data over a plurality of channels. The method further comprises performing, via the hardware implementation of the encryption / decryption functionality of the GCM, computations on a first packet received over a first of the plurality of channels, using a first cryptographic key from a key generator component of the hardware implementation of the encryption / decryption functionality of the GCM. The method further comprises storing the computations in a buffer and using the stored computations from the buffer to carry out processing using the hardware implementation of the authentication functionality of the GCM. The logic circuitry of the hardware implementation of the encryption / decryption functionality and logic circuitry of the hardware implementation of the authentication functionality are decoupled via the use of the buffer. Furthermore, each of the hardware implementation of the encryption / decryption functionality and the hardware implementation of the authentication functionality separately allocates packets into the plurality of channels using respectively separate criteria. Within a single clock cycle, a first subset of the plurality of channels processes an existing packet while, within a second subset of the plurality of channels begin processing a new packet, received after the existing packet. In the hardware implementation of the authentication functionality, a plurality of Galois field multipliers are separated from an exclusive OR circuit assembly that sums data from each of the plurality of channels. The first plurality of channels is summed for the existing packet by a first exclusive OR circuit of the exclusive OR circuit assembly and the second plurality of channels is summed for the new packet by a second exclusive OR circuit of the exclusive OR circuit assembly.
[0008] The disclosed method resolves the technical problem of inefficient resource utilization in GCM hardware, primarily due to underutilized GHASH logic, limited parallelism in data processing, and bottlenecks in processing one data packet per cycle. The disclosed method comprises separate hardware implementations for encryption / decryption and authentication, both of which receive data packets over multiple channels. The computations are performed on the received packets using the cryptographic key and the results are stored in the buffer. Thereafter, the stored computations from the buffer are utilized to carry out further processing. The decoupling of the logic circuitry of the encryption / decryption hardware from the logic circuitry of the authentication hardware via the buffer, allows for allocation of packets separately into channels based on different criteria. Additionally, the method enables parallel processing, where one subset of the plurality of channels processes the existing packet while another subset begins processing the new packet in the same clock cycle. Furthermore, the separation of the plurality of Galois field multipliers from the exclusive OR circuit assembly in the authentication hardware enhances efficiency and reduces the required resources. These features synergistically work together to achieve a reduction in authentication area for encryption and decryption, resulting in improved performance and resource utilization in GCM implementations. The method enables parallel implementation of GHASH, prominent component of GCM, which reduces the number of required channels. The parallel implementation of GHASH reduces the physical layout and power consumption while maintaining performance metrics.
[0009] In an implementation form, the hardware implementation of the encryption / decryption functionality uses a hardware multiplexer to decide whether a received packet, received over one of the plurality of channels, is a new packet or a previously received packet. If the received packet is determined to be a new packet, a key generator component of the hardware implementation of the encryption / decryption functionality is used to perform a computation during encryption / decryption.
[0010] The utilization of the hardware multiplexer also contributes to a reduction in the overall hardware resources required, including gate count, and minimizes power consumption, leading to enhanced performance and efficiency, especially in space-and energy-constrained GCM implementations.
[0011] In a further implementation form, if the received packet is determined to be a previously received packet, stored computations from the buffer are used during encryption / decryption.
[0012] By avoiding re-computations of encryption or decryption operations for the previously received packet, reduced power consumption can be obtained.
[0013] In a further implementation form, the hardware implementation of the encryption / decryption functionality operates over 128-bit data blocks.
[0014] The use of 128-bit data blocks ensures compatibility and adherence to Advanced Encryption Standard (AES) encryption / decryption algorithms.
[0015] In a further implementation form, the hardware implementation of the authentication functionality uses a finite field multiplication operation.
[0016] The use of finite field multiplication in the authentication process enhances security and computational efficiency. The structured nature of finite field multiplication also enables optimized hardware designs, potentially reducing power consumption. Additionally, the finite field multiplication supports scalability, allowing the authentication system to be adapted for different security levels by adjusting the field size without fundamentally changing the underlying algorithm.
[0017] In a further implementation form, the buffer is a First-In-First-Out, FIFO, buffer.
[0018] The use of the FIFO buffer allows for relaxed timing in processing the packets. The FIFO buffer ensures that the complete packet arrives before the data packet is processed, assuming that both the arrival and processing rates are the same. This allows for efficient handling of the packets without the requirement for additional synchronization mechanisms.
[0019] In a further implementation form, the GCM is a mode of operation of symmetric key cryptographic block ciphers.
[0020] The use of GCM as the mode of operation for symmetric key cryptographic block ciphers offers enhanced security and data integrity.
[0021] In a further implementation form, the key expansion is used in the hardware implementation of the encryption / decryption functionality.
[0022] The key expansion is required to enhance the security and effectiveness of the encryption / decryption process. By generating unique key logic for each round, the method ensures that the encryption / decryption operations are not vulnerable to attacks that exploit patterns or weaknesses in the key. This further results in strengthening the overall security of a cryptographic system.
[0023] In a further implementation form, key expansion is performed in the same way for all cyber blocks that belong to the same packet.
[0024] The reason for performing key expansion in the same way for all cyber blocks within a packet is to ensure consistency and uniformity in the encryption process. By using the same key expansion logic, all blocks within a packet have the same set of expanded keys, which enhances the security and efficiency of the encryption algorithm.
[0025] In a further implementation form, the hardware implementation of the authentication functionality is performed in a parallel manner.
[0026] The hardware implementation of the authentication functionality is performed in a parallel manner to optimize efficiency and reduce resource usage. By utilizing parallel processing, the method manifests the ability to handle high input rates and support various packet sizes without compromising on authentication performance.
[0027] In a further implementation form, the packets are Ethernet packets.
[0028] By dynamically allocating channels based on the packet size and processing requirements, the method aims to maximize the throughput and minimize processing delays.
[0029] In a further implementation form, the hardware implementation of the encryption / decryption functionality processes the received packets at an operating frequency of 1.2 GHz.
[0030] The purpose of implementing the encryption / decryption functionality at the operating frequency of 1.2 GHz is to ensure efficient and fast processing of received packets.
[0031] In another aspect, the present disclosure provides a system comprising means adapted for carrying out all the steps of the method of the present disclosure, causing the system to perform the method.
[0032] The system achieves all the advantages and technical effects of the method of the present disclosure.
[0033] It is to be appreciated that all the aforementioned implementation forms can be combined.
[0034] It has to be noted that all devices, elements, circuitry, units and means described in the present application could be implemented in the software or hardware elements or any kind of combination thereof. All steps which are performed by the various entities described in the present application as well as the functionalities described to be performed by the various entities are intended to mean that the respective entity is adapted to or configured to perform the respective steps and functionalities. Even if, in the following description of specific embodiments, a specific functionality or step to be performed by external entities is not reflected in the description of a specific detailed element of that entity which performs that specific step or functionality, it should be clear for a skilled person that these methods and functionalities can be implemented in respective software or hardware elements, or any kind of combination thereof. It will be appreciated that features of the present disclosure are susceptible to being combined in various combinations without departing from the scope of the present disclosure as defined by the appended claims.
[0035] Additional aspects, advantages, features and objects of the present disclosure would be made apparent from the drawings and the detailed description of the illustrative implementations construed in conjunction with the appended claims that follow.BRIEF DESCRIPTION OF THE DRAWINGS
[0036] The summary above, as well as the following detailed description of illustrative embodiments, is better understood when read in conjunction with the appended drawings. For the purpose of illustrating the present disclosure, exemplary constructions of the disclosure are shown in the drawings. However, the present disclosure is not limited to specific methods and instrumentalities disclosed herein. Moreover, those skilled in the art will understand that the drawings are not to scale. Wherever possible, like elements have been indicated by identical numbers.
[0037] Embodiments of the present disclosure will now be described, by way of example only, with reference to the following diagrams wherein:
[0038] FIG. 1A is a network environment diagram of a system for communication supporting Galois Counter Mode (GCM) of operation, in accordance with an embodiment of the present disclosure;
[0039] FIG. 1B is a block diagram that illustrates the various exemplary components of a transmitter, in accordance with an embodiment of the present disclosure;
[0040] FIG. 2 is a flowchart of a method of performing channel allocation in a hardware implementation of encryption / decryption functionality and authentication functionality of GCM implementation, in accordance with an embodiment of the present disclosure;
[0041] FIG. 3 illustrates a parallel GCM architecture with four channels for high-performance encryption and authentication, in accordance with an embodiment of the present disclosure;
[0042] FIG. 4 illustrates a data packet scheduling or transmission scheme across a plurality of channels over several clock cycles, in accordance with an embodiment of the present disclosure;
[0043] FIG. 5 illustrates a flow of operations in an Advanced Encryption Standard (AES) encryption process for a single channel, in accordance with an embodiment of the present disclosure; and
[0044] FIGs. 6A, 6B, and 6C collectively, illustrate a schematic diagram of GHASH implementation, in accordance with an embodiment of the present disclosure.
[0045] In the accompanying drawings, an underlined number is employed to represent an item over which the underlined number is positioned or an item to which the underlined number is adjacent. A non-underlined number relates to an item identified by a line linking the non-underlined number to the item. When a number is non-underlined and accompanied by an associated arrow, the non-underlined number is used to identify a general item at which the arrow is pointing.DETAILED DESCRIPTION OF EMBODIMENTS
[0046] The following detailed description illustrates embodiments of the present disclosure and ways in which they can be implemented. Although some modes of carrying out the present disclosure have been disclosed, those skilled in the art would recognize that other embodiments for carrying out or practicing the present disclosure are also possible.
[0047] FIG. 1A is a network environment diagram of a system for communication supporting Galois Counter Mode (GCM) of operation, in accordance with an embodiment of the present disclosure. With reference to FIG. 1A, there is shown a system 100 for communication that includes a transmitter 102, and a receiver 104. The transmitter 102 and the receiver 104 communicate with each other through a plurality of channels 106. The plurality of channels 106 includes N number of channels, such as a first channel 106A, a second channel 106B, and up to a Nth channel 106N.
[0048] The system 100 is specifically designed for an efficient parallel hardware implementation of AES and GHASH functions which significantly reduces the physical layout and the power consumption while maintaining all performance metrics when implemented as part of IPSec for Ethernet traffic in various networking devices, such as switches and Network Interface Card (NICs) . The system 100 for communication supports the Galois Counter mode of operation to ensure secure and authenticated data transmission. The system 100 is designed to handle high-speed network traffic while maintaining data confidentiality and integrity. The system 100 is beneficial for applications that requires encryption / decryption for security purposes where space and energy efficiency are paramount, including mobile devices, IoT devices, and other compact, high-performance cryptographic systems.
[0049] The transmitter 102 may include suitable logic circuitry, interfaces, or code that is configured to communicate with the receiver 104 via the plurality of channels 106. Examples of the transmitter 102 may include but are not limited to, an Internet-of-Things (IoT) device, a smartphone, a machine-type communication (MTC) device, a computing device, a server, an IoT controller, a drone, customized hardware for wireless telecommunication, or any other portable or non-portable electronic device. In the system 100 for communication, the transmitter 102 may have a single antenna for communication with the receiver 104. However, in another implementation of the system 100 for communication, the transmitter 102 may have more than one antenna for communication with the receiver 104 and may act as a transceiver (i.e., the transmitter 102 may function as a transmitter as well as a receiver of data) .
[0050] The receiver 104 may include suitable circuitry, interfaces, or code that is configured to receive one or more data packets from the transmitter 102, via the plurality of channels 106. Examples of the receiver 104 may include, but are not limited to, an IoT device, a smartphone, a MTC device, an Internet-of-Things (IoT) controller, a base station, a server, a smart phone, a customized hardware for wireless telecommunication, a receiver, or any other portable or non-portable electronic device. In said implementation scenario of the system 100 for communication, the receiver 104 may have a single antenna for communication with the transmitter 102. However, in another implementation scenario of the system 100 for communication, the receiver 104 may have more than one antenna for communication with the transmitter 102.
[0051] The plurality of channels 106 may be referred to as multiple pathways through which data or signals communicate from the transmitter 102 to the receiver 104. Each channel of the plurality of channels 106 represents a separate route for transmitting data. The plurality of channels 106 can be physical, such as different frequency bands in wireless communication, or logical, like separate data streams over the same medium. The plurality of channels 106 allows enhanced data transmission efficiency, reliability, and increased capacity, as different signals can be transmitted simultaneously without interference. Examples of the plurality of channels 106 may include, such as different frequency bands in cellular networks (like 4G or 5G) , separate Wi-Fi channels in home or office networks, and distinct satellite communication frequencies. The plurality of channels 106 is widely used in applications, such as mobile phone communication, Wi-Fi networks, Bluetooth devices, and satellite-based systems, ensuring seamless data transfer across various devices and services. The plurality of channels 106 may be represented by a wireless communication network because of the presence of multiple wireless paths for data communication between the transmitter 102 and the receiver 104.
[0052] The system 100 for communication is capable of handling high-bandwidth traffic, often processing data streams of up to 800 Gbps or more. The system 100 for communication operates at high clock speeds, commonly around 1.2 GHz, to meet the demands of modern network traffic. The functionality of system 100 revolves around the GCM algorithm, which combines the Advanced Encryption Standard (AES) for data confidentiality and the GHASH function for data authentication. This dual approach ensures that data is not only encrypted but also protected against tampering or unauthorized modification during transmission. When processing network traffic, the system 100 is configured to handle a variety of packet sizes, ranging from small packets of 64 bytes to larger packets of several kilobytes. Each packet typically includes headers (such as Ethernet, IP, and ESP headers) and payload data. The system 100 efficiently processes these packets, applying the GCM algorithm to the appropriate portions of the data. The system 100 employs multiple parallel channels for both AES encryption / decryption and GHASH computation. These parallel channels allow the system 100 to process multiple blocks of data simultaneously, achieving the throughput required for high-speed networks. The system 100 also manages various cryptographic elements, including key generation and management, counter values for the AES-CTR mode, and the computation and verification of authentication tags. All these operations are performed with minimal latency to maintain line-rate processing.
[0053] One of the key challenges in conventional systems is to balance the performance with efficient resource utilization and power consumption. As network speeds continue to increase, there's a growing requirement for more efficient implementations that can scale without proportionally increasing hardware resources or power requirements. The system 100 addresses these challenges by allowing more efficient parallel processing and smart resource allocation, the system 100 is configured to overcome aforementioned limitations of conventional systems, particularly in terms of resource utilization, scalability, and power efficiency. The system 100 is configured to meet the ever-growing demands of high-speed networks while maintaining or even reducing hardware complexity and power consumption. Such advancements are required for the continued evolution of secure, high-performance networking technologies across various sectors of the digital economy.
[0054] Moreover, the transmitter 102 is configured to perform the authenticated encryption operation and the receiver 104 is configured to perform the authenticated decryption operation. The authenticated decryption operation is similar to the authenticated encryption operation, but with the order of hash step and encrypt step revered. Because of the similarity between the authenticated encryption and the authenticated decryption operations, the authenticated encryption operation executed at the transmitter 102 is described in detail, for example, in FIG. 1B.
[0055] FIG. 1B is a block diagram that illustrates various exemplary components of a transmitter, in accordance with an embodiment of the present disclosure. FIG. 1B is described in conjunction with elements from FIG. 1A. With reference to FIG. 1B, there is shown a block diagram of the transmitter 102, which includes an encryption / decryption hardware 108 and an authentication hardware 110. The encryption / decryption hardware 108 and the authentication hardware 110 are configured to perform encryption / decryption functionality, and authentication functionality, respectively. Furthermore, the encryption / decryption hardware 108 comprises a key generator component 112, a hardware multiplexer 114, and a logic circuitry 116. The authentication hardware 110 includes a plurality of Galois field multipliers 118, an exclusive OR circuit assembly 120, and a logic circuitry 122. The exclusive OR circuit assembly 120 includes a first exclusive OR circuit 120A and a second exclusive OR circuit 120B. The logic circuitry 116 of the encryption / decryption hardware 108 and the logic circuitry 122 of the authentication hardware 110 is decoupled via the use of a buffer 124.
[0056] The encryption / decryption hardware 108 may include suitable circuitry, interfaces, or code that is configured to perform data encryption and decryption functionality. The encryption functionality refers to a process of converting plaintext data into ciphertext using a cryptographic algorithm and a cryptographic key. The decryption functionality refers to a process of converting ciphertext back into its original plaintext form using the appropriate cryptographic key and algorithm within a cryptographic system. The AES algorithm is used for the conversion of plaintext into ciphertext block by block (e.g., data blocks of 128-bit length) . The application of the encryption / decryption hardware 108 includes, but not limited to implementation scenarios requiring high-performance data protection, such as financial transactions, secure communication channels, and data storage systems. Examples of the encryption / decryption hardware 108 may include but are not limited to, devices utilizing AES algorithm or GCM for secure and fast processing of sensitive information.
[0057] The authentication hardware 110 may include suitable logic, circuitry, interfaces, or code that is configured to perform data authentication functionality. The authentication functionality refers to the capability of the authentication hardware 110 to verify the identity of a user or a device or an entity through the use of cryptographic protocols and techniques. The authentication hardware 110 ensures data integrity and authenticity by performing the GHASH function as part of the Galois counter mode. Examples of the authentication hardware 110 may include but are not limited to, a hardware used in smart cards, mobile devices, IoT devices, servers, and the like.
[0058] The key generator component 112 may include suitable logic, circuitry, interfaces, or code that is configured to generate and manage cryptographic keys within a cryptographic system, such as the system 100 for communication. The key generator component 112 is configured to generate a sequence of random or pseudo-random (i.e., a sequence of numbers or values that appear random) values that serve as the cryptographic key, to ensure the confidentiality and integrity of transmitted or stored data.
[0059] The hardware multiplexer 114 may include suitable logic, circuitry, interfaces, or code that is configured to select one of several input signals and forward the selected input signal to a single output line. The hardware multiplexer 114 efficiently manages the data flow in the system 100 by reducing the number of data paths. The hardware multiplexer 114 is configured to manage the parallel processing of AES and GHASH functionality. More specifically, the hardware multiplexer 114 is configured to direct Galois field multiplication results to the appropriate exclusive OR channels during GHASH processing. The hardware multiplexer 114 allows the system 100 to process multiple packets concurrently while optimizing channel utilization.
[0060] The logic circuitry 116 may include suitable logic, circuitry, interfaces, or code that is configured to perform logical operations on one or more data packets and produce a corresponding output based on predetermined rules. The logic circuitry 116 may comprise logical gates, such as AND, OR, NOT, XOR and the like. The logic circuitry 116 forms the backbone of digital systems, enabling decision-making and processing functions within hardware components. The logic circuitry 116 is widely utilized in computing systems, embedded devices, and control systems, where operations like arithmetic calculations, data comparison, and signal controlling are performed. Examples of the logic circuitry 116 may include, but are not limited to, processors executing binary instructions, memory management systems handling data flow, and microcontrollers controlling automated devices.
[0061] The plurality of Galois field multipliers 118 may include suitable logic, circuitry, interfaces, or code that is configured to perform GHASH function used in the Galois / Counter Mode (GCM) of operation. The plurality of Galois field multipliers 118 may be configured to perform multiplication operations in a finite field, specifically, the Galois Field GF (2^128) , which is required for the authentication aspect of GCM of operation.
[0062] The exclusive OR (XOR) circuit assembly 120 may include suitable logic, circuitry, interfaces, or code that is configured to perform a logical operation that outputs true only when the number of true inputs is odd. The exclusive OR circuit assembly 120 is commonly used for data arithmetic operations, error detection, encryption algorithms, and parity bit generation in data communication systems. Applications of the exclusive OR circuit assembly 120 may include, but are not limited to, cryptographic systems for combining plaintext with a cryptographic key during encryption, error-checking systems for detecting transmission errors, processors for performing arithmetic calculations, and the like.
[0063] The functioning and examples of the logic circuitry 122 of the authentication hardware 110 is similar to the functioning and examples of the logic circuitry 116 of the encryption / decryption hardware 108. Although the logic circuitry 116 of the encryption / decryption hardware 108 is decoupled from the logic circuitry 122 of the authentication hardware 110 via the use of the buffer 124.
[0064] The buffer 124 may include suitable logic, circuitry, interfaces, or code that is configured to temporarily store data before data transmission (for example, the data transmission from the transmitter 102 to the receiver 104) . The buffer 124 can be in the form of a reserved memory space, used to temporarily hold data while the data is being transferred from one place to another within a cryptographic system, such as the system 100 for communication.
[0065] In operation, there is provided the system 100 for performing channel allocation in a hardware implementation of encryption / decryption functionality and in a hardware implementation of authentication functionality, where the hardware implementations are components of the Galois Counter Mode (GCM) , configured to receive packets of data over the plurality of channels 106.
[0066] The system 100 is specifically designed for the Galois Counter Mode (GCM) and is configured to receive packets of data over the plurality of channels 106. The system 100 is configured to efficiently utilize the hardware resources by dynamically allocating channels of the plurality of channels 106 based on packets of data arrival and processing requirements. When the packets of data arrive, the system 100 may be configured to determine current load on each channel of the plurality of channels 106 and assign the data packets to least loaded channel of the plurality of channels 106. The dynamic channel allocation approach ensures that the plurality of channels 106 is fully utilized, which leads to improved throughput and reduced latency in GCM operations. The purpose of channel allocation is to efficiently handle the processing of data packets in the GCM hardware implementations. By allocating channels, the system 100 can effectively manage the encryption / decryption and authentication processes, ensuring smooth and secure data transmission.
[0067] The system 100 is further configured to perform via the hardware implementation of the encryption / decryption functionality of the GCM, computations on a first packet received over a first of the plurality of channels 106, using a first cryptographic key from the key generator component 112 component of the hardware implementation of the encryption / decryption functionality of the GCM. The term "cryptographic key" refers to a piece of information used in cryptographic systems (such as the system 100) to encrypt or decrypt data. The encryption / decryption hardware 108 is configured to encrypt the first packet (or the first data packet) received over the first channel 106A of the plurality of channels 106 using the first cryptographic key. The key generator component 112 is configured to generate the first cryptographic key. Moreover, the encryption of the first data packet is done using AES scheme. The encryption of the first data packet is done to efficiently allocate channels and perform computations for encryption / decryption and authentication processes. By utilizing the encryption / decryption hardware 108, the system 100 can handle high input rates and various packet sizes while reducing the required number of channels and gates for the data authentication process.
[0068] In accordance with an embodiment, the hardware implementation of the encryption / decryption functionality operates over 128 bits data blocks. The term "128 bits data blocks" refers to data units comprising 128 bits, which are commonly used in cryptographic systems for various operations, such as encryption, decryption, and data integrity verification. The encryption / decryption hardware 108 processes data blocks of 128 bits. This is achieved through the use of AES encryption / decryption algorithm, which specifically operates on 128-bit blocks of data. The AES algorithm utilizes a key, along with the input data, to perform the encryption or decryption process. The computational result of one 128-bit data block serves as the input for the calculation of the next block, allowing for the sequential processing of data blocks. The encryption of the 128-bit data blocks is done to ensure compatibility and adherence to the AES encryption / decryption algorithm. The AES is designed to work with 128-bit blocks, and using a different block size would result in incompatibility and potential errors. Additionally, processing data in blocks allows for efficient and streamlined encryption / decryption operations, as each block can be independently processed. By adhering to the standard block size, the system 100 ensures compatibility with other AES implementations and facilitates interoperability.
[0069] In accordance with an embodiment, the packets are Ethernet packets. The term "Ethernet packets" refers to the units of data that are transmitted over an Ethernet network, containing a header and payload, and conforming to the Ethernet protocol standards for communication between devices in a local area network (LAN) . The system 100 is configured to process Ethernet packets by allocating channels for encryption logic. Each arriving packet requires the allocation of two channels, one for counter0 calculation and one for H calculation. By allocating channels for these calculations, the system 100 ensures efficient processing of Ethernet packets. The system 100 allows for the utilization of available channels in each clock cycle, optimizing the processing time. This further enables the system 100 to handle packets of varying sizes and efficiently allocate resources for encryption and decryption operations.
[0070] In accordance with an embodiment, the hardware implementation of the encryption / decryption functionality processes the received packets at an operating frequency of 1.2 GHz. The processing of the received packets at the operating frequency of 1.2 GHz results in a significant reduction in authentication area for encryption and decryption processes. By utilizing only 6 channels instead of 8, which is equivalent to 400K gates, the system 100 achieves a significant reduction in authentication area. This reduction in area contributes to improved efficiency and cost-effectiveness of the encryption / decryption functionality. Additionally, the parallel implementation of GHASH and AES algorithms ensures that packets of any size can be processed efficiently, enabling high-speed data transmission and secure communication. The system 100 is designed to handle a stream of incoming packets at a rate of 800 Gbps, requiring 83.333 bytes to be processed per clock cycle, which translates to handling approximately five 128-bit blocks (16 bytes per block) in each cycle. Operating at 1.2 GHz allows the encryption / decryption hardware 108 to handle high-throughput data streams efficiently, processing Ethernet packets at line rate while maintaining security. The ability to process 83.333 bytes per clock cycle ensures that the encryption / decryption hardware 108 can support data transmission rates of 800 Gbps. At 1.2GHz frequency, the encryption / decryption hardware 108 uses parallel channels to perform encryption and decryption on multiple blocks simultaneously, ensuring that the system 100 meets the line rate.
[0071] In accordance with an embodiment, the hardware implementation of the encryption / decryption functionality uses the hardware multiplexer 114 to make a determination as to whether a received packet, received over one of the plurality of channels 106, is a new packet or a previously received packet, and if the received packet is determined to be a new packet, the key generator component 112 component of the hardware implementation of the encryption / decryption functionality is used to perform a computation during encryption / decryption. The hardware multiplexer 114 is responsible for determining whether the received packet, which is received over one of the plurality of channels 106, is the new packet or the previously received packet. To achieve this, the encryption logic (or the encryption / decryption hardware 108) allocates two channels for each arriving packet. One channel is used for calculating the counter0, while the other channel is used for calculating H. The counters to be encrypted are then pushed to the logic, along with an input indicating whether they belong to the current packet (current key expansions) or the next packet (next key expansions) . The purpose of using the hardware multiplexer 114 in the encryption / decryption functionality is to efficiently determine whether the received packet is new or previously received. By allocating two channels for each packet, the system 100 can perform the required calculations for encryption and decryption. This approach leads to channel saving, which is equivalent to a significant number of gates, resulting in a reduction of, for example, 25%, in the authentication area for encryption and decryption processes. This optimization is required for improving the overall efficiency and performance of the system 100.
[0072] Moreover, if the received packet is determined as the new packet, the key generator component 112 of the encryption / decryption hardware 108 is utilized to perform a computation during the encryption / decryption process. This computation involves allocating channels for counter calculation and H calculation, depending on the residue of the current packet. The number of channels required for the residue of the current packet can vary from 1 to 6. If the residue requires 1, 2, or 3 channels, the last 3 channels can be used for the next packet within the same clock cycle. However, if the residue requires 4 to 6 channels, the next packet is not processed within the same clock cycle. By utilizing the key generator component 112 to perform the computations during encryption / decryption, the system 100 can enhance the security and efficiency of the process. Allocating channels based on the residue of the current packet allows for optimized resource utilization and enables seamless processing of subsequent packets within the same clock cycle, whenever possible.
[0073] The system 100 is further configured to store the computations in the buffer 124. By storing the computations in the buffer 124, the system 100 can seamlessly transition from one block of data to the next, utilizing the computational result of the previous block as an input. This allows for continuous and uninterrupted processing, enhancing the overall performance and speed of the system 100.
[0074] In accordance with an embodiment, if the received packet is determined to be a previously received packet, stored computations from the buffer 124 are used during encryption / decryption. When the packet is received, the system 100 checks if it has been received before. If the received packet is determined as the previously received packet, the system 100 retrieves stored computations from the buffer 124. These stored computations are then utilized during the process of encryption / decryption. Specifically, the system 100 XORs the plain data with the stored computations to encrypt the data, and XORs the encrypted data with the stored computations to decrypt it. The purpose of using stored computations from the buffer 124 during encryption / decryption is to ensure the symmetric nature of the process. By XORing the data with the same computations used during the initial encryption, the system 100 can effectively decrypt the data. This approach simplifies the encryption / decryption process and reduces the requirement for additional computations or complex algorithms. The reuse of the stored computations from the buffer 124 enhances the overall throughput of the system 100 by avoiding redundant calculations.
[0075] The system 100 is further configured to use the stored computations from the buffer 124 to carry out processing using the hardware implementation of the authentication functionality of the GCM. Upon completion of encryption operations by the encryption / decryption hardware 108 on the first packet received through the plurality of channels 106, the resultant computations are stored in the buffer 124 and extracted from the buffer 124 by the authentication hardware 110. The authentication hardware 110 subsequently applies the GCM authentication algorithms to the extracted data to verify data integrity and authenticity. By employing the buffer 124 as intermediary storage, the system 100 permits independent operation of the authentication functionality from the encryption / decryption process, leading to enhanced resource utilization and throughput. By utilizing the stored computations from the buffer 124, the system 100 can optimize the processing of data using the hardware implementation of the GCM authentication functionality. The system's approach allows for improved efficiency in channel allocation, reducing physical layout and power consumption.
[0076] The logic circuitry 116 of the hardware implementation of the encryption / decryption functionality and the logic circuitry 122 of the hardware implementation of the authentication functionality are decoupled via the use of the buffer 124. The decoupling ensures that encryption / decryption and authentication processes can occur independently of each other without the risk of resource contention, which often leads to delays or bottlenecks in the data processing. By decoupling the logic circuitry 116 of the encryption / decryption hardware 108 and the logic circuitry 122 of the authentication hardware 110, the system 100 allows each logic circuitry to operate independently without interference, enhancing overall performance of the system 100. By separating the two functionalities, the system 100 can save channels and gates, resulting in reduced area requirements for authentication in both the encryption (at the transmitter 102) and decryption (at the receiver 104) directions.
[0077] Each of the hardware implementations of the encryption / decryption functionality and the hardware implementation of the authentication functionality separately allocates packets into the plurality of channels 106 using respectively separate criteria. The encryption / decryption hardware 108 may distribute packets on the plurality of channels 106 based on size, priority, or encryption algorithm, while the authentication hardware 110 may distribute packets on the plurality of channels 106 based on tag size, computational complexity, or security requirements. The separate allocation strategies enable fine-tuned load balancing, accommodating processing demand variations between the encryption / decryption hardware 108 and the authentication hardware 110. Consequently, the system 100 for communication manifests an improved throughput, and gains flexibility to adapt to diverse security policies and performance optimization requirements.
[0078] Within a single clock cycle, a first subset of the plurality of channels 106 processes an existing packet while, within a second subset of the plurality of channels 106 begins processing a new packet, received after the existing packet. Within the single clock cycle, the system 100 divides the available channels into two subsets. The first subset of the plurality of channels 106 processes the existing data packet, while the second subset of the plurality of channels 106 begins processing the new data packet received after the existing data packet. This is done to optimize the processing of packets and improve efficiency. By allowing the beginning of the next arriving packet to be processed simultaneously with the end of the current processed packet, the system 100 maximizes the utilization of available channels and reduces processing time. By utilizing the available channels effectively, the system 100 can handle multiple packets within the single clock cycle, reducing processing time and improving overall system performance.
[0079] In the hardware implementation of the authentication functionality, the plurality of Galois field multipliers 118 are separated from the exclusive OR circuit assembly 120 that sums data from each of the plurality of channels 106, a first subset of the plurality of channels 106 are summed for the existing packet by the first exclusive OR circuit 120A of the exclusive OR circuit assembly 120 and a second subset of the plurality of channels 106 are summed for the new packet by the second exclusive OR circuit 120B of the exclusive OR circuit assembly 120. The system 100 separates the plurality of Galois field multipliers 118 from the exclusive OR circuit assembly 120 in the hardware implementation of authentication functionality (or the authentication hardware 110) . This separation results in two sets of channels: the Galois field multiplication channels (GF channels) and the remaining channels. The exclusive OR circuit assembly 120 sums data from each of these channels. The decoupling the plurality of Galois field multipliers 118 from the exclusive OR circuit assembly 120 enables more efficient parallel processing of multiple data streams. The separation of Galois field multipliers allows for an improved utilization of these computational resources. This leads to enhanced performance and efficiency in the hardware implementation of authentication functionality. Additionally, the system 100 can introduce extended multiplexers to choose H values from the new packet, which are calculated using the existing Galois field multipliers. This further enhances the system's ability to process and authenticate packets efficiently.
[0080] In accordance with an embodiment, the hardware implementation of the authentication functionality uses a finite field multiplication operation. The term "finite field multiplication operation" refers to a mathematical operation performed on elements within a finite field, where two elements are multiplied together to obtain a third element within the same finite field. The authentication hardware 110 is configured to multiply the additional authenticated data (AAD) by H in the finite field GF (2^128) . This multiplication is performed using polynomial multiplication in the finite field GF (2^128) . The GHASH operation, which is part of the authentication process, is also carried out in the same finite field using polynomial multiplication. The polynomial multiplication in GF (2^128) ensures that the field operations required for authentication are both efficient and secure. The GHASH function takes two inputs, for example, a 128-bit block of data (referred to as a "block" ) and a 128-bit hash key (referred to as "H" ) . The output of the GHASH function is also a 128-bit block. By implementing the finite field multiplication operation in the authentication hardware 110, the system 100 can achieve both high security and improved efficiency, making the finite field multiplication suitable for real-time applications.
[0081] In accordance with an embodiment, the buffer 124 is a First-In-First-Out, FIFO, buffer. The buffer 124 as the FIFO buffer ensures that the packets are processed in the order of their arrival, with the first packet received being the first to be processed. The FIFO buffer serves as a temporary storage mechanism for computations and data packets, ensuring orderly processing in the sequence of arrival. By employing the FIFO architecture, the buffer 124 maintains the chronological integrity of incoming data, required for preserving the order of operations in the GCM implementation. The FIFO architecture allows for efficient management of data flow between the encryption / decryption hardware 108 and the authentication hardware 110, preventing bottlenecks and ensuring smooth pipeline operation. By employing the FIFO buffer, the system 100 enables the simultaneous processing of the end of the current packet and the beginning of the next arriving packet within the same clock cycle. Such optimization is possible when there are at least 3 available channels at the end of the current packet. The use of the FIFO buffer improves the overall processing efficiency and reduces any potential delays in packet processing, leading to enhanced system performance.
[0082] In accordance with an embodiment, the GCM is a mode of operation of symmetric key cryptographic block ciphers. The term "symmetric key cryptographic block ciphers" refers to cryptographic algorithms that use the same key for both encryption and decryption, operate on fixed-size blocks of data, and employ symmetric operations to provide confidentiality and integrity of the information. The system 100 is configured to utilize the GCM as the mode of operation for symmetric key cryptographic block ciphers. The GCM mode involves advancing a counter for each subsequent block in the payload, creating the ciphertext. The counter is then encrypted using the AES and XORed with the respective payload block. Additionally, the GCM employs the GHASH calculation, which is responsible for data integrity and authentication. The authentication tag for GCM is obtained by XORing the GHASH calculation with an encryption of an initial vector. The use of GCM as the mode of operation for symmetric key cryptographic block ciphers offers enhanced security and data integrity. The GCM provides both data encryption and authentication, making the system 100 suitable for applications that require secure communication. By utilizing GCM, the system 100 ensures that the transmitted data remains confidential and unaltered during transmission.
[0083] In accordance with an embodiment, key expansion is used in the hardware implementation of the encryption / decryption functionality. The key expansion process starts with an initial encryption key (128, 196, or 256 bits, depending on the AES variant) . The initial encryption key undergoes a series of transformations to generate a set of round keys, which are used in the AES encryption / decryption functionality. Mathematically, if K represents the initial key, the key expansion generates a sequence of round keys K0, K1, K2, …., Kn using a predefined key expansion algorithm. Each round key Ki is used in the AES encryption of the corresponding cipher block, where the encryption process can be represented in form of Equation (1) : Ci=E (Ki, Yi) (1) .
[0084] In Equation (1) , Ci represents the ciphertext for the ith block, E denotes the AES encryption function, Ki is the round key for that block, and Yi is the corresponding plaintext block. The same key expansion is reused for every cipher block within the same packet, eliminating the requirement to recalculate keys for each block. Once the key expansion is completed for the first block, subsequent blocks can use the same expanded keys, allowing parallel encryption / decryption of all blocks in the packet. This uniform approach minimizes the requirement for additional hardware, reduces computational cycles, and speeds up the overall encryption process. The key expansion ensures that each round of AES has a unique key logic, which is used for the subsequent encryption / decryption operations. The key expansion is required to enhance the security and effectiveness of the encryption / decryption process. By generating unique key logic for each round, the system 100 ensures that the encryption / decryption operations are not vulnerable to attacks that exploit patterns or weaknesses in the key. This helps to strengthen the overall security of the system 100.
[0085] In accordance with an embodiment, key expansion is performed in the same way for all cyber blocks that belong to the same packet. The term "cyber blocks" refers to discrete units or components within a cryptographic system that are designed to provide secure and reliable communication, data storage, or computational functions, thereby enhancing the overall security and integrity of the system 100. By reusing the same set of expanded keys for all cipher blocks within the single packet, the system 100 minimizes the computational overhead typically associated with generating new keys for each block. Furthermore, the uniform key expansion facilitates parallel processing of cipher blocks, which improves processing speed and efficiency, particularly in systems that require fast encryption and decryption. By using the same key expansion logic, all blocks within the packet will have the same set of expanded keys, which enhances the security and efficiency of the encryption algorithm.
[0086] In accordance with an embodiment, the hardware implementation of the authentication functionality is performed in a parallel manner. For each block of encrypted data, a Galois Field multiplication of the form H×Xi is performed, where H is the 128-bit hash key (H=E (K, 0128) ) and Xi denotes the ith data block. In the parallel implementation of the authentication hardware 110, channels calculated H×X1, H×X2, …., H×Xn simultaneously for n blocks. The results of these multiplications are then summed, as required in the GHASH operation, to generate the final authentication tag as shown in Equation (2) :
[0087] By performing these multiplications and XOR operations in parallel, the system 100 accelerates the process of generating the authentication tag for a given packet. The parallel implementation of GHASH offers significant enhancement in terms of speed and efficiency. By calculating H×Xi for multiple data blocks concurrently, the total processing time for the authentication tag is reduced, making the system 100 more suitable for high-throughput environments, such as secure data transmission networks. The authentication hardware 110 is required to perform in a parallel manner leading to efficient resource utilization and improved throughput of the system 100. By utilizing parallel processing, the system 100 can handle high input rates and support various packet sizes without compromising on authentication performance. This approach allows for significant savings in terms of channels and gates required for the authentication process.
[0088] Thus, the system 100 resolves the technical problem of inefficient resource utilization in GCM hardware, primarily due to underutilized GHASH logic, limited parallelism in data processing, and bottlenecks in processing one data packet per cycle. The system 100 comprises separate hardware implementations for encryption / decryption and authentication (i.e., the encryption / decryption hardware 108 and the authentication hardware 110) , both of which receive data packets over the plurality of channels 106. The encryption / decryption hardware 108 performs computations on the received packets using the cryptographic key and stores the results in the buffer 124. The authentication hardware 110 then uses the stored computations from the buffer 124 to carry out further processing. The key advantage of the system 100 is that the logic circuitry 116 of the encryption / decryption hardware 108 is decoupled from the logic circuitry 122 of the authentication hardware 110 through the use of the buffer 124, allowing for separate allocation of packets into channels based on different criteria. Additionally, the system 100 enables parallel processing, where one subset of the plurality of channels 106 processes the existing packet while another subset begins processing the new packet in the same clock cycle. Furthermore, the separation of the plurality of Galois field multipliers 118 from the exclusive OR circuit assembly 120 in the authentication hardware 110 enhances efficiency and reduces the required resources. These features synergistically work together to achieve a reduction in authentication area for encryption and decryption, resulting in improved performance and resource utilization in GCM implementations. The system 100 enables parallel implementation of GHASH, prominent component of GCM, which reduces the number of required channels. The reduction in channels leads to significant gate savings, resulting in a reduction in authentication area for encryption and decryption processes. Due to the reduction in authentication area for encryption and decryption processes, the system 100 manifests an enhanced efficiency in terms of space and energy consumption. The system's parallel implementation of GHASH reduces the physical layout and power consumption while maintaining performance metrics. The system 100 is particularly beneficial for applications that require secure encryption / decryption, such as mobile devices, IoT devices, and compact cryptographic systems.
[0089] FIG. 2 is a flowchart of a method of performing channel allocation in a hardware implementation of encryption / decryption functionality and authentication functionality of GCM implementation, in accordance with an embodiment of the present disclosure. FIG. 2 is described in conjunction with the FIGs. 1A and 1B. With reference to FIG. 2, there is shown a method 200 of performing channel allocation in a hardware implementation of encryption / decryption functionality and authentication functionality of GCM implementation. The method 200 includes steps 202 to 208. The system 100 (of FIG. 1) is configured to execute the method 200.
[0090] There is provided the method 200 comprising performing channel allocation in a hardware implementation of encryption / decryption functionality and authentication functionality of the GCM. The method 200 involves receiving packets of data across the plurality of channels 106 and performing cryptographic computations on each packet, beginning with the first packet processed using a cryptographic key generated by the key generator component 112. The cryptographic computations are then temporarily stored in the buffer 124, which is used to decouple the logic circuitry 116 of the encryption / decryption hardware 108 from the logic circuitry 122 of the authentication hardware 110. The decoupled logic allows for parallel processing, enabling each functionality to allocate packets to channels based on distinct criteria. The method 200 further ensures the simultaneous handling of existing packets by a few channels of a single clock cycle while allowing other channels within the same clock cycle to begin processing of newly received packets, thereby enhancing throughput within the single clock cycle. The method 200 further comprises implementing authentication functionality using the plurality of Galois field multiplications independently from summing of data from multiple channels using exclusive OR operations. The method 200 further includes summing data from a first plurality of channels associated with an existing packet using a first exclusive OR operation, and summing data from a second plurality of channels associated with a new packet using a second exclusive OR operation.
[0091] At step 202, the method 200 comprises receiving packets of data over the plurality of channels 106. The plurality of channels 106 provides parallel data paths, allowing several packets to be received simultaneously. By utilizing the plurality of channels 106, the method 200 ensures efficient data flow, minimizes data congestion and reduces latency.
[0092] At step 204, the method 200 further comprises performing, via the hardware implementation of the encryption / decryption functionality of the GCM, computations on a first packet received over a first of the plurality of channels 106, using a first cryptographic key from the key generator component 112 component of the hardware implementation of the encryption / decryption functionality of the GCM. The cryptographic computations are executed using the encryption / decryption hardware 108. The first packet undergoes encryption / decryption process based on the specified mode of operation, leveraging the encryption / decryption hardware 108 optimized circuitry to ensure secure and efficient processing. The use of the first cryptographic key ensures that the data within the packet is securely encrypted for transmission and securely received for decryption for further use, aligning with the overall cryptographic workflows.
[0093] At step 206, the method 200 further comprises storing the computations in the buffer 124. The encryption / decryption computations are performed on the received packet thereafter, the computed data is stored in the buffer 124 for temporary retention. The buffer 124 serves as an intermediary storage component, ensuring that the computed data is readily available for subsequent operations. By temporarily storing the computed data, the buffer 124 manages the timing and coordination between different stages of processing, allowing the method 200 to efficiently retrieve and utilize the data for further steps, such as data authentication or additional cryptographic tasks.
[0094] At step 208, the method 200 further comprises using the stored computation from the buffer 124 to carry out processing using hardware implementation of the authentication functionality of the GCM. Once the cryptographic computations have been stored in the buffer 124, the authentication hardware 110 retrieves the computations from the buffer 124 to perform integrity checks and verification processes. The authentication functionality applies specific algorithms, such as Galois field (GF) multiplications, to validate the authenticity of the data. By leveraging the stored computations from the buffer 124, the method 200 ensures that the authentication functionality operates independently from the encryption / decryption functionality while maintaining data continuity.
[0095] The logic circuitry of the hardware implementation of the encryption / decryption functionality and the logic circuitry 116 of the hardware implementation of the authentication functionality is decoupled via the use of the buffer 124. The decoupling allows the encryption / decryption hardware 108 to complete its computations and store the results in the buffer 124, while the authentication hardware 110 retrieves the data as required for further processing.
[0096] Each of the hardware implementations of the encryption / decryption functionality and the hardware implementation of the authentication functionality separately allocates packets into the plurality of channels 106 using respectively separate criteria. The encryption / decryption hardware 108 may distribute packets on the plurality of channels 106 based on size, priority, or encryption algorithm, while the authentication hardware 110 may distribute packets on the plurality of channels 106 based on tag size, computational complexity, or security requirements.
[0097] Within a single clock cycle, the first subset of the plurality of channels 106 processes an existing packet while, within a second subset of the plurality of channels 106 begins processing a new packet, received after the existing packet. Within the single clock cycle, the method 200 divides the available channels into two subsets. The first subset of the plurality of channels 106 processes the existing data packet, while the second subset of the plurality of channels 106 begins processing the new data packet received after the existing data packet.
[0098] In the hardware implementation of the authentication functionality, the plurality of Galois field multipliers 118 is separated from the exclusive OR circuit assembly 120 that sums data from each of the plurality of channels 106, wherein the first subset of the plurality of channels 106 are summed for the existing packet by a first exclusive OR circuit 120A of the exclusive OR circuit assembly 120 and the second subset of the plurality of channels 106 are summed for the new packet by a second exclusive OR circuit 120B of the exclusive OR circuit assembly 120. The decoupling the plurality of Galois field multipliers 118 from the exclusive OR circuit assembly 120 enables more efficient parallel processing of multiple data streams. Additionally, the method 200 can introduce extended multiplexers to choose H values from the new packet, which are calculated using the existing Galois field multipliers.
[0099] In accordance with an embodiment, the hardware implementation of the encryption / decryption functionality uses the hardware multiplexer 114 to make a determination as to whether a received packet, received over one of the plurality of channels 106, is a new packet or a previously received packet, and if the received packet is determined to be a new packet, the key generator component 112 component of the hardware implementation of the encryption / decryption functionality is used to perform a computation during encryption / decryption. The purpose of using the hardware multiplexer 114 in the encryption / decryption functionality is to efficiently determine whether the received packet is new or previously received. By allocating two channels for each packet, the method 200 can perform the required calculations for encryption and decryption. This approach leads to channel saving, which is equivalent to a significant number of gates, resulting in a reduction of, for example, 25%, in the authentication area for encryption and decryption processes.
[0100] In accordance with an embodiment, if the received packet is determined to be a previously received packet, stored computations from the buffer 124 are used during encryption / decryption. This computation involves allocating channels for counter calculation and H calculation, depending on the residue of the current packet. Allocating channels based on the residue of the current packet allows for optimized resource utilization and enables seamless processing of subsequent packets within the same clock cycle, whenever possible.
[0101] In accordance with an embodiment, the hardware implementation of the encryption / decryption functionality operates over 128 bits data blocks. The 128-bits block size allows more data to be encrypted or decrypted in a single operation, reducing the number of total operations required for large datasets.
[0102] In accordance with an embodiment, the hardware implementation of the authentication functionality uses a finite field multiplication operation. By implementing the finite field multiplication operation in the authentication hardware 110, the method 200 can achieve both high security and improved efficiency, making the finite field multiplication suitable for real-time applications.
[0103] In accordance with an embodiment, the buffer 124 is a First-In-First-Out, FIFO, buffer. The FIFO buffer serves as a temporary storage mechanism for computations and data packets, ensuring orderly processing in the sequence of arrival. By employing the FIFO architecture, the buffer 124 maintains the chronological integrity of incoming data, required for preserving the order of operations in the GCM implementation.
[0104] In accordance with an embodiment, the GCM is a mode of operation of symmetric key cryptographic block ciphers. The method 200 is configured to utilize the GCM as the mode of operation for symmetric key cryptographic block ciphers. The GCM mode involves advancing a counter for each subsequent block in the payload, creating the ciphertext. The counter is then encrypted using the AES and XORed with the respective payload block. The use of GCM as the mode of operation for symmetric key cryptographic block ciphers offers enhanced security and data integrity.
[0105] In accordance with an embodiment, the packets are Ethernet packets. The method 200 is configured to process Ethernet packets by allocating channels for encryption logic. Each arriving packet requires the allocation of two channels, one for counter0 calculation and one for H calculation.
[0106] In accordance with an embodiment, the hardware implementation of the encryption / decryption functionality processes the received packets at an operating frequency of 1.2 GHz. The processing of the received packets at the operating frequency of 1.2 GHz results in a significant reduction in authentication area for encryption and decryption processes. Operating at 1.2 GHz allows the encryption / decryption hardware 108 to handle high-throughput data streams efficiently, processing Ethernet packets at line rate while maintaining security.
[0107] The steps 202 to 208 are only illustrative, and other alternatives can also be provided where one or more steps are added, or one or more steps are provided in a different sequence without departing from the scope of the claims herein.
[0108] There is provided a computer program comprising instructions for carrying out all the steps of the method 200. The computer program is executed on a computer system. The computer program is implemented as an algorithm, embedded in a software stored in the non-transitory computer-readable storage medium having program instructions stored thereon, the program instructions being executable by the one or more processors in the computer system to execute the method 200. The non-transitory computer-readable storage means may include, but are not limited to, an electronic storage device, a magnetic storage device, an optical storage device, an electromagnetic storage device, a semiconductor storage device, or any suitable combination of the foregoing. Examples of implementation of computer-readable storage medium, but are not limited to, an Electrically Erasable Programmable Read-Only Memory (EEPROM) , a Random Access Memory (RAM) , a Read Only Memory (ROM) , a Hard Disk Drive (HDD) , a Flash memory, a Secure Digital (SD) card, a Solid-State Drive (SSD) , a computer-readable storage medium, and / or a CPU cache memory.
[0109] FIG. 3 illustrates a parallel GCM architecture with four channels for high-performance encryption and authentication, in accordance with an embodiment of the present disclosure. FIG. 3 is described in conjunction with the FIGs. 1A, 1B, and 2. With reference to FIG. 3, there is shown a parallel GCM architecture 300 with four channels for high-performance encryption and authentication. The parallel GCM architecture 300 operates in two modes of operation namely, a first mode of operation 302 which describes AES encryption process and a second mode of operation 304 which describes GHASH authentication process.
[0110] The AES channel handles 128-bit data blocks, allowing the system 100 to process 64 bytes per cycle. In the first mode of operation 302, the AES encryption process starts with the encryption key (i.e., the cryptography key) generated by the key generator component 112. The key generator component 112 expands the encryption key across all AES channels (e.g., Y1, Y2, Y3, and Y4) , ensuring that the same key is used consistently across different AES channels and rounds. The system 100 uses four data inputs, such as Din 1, Din 2, Din 3, and Din 4. Each data input represents 128-bit blocks of incoming data for each of the four parallel channels. Each data input is fed into one of the four AES encryption channels. Afterward, each input data block undergoes a masking operation before being passed into the AES encryption process. The masking function adds an extra layer of security to the data input by masking the raw input (i.e., data input) before encryption, ensuring that sensitive information remains protected during processing. Each channel (Y1, Y2, Y3, and Y4) comprises a dedicated AES encryption block, where 128-bits data block is encrypted. The AES process involves multiple rounds (e.g., up to 14 rounds) of data transformation using the key generated by the key generator component 112. Each round manipulates the data based on the key to produce an encrypted output. The outputs from the four AES channels are labelled as Dout 1, Dout 2, Dout 3, and Dout 4. After the AES encryption process, another masking operation is applied to the encrypted data. The other masking operation supports to maintain data security and prevent any exposure of sensitive information after the encryption is complete. The encrypted outputs (i.e., Dout 1, Dout 2, Dout 3, and Dout 4) from the AES channels are passed to the second mode of operation 304 of GHASH authentication process.
[0111] The second mode of operation 304 deals with GHASH authentication process which begins with using the plurality of the Galois field multipliers 118 required for computation of the authentication tag. The output from the four AES channels is used in the second mode of operation 304, where mathematical operations are performed to generate authentication tags. Within the GHASH authentication process, each encrypted output (Dout 1, Dout 2, Dout 3, and Dout 4) is processed in parallel using the plurality of Galois Field multipliers 118. The GHASH authentication process calculates the authentication tag, which is required for validating the integrity of the data. The encrypted outputs are accumulated using registers (Reg 1, Reg 2, Reg 3, and Reg 4) , and a final output is computed through a hash function (H4 and H) .
[0112] Moreover, the AES encryption process is decoupled from the GCM authentication process via the use of the buffer 124, which has been described in detail, for example, in FIG. 1B. The parallel GCM architecture 300 enables a cryptographic system (e.g., the system 100) to process 64 bytes per cycle (i.e., 16 bytes per channel) . The parallel GCM architecture 300 is designed to handle high-speed Ethernet streams at 1.2 GHz, capable of processing approximately 83.333 bytes per clock cycle for an 800 Gbps input stream. Given that each AES channel can process 16 bytes per cycle, the system 100 requires the plurality of channels 106 to perform parallelly to meet this demand. In the case of handling a stream of 129-byte packets, two channels are used for authentication while the remaining channels are used for data processing. To maintain line-rate performance, at least 6 parallel AES and GHASH channels are required.
[0113] FIG. 4 illustrates a data packet scheduling or transmission scheme across a plurality of channels over several clock cycles, in accordance with an embodiment of the present disclosure. FIG. 4 is described in conjunction with elements from FIGs. 1A, 1B, 2 and 3. With reference to FIG. 4, there is shown a data packet scheduling scheme 400 that allows for efficient handling of multiple data packets across six channels over six clock cycles. The six channels are represented as Channel #0, Channel #1, Channel #2, Channel #3, Channel #4 and Channel#5 and six clock cycles are represented as Cycle #0, Cycle #1, Cycle #2, Cycle #3, Cycle #4 and Cycle #5. Each cell of the data packet scheduling scheme 400 includes a “packet X” where X can be N-1, N, N+1, up to N+4, which represents data packets in a sequential order. A few data packets are marked with a keyword “curr” which indicates that current packet being processed, and a few data packets are marked with the keyword “new” which indicates that a new packet is being processed.
[0114] The system 100 for communication (of FIG. 1) is configured to employ the data packet scheduling scheme 400. The data packet scheduling scheme 400 allows the start of a new packet to be processed simultaneously with the end of the current packet within the same clock cycle. This approach maximizes throughput and efficiency of the system 100. In the data packet scheduling scheme 400 two entirely new packets cannot be processed in the same clock cycle, further results in maintaining a balance between efficiency and processing capacity. The notation "Packet N (curr) " indicates the current packet being processed, while "Packet N+1 (new) " represents a newly arriving packet. Furthermore, a packet marked as "new" in one cycle becomes the "current" packet in the subsequent cycle. This is evident in the progression from cycle to cycle in the data packet scheduling scheme 400. The system 100 is configured to use counters for encryption, which are fed into the logic with indicators specifying whether the counters belong to the current packet (using current key expansions) or to a new packet (requiring next key expansions) . The data packet scheduling scheme 400 effectively demonstrates how packets flow through the channels over time, with new packets being introduced as current packets are completed, ensuring continuous processing without gaps. This approach allows for a seamless transition between packets, optimizing the use of available processing resources and minimizing idle time in the encryption logic. The data packet scheduling scheme 400 represents an innovative approach for handling high-speed data streams, particularly useful in scenarios requiring continuous, high-throughput packet processing and encryption.
[0115] The data packet scheduling scheme 400 initiates with the Channel #0 loaded with the packet N-1 for the clock cycle #0, while Channels #1 to Channel #5 introduce new Packet N. During Cycle #1, Channel #0 introduces its new Packet N, as the other channels transition to Packet N+1. From Cycles #2 through Cycle #5, all channels synchronize, transmitting the same packet sequence progressing from N+1 to N+3. New packets (designated as N) are introduced in Cycle #0 and Cycle #1. The packet designation increases with each cycle (N-1 → N → N+1 → N+2 → N+3) , representing the progression of packets through the transmission queue. Channel #0 operates on a one-cycle delay compared to the other channels when introducing new packets. The staggering creates an offset in packet transmission across the six channels, potentially allowing for more efficient use of network resources and improved overall performance of the system 100 for communication.
[0116] FIG. 5 illustrates a flow of operations in an Advanced Encryption Standard (AES) encryption process for a single channel, in accordance with an embodiment of the present disclosure. FIG. 5 is described in conjunction with elements of FIGs. 1A, 1B, 2, 3 and 4. With reference to FIG. 5, there is shown a schematic diagram 500 that depicts a flow of operations in an AES encryption process for a single channel.
[0117] As mentioned in FIG. 4, a key expansion is required for each packet. This is mentioned in the data packet scheduling scheme 400 that two new packets cannot be processed within the same clock cycle. The key calculation for the current packet can be used from a sample from the previous cycle and the key expansion logic is configured to calculate an expanded key for a new packet. The curr / next indication as shown in FIG. 5, is used to decide from where to take the key expansion values (from the key expansion logic (i.e., the key expansion 502 block) or from a sample of the logic from the previous cycle) and thus save the implementation of another key expansion logic.
[0118] The key expansion process begins with an input “Key” feeding into a key expansion 502 block. The key expansion 502 block is configured to generate multiple round keys (or expanded keys) , represented by "Round #1 Sample 504" and "Round #2 Sample 504A" . Thereafter, key sample selection is performed using hardware multiplexers, such as the hardware multiplexer 114. The multiple round keys (or expanded keys) that is Round #1 Sample 504" and "Round #2 Sample 504A" are passed through the hardware multiplexer 114. The hardware multiplexer 114 can be controlled by a "Curr / next" signal, allowing selection between current and next round keys. The "Curr / next" signal is driven into the hardware multiplexer 114 which has two inputs, one input straight from the key expansion 502 block (for new data packets) and one input from a key sample (for current data packet) . The selected key samples feed into another set of multiplexers, which also receive input from "Sample" blocks. These likely represent the intermediate states of data being encrypted. The core encryption happens in an "AES round 506" block. The AES round 506 block receives inputs from the key sample multiplexers and a "Text" input, presumably the plaintext to be encrypted. The schematic diagram 500 shows XOR operations (represented by ⊕ symbols) before and after the AES round 506 block, integrating the round keys with the data. A "Key length" input is also shown in the schematic diagram 500, which likely determines the number of rounds in the AES process. The AES encryption process produces multiple "Sample" outputs, representing the encrypted data at various stages. The schematic diagram 500 emphasizes the iterative nature of the AES encryption process, the significance of key expansion and round key selection, and the integration of expanded keys with the data being encrypted. The multiplexers (i.e., the hardware multiplexer 114) allow for efficient reuse of key samples across multiple encryption rounds or channels, potentially optimizing the hardware implementation.
[0119] Additionally, the AES round 506 block refers to one iteration of a series of operations that transform the input data (i.e., plaintext) into output data (i.e., ciphertext) or vice versa during the AES encryption / decryption process. The AES encryption / decryption process works in a series of rounds, where each round modifies the input data (i.e., plaintext) into output data (i.e., ciphertext) to secure the input data through a series of transformations. The National Institute of Standards and Technology (NIST) standard specifies three instantiations of AES: AES-128, AES-192, and AES-256, where the suffix indicates the bit length of the key. The key length refers to the size of the encryption key used in the AES encryption process. The key length determines the strength of the encryption and the number of rounds performed during the AES encryption / decryption process. The block size (i.e., the length of the data inputs and outputs) is 128 bits in each of the three AES cases. The cipher blocks go through 10 to 14 rounds of fixed transformation. The number of rounds depends on the size of the key used, such as 10 rounds for 128-bit key (or AES-128) , 12 rounds for 192-bits key (or AES-192) , and 14 rounds for a 256-bits key (or AES-256) . In each of the rounds of the AES encryption process, the key is manipulated by the key expansion 502 block. Each key expansion is used as logic for the next round of encryption. The key expansion of a certain round is identical to all cypher blocks that belong to the same packet. (Since the same key applies to all block in a single packet, the key expansion is the same as well) . The AES encryption process uses 4 data manipulation functions in each round, that is SubBytes, ShiftRows, MixColumns and AddRoundKey (i.e., XOR data with the relevant key expansion) .
[0120] For each counter (as mentioned, the advanced counter is to be used as a cipher text) the AES runs the following steps:
[0121] 1. Get the counter;
[0122] 2. Call AddRoundKey on counter / / first round;
[0123] 3. Call SubBytes on counter;
[0124] 4. Call ShiftRows on counter;
[0125] 5. Call MixColumns on counter;
[0126] 6. Call AddRoundKey on counter;
[0127] 7. Repeat steps 3 –6:
[0128] 7.1. For AES 128, 8 times;
[0129] 7.2. For AES 196, 10 times;
[0130] 7.3. For AES 256, 12 times;
[0131] 8. Call SubBytes on counter;
[0132] 9. Call ShiftRows on counter;
[0133] 10. Call AddRoundKey on counter.
[0134] Counter is now the cipher text.
[0135] In the AES encryption process, each block of data is encrypted separately from rest of the data blocks, therefore, the AES encryption process is suitable for parallelization. Instead of implementing another key expansion logic for another concurrent packet, the proposed AES encryption process incorporates 14×128-bit registers and 2×128-bit 14×multiplexers (Muxes) to achieve the same functionality. Such addition compensates for over 1M gates that are saved for reducing two channels.
[0136] An embodiment of the disclosure provides a packet processing scheme across multiple channels over multiple clock cycles. In this embodiment, there is provided a packet processing scheme across multiple channels (for example, six channels) over multiple clock cycles. The packet processing scheme is described via different scenarios of packet processing across six channels, and an exemplary scenario of a packet processing.
[0137] Four cases representing different scenarios of packet processing across the 6 channels and this embodiment also focuses on a specific example over two clock cycles, illustrating how a cryptographic system (e.g., the system 100) handles packet transitions.
[0138] One part of this embodiment of packet processing demonstrates how the system 100 is configured to manage packet processing and channel allocation to maximise throughput. The four different cases for packet processing are demonstrated.
[0139] Case 1: the residue of the current packet requires 1 channel (i.e., Channel 1) , last 3 channels (e.g., Channel 4, Channel 5 and Channel 6) can be used for the next packet within the same clock cycle.
[0140] Case 2: the residue of the current packet requires 2 channels (e.g., Channel 1 and Channel 2) , last 3 channels (i.e., Channel 4, Channel 5 and Channel 6) can be used for the next packet within the same clock cycle.
[0141] Case 3: the residue of the current packet requires 3 channels (e.g., Channel 1, Channel 2 and Channel 3) , last 3 channels (i.e., Channel 4, Channel 5 and Channel 6) can be used for the next packet within the same clock cycle.
[0142] Case 4: the residue of the current packet requires 4-6 channels (e.g., Channel 1, Channel 2, Channel 3 and Channel 4) , the next packet is not processed within the same clock cycle.
[0143] One part of this embodiment demonstrates a specific example of processing a data packet of, for example, 170 bytes. The data packet requires a GHASH processing of 144 bytes (170-26) . This means 8 channels are used in the first clock cycle of processing the data packet (128 bytes) and 1 channel is used in the second clock cycle of processing (16 bytes) . Since there are 7 available channels in the second cycle, if there is another arriving packet waiting to be processed, then the last 3 channels are used for the next data packet.
[0144] Clock 1: All 6 channels are processing the data packet (or first data packet) .
[0145] Clock 2: The first 3 channels complete the processing of the first packet, while the last 3 channels begin processing the next packet.
[0146] The packet processing scheme of this embodiment maximizes efficiency by allowing immediate transition to the next packet when possible, reducing idle time and optimizing channel utilization in packet processing.
[0147] Furthermore, in order to enhance the throughput of the system 100 for communication, the GHASH authentication process can be divided in two parts, first is Galois field H multiplication (the computationally intensive part) and second is XOR addition of the multiplication results. By decoupling these operations, a low-cost addition to the XOR scheme can be introduced, enabling the computation of two packets in the same cycle (the end of the current packet and the beginning of the next one) . After this separation, 6 channels are there for each clock cycle: Galois field multiplication channels (GF channels) and XOR addition channels (current XOR channels) . For concurrent processing of a subsequent packet, an additional 3 XOR channels (subsequent XOR channels) are introduced. This approach allows for more efficient utilization of the Galois field multipliers, which are the most resource-intensive components in the authentication hardware 110. For this, the system 100 is configured to operate in two modes:
[0148] Case 1: when all channels are in use or no new packet has arrived, therefore, the output from all GF channels is directed to the current XOR channels.
[0149] Case 2: When a new packet arrives and at least 3 channels are idle then, data from the last three GF channels is redirected to the subsequent XOR channels.
[0150] Such design enhances overall throughput by maximizing the use of available resources and allowing for seamless transition between packets.
[0151] One part of this embodiment further describes the use of each channel as a GF channel. Each channel is a reference GHASH channel that multiplies an encrypted 16 byte data with a respective power of a hash key (H) . H is a reference hash key. The Hash key is created by encrypting 128 zero bits with a respective key. The power of H is calculated by Galois field multiplication with H. The GF multiplier is a Galois field multiplier that is implemented with around 50K gates. The addition in the context of GHASH calculation is exclusive OR over 128 bits. For instance, in the first clock cycle, GF channels utilize values from H to H6, engaging six GF multipliers for the current packet. In the second clock cycle, while GF Channel 1 continues to process the value H for the current packet, GF Channels 4, 5, and 6 are allocated to handle H values of the next packets (H', H'2, and H'3) . In the first clock, when all channels are in use or no new packet has arrived, output from all GF channels is directed to the current exclusive OR channels for immediate processing. In the second clock cycle, if a new packet arrives while at least three channels are idle, data from the last three GF channels is rerouted to subsequent XOR channels for processing. The flexible routing strategy allows the system 100 to quickly adapt to varying payloads and packet sizes. By implementing the structured approach to packet handling and channel utilization, the system 100 achieves efficient and parallel processing of data packets.
[0152] FIGs. 6A, 6B, and 6C collectively, illustrate a schematic diagram of GHASH implementation, in accordance with an embodiment of the present disclosure. FIGs. 6A, 6B, and 6C are described in conjunction with elements from FIGs. 1A, 1B, 2, 3, 4, and 5. With reference to FIGs. 6A, 6B and 6C, there is shown a schematic diagram 600 that depicts the GHASH implementation in a parallel GCM architecture. The schematic diagram 600 represents the buffer 124, a data aligner and control 602, an integrity check value (ICV) sampling 704, the plurality of Galois field multipliers 118 (represented as “H” ) , and an exclusive OR operation performed by the exclusive OR circuit assembly 120.
[0153] The initial processing is shown in FIG. 6A. The data aligner and control 602 for arranging and synchronization data is shown in FIG. 6B. The data packet processing, GF channels, control and synchronization, and output results are shown in FIG. 6C.
[0154] The data aligner and control 602 is a component in the parallel GCM architecture, responsible for organizing and preparing incoming data streams for parallel processing. The data aligner and control 602 receive data, including additional authenticated data (AAD) and packet data, typically in large chunks (such as 1024-bit segments) . The data aligner and control 602 breaks the large chunks into smaller, and manageable data packets that can be fed into the plurality of the Galois field multipliers 118 and other processing channels. By aligning the data into the correct format and ensuring that the data is synchronized across all processing stages, the data aligner and control 602 ensures smooth data flow throughout the encryption and authentication process.
[0155] The ICV sampling 604 refers to a process that verifies and ensures the integrity of processed data. The ICV sampling 604 is positioned at the culmination of the data flow, receiving inputs from various processing stages including the buffer 124 and the Galois field (GF) operations. The operation of the GCM architecture is initialised by receiving multiple input data packets. The input data packets are immediately processed using two buffers and each buffer corresponds to the buffer 124 (FIFO) . The buffer 124 can be configured to manage and organize the incoming data packets and temporarily store the data packets. The outputs from the buffer 124 are then strategically routed toward the central processing sections, including the ICV sampling 604. The ICV sampling 604, positioned at the top of the schematic diagram 600, receives processed data from the buffer 124 and other system components, performing critical integrity checks and final data validation before providing an output. FIG. 6A also represents “Partial sum for previous iteration” and “Partial sum for current iteration” , which performs cumulative calculations on the data streams, possibly aggregating or summarizing information from multiple channels.
[0156] Now referring to FIG. 6B, which represents the parallel processing of cryptographic data using the plurality of the GF multipliers 118 and XOR channels. After the data is aligned and prepared for further processing by the data aligner and control 602, the data is divided across multiple channels for simultaneous processing. The plurality of GF multipliers 118 is configured to perform calculations on the incoming data to generate hash values (H) , which are used for packet authentication. In this stage, the current and next packets are processed concurrently across two clock cycles. In Clock 1, all six GF channels are used for the current packet, while in Clock 2, the last three GF channels (e.g., Channel 4, Channel 5, and Channel 6) are allocated to the next packet's computations, resulting in enhanced resource utilization. These results are then fed into XOR gates, where the outputs of the GF multipliers are combined through parallel XOR operations. The parallel GCM architecture includes multiplexers to select between different H values and control the flow of data, ensuring smooth transitions between the current and next packet's computations. FIG. 6B also includes the plurality of “x sample” block which appears to be responsible for data sampling or synchronization at various points in the processing chain, where X sample means H sample, X may be H, H3 or H2. FIG. 6B also incorporates several multiplication operations, indicated by circled "x" symbols.
[0157] Now referring to FIG. 6C, which represents additional hardware multiplexers and H sample selectors that determine which hash values (H) are used for the next round of computations, ensuring the simultaneous processing of both the current and next packets. The outputs of the hardware multiplexer 114 can be directed to the XOR channels based on the clock cycle, ensuring efficient utilization of channels throughout the GHASH implementation. FIG. 6C also incorporates the GF sample block, which handles the arithmetic operations for specific data segments within the packet (e.g., dividing data into 128-bit blocks) . The results of the arithmetic operations are provided to further XOR gates that combine the outputs from multiple channels, ultimately generating the cryptographic authentication tag or hash value. The use of AND gates and registers further refine the process by controlling which channels are active and storing the partial sums from the XOR operations.
[0158] Moreover, the proposed channel allocation scheme during parallel implementation of AES encryption and GHASH authentication processes requires the following additions for the last 3 channels; extended multiplexers to choose H values from the new packet (which are calculated with the existing Galois field multipliers, 2×128 XOR gates to sum all three channels, 3×128 AND gates to enable the XOR computation for each channel, 2×16 bit MUX to choose from computation of current packet or next packet, 2×128 bits register for new partial sum. Such addition compensates for over 200K gates that are saved for reducing 2 channels.
[0159] As described, each AES channel encrypted / decrypted 16 bytes of data. Considering an 800Gbps stream of incoming ethernet packets to be processed, either on a single ethernet port of 800Gbps or on aggregation of multiple ethernet ports. If the stream is processed within an operating frequency of 1.2GHz, the encryption / decryption hardware 108 is required to process 83.333 bytes per a single clock cycle, which means only 6 parallel channels are required for GHASH (83.333 bytes / 16 bytes = 5.208 channels which are rounded up to 6) . However, in a conventional scheme, only one packet can be processed in a cycle and 6 channels are not enough. The minimum requirement is 8 channels. The solution to aforementioned technical problem is a parallel implementation of AES and GHASH algorithms which require only 6 channels to support 800Gbps input rate at 1.2 GHz for any received packet size. Saving 2 channels which are equivalent to 400K gates for authentication process. A reduction of 25%in authentication area for encryption (Tx direction at the transmitter 102) and decryption (Rx direction at the receiver 104) . A total of 1.4M gates reduction is obtained using AES and GHASH optimization.
[0160] Modifications to embodiments of the present disclosure described in the foregoing are possible without departing from the scope of the present disclosure as defined by the accompanying claims. Expressions such as "including" , "comprising" , "incorporating" , "have" , "is" used to describe and claim the present disclosure are intended to be construed in a non-exclusive manner, namely allowing for items, components or elements not explicitly described also to be present. Reference to the singular is also to be construed to relate to the plural. The word "exemplary" is used herein to mean "serving as an example, instance or illustration" . Any embodiment described as “exemplary” is not necessarily to be construed as preferred or advantageous over other embodiments and / or to exclude the incorporation of features from other embodiments. The word "optionally" is used herein to mean "is provided in some embodiments and not provided in other embodiments" . It is appreciated that certain features of the present disclosure, which are, for clarity, described in the context of separate embodiments, may also be provided in combination in a single embodiment. Conversely, various features of the present disclosure, which are, for brevity, described in the context of a single embodiment, may also be provided separately or in any suitable combination or as suitable in any other described embodiment of the disclosure.
Claims
1.A method (200) of performing channel allocation in a hardware implementation of encryption / decryption functionality and in a hardware implementation of authentication functionality, where the hardware implementations are components of the Galois Counter Mode, GCM, comprising steps of:receiving packets of data over a plurality of channels (106) ;performing, via the hardware implementation of the encryption / decryption functionality of the GCM, computations on a first packet received over a first of the plurality of channels (106) , using a first cryptographic key from a key generator component (112) of the hardware implementation of the encryption / decryption functionality of the GCM;storing the computations in a buffer (124) ;using the stored computations from the buffer (124) to carry out processing using the hardware implementation of the authentication functionality of the GCM;wherein logic circuitry (116) of the hardware implementation of the encryption / decryption functionality and logic circuitry (122) of the hardware implementation of the authentication functionality are decoupled via the use of the buffer (124) ;wherein each of the hardware implementation of the encryption / decryption functionality and the hardware implementation of the authentication functionality separately allocates packets into the plurality of channels (106) using respectively separate criteria;wherein, within a single clock cycle, a first subset of the plurality of channels (106) processes an existing packet while, within a second subset of the plurality of channels (106) begin processing a new packet, received after the existing packet; andwherein, in the hardware implementation of the authentication functionality, a plurality of Galois field multipliers (118) are separated from an exclusive OR circuit assembly (120) that sums data from each of the plurality of channels (106) , wherein the first plurality of channels are summed for the existing packet by a first exclusive OR circuit (120A) of the exclusive OR circuit assembly (120) and wherein the second plurality of channels are summed for the new packet by a second exclusive OR circuit (120B) of the exclusive OR circuit assembly (120) .2.The method (200) of claim 1, wherein the hardware implementation of the encryption / decryption functionality uses a hardware multiplexer (114) to make a determination as to whether a received packet, received over one of the plurality of channels (106) , is a new packet or a previously received packet, and if the received packet is determined to be a new packet, the key generator component (112) of the hardware implementation of the encryption / decryption functionality is used to perform a computation during encryption / decryption.3.The method (200) of claim 2, wherein if the received packet is determined to be a previously received packet, stored computations from the buffer (124) are used during encryption / decryption.4.The method (200) of claim 1, wherein the hardware implementation of the encryption / decryption functionality operates over 128 bits data blocks.5.The method (200) of claim 1, wherein the hardware implementation of the authentication functionality uses a finite field multiplication operation.6.The method (200) of claim 1, wherein the buffer (124) is a First-In-First-Out, FIFO, buffer.7.The method (200) of claim 1, wherein the GCM is a mode of operation of symmetric key cryptographic block ciphers.8.The method (200) of claim 1, wherein key expansion is used in the hardware implementation of the encryption / decryption functionality.9.The method (200) of claim 8, wherein key expansion is performed in the same way for all cyber blocks that belong to the same packet.10.The method (200) of claim 1, wherein the hardware implementation of the authentication functionality is performed in a parallel manner.11.The method (200) of claim 1, wherein the packets are Ethernet packets.12.The method (200) of claim 1, wherein the hardware implementation of the encryption / decryption functionality processes the received packets at an operating frequency of 1.2 GHz.13.A system (100) comprising means adapted for carrying out all the steps of the method (200) according to any preceding method claim.