Smartnic-based network traffic acceleration device and method

The SmartNIC-based network traffic acceleration device optimizes data transmission by using SmartNICs with onboard processors for deduplication and AI-specific compression, addressing network slowdowns and host load, ensuring efficient and accurate data transfer.

WO2026141817A1PCT designated stage Publication Date: 2026-07-02UNIST (ULSAN NAT INST OF SCI & TECH)

Patent Information

Authority / Receiving Office
WO · WO
Patent Type
Applications
Current Assignee / Owner
UNIST (ULSAN NAT INST OF SCI & TECH)
Filing Date
2025-07-15
Publication Date
2026-07-02

Smart Images

  • Figure KR2025010389_02072026_PF_FP_ABST
    Figure KR2025010389_02072026_PF_FP_ABST
Patent Text Reader

Abstract

The present disclosure relates to a smartNIC-based network traffic acceleration device and method, the device comprising: a memory for storing at least one instruction for smartNIC-based network traffic acceleration; and a processor for performing an operation according to the instruction, wherein the processor performs network optimization in a WAN environment through smartNIC on the basis of a WAN optimizer using the smartnNIC, and lightweights data and compresses artificial intelligence data so as to optimize data transmission.
Need to check novelty before this filing date? Find Prior Art

Description

Smartnik-based network traffic acceleration device and method

[0001] The present disclosure relates to a SmartNIC-based network traffic acceleration device and method, and specifically, to a device and method that improves data processing speed by utilizing a SmartNIC, which is a network interface card (NIC).

[0002] In 2022, the International Telecommunication Union (ITU) estimated mobile and fixed broadband traffic rates to calculate global and regional aggregations. The estimation results indicated that the global share of mobile broadband traffic in 2022 reached 913 exabytes (EB), more than double the traffic in 2019 (419 EB). This represents an approximately fivefold increase in fixed broadband traffic in 2022 compared to 2019. Furthermore, mobile and fixed broadband traffic recorded an average annual growth rate of 30% from 2019 to 2022, and network usage is predicted to increase due to the expansion of related industries.

[0003] Meanwhile, SaaS (Software as a Service) is a service that allows users to receive software over the internet without the need to install or manage it themselves. Examples of SaaS include shared drives and OTT. An example of a shared drive is Google Drive. TM (Google Drive), Microsoft 365 TM (Microsoft 365), Dropbox TM There are Dropbox, etc., and OTT (Over-the-top media service) includes Netflix TM (Netflix), Apple TV TM (Apple TV), Disney Plus TM There are (Disney Plus), etc.

[0004] SaaS is highly accessible as it can be used anywhere with an internet connection, and its convenience—as users do not need to manage software—is leading to a continuous increase in usage and usage time. Furthermore, the global SaaS market is expected to continue growing, reaching $295.08 billion by 2025. Since SaaS performs tasks over the internet, data transmission is required. However, the increase in SaaS usage leads to an increase in data traffic across the entire network, causing a slowdown in overall network speed. Telecommunications operators in the European Union, Alphabet's Google TM (GOOGL.O), Meta's Facebook TM (META.O), Netflix TM They have been urging the implementation of new legislation that would require U.S. tech companies, such as NFLX.O, to bear part of the costs of European telecommunications networks, arguing that they consume the majority of internet traffic in the European region.

[0005] Furthermore, artificial intelligence requires large amounts of data and GPUs for model training, and cloud computing is frequently used for this purpose. In such cases, a large volume of training data must be transmitted to the training server. Additionally, since training a single model takes a long time, training is often conducted across multiple servers (distributed computing, federated learning). Moreover, when training a Large Language Model (LLM) using distributed computing, information regarding each layer of the model, queries, and KVCache must be transmitted between servers; KVCache stores the Key (K) and Value (V) generated for each layer in the form of tensors. Due to the large scale of the data in KVCache, it causes an increase in overall network traffic and requires additional computations to reduce this data size. Federated Learning (FL) is being researched to reduce network traffic because it trains models individually on each client (e.g., mobile devices, computers) and transmits only the results to a central server for integration.

[0006] Furthermore, unlike general data, data used for AI training exists in special forms (tensors, caches, gradients, etc.), and compression that does not consider these specific characteristics negatively affects the accuracy of AI models; in particular, in the case of LLM, the KVCache used in the early stages has a significant impact on model accuracy.

[0007] In addition, Sky Computing aims to integrate various cloud service providers into an integrated ecosystem where users can submit workloads without being tied to a specific provider.

[0008] There are issues between cloud platforms, such as the difficulty of transferring data (data gravity), the tendency to continue using a specific vendor's services (vendor lock-in), and functional mismatches between different clouds (compatibility issues).

[0009] This idea utilizes an intercloud broker that handles job placements across different clouds based on user preferences regarding cost, performance, or other criteria. Additionally, 'SkyPilot', presented at NSDI 2023, proposes an intercloud broker that handles job placements across different clouds based on user preferences regarding cost, performance, or other criteria.

[0010] Conventional Network Interface Cards (NICs) are developed based on Application-Specific Integrated Circuits (ASICs), which have the advantages of easy mass production and low cost, but have the disadvantage that it is impossible to change their internal logic.

[0011] To address this, a new type of Smart Network Interface Card (SmartNIC) was developed. By possessing its own onboard processor and the necessary computing resources, the SmartNIC can handle various workloads on behalf of the Host. NVIDIA TM By using the GPU Direct technology provided by, the amount of memory computation is reduced by using the GPU directly without using additional memory for computation.

[0012] The purpose of the embodiments disclosed in this disclosure is to provide a SmartNIC-based network traffic acceleration device and method that utilizes a SmartNIC, a new type of network interface card (NIC), to transmit data quickly and efficiently through techniques such as network optimization in a WAN environment, data lightweighting, and AI data compression, thereby maximizing network performance and increasing data processing speed.

[0013] However, the problem to be solved according to one embodiment is not limited only to that mentioned above.

[0014] A SmartNIC-based network traffic acceleration device according to the present disclosure for achieving the aforementioned technical problem comprises: a memory storing at least one instruction for SmartNIC-based network traffic acceleration; and a processor that performs an operation according to the instruction, wherein the processor can optimize data transmission by performing network optimization in a WAN environment through the SmartNIC based on a WAN Optimizer utilizing SmartNIC, and by performing data lightweighting and artificial intelligence data compression.

[0015] At this time, the processor performs deduplication and compression processes to lighten the data in the smartnic, and can perform compression of the artificial intelligence data if it is related to artificial intelligence.

[0016] In addition, to eliminate duplication within the transmitted data, the processor can divide the transmitted data brought in via RDMA / DMA into chunks of a certain size, calculate the hash value of the corresponding chunk, check whether the corresponding hash value exists in the cache of the smartnik, and if it exists in the cache, replace the chunk with the form of a corresponding token.

[0017] In addition, for the purpose of data lightweighting, the processor can first divide the deduplicated data into chunks of a certain size, compress each chunk at a different compression level, and then transmit it to the network.

[0018] Additionally, the processor can reduce data noise and identify trends in traffic tracking of the network through a pre-configured algorithm 1 based on an ARIMA model, train a Random Forest Regression (RFR) model to calculate transmission time between the transmitting and receiving ends based on available computing resources of the transmitting and receiving ends through a pre-configured algorithm 2, predict the processing time of the data lightweighting between the transmitting and receiving ends through the RFR model, input the result of the network traffic predicted through trend identification of the algorithm 1 through a pre-configured algorithm 3 into the RFR model trained by the algorithm 2, calculate the estimated time according to a plurality of lightweighting combinations through the RFR model, and select the value for the lightweighting having the lowest estimated time among the calculated estimated times as the degree of data lightweighting.

[0019] In addition, the processor can generate the chunk by moving a window with the length of the chunk one byte at a time along the direction of data progression.

[0020] In addition, the processor provides a compression algorithm in the smartnic and can store input data and queries in the form of tensors to enable compression of the artificial intelligence data.

[0021] In addition, when the processor receives lightweight data from the network, it can check the degree of lightweighting of the received data, perform decompression and redundancy restoration on the checked data, and transmit it to the host via RDMA / DMA.

[0022] Additionally, the present disclosure describes a traffic acceleration method performed by a processor of a SmartNIC-based network traffic acceleration device, wherein the processor may include the step of optimizing a network in a WAN environment through a SmartNIC based on a WAN Optimizer utilizing a SmartNIC; and the step of optimizing data transmission by the processor performing data lightweighting and compression of artificial intelligence data.

[0023] According to the present disclosure, it is possible to lighten the transmission data of a general-purpose application and to perform data lightening using smartniks, thereby providing the effect of reducing host overhead and saving computing resources.

[0024] In addition, according to the present disclosure, the effect of providing various options that can be used for data lightweighting depending on the device used by the user (hardware characteristics such as whether a CPU, GPU, or smartnic is installed) is provided.

[0025] In addition, according to the present disclosure, various deduplication techniques and degrees, compression algorithms, and compression levels are selected in consideration of user requirements, and the characteristics of the data and the compression algorithm and level have an effect that significantly affects CPU usage, compression calculation time, and compression ratio.

[0026] In addition, according to the present disclosure, the effect of minimizing the impact on model training accuracy and shortening training time through transmission that considers the specificity of AI data, and reducing the overhead of the host by performing data lightweighting on the smartnik instead of the host busy with model training, is provided.

[0027] In addition, according to the present disclosure, when transmitting AI data from one server to another, the network transmission takes a long time due to the massive size of the data. Furthermore, when performing data parallelism during distributed learning, it provides the effect of enabling gradients calculated on each server to be exchanged with each other for weight updates.

[0028] In addition, according to the present disclosure, data parallelism enables distributed training on multiple servers during deep learning training, and StellaTrain is utilized to compress gradients and enable high-speed transmission.

[0029] Furthermore, according to the present disclosure, by utilizing CacheGen's encoding technique that takes into account the specificity of the data, the effect of significantly minimizing the impact on model accuracy is provided, and the data used for training can be efficiently transmitted when requested from another server from a server capable of compressing large-scale data in the form of Tensors into a bitstream.

[0030] In addition, according to the present disclosure, the time required for the inference process is reduced and the speed of service provision is increased, thereby providing an effect of improving the quality of the service.

[0031] In addition, according to the present disclosure, various deduplication and compression algorithms and degrees are provided based on given data or models, taking into account the characteristics of the transmitted data and the device being used, as well as user requirements, thereby providing the effect of enabling a higher degree of data compression.

[0032] In addition, according to the present disclosure, data lightening through deduplication and compression of transmitted data provides the effect of reducing network usage fees by reducing network transmission speed and network traffic through data lightweighting.

[0033] In addition, according to the present disclosure, the degree of data lightweighting can be determined by considering the available computing resources of the receiving end as well as the transmitting end, thereby providing the effect of enabling faster end-to-end data transmission.

[0034] In addition, according to the present disclosure, the effect of reducing host computational load and traffic through SmartNIC offloading and saving energy by offloading network functions to BlueField (a type of SmartNIC) is provided.

[0035] In addition, according to the present disclosure, a reduction in the computational load of the CPU reduces the heat generation of the CPU, thereby reducing energy consumption, and the RDMA technology used for network packet processing provides the effect of minimizing the CPU usage of the Host.

[0036] In addition, according to the present disclosure, the effect of realizing a Green Network is provided by reducing overall network power consumption through the reduction of network traffic.

[0037] In addition, according to the present disclosure, network infrastructure and technology through a green network can increase energy efficiency and minimize environmental impact, and use sustainable technology to reduce power consumption and minimize carbon emissions, thereby providing the effect of making the entire ecosystem of the network environmentally friendly.

[0038] The effects obtainable from the exemplary embodiments of the present disclosure are not limited to those mentioned above, and other unmentioned effects can be clearly derived and understood by those skilled in the art to which the exemplary embodiments of the present disclosure belong from the description below. That is, unintended effects resulting from the implementation of the exemplary embodiments of the present disclosure can also be derived by those skilled in the art from the exemplary embodiments of the present disclosure.

[0039] FIG. 1 is a drawing showing the structure of a SmartNIC according to an embodiment.

[0040] FIG. 2 is a diagram showing a service architecture providing a smartnic-based network traffic acceleration system according to an embodiment.

[0041] FIG. 3 is a drawing showing the WANOPS system architecture according to an embodiment.

[0042] FIG. 4 is a drawing showing a port version of the WANOPS architecture according to an embodiment.

[0043] FIG. 5 is a diagram showing the configuration of the WANOPS architecture according to an embodiment.

[0044] FIG. 6 is a drawing showing a linker segment according to an embodiment.

[0045] FIG. 7 illustrates the structure of an RDMA request (RDMA_REQ) and a return value between a Host and a SmartNick according to an embodiment, and is a diagram showing the details of the RDMA request and the return value.

[0046] Figure 8 is a sequence diagram illustrating the client-side data communication process using WANProxy.

[0047] FIG. 9 is a sequence diagram illustrating the server-side data communication process using WANProxy in an embodiment.

[0048] FIG. 10 is a block diagram of a smartnic-based network traffic acceleration device according to an embodiment.

[0049] FIG. 11 is a diagram illustrating a data compression process according to an embodiment.

[0050] FIGS. 12 to 14 are drawings illustrating algorithms 1 to 3 for determining the degree of data lightweighting according to an embodiment.

[0051] FIG. 15 is a drawing showing the deduplication process according to an embodiment.

[0052] FIG. 16 is a table showing the performance of the compression algorithm evaluated on the reference dataset in the embodiment, and a diagram showing the characterization results using seven datasets on the SoC core and the BlueField-2 C-engine when executing four compression designs at various dimensions.

[0053] FIG. 17 is a diagram illustrating a data compression process according to an embodiment.

[0054] FIG. 18 is a drawing showing a CDN of a smartnic-based network traffic acceleration system according to an embodiment.

[0055] FIG. 19 is a diagram illustrating the data processing process in sky computing in an embodiment.

[0056] FIG. 20 is a drawing showing an AI distributed computing system by WANOPS according to an embodiment.

[0057] FIG. 21 is a drawing showing a multimodal-based WANOP according to an embodiment.

[0058] FIG. 22 is a diagram illustrating a smartnic-based network traffic acceleration method according to an embodiment.

[0059] Hereinafter, various embodiments of the present disclosure are described in conjunction with the accompanying drawings. As various embodiments of the present disclosure may be subject to various modifications and may have various forms, specific embodiments are illustrated in the drawings and described in detail. However, this is not intended to limit the various embodiments of the present disclosure to specific forms, and it should be understood that they include all modifications and / or equivalents and substitutions that fall within the spirit and scope of the various embodiments of the present disclosure. In relation to the description of the drawings, similar reference numerals have been used for similar components.

[0060] In various embodiments of the present disclosure, terms such as “comprising” or “having” are intended to indicate the existence of the features, numbers, steps, actions, components, parts, or combinations thereof described in the specification, and should be understood as not precluding the existence or addition of one or more other features, numbers, steps, actions, components, parts, or combinations thereof.

[0061] In various embodiments of the present disclosure, expressions such as “or” include any and all combinations of the words listed together. For example, “A or B” may include A, may include B, or may include both A and B.

[0062] Expressions such as "first," "second," "first," or "second" used in various embodiments of the present disclosure may modify various components of the various embodiments, but do not limit such components. For example, such expressions do not limit the order and / or importance of such components and may be used to distinguish one component from another.

[0063] When it is mentioned that a component is "connected" or "joined" to another component, it should be understood that the component may be directly connected or joined to the other component, but that a new component may also exist between the component and the other component.

[0064] In the embodiments of the present disclosure, terms such as "module," "unit," "part," etc. are used to refer to a component that performs at least one function or operation, and such component may be implemented in hardware or software, or in a combination of hardware and software. Additionally, a plurality of "modules," "units," "parts," etc. may be integrated into at least one module or chip and implemented as at least one processor, except where each needs to be implemented in specific individual hardware.

[0065] Terms such as those defined in commonly used dictionaries should be interpreted as having meanings consistent with their meanings in the context of the relevant technology, and should not be interpreted in an ideal or overly formal sense unless explicitly defined in the various embodiments of the present disclosure.

[0066] Hereinafter, embodiments of the present disclosure will be described in detail with reference to the attached drawings.

[0067] Figure 1 is a diagram showing the structure of a SmartNIC according to an embodiment.

[0068] As illustrated in (1) of FIG. 1, (a) is a diagram showing the structure of a SmartNIC, and (b) of FIG. 1 is a diagram showing a general NIC and a SmartNIC structure. Conventional network interface cards (NICs) are developed based on Application-Specific Integrated Circuits (ASICs). While this has the advantages of being easy to mass-produce and inexpensive, it has the disadvantage that it is impossible to change the internal logic. In the embodiment, a new type of network interface card (SmartNIC) is developed to solve this problem. By having its own on-board processor and the necessary computing resources, the SmartNIC can process various workloads on behalf of the host. In addition, by using GPU Direct technology, the amount of memory computation can be reduced by using the GPU directly without using additional memory for computation.

[0069] FIG. 2 is a diagram showing a service architecture providing a smartnic-based network traffic acceleration system according to an embodiment.

[0070] As illustrated in (2) of FIG. 2, the two devices to be transmitted consist of a Host and a SmartNIC. Here, the two devices refer to the entity transmitting the data and the entity receiving the data. The entities transmitting and receiving the data correspond to servers, clients, and nodes within a cloud environment. An application runs on the Host of the two devices, and a module for data optimization developed by us runs on the SmartNIC. At the transmission end, data proceeds from the Host to the SmartNIC, and from the SmartNIC to the WAN (external network). In the embodiment, the application running on the Host uses the WANOPS (WAN Optimizer by SmartNIC) library according to the embodiment. When the application running on the Host uses a socket function, the WANOPS library hooks it and transmits the data to the SmartNIC via RDMA / DMA. Subsequently, the SmartNIC performs data optimization and performs the socket communication on behalf of the Host. At the receiving end, data is transmitted from the WAN (external network) to the SmartNIC, and from the SmartNIC to the Host. The application running on the Host uses the WANOPS library according to the embodiment.

[0071] In the embodiment, when an application running on the host uses a socket function, the WANOPS library hooks it and sends a socket communication control message to the SmartNIC via RDMA / DMA. The SmartNIC performs the socket communication on behalf of the host, and when data arrives, the SmartNIC restores the lightweight data to the original data and transmits it to the host via RDMA / DMA.

[0072] Figure 3 is a diagram showing the WANOPS system architecture according to an embodiment.

[0073] As illustrated in (3) of FIG. 3, data transmission can be performed between a host and a SmartNick, or between a SmartNick and a WAN (external network). Data transmission between a host and a SmartNick is data transmission that takes place within a single device and is transmitted using RDMA / DMA. Since it does not pass through a complex network stack, fast transmission is possible. Furthermore, it enables data transmission without CPU intervention, which means the CPU can handle other tasks. Additionally, data transmission between a SmartNick and a WAN is data transmission that takes place between two devices over a network. Conventional WANProxies increase the computational load on the host because they perform encoding or decoding processes on the host before transmission. If the host does not perform encoding or decoding processes, an additional device is required to operate the WANProxy. On the other hand, in the embodiment, lightweight data is transmitted through the SmartNick's TCP / UDP socket.

[0074] Figure 4 is a diagram showing a port version of the WANOPS architecture according to an embodiment.

[0075] As illustrated in (4) of Figure 4, the main components and operation methods of a data transmission system between a client and a server, such as data processing through a proxy server, WAN-based communication, and high-speed data transmission using RDMA / DMA technology, are visualized. The functions of the encoder and decoder utilizing the proxy server's cache are focused on significantly improving network efficiency.

[0076] Figure 5 is a diagram showing the configuration of the WANOPS architecture according to an embodiment.

[0077] Figure 5 (5) is a diagram showing the structure of data processing and network transmission, visually representing the data flow and key functions between the Host and SmartNick. In the embodiment, RDMA / DMA technology is used to transmit data at high speed with minimal CPU intervention, and SmartNick's AI-based compression / decompression function focuses on optimizing network bandwidth and performance. Additionally, SmartNick distinguishes between general data and AI-related data to perform encoding and decoding. General data undergoes deduplication and compression, while AI-related data undergoes AI compression tailored to the characteristics of the data being sent, replacing the deduplication and compression processes. This means that data lightweighting is performed using different methods by distinguishing the types of data. Furthermore, caching is used in deduplication, which is a general data lightweighting method. WANOPS is the core of WAN optimization and plays a role in further enhancing network efficiency.

[0078] FIG. 6 is a diagram showing a linker segment according to an embodiment.

[0079] As illustrated in (6) of FIG. 6, the Linker Segment defines a message format for data transmission between the Host and the Smart NIC. In the embodiment, data transmission consists of a Request and a Return structure. The Request structure is divided into imm_data and a Segment. The Request transmits data necessary for executing socket-related functions in the Smart NIC. The Return transmits the result returned after executing the socket-related function in the Smart NIC to the Host. Additionally, the data transmitted between the Host and the Smart NIC enables the socket-related function called from the Host to operate with the same semantics in the Smart NIC. The Linker Segment illustrated in FIG. 6 is a message format designed to increase the efficiency and stability of data transmission between the Host and the Smart NIC. It guarantees the accuracy and performance of data communication by defining details such as RDMA requests, data size, network ports, and processing results.

[0080] FIG. 7 illustrates the structure of an RDMA request (RDMA_REQ) and a return value between a Host and a Smart NIC according to an embodiment, and shows the details of the RDMA request and the return value. Table (7) in FIG. 7 shows in detail how an RDMA request (RDMA_REQ) is processed and the values ​​returned upon success or failure. In the embodiment, the return value upon success may be a socket file descriptor (Sockfd), a data size (buf_sz), or a simple success code (0). Upon failure, -1 is generally returned to indicate failure. This information is directly related to the protocol design of the Linker Segment and clearly explains the operation of RDMA-based data transmission.

[0081] FIGS. 8 and 9 are sequence diagrams illustrating the client-side data communication process using WANProxy. In the embodiment, FIGS. 8 and 9 illustrate how SmartNIC handles the process when a socket-related function is called from the host. Therefore, FIGS. 8 and 9 show what data is transmitted in the overall flow and where the actual operation of each socket-related function is performed. Additionally, in the embodiment, they are used when calling each socket function in FIGS. 6 and 7. They represent the interaction between an RDMA-based client, a host (SmartNIC Linker), and WANProxy. Referring to (8) in FIG. 8, the communication process can be performed through connection establishment, data transmission, and connection termination processes. Connection establishment is the process of creating and connecting a socket through an RDMA request. During the data transmission process, data transmission / reception and ACK are verified. Additionally, during the connection termination process, the connection is safely terminated through the FIN / ACK procedure.

[0082] WANProxy serves as the central hub of the WAN network, maintaining a TCP / IP-based connection with SmartNick. This process combines RDMA and WANProxy to maximize the efficiency of network transmission. Furthermore, WANProxy is the underlying program used by WANOPS, and it operates within SmartNick. The Proxy in Fig. 4 refers to WANProxy, and furthermore, to WANOPS.

[0083] FIG. 9 is a sequence diagram illustrating the server-side data communication process using WANProxy in an embodiment. FIG. 9 shows the interaction between the RDMA server (WebServer), the Host-Smartnik Linker (Socket Server), and WANProxy in steps. Referring to (9) in FIG. 9, the process includes initialization and setup, which involves creating a socket, binding, and waiting for a connection; a client connection process that establishes a socket connection through a 3-way handshake; a message delivery process that comes from outside the WANProxy; a data transmission process; and a connection termination process. The data transmission process is confirmed by data transmission / reception and ACK. The connection termination process is safely terminated through a FIN / ACK procedure. In the embodiment, WANProxy acts as an intermediate hub for data transmission between the server and the client and efficiently supports TCP / IP-based network communication.

[0084] The SmartNIC-based network traffic acceleration device according to the embodiment utilizes SmartNIC, a new type of network interface card (NIC), to transmit data quickly and efficiently through techniques such as network optimization in a WAN environment, data lightweighting, and AI data compression, thereby maximizing network performance and increasing data processing speed.

[0085] In addition, the smartnik-based network traffic acceleration device according to the embodiment performs two processes for data lightweighting in smartnik: deduplication and compression, and AI compression if related to AI. In addition, the smartnik-based network traffic acceleration device according to the embodiment first divides the deduplicated data into fixed chunks, compresses each chunk at a different compression level, and transmits each chunk to the network. In addition, the smartnik-based network traffic acceleration device according to the embodiment determines the degree of data lightweighting using pre-configured algorithms 1, 2, and 3.

[0086] Algorithm 1 illustrated in Fig. 12 below is an algorithm that uses an ARIMA (Autoregressive Integrated Moving Average) model in network traffic tracking to reduce noise in data and identify trends, and may include at least one model among ARIMA, SARIMA (Seasonal ARIMA), and ARIMAX (ARIMA with Exogenous variables).

[0087] In addition, Algorithm 2 shown in Fig. 13 below is an algorithm that trains a Random Forest Regression (RFR) model to calculate the transmission time between the transmitting and receiving ends based on the available computing resources of the transmitting and receiving ends, and can predict the data lightweighting processing time between the transmitting and receiving ends.

[0088] In addition, Algorithm 3 illustrated in Fig. 14 below is an algorithm that determines the degree of data lightweighting using an RFR model learned by Algorithm 2. By inputting the results regarding the network traffic predicted through trend identification of Algorithm 1 into the RFR model learned by Algorithm 2, the estimated time according to a plurality of lightweighting combinations is calculated through the RFR model, and the value for lightweighting having the lowest estimated time among the calculated estimated times can be selected as the degree of data lightweighting. That is, Algorithm 3 is an algorithm that determines the smallest degree of data lightweighting between the transmitting and receiving ends through the RFR model learned by Algorithm 2.

[0089] In addition, the smartnic-based network traffic acceleration device according to the embodiment provides various compression algorithms in the smartnic and stores input data and queries in the form of tensors to enable AI compression. In addition, AI compression supports both tensors and gradients.

[0090] FIG. 10 is a block diagram of a smartnic-based network traffic acceleration device according to an embodiment. In the embodiment, a server is a computing system that provides services to other computers or devices in a computer network or stores and manages data. The server (200) accepts requests from other computers or devices called clients and provides responses or data to those requests. The configuration of the server (200) shown in FIG. 2 is merely a simplified example.

[0091] The communication module (110) can be configured regardless of the mode of communication, such as wired or wireless, and can be configured with various communication networks, such as a Personal Area Network (PAN) or a Wide Area Network (WAN). Additionally, the communication module (110) can operate based on the known World Wide Web (WWW) and may utilize wireless transmission technologies used for short-range communication, such as Infrared Data Association (IrDA) or Bluetooth. For example, the communication module (110) may be responsible for transmitting and receiving data necessary to perform a technique according to one embodiment of the present disclosure.

[0092] Memory (120) may refer to any type of storage medium. For example, memory (120) may include at least one type of storage medium among flash memory type, hard disk type, multimedia card micro type, card type memory (e.g., SD or XD memory, etc.), RAM (Random Access Memory), SRAM (Static Random Access Memory), ROM (Read-Only Memory), EEPROM (Electrically Erasable Programmable Read-Only Memory), PROM (Programmable Read-Only Memory), magnetic memory, magnetic disk, and optical disk. Such memory (120) may also constitute the database shown in FIG. 1.

[0093] The memory (120) can store at least one instruction that can be executed by the processor (130). Additionally, the memory (120) can store any form of information generated or determined by the processor (130) and any form of information received by the server (200). Additionally, the memory (120) stores various types of modules, instruction sets, or models.

[0094] The processor (130) can perform technical features according to embodiments of the present disclosure to be described below by executing at least one instruction stored in memory (120). In one embodiment, the processor (130) may be composed of at least one core and may include a processor for data analysis and / or processing, such as a central processing unit (CPU) of a computer device, a general purpose graphics processing unit (GPGPU), or a tensor processing unit (TPU).

[0095] This processor (130) can train a neural network or model designed in a machine learning or deep learning manner. To this end, the processor (130) can perform calculations for training the neural network, such as processing input data for training, extracting features from input data, calculating errors, and updating the weights of the neural network using backpropagation. Additionally, the processor (130) can perform inference for a specific purpose using a model implemented in an artificial neural network manner.

[0096] In the embodiment, the processor (130) describes an AI decompression process based on a service architecture that provides a smartnic-based network traffic acceleration system. Smartnic performs two main processes for data lightweighting. If the transmitted data is related to AI, deduplication and compression processes are performed using AI features. The deduplication process is a task to remove duplicates within the transmitted data. In the embodiment, Round Robin can be performed. Additionally, the processor (130) performs a compression process. In the embodiment, compression is performed to reduce the size of the transmitted data. In the embodiment, zlib, zstd, lz4, DOCA compression, and GPU compression (nvCOMP) can be performed. Furthermore, through the AI ​​compression process, a data lightweighting algorithm that considers the specificity of AI data is executed.

[0097] Additionally, the processor (130) performs compression based on real-time network changes. In the embodiment, the processor (130) compresses the entire previously deduplicated data into a single compression level, divides it into fixed chunks, and transmits it to the network.

[0098] FIG. 11 is a diagram illustrating a data compression process according to an embodiment.

[0099] As illustrated in (11) of FIG. 11, the processor (130) first divides the deduplicated data into fixed sizes, then compresses each chunk to a different compression level and transmits each chunk to a network. FIGS. 12 to 14 are diagrams showing algorithms 1 to 3 for determining the degree of data lightweighting according to an embodiment. Referring to (12), (13) and (14) in FIGS. 12 to 14, in the embodiment, the processor (130) reduces noise in the data and identifies trends in network traffic tracking through an ARIMA-based algorithm 1, and learns a Random Forest Regression (RFR) model to calculate the transmission time between the transmitting and receiving ends based on available computing resources of the transmitting and receiving ends through an algorithm 2, thereby predicting the data lightweighting processing time between the transmitting and receiving ends, and inputs the result of the network traffic predicted through the trend identification of the algorithm 1 into the RFR model learned by the algorithm 2 through the algorithm 3, calculates the estimated time according to a plurality of lightweighting combinations through the RFR model, and selects the value for lightweighting that has the lowest estimated time among the calculated estimated times as the degree of data lightweighting.

[0100] At this time, when determining the degree of data lightweighting, the processor (130) considers all available computing resources of the encoding and decoding end, which differs from conventional general data deduplication and compression in that they only consider the available resources of the transmitting end, and as a result, determining the degree of data lightweighting by considering the available resources of the transmitting and receiving end enables faster end-to-end data transmission.

[0101] FIG. 15 is a diagram showing a deduplication process according to an embodiment.

[0102] Referring to (15) in FIG. 15, the processor (130) according to the embodiment replaces the chunk in the form of the corresponding token when it is in the cache. In the embodiment, when the processor (130) divides into chunks to create chunks, a window with the length of the chunk moves 1 byte at a time along the direction of data flow to create the chunks. Through this, the embodiment can eliminate more duplication than when dividing into fixed units as in the conventional method. Additionally, the processor (130) performs the hashing process quickly through hashing using Round robin, which reuses previously checked values. The embodiment provides improved performance and better collision resistance compared to conventional hashes. Subsequently, the processor (130) performs a cache matching process. The cache consists of chunks and tokens, and the token is data that is much smaller than the chunk. The cache restores the same data by having the transmitting side and the sending side share the same value. In the embodiment, the cache is stored in memory and disk, and the frequently used cache is stored in memory. Previously, deduplication took a long time because it had to wait until data was saved to both memory and disk; however, this technique reduces time by executing the next operation immediately after data is saved to memory. Data is saved to disk sequentially after being saved to memory.

[0103] FIG. 16 is a figure citing existing research results to explain the contents related to the embodiment. Referring to FIG. 16, (a) represents the CPU utilization of the SoC core, (b) represents the compression latency, (c) represents the decompression latency, (d) represents the compression throughput, and (e) represents the decompression throughput.

[0104] Referring to (16) in FIG. 16, the processor (130) performs compression to reduce the size of the data being transmitted. In the embodiment, the processor (130) provides various compression algorithms. For example, the processor (130) can perform compression using zlib, zstd, lz4, DOCA compression, and GPU Compression (nvCOMP). Also, referring to FIG. 16, in 'Dynamic code compression for JavaScript engine' published in Practice and Experience 2022, the results (compression ratio) of compressing the same data using various compression techniques can be verified. Additionally, the processor (130) provides a hardware-based compression technique. In the embodiment, compression can be performed at a faster speed than the same compression algorithm computed using a CPU. Furthermore, by utilizing the Compression-engine (C-engine) within SmartNick, the compression time is reduced by more than 70%, and in the case of GPU Compression (nvCOMP) available in SmartNick, the compression and decompression of large amounts of data can be performed quickly using a GPU. In addition, Smartnik enables GPU usage path optimization through GPU Direct RDMA. In the embodiment, it allows for faster operation by bypassing the host.

[0105] FIG. 17 is a diagram illustrating a data compression process according to an embodiment.

[0106] Referring to (17) in FIG. 17, the processor (130) stores input data and queries in the form of a tensor (multidimensional vector) KV Cache, particularly in the case of artificial intelligence, specifically LLM. CacheGen allows the original KV cache to undergo three steps to convert into a bitstream. Additionally, the processor (130) executes the three steps of CacheGen in the deduplication and compression steps. In an embodiment, the processor (130) can perform compression through Threshold-v compression based on StellaTrain. This allows the processor to estimate a threshold v such that only k elements have a size greater than v, thereby significantly reducing the computational burden associated with sorting.

[0107] Additionally, the processor (130) performs the three stages of CacheGen. In the embodiment, Delta encoding is used to reduce the difference between keys or vectors, and similar data is reconstructed to facilitate compression. Additionally, the processor (130) performs layer-wise quantization. In the embodiment, the processor (130) performs slightly more aggressive quantization in the later parts of the overall AI model because the degree to which it affects the accuracy of the model decreases as it progresses toward the later parts.

[0108] Additionally, the processor (130) enables the use of fewer bits as the distribution of data becomes more skewed through Smart Arithmetic Coding (AC). In the embodiment, data can be grouped by channel or layer to reduce entropy. That is, in the embodiment, the data is represented with fewer bits through the grouping process. In the embodiment, entropy is the number of bits of an element.

[0109] Meanwhile, when the processor (130) receives lightweight data from the network, it can check the degree of lightweighting of the received data and perform decompression and redundancy restoration on the checked data and transmit it to the host via RDMA / DMA. At this time, the decompression and redundancy restoration may be in the reverse order of the compression and deduplication described above.

[0110] The following describes the data processing scenarios of WANOPS.

[0111] FIG. 18 is a diagram showing a Content Delivery Network (CDN) of a smartnic-based network traffic acceleration system according to an embodiment.

[0112] As illustrated in (18) of FIG. 18, the embodiment provides a Content Delivery Network (CDN), which is a system for quickly delivering content of a website or application to a user. The CDN consists of servers distributed across multiple geographical locations, and by providing data from the server closest to the user, it increases transmission speed and reduces waiting time. Data transmission is frequent between the servers constituting the CDN for the synchronization of content.

[0113] Data lightweighting is necessary when the latest content needs to be retrieved from the source server or when the content requested by the user is unavailable, as the size of the transmitted data is large.

[0114] The communication scenario between CDN servers is described in the following examples. Data is transmitted from Server A of the CDN to Server B. In the examples, the Host of Server A transmits the content data to be sent to the SmartNick via RDMA / DMA. The SmartNick of Server A deduplicates and compresses the data and transmits it to the SmartNick of Server B.

[0115] The SmartNIC on Server B decompresses the received data and restores the data deduplication to convert it back into original data, then transmits it to the Host on Server B via RDMA / DMA. In the embodiment, the Hosts of Servers A and B use the WANOPS library to transmit the SmartNIC via RDMA / DMA without any additional code changes. Furthermore, since the SmartNIC performs data lightweighting, the computing resources of the Host can be significantly saved. Additionally, the embodiment enables RDMA / DMA transmission while reducing computing resource consumption through data lightweighting.

[0116] In addition, data lightweighting enables faster transmission, allowing for the provision of quick services to users.

[0117] Sky computing is a next-generation cloud computing paradigm focused on interoperability and portability among multiple cloud service providers. Sky computing aims to enable users to flexibly use services across multiple cloud environments without being dependent on a single cloud provider. Sky computing requires data transfer between multiple clouds. However, since each cloud provider charges users for network usage fees during the data transfer process, frequent data transfer between multiple clouds can lead to high costs. Accordingly, the processor (130) transfers data through a sky computing transfer scenario. In the embodiment, to transfer data from Cloud A to Cloud B, the processor (130) transfers data from Cloud A to the SmartNick of A via RDMA / DMA. At this time, the SmartNick of A deduplicates and compresses the data and transfers it to the SmartNick of Cloud B. Additionally, the SmartNick of Cloud B decompresses and restores the deduplication of the received data to convert it into original data and transfers it to the Host of Cloud B via RDMA / DMA. Through this, the processor (130) enables a reduction in cloud usage fees and saves computing resources through smartnic offloading. Additionally, it enables saving network usage fees through data lightweighting and saving computing resources by selecting data lightweighting options (deduplication unit, compression level). Furthermore, data lightweighting enables faster transmission, allowing for faster service provision to users and reducing traffic on the entire network.

[0118] FIG. 19 is a diagram illustrating the data processing process in sky computing in an embodiment.

[0119] With reference to (19) in FIG. 19, a sky computing transmission scenario is described below. First, data is transmitted from Cloud A to B, and data from Cloud A is transmitted to A’s SmartNick via RDMA / DMA. A’s SmartNick deduplicates and compresses the data and transmits it to Cloud B’s SmartNick. Cloud B’s SmartNick decompresses and restores the data deduplication of the received data to convert it into original data and transmits it to Cloud B’s Host via RDMA / DMA. The processor (130) enables the reduction of cloud usage fees through WANOPS and saves computing resources through SmartNick offloading. Additionally, it enables saving network usage fees through data lightweighting and saving computing resources through the selection of data lightweighting options (duplication unit, compression level). Furthermore, in the embodiment, faster transmission is possible through data lightweighting, enabling the provision of faster services to users and reducing overall network traffic.

[0120] Additionally, the processor (130) enables distributed computing, a computing method in which multiple computers (or nodes) cooperate with each other to perform tasks. This allows for increased efficiency and performance by having multiple computers divide and perform tasks simultaneously, rather than a single central computer handling all tasks.

[0121] FIG. 20 is a diagram showing an AI distributed computing system by WANOPS according to an embodiment.

[0122] As illustrated in (20) of FIG. 20, the AI ​​distributed computing process in the embodiment enables the performance of model partitioning, data transmission and synchronization, computation optimization, and verification processes. A scenario for data transmission in ML model distributed computing is described. In the embodiment, nodes A, B, and C are used for distributed computing, and each node learns each data and calculates the gradient for that data. The Host of each node transmits the calculated gradient to the SmartNick via RDMA / DMA, and each SmartNick performs gradient compression using StellaTrain. Additionally, the compressed gradient value is transmitted to a node other than itself, and the received compressed gradient value is recovered through StellaTrain and transmitted to each Host via RDMA / DMA. Furthermore, each node upgrades the weights using the received gradient value. In the embodiment, by performing StellaTrain in the SmartNick, gradient transmission and reception are made possible without overhead on the Host. Additionally, the SmartNick of each node enables the optimization of the compression ratio through cache-aware compression of StellaTrain.

[0123] Additionally, the processor (130) performs AI KC Cache request processing. The LLM receives a question (Query) and a file (Context) from the user and creates a Key-Value Cache. When reusing the KV Cache on another server, network latency for data transmission is a significant overhead. Below, a scenario for transferring the KV Cache between two servers is described.

[0124] In a situation where Server B requests C's KV Cache from Server A, which is operating with LLM receiving context C as input, Server A receives a request from Server B to transmit C's KV Cache. Server A transmits C's KV Cache to A's SmartNick via RDMA / DMA. A's SmartNick performs AI Compression (Delta encoding, Layer-wise quantization, Smart AC) on the KV Cache. It then transmits the resulting data to B's SmartNick. Additionally, B's SmartNick restores the received compressed data to the original KV Cache data by performing AI decompression using GPU direct. Subsequently, it transmits this data to B's Host via RDMA / DMA.

[0125] In the embodiment, data lightweighting in SmartNick is performed through WANOPS to save the Host's computing resources, and network traffic tracking for selecting the degree of data lightweighting is offloaded to SmartNick so that the Host's computing resources are not used. In addition, network latency for KV Cache reuse is reduced to shorten the inference process time and improve service.

[0126] FIG. 21 is a diagram showing a multimodal-based WANOP according to an embodiment.

[0127] Referring to (21) in FIG. 21, the processor (130) transmits data collected from various equipment to the SmartNick where WANOPS operates through a multimodal process that combines and analyzes or processes various different types of data modes. In the embodiment, the various equipment transmits data to the SmartNick to be transmitted to the server (after collecting or processing). The SmartNick receives the data and transmits it to a host or another server; in the case of a host, it transmits it via RDMA / DMA without data lightweighting. In the case of another server, it transmits it after performing data lightweighting. In the embodiment, when the lightweighted data is received, the SmartNick restores it to the original data and transmits it to the host via RDMA / DMA.

[0128] Below, we will examine Fig. 22.

[0129] The smartnik-based network traffic acceleration method illustrated in FIG. 22 can be performed by a smartnik-based network traffic acceleration device (100) including a processor (130).

[0130] Meanwhile, FIG. 22 is merely illustrative, and the concept of the present invention is not to be interpreted as being limited to that shown in FIG. 22. For example, each step may be configured in a different order than that shown in FIG. 22, at least one of the steps shown in FIG. 22 may not be performed, or one or more steps not shown in FIG. 22 may be additionally performed.

[0131] Below, a SmartNIC-based network traffic acceleration method will be described in turn. Since the operation (function) of the SmartNIC-based network traffic acceleration method according to the embodiment is essentially the same as the function of the system, descriptions that overlap with FIGS. 2 to 21 will be omitted.

[0132] FIG. 22 is a diagram illustrating a smartnic-based network traffic acceleration method according to an embodiment.

[0133] Referring to FIG. 22, in step S110, network optimization in a WAN environment is performed through SmartNIC based on a WAN Optimizer utilizing SmartNIC. In step S120, data transmission is optimized by performing data lightweighting and artificial intelligence data compression.

[0134] The SmartNick-based network traffic accelerator according to the embodiment can reduce the total amount of traffic by enabling WANOPS to perform data lightweighting on data transmitted from devices not equipped with SmartNicks. Additionally, it eliminates the load on the Host by utilizing only SmartNick resources, and allows not only the immediate transmission of received data to other devices but also the transmission of data after the Host has utilized it and then lightweighted it on the SmartNick.

[0135] In addition, the smartnik-based network traffic accelerator according to the embodiment enables the lightweighting of transmission data for general-purpose applications and provides the effect of reducing host overhead and saving computing resources by utilizing smartniks to perform data lightweighting.

[0136] In addition, the smartnic-based network traffic acceleration device according to the embodiment provides the effect of offering various options that can be used for data lightweighting depending on the device used by the user (hardware characteristics such as CPU, GPU, and whether smartnic is installed).

[0137] In addition, the smartnic-based network traffic accelerator according to the embodiment provides various deduplication, compression algorithms, and compression level selections considering user requirements, and the characteristics of the data and the compression algorithms and levels have a significant impact on CPU usage, compression calculation time, and compression ratio.

[0138] In addition, the smartnic-based network traffic accelerator according to the embodiment minimizes the impact on model training accuracy and shortens training time through transmission that considers the specificity of AI data, and provides the effect of reducing host overhead because data lightweighting is performed on the smartnic instead of the host busy with model training.

[0139] In addition, the smartnic-based network traffic accelerator according to the embodiment reduces the time previously taken for network transmission when transferring AI data from one server to another due to the massive size of the data.

[0140] In addition, the smartnik-based network traffic accelerator according to the embodiment provides the effect of enabling the transmission of gradients at a high speed by utilizing StellaTrain when exchanging gradients calculated at each server for weight updates during data parallelism.

[0141] In addition, the smartnic-based network traffic accelerator according to the embodiment utilizes CacheGen's encoding technique that considers the specificity of the data to significantly minimize the impact on model accuracy, and provides the effect of enabling efficient transmission of data used for training from a server capable of compressing large-scale Tensor-type data into a bitstream form when requested from another server.

[0142] In addition, the smartnic-based network traffic accelerator according to the embodiment reduces the time required for the inference process and increases the service provision speed, thereby providing an effect of improving service quality.

[0143] In addition, the smartnic-based network traffic accelerator according to the embodiment provides various deduplication and compression algorithms and degrees that consider the characteristics of the transmitted data and the device being used, as well as user requirements, based on given data or models, thereby providing the effect of enabling a higher degree of data compression.

[0144] In addition, the smartnic-based network traffic acceleration device according to the embodiment provides the effect of reducing network usage fees by reducing network transmission speed and network traffic through data lightweighting via data deduplication and compression of transmitted data.

[0145] In addition, the SmartNIC-based network traffic accelerator according to the embodiment reduces host computational load and traffic through SmartNIC offloading and offloads network functions to BlueField (a type of SmartNIC) to achieve the effect of saving about 29% (127W) of energy.

[0146] In addition, the smartnic-based network traffic acceleration device according to the embodiment reduces energy consumption by decreasing the CPU's heat generation through a reduction in the CPU's computational load, and the RDMA technology used for network packet processing provides the effect of minimizing the host's CPU usage.

[0147] In addition, the smartnic-based network traffic accelerator according to the embodiment provides the effect of realizing a Green Network by reducing overall network power consumption through the reduction of network traffic.

[0148] In addition, the smartnic-based network traffic accelerator according to the embodiment provides the effect of increasing energy efficiency and minimizing environmental impact through network infrastructure and technology via a green network, reducing power consumption and minimizing carbon emissions by using sustainable technology, and making the entire network ecosystem eco-friendly.

[0149] Meanwhile, the methods according to the various embodiments of the present invention described above can be implemented in the form of an application or software program that can be installed on an existing electronic device.

[0150] In addition, the whole or part of the method may be composed of multiple software function modules and implemented on an operating system (OS). Alternatively, each step may be composed of a single software function module, or each step may be combined to form a single software function module and implemented on an operating system. Therefore, even if all of the embodiments of the present disclosure are not implemented as a single software function module, if multiple software function modules implement each step of the present disclosure and multiple software function modules are implemented on a single operating system, it can be understood that the method of the present disclosure has been implemented.

[0151] In addition, the methods according to the various embodiments of the present invention described above can be implemented solely through software upgrades or hardware upgrades of existing electronic devices. Furthermore, the various embodiments of the present invention described above can also be performed through an embedded server equipped in an electronic device or an external server of the electronic device.

[0152] Meanwhile, according to one embodiment of the present invention, the various embodiments described above may be implemented as software comprising instructions stored on a computer-readable recording medium using software, hardware, or a combination thereof. In some cases, the embodiments described herein may be implemented as the processor itself. According to the software implementation, embodiments such as the procedures and functions described herein may be implemented as separate software modules. Each of the software modules may perform one or more functions and operations described herein.

[0153] Meanwhile, a computer or a similar device may include a device according to the disclosed embodiments, which is capable of calling instructions stored from a storage medium and operating according to the called instructions. When said instructions are executed by a processor, the processor may perform a function corresponding to said instructions directly or by using other components under the control of said processor. The instructions may include code generated or executed by a compiler or an interpreter.

[0154] A computer-readable recording medium may be provided in the form of a non-transitory computer-readable recording medium. Here, "non-transitory" simply means that the storage medium does not contain a signal and is tangible, without distinguishing whether data is stored semi-permanently or temporarily on the storage medium. In this context, a non-transitory computer-readable medium refers to a medium that stores data semi-permanently and is readable by a device, rather than a medium that stores data for a short moment, such as registers, caches, or memory. Specific examples of non-transitory computer-readable media may include CDs, DVDs, hard disks, Blu-ray discs, USBs, memory cards, and ROMs.

[0155] As described above, exemplary embodiments have been disclosed in the drawings and specification. Although specific terms have been used to describe the embodiments in this specification, they are used only for the purpose of explaining the technical concept of this disclosure and are not intended to limit the meaning or the scope of this disclosure as defined in the claims. Therefore, those skilled in the art will understand that various modifications and equivalent alternative embodiments are possible therefrom. Accordingly, the true technical scope of protection of this disclosure should be determined by the technical concept of the appended claims.

Claims

Memory for storing at least one instruction for smartnic-based network traffic acceleration; and It includes a processor that performs an operation according to the above instructions, The above processor is, A device that optimizes data transmission by performing network optimization in a WAN environment through the SmartNIC and performing data lightweighting and artificial intelligence data compression based on a WAN Optimizer utilizing a SmartNIC. In paragraph 1, the processor is, A device that performs deduplication and compression processes to lighten data in the above-mentioned smartnic, and performs compression of the above-mentioned artificial intelligence data when it is related to artificial intelligence. In paragraph 2, the processor is, A device for removing duplication within transmitted data, dividing transmitted data brought in by RDMA / DMA into chunks of a certain size, calculating the hash value of the chunk, checking whether the hash value exists in the cache of the smart NIC, and if it exists in the cache, replacing the chunk in the form of a corresponding token. In paragraph 3, the processor is, A device for transmitting data to a network, wherein, for the purpose of lightweighting the above data, deduplicated data is first divided into chunks of a certain size, and each chunk is compressed at a different compression level. In paragraph 4, the processor is, A device that reduces data noise and identifies trends in traffic tracking of the above network through a pre-configured algorithm 1 based on an ARIMA model. In paragraph 5, the above processor, A Random Forest Regression (RFR) model is trained to calculate the transmission time between the said transmitters and receivers based on the available computing resources of the transmitters and receivers through the pre-configured Algorithm 2, and A device that predicts the processing time of the data lightweighting between the transmitting and receiving ends through the above RFR model. In paragraph 6, the above processor, Through the pre-configured algorithm 3, the result regarding the network traffic predicted through trend identification of the algorithm 1 is input into the RFR model learned by the algorithm 2, and the estimated time according to a plurality of lightweight combinations is calculated through the RFR model, and A device that selects a value for lightweighting having the lowest estimated time among the above calculated estimated times as the degree of lightweighting of the data. In paragraph 4, the processor is, A device that generates said chunks by moving 1 byte at a time along the direction of data progress, with a window having the length of the chunk. In paragraph 1, the processor is, A device that provides a compression algorithm in the above-mentioned smartnic and stores input data and queries in the form of tensors to enable compression of the above-mentioned artificial intelligence data. In paragraph 2, the processor is, When lightweight data is received from the above network, the degree of lightweighting of the received data is checked, and A device that performs decompression and redundancy restoration on the above-mentioned verified data and transmits it to a host via RDMA / DMA. A traffic acceleration method performed by a processor of a smartnic-based network traffic acceleration device, A step in which the above processor optimizes the network in a WAN environment through the SmartNIC based on a WAN Optimizer utilizing the SmartNIC; and A method comprising the step of the processor optimizing data transmission by performing data lightweighting and artificial intelligence data compression. In Clause 11, the above processor is, A method for performing deduplication and compression processes to lighten data in the above-mentioned smartnic, and performing compression of the above-mentioned artificial intelligence data when it is related to artificial intelligence. In Clause 12, the above processor, A method for removing duplicates within transmitted data by dividing transmitted data brought in by RDMA / DMA into chunks of a certain size, calculating the hash value of the chunk, checking whether the hash value exists in the cache of the smart NIC, and if it exists in the cache, replacing the chunk with the form of a corresponding token. In Clause 13, the above processor, A method for lightweighting the above data, wherein deduplicated data is first divided into chunks of a certain size, and each chunk is compressed at a different compression level and transferred to the above network. In paragraph 14, the above processor, Through Algorithm 1, which is a pre-configured algorithm based on an ARIMA model, data noise is reduced and trends are identified in the traffic tracking of the above network, and Through a pre-configured algorithm 2, a Random Forest Regression (RFR) model is trained to calculate the transmission time between the transmitting and receiving ends based on the available computing resources of the transmitting and receiving ends, and the processing time for the data lightweighting between the transmitting and receiving ends is predicted through the RFR model. A method of inputting the result of the traffic of the network predicted by trend identification of the algorithm 1 through a pre-configured algorithm 3 into an RFR model learned by the algorithm 2, calculating the estimated time according to a plurality of lightweight combinations through the RFR model, and selecting the value for the lightweight having the lowest estimated time among the calculated estimated times as the degree of lightweighting of the data.