Codeword generation for semantic multimodal communication

The semantic communication system addresses the challenge of transmitting and recovering semantic information from multimodal data by using neural networks to generate binary codewords based on Euclidean distances, enhancing data transmission efficiency and accuracy in noisy channels.

WO2026136147A1PCT designated stage Publication Date: 2026-06-25OHIO STATE INNOVATION FOUND

Patent Information

Authority / Receiving Office
WO · WO
Patent Type
Applications
Current Assignee / Owner
OHIO STATE INNOVATION FOUND
Filing Date
2025-12-12
Publication Date
2026-06-25

AI Technical Summary

Technical Problem

Conventional communication systems fail to effectively transmit and recover semantic information from multimodal data, especially in high-volume and latency-sensitive applications, without considering the meaning of the data.

Method used

A semantic communication system that uses a transformer model to extract semantic embeddings, trains neural networks to generate binary codewords based on Euclidean distances, and employs a codebook for efficient transmission and recovery of semantic information, incorporating channel effects to enhance robustness.

Benefits of technology

The system achieves reduced semantic distortion and improved data transmission efficiency by assigning codewords that reflect Euclidean distances, ensuring accurate recovery of semantic information even in noisy channels.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure US2025059334_25062026_PF_FP_ABST
    Figure US2025059334_25062026_PF_FP_ABST
Patent Text Reader

Abstract

Semantic communication systems and methods of implementing semantic communication are disclosed. A method includes obtaining a dataset of semantic embeddings. The semantic embeddings include semantic information extracted from multimodal messages using a transformer model. The method also includes using the semantic embeddings to train one or more neural networks, and obtaining binary codewords corresponding to the semantic embeddings for transmission based on training the one or more neural networks.
Need to check novelty before this filing date? Find Prior Art

Description

Attorney Docket: T2025-079 (751501-2040) CODEWORD GENERATION FOR SEMANTIC MULTIMODAL COMMUNICATION CROSS-REFERENCE TO RELATED APPLICATIONS

[0001] This application claims priority under 35 U. S. C. § 119(e) to Provisional Patent Application No. 63 / 734,458, filed December 16, 2024, the entire contents of which are incorporated herein by reference.BACKGROUND

[0002] Wireless communication systems are becoming ubiquitous, and the volume and types of data being communicated are increasing. Multimodal communication involves a signal or message that includes more than one of, for example, text, audio, images, symbols, videos, gestures, and facial expressions. Conventional communication systems entail the transmission of digital sequences that facilitate receiving and recovering the bits of the multimodal data without consideration of its meaning. To handle the increasing volume of data being conveyed, as well as the strict latency requirements of emerging communication applications (e.g., digital twins, fully autonomous vehicles), semantic communication has become a developing approach. In semantic communication, semantic information - focused on the meaning rather than the exact bits - is extracted from the multimodal data. A codebook is used to obtain a codeword corresponding with the semantic information. The codeword is transmitted and, at the receiver, the same codebook is used to recover the semantic information from the received codeword.Attorney Docket: T2025-079 (751501-2040)BRIEF SUMMARY OF THE INVENTION

[0003] Certain aspects of the concepts and embodiments described herein are summarized below. The aspects are representative and not exhaustively listed. In alternate embodiments, certain features and elements can be added, omitted, and interchanged with each other. Additionally, variations, extensions, and modifications to the example embodiments can be achieved by those skilled in the art without departing from the concepts, so as to encompass equivalent and related structures.

[0004] Various embodiments are disclosed for semantic communication systems and methods of implementing semantic communication. An example method includes obtaining a dataset of semantic embeddings. The semantic embeddings include semantic information extracted from multimodal messages using a transformer model. The method also includes using the semantic embeddings to train one or more neural networks, and obtaining binary codewords corresponding to the semantic embeddings for transmission based on training the one or more neural networks.

[0005] In some aspects, the semantic embeddings are used to train one neural network to obtain soft output, and the method also includes obtaining the binary codewords from the soft output via binarizing and mapping to distinct codewords. Training the one neural network may include using a Euclidean distance between pairs of the semantic embeddings to generate the soft output. The obtaining the binary codewords from the soft output may include assigning the binary codewords such that a Hamming distance between pairs of the binary codewords is based on the Euclidean distance between the pairs of the semantic embeddings associated with the pairs of the binary codewords.

[0006] In some aspects, the method may include compressing the dataset of semantic embeddings into a clustered dataset of clusters of the semantic embeddings. The clusters of the semantic embeddings in the clustered dataset may be used to train one neural network to obtain soft output, and the method may also include obtaining the binary codewords from the soft output by binarizing and mapping to distinct codewords.Attorney Docket: T2025-079 (751501-2040)

[0007] In some aspects, the using the semantic embeddings to train the one or more neural networks includes training a neural network of an encoder and training a neural network of a decoder to obtain a trained codebook. The training the neural network of the encoder may include populating the trained codebook and obtaining encoder-side indices of the trained codebook. In some aspects the training the neural network of the decoder includes using channel effects to obtain perturbed indices from the encoder-side indices, and using the perturbed indices as input in a neural network model of the decoder. The obtaining the codewords may be based on indices of the codebook.

[0008] An example semantic communication system includes processing circuitry to obtain a dataset of semantic embeddings. The semantic embeddings include semantic information extracted from multimodal messages using a transformer model. The processing circuitry also uses the semantic embeddings to train one or more neural networks, and obtains binary codewords corresponding to the semantic embeddings for transmission based on training the one or more neural networks.

[0009] In some aspects, the processing circuitry uses the semantic embeddings to train one neural network to obtain soft output, and to obtain the binary codewords from the soft output by binarizing and mapping to distinct codewords. The processing circuitry may use a Euclidean distance between pairs of the semantic embeddings to generate the soft output. In some aspects, the processing circuitry assigns the binary codewords such that a Hamming distance between pairs of the codewords is based on the Euclidean distance between the pairs of the semantic embeddings associated with the pairs of the binary codewords.

[0010] In some aspects, the processing circuitry compresses the dataset of semantic embeddings into a clustered dataset of clusters of the semantic embeddings. The processing circuitry may also use the clusters of the semantic embeddings in the clustered dataset to train one neural network to obtain soft output, and obtain the binary codewords from the soft output by binarizing and mapping to distinct codewords.Attorney Docket: T2025-079 (751501-2040)

[0011] In some aspects, the one or more neural networks include a neural network of an encoder and training a neural network of a decoder that obtain a trained codebook. The processing circuitry may populate the trained codebook and obtain encoder-side indices of the trained codebook. In some aspects, the processing circuitry may use channel effects to obtain perturbed indices from the encoder-side indices, and use the perturbed indices as input in a neural network model of the decoder. The processing circuitry may obtain the binary codewords based on indices of the codebook.BRIEF DESCRIPTION OF THE DRAWINGS

[0012] Many aspects of the present disclosure can be better understood with reference to the following drawings. The components in the drawings are not necessarily to scale, with emphasis instead being placed upon clearly illustrating the principles of the disclosure. Moreover, in the drawings, like reference numerals designate corresponding parts throughout the several views. Repetition of labels for some components may be omitted for clarity of the illustrations.

[0013] FIG. 1 is a block diagram of aspects of a transmitter and receiver in an exemplary semantic multimodal communication system according to various embodiments.

[0014] FIG. 2 is a process flow of a method of performing semantic quantization according to various embodiments.

[0015] FIG. 3 is a process flow of a method of performing semantic compression according to various embodiments.

[0016] FIG. 4 is a process flow of a method of generating a semantic vector-quantized autoencoder according to various embodiments.

[0017] FIG. 5 is a block diagram detailing aspects of processing circuitry of the semantic multimodal communication system according to various embodiments.Attorney Docket: T2025-079 (751501-2040)DETAILED DESCRIPTION

[0018] As noted above, semantic communication systems may facilitate the communication of large volumes of multimodal data while conforming with latency requirements of demanding real-time applications. In semantic communication, the relevant meaning within data may be extracted as semantic information, referred to as embedding(s). A codebook is used to assign a codeword to a message embedding and the codeword is transmitted, rather than the data itself. At the receiver, the same codebook is used to obtain the semantic information ( / .<., embedding) from the received codeword. In classification-oriented semantic communication, the recovered semantic information is used to obtain a classification embedded in the semantic information, which represents the actual knowledge of interest in the multimodal data.

[0019] In this context, systems and methods directed to codeword generation in semantic communication are described. According to various embodiments, codeword generation may focus on aspects of generating a codebook or the corresponding codewords. The codebook refers to a discrete set of embeddings that may be computed from multimodal data by a transformer model. Binary codewords corresponding to respective embeddings are transmitted for recovery and processing of the associated embedding at a receiver. According to different embodiments detailed herein, a dataset Qpreof n embeddings may be used for semantic quantization, semantic compression, or to train a semantic vector-quantized autoencoder (VQ-AE). Generally, vectors and matrices are bolded.

[0020] According to some embodiments, the dataset Qpremay be used as a shared codebook for semantic quantization. The binary codewords may be generated by training a neural network in consideration of minimizing semantic distortion, which quantifies a loss in the meaning resulting from a difference in the recovered embedding at a receiver as compared with the embedding at the transmitter. Smaller Hamming distances between the codewords corresponding to embeddings that are closer in Euclidean space result in lower semantic distortion forAttorney Docket: T2025-079 (751501-2040) transmissions over noisy channels. A neural network may be trained to assign a fixed-length binary codeword with the objective of assigning codewords with Hamming distances that reflect Euclidean distances between their corresponding embeddings. That is, binary codewords are assigned such that embeddings that are closer in Euclidean space have corresponding codewords that have smaller Hamming distances between them.

[0021] According to some further embodiments, the dataset Qpremay be used for semantic compression, which refers to clustering similar embeddings in Qpre, thereby exploiting semantic redundancies. An affinity propagation (AP) algorithm may be used for the clustering. Rather than a binary codeword being assigned to each embedding in the codebook, a binary codeword may be assigned to each cluster of embeddings in the codebook.

[0022] According to alternate or additional embodiments, the dataset Qpremay be used as training data to train a semantic VQ-AE, which outputs discrete representations for message embeddings ( / .<., the semantic information in multimodal data). The VQ-AE includes an encoder and decoder, which constitute an autoencoder, and a codebook. The training of the autoencoder includes simulating channel conditions, which effect the encoder, decoder, and codebook. Following training, the learned codebook and encoder-decoder are shared between the transmitter and receiver. Subsequently, an embedding of multimodal data may be assigned to a learned codebook element and the codebook index in binary may be used as the codeword for transmission.

[0023] Turning to the drawings, FIG. l is a block diagram of aspects of a transmitter 20 and receiver 30 in an exemplary semantic multimodal communication system 10 according to various embodiments. At the transmitter 20, input multimodal data 5 is provided to a semantic encoder 105 that is shown with two operational components. The semantic encoder 105 may include a neural network transformer model 110 such as, for example, contrastive language-image pretraining (CLIP), which produces a vector of a multimodal embedding q. The semantic encoder 105 may also include a vector quantizer 120, which is detailed according to various embodimentsAttorney Docket: T2025-079 (751501-2040) below. The vector quantizer 120 maps the embedding q to a binary codeword. The binary codeword may correspond to an index of the codebook assigned to message embedding q. A channel encoder and modulator 130 produces a transmit signal x as the channel input by encoding the binary codeword.

[0024] At the receiver 30, the received signal y is received as the channel output and is given by:y = hx + z [EQ. 1]The transmitted signal x and received signal y are complex vectors of length b (x G Cband y E In EQ. 1, h represents the fading coefficient and is a member of the set of complex numbers (h G (C), and z denotes additive noise, which is a Gaussian random variable with mean 0 and variance a xb, where I is an identity matrix. A channel decoder 140 obtains a reconstructed binary codeword, and a semantic decoder 150 obtains the associated embedding q. The semantic decoder 150 uses the same codebook as the semantic encoder 110. A classifier neural network 160 obtains a classification c using the embedding q. Aspects of the semantic encoder 120 are detailed with reference to FIGS. 2-4 according to various embodiments.

[0025] FIG. 2 is a process flow of a method 200 of performing semantic quantization according to various embodiments. The processes of the method 200 are directed to assigning binary codewords with Hamming distances that reflect Euclidean distances between their corresponding embeddings. That is, based on the processes, embeddings that are closer in Euclidean space have corresponding distinct binary codewords that have smaller Hamming distances between them.

[0026] At 210, training a neural network using the dataset Qpreto obtain soft output W = g(. Qpre> 0) is based on processes 220, 230, and 240, with 0 being a learnable parameter vector and the resulting soft output W being an nxlcw matrix of real numbers (W G UknxZc w), where lcwAttorney Docket: T2025-079 (751501-2040) is the fixed length of the codewords, which is [log2n]and, as previously noted, n is the number of embeddings in the dataset Qpre.

[0027] At 220, initializing the learnable parameter vector 0 may involve a Xavier initialization, for example. The Xavier initialization is a weight initialization technique for neural networks that includes setting initial weights from a distribution with zero mean and a variance that depends on the number of input and output neurons in the layer with a goal of keeping the variance of activations and gradients constant across layers.

[0028] At 230, processes 231-236 are performed iteratively in batches. Given the size n of the dataset Qpre, each batch may involve a data subsetof size m. As detailed below, the processes (231-236) that are implemented as part of 230 provide soft output W (at 210), which is generated in consideration of the Euclidean distances between embeddings in each data subset of each batch.

[0029] At 231, the processes performed for each batch include computing input pairwise distances ( / .<., Euclidean distances between pairs of embeddings) that are an nm matrix of real numbers (D 6 ]Rfo>xro>) as follows (for all i,j E {0,1,...,nb— 1}):D[i,j] = HQ i] - Qft[ / ]||2[EQ. 1]At 232, generating triplets is, itself, an iterative process (for each j E {0,1,...,nb— 1}). At each value of j, an anchor sample asassigned the value of j (i.e., the index of each iteration for the processes at 232), a positive sample psassigned the column index with minimum value of D|j, k] where j and k are not the same, and a negative sample nsassigned the column index with maximum value of D[ / , k] are obtained. The model is trained to ensure that the anchor sample as(each index value of j) is closer to the positive sample ps(column index with minimum value of D|j, k], corresponding to closer embeddings) than the negative sample ns(column index with maximum value of D[ / , k], corresponding to farther embeddings). The iterative process results in a vector of positive samples psand a vector of negative samples nsfor each batch.Attorney Docket: T2025-079 (751501-2040)

[0030] At 233, the processes include computing the model output = p(Q&; 0) for each batch of the data subset Q^. At 234, computing output pairwise distances that are an nvxm matrix of real numbers (D^G is asfollows (for all i,j E {0,1,...,nb— 1}):Dw w] = ||Wt[<] -W6[ / ]||2[EQ. 2]

[0031] At 235, computing total loss for each batch during each iteration refers to using a loss function Lcwfor the training associated with each batch. The loss function Lcwis obtained as a convex combination of weighted components, namely triplet loss, diversity loss, and orthogonality loss:^CW °^1 -^Triplet ”b°^2 -^Diversity ”b°^3 Orthogonality [EQ- 3]EQ. 3 may be written as:£cw = -Wft[p7]|| - ||wft[ / ] -wft[n7]|| + e) +fLh J \ Lt L / [EQ. 4]In EQ. 4, in the first term, Pj=ps[j], w7=s[j], and e is the margin, which may be used to adjusthow much closer p7must be to j (i.e., anchor) than n7is to the anchor. The third term uses asquared Frobenius norm. The triplet loss (first term in EQ. 4) is used to align the model outputs \Nbfor each batch with positions of embeddings in the data subsetassociated with indices (i.e., to align intra-embedding and corresponding intra-codeword distances). The triplets are regenerated per batch.

[0032] The triplet loss alone can lead to duplicate assignments of codewords. Thus, regularization terms are introduced at 236 in the form of the second and third terms in EQ. 4, for example. Specifically, at 236, a regularization technique may be implemented. For example, the AdamW optimizer may be applied (AdamW(0, £Cw))- The exemplary AdamW algorithm is aAttorney Docket: T2025-079 (751501-2040) known optimization algorithm in deep learning that decouples weights decay from gradient-based updates. At 240, the vector 0 is obtained based on the processes at 230.

[0033] At 250, following the iterative batch-wise processes to obtain the soft output W, the soft output W is binarized as follows:Wbin[i,j] = P>° [EQ. 5]bini n[0 i / W[i,j] < 0 L v j

[0034] At 260, processes are implemented to map the binarized soft outputto distinct binary codewords of a fixed length lcw. A matrix of distinct binary codewords is initialized asW G {0, l2'cw xlcw The minimization that is implemented as part of the processes at 260 ensures that each binarized soft output vector in the matrix (binarized soft outputis mapped to a distinct binary codeword by minimizing the Hamming distance between the vector and its assigned binary codeword.

[0035] A Hamming distance matrix is computed for every z and j as:= dH(Wbin[^, W[j]) [EQ. 6]A binary decision variable A is defined (A 6 {0,l}nx2lcw')-j > f 1 if index i is assigned to codeword j „Q 71*-0 otherwiseAdditional processes involve minimizing with respect to A such that EQ. 8 below is true for all z and EQ. 9 below is true for all j.A[i,j] = 1 [EQ. 8]^=o A[i,j] = l [EQ. 9]The minimization may be expressed as:[EQ 10]The assigned binary codewords W|j] are returned for all z where A[i, j] = 1.Attorney Docket: T2025-079 (751501-2040)

[0036] FIG. 3 is a process flow of a method 300 of performing semantic compression according to various embodiments. At 310, the compression entails obtaining a clustered dataset Qpreof embeddings from the dataset Qpre, which has n embeddings. The number of clusters (n) may not be predefined but may, instead, be identified by implementing a clustering algorithm such as affinity propagation (AP). The embeddings may be assigned to clusters by minimizing semantic distortion. For I' such that 0 < I' < n, the assignment may be applied as follows:APQ?„«O =ar®r“a(q. Q?™['']) [EQ- n] In EQ. 11, 5 refers to the semantic distortion.

[0037] The clustered dataset Qprehas n clusters, where n « n. The AP algorithm is tunable such that the number of clusters can be adjusted. It selects centroids from the data points, referred to as “exemplars.” Each sample ( / .<., embedding) is assigned to an exemplar (cluster) by optimizing a defined similarity metric. According to embodiments, the similarity metric that is optimized is Euclidean distance, which is minimized.

[0038] At 320, once the clustered dataset Qpreis obtained (at 310), the processes shown in FIG. 2 may be implemented. At 330, based on implementing the method 200 (at 320), distinct binary codewords W may be obtained for each of the n clusters. Each binary codeword may correspond to a cluster label I. At the receiver, the reconstructed cluster label Z, rather than the embedding label i, is recovered at the semantic decoder 150 to retrieve the associated embedding q = The classifier neural network 160 may then obtain the classification c.

[0039] FIG. 4 is a process flow of a method 400 of training a semantic VQ-AE according to various embodiments. According to embodiments discussed with reference to FIG. 4, rather than using the dataset Qpreof embeddings as a codebook (or compressed codebook) as in embodiments discussed with reference to FIGS. 2 and 3, the processes train a codebook using the embeddings in the dataset Qpreas training data. According to the processes, an encoder and decoder that together comprise an autoencoder are trained along with the codebook. Channel effects areAttorney Docket: T2025-079 (751501-2040) considered in the training. Thus, the codewords associated with the trained codebook reduce semantic distortion.

[0040] At 410, signal-to-noise ratio (SNR) range (SNRmin, SNRmax) is obtained. The values of SNRmin and SNRmax may be calculated from parameters of the semantic multimodal communication system 10 (e.g., transmit power, bandwidth, channel models), for example. In some embodiments, empirical values of SNRmin and SNRmax may be obtained. Additionally, at 410, the dataset Qpreof embeddings is obtained and a learnable parameter vector of the encoder Yenc,alearnable parameter vector of the decoder ydec, and the output codebook E are initialized. E is a d X k matrix of real numbers (E ∈ ℝd×k), where d (number of latent space points for quantizing the encoder’s output vectors at 422) and k (dimension of latent space) are inputs, along with <z, which is a design parameter of the neural networks used by the encoder and decoder ( / .<., autoencoder).

[0041] According to various embodiments, the encoder and decoder both employ a neural network with fully connected (FC) layers and a design parameter a. In the encoder, the initial FC layer maps m-dimensional embeddings to <z7-dimensional space, then iteratively scales down dimensions by a-1until <z7is reached. The j and j are chosen such that j represents the largest number where <z7< m and j represents the smallest number where k < ak The final FC layer in the encoder maps directly to a ^-length vector of real numbers( / .<., to the latent space). Conversely, the decoder mirrors the encoder’s architecture, progressively increasing dimensions to reconstruct the embeddings from the codebook elements. The design parameter a > 1 determines the model’s depth, with smaller values yielding more layers.

[0042] At 420, processes (421-426) described below are performed in batches, with each batch pertaining to a subsetof size nbof the dataset Qpre. At 421, for each batch, the encoder maps each embedding q in the subset to a latent vector v according to:v = fenc(q; γenc) [EQ. 12]Attorney Docket: T2025-079 (751501-2040) The vector v is a vector of length k of real numbers (v G IRk). At 422, quantizing the vector refers to assigning it to elements in the codebook E as follows:CD(v) = argmin_{p'} ||v - E[p']||₂ [EQ. 13]In EQ. 13, p' is such that 0 < p' < d. Each of the codebook elements e to which the vector v is assigned is denoted E[p] (e = E[p]), where p is the index of the element. For the entire batch, the vector of indices p = CD(V). That is, the vector of indices p, whose binary forms are used as codewords according to embodiments related to FIG. 4, correspond to indices of the subsetof embeddings based on the associated elements e of the codebook E. At 423, perturbing the indices p to p involves implementing processes 425, 430, 435, 440, and 445, as discussed below, to account for channel effects between the encoder and decoder.

[0043] The processes 425, 430, 435, 440, and 445 involve simulating the channel between the transmitter 20 and the receiver 30. This is because the modeled channel effects are incorporated during training to make the learned codebook and encoder-decoder robust to channel impairments during inference. At 425, computing codeword length Zcwis given by:l_cw = ⌈log₂d⌉ [EQ. 14]At 430, sampling SNR provides an SNR sample ^SNR which is a uniform random variable in the SNR range (^SNR ~ Z7(SNRmin, SNRmax)). The exemplary channel encoder and modulator 130 implements quadrature phase shift keying (QPSK) modulation. Thus, at 435, the processes include computing QPSK bit error probability pb. The computation is based on the SNR sample ^SNR and may use a known function (e.g., Q-function). At 440, the processes include converting the vectorof indices p to a binary matrix PbinG {0,l}nz,xZcw.

[0044] At 445, processes used to obtain the perturbed indices p include defining a matrixlPb∈ {0,1}n_b × l_cwdefined for every z and j as:-I r- ■-] _ flwith probability pb[EQ. 15]Pbt 0 otherwiseAttorney Docket: T2025-079 (751501-2040) The perturbation of binary matrix Pbinis given by:P̃bin = (Pbin + lPb) mod 2 [EQ. 16]The perturbed indices p are then obtained by converting Pbinback to decimal indices p' and applying modulo to handle overflows, as follows:p=p' modt / [EQ. 17]

[0045] At 424, passing the perturbed vector of indices p through the decoder results in: q̂ = fdec(E[p̃]; γdec) [EQ. 18]The processes at 425 and 426 have similarities to those discussed with reference to FIG. 2 at 235 and 236. At 425, computing loss £AE, which is the loss function used to train the encoder and decoder neural networks and the codebook, is according to the following:AE = llq - qlll +ll^(v) - e||| + p\\v - s^(e)||l [EQ. 19]In EQ. 19, the first term provides reconstruction loss associated with the accuracy of embedding reconstruction at the decoder. In the second term, sg denotes the stop gradient operator preventing parameter updates by ensuring zero partial derivatives. This term is used to train the codebook through a goal of minimizing the distance between the codebook elements e and the latent vector v. The third term provides commitment loss associated with deviation of the encoder from the codebook elements e, and ft represents the commitment parameter. At 426, implementing the optimization algorithm may include using a regularization technique such as an AdamW optimizer (AdamW(yenc, ydec, E, ZAE)).

[0046] At 450, as shown in FIG. 4, the learned parameter vector of the encoder yenc, the learned parameter vector of the decoder ydec, and the learned codebook E are obtained by performing the processes of 420 for every batch of embedding subsets Q^. These are shared between the transmitter 20 and receiver 30. Embeddings may then be assigned to the learned codebook E using EQS. 12 and 13 and processes discussed above. The corresponding codebookAttorney Docket: T2025-079 (751501-2040) index / ?, which serves as the codeword according to some embodiments, may be converted to a binary value Pbtn for encoding and transmission.

[0047] At the receiver 30, the reconstructed index p (i.e., codeword) can be used to retrieve the corresponding element e from the codebook E (e = E[p]). A recovered embedding q, which is a vector, may be obtained using the trained decoder according to the following, which is similar to EQ. 18:q̂ = fdec(e; γdec) [EQ. 20]As shown in FIG. 1, the classifier neural network 160 may provide the classification c from the embedding q.

[0048] FIG. 5 is a block diagram detailing aspects of processing circuitry 500 of the semantic multimodal communication system 10 according to various embodiments. The processing circuitry 500 may implement one or more of the methods 200, 300, and 400, as discussed with reference to FIGS. 2-4, respectively. Aspects of the processing circuitry 500 may be embodied in the transmitter 20 or the receiver 30.

[0049] The processing circuitry 500 may be implemented within a server or any other system providing computing capability or may employ a plurality of computing devices arranged, for example, in one or more server banks, computer banks, or other arrangements. The components of the processing circuitry 500 discussed herein and otherwise known to be included are not limited to a specific number of geographic location or proximity relative to other components. For example, the processing circuitry 500 may include a plurality of computing devices that together may comprise a hosted computing resource, a grid computing resource, and / or any other distributed computing arrangement. In some cases, the processing circuitry 500 may correspond to an elastic computing resource where the allotted capacity of processing, network, storage, or other computing-related resources may vary over time.

[0050] The processing circuitry 500 may include one or more processors 510 and memory 520, including computer-readable media 520a to store instructions that are processed by one orAttorney Docket: T2025-079 (751501-2040) more of the processors 510 and one or more databases 520b to store data. Computer-readable instructions should be understood as including software generated using programming languages such as, for example, C, C++, C#, Objective C, Java®, JavaScript®, Perl, PHP, Visual Basic®, Python®, Ruby, Flash®, or other programming languages. The processing circuitry 500 may also include communication components 530 to facilitate wireless and / or wired communication via the processing circuitry 500. Components of processing circuitry 500 may communicate via any known local interface 540 (e.g., a data bus with an accompanying address / control bus or other bus structure). As previously noted, the components are not limited to being arranged or housed together. Thus, wireless and / or wired communication may be employed among the components of the processing circuitry 500 (e.g., local interface 540 may be implemented as a network).

[0051] Any reference to processor 510 should be understood to mean one or more of the processors 510 (implemented sequentially or in parallel), and any reference to processor 510 should be understood to refer to the same, different, or a combination of the same and different processors 510 as other references to processor 510.

[0052] One or more processors 510 may comprise technologies that include, but are not limited to, discrete logic circuits having logic gates for implementing various logic functions upon an application of one or more data signals, application specific integrated circuits (ASICs) having appropriate logic gates, field-programmable gate arrays (FPGAs), or other components, etc. Such technologies are generally well known by those skilled in the art and, consequently, are not described in detail herein.

[0053] Memory 520 is defined herein as including both volatile and nonvolatile memory and data storage components. Volatile components are those that do not retain data values upon loss of power. Nonvolatile components are those that retain data upon a loss of power. Thus, the memory 520 may comprise, for example, random access memory (RAM), read-only memory (ROM), hard disk drives, solid-state drives, USB flash drives, memory cards accessed via a memory card reader, floppy disks accessed via an associated floppy disk drive, optical discsAttorney Docket: T2025-079 (751501-2040) accessed via an optical disc drive, magnetic tapes accessed via an appropriate tape drive, and / or other memory components, or a combination of any two or more of these memory components. In addition, the RAM may comprise, for example, static random access memory (SRAM), dynamic random access memory (DRAM), or magnetic random access memory (MRAM) and other such devices. The ROM may comprise, for example, a programmable read-only memory (PROM), an erasable programmable read-only memory (EPROM), an electrically erasable programmable read-only memory (EEPROM), or other like memory device. In the context of the present disclosure, a computer-readable medium 520 can be any medium that can contain, store, or maintain the logic or application described herein for use by or in connection with processing circuitry 500.

[0054] The processing circuitry 500 may additionally include user interface components 550 including one or more displays and input devices. The user interface components 550 may include, for example, one or more display devices such as liquid crystal display (LCD) displays, gas plasma-based flat panel displays, organic light emitting diode (OLED) displays, electrophoretic ink (E ink) displays, LCD projectors, or other types of display devices, etc. Input devices may include a keyboard, mouse, handheld console, etc.

[0055] The features, structures, or characteristics described above may be combined in one or more embodiments in any suitable manner, and the features discussed in the various embodiments are interchangeable, if possible. In the following description, numerous specific details are provided in order to fully understand the embodiments of the present disclosure. However, a person skilled in the art will appreciate that the technical solution of the present disclosure may be practiced without one or more of the specific details, or other methods, components, materials, and the like may be employed. In other instances, well-known structures, materials, or operations are not shown or described in detail to avoid obscuring aspects of the present disclosure.

[0056] When relative terms such as “on,” “below,” “upper,” “lower,” “front,” “back,” and “rear” are used in the specification to describe the relative relationship of one component to anotherAttorney Docket: T2025-079 (751501-2040) component, these terms are used in this specification for convenience only, for example, as a direction in relation to an orientation shown in the drawings. When a structure is “on” another structure, it is possible that the structure is integrally formed on another structure, or that the structure is “directly” disposed on another structure, or that the structure is “indirectly” disposed on the other structure through other structures.

[0057] In this specification, the terms such as “a,” “an,” “the,” and “said” are used to indicate the presence of one or more elements and components. The terms “comprise,” “include,” “have,” “contain,” and their variants are used to be open ended, and are meant to include additional elements, components, etc., in addition to the listed elements, components, etc. unless otherwise specified in the appended claims.

[0058] The terms “first,” “second,” etc. are used only as labels, rather than a limitation for a number of the objects. It is understood that if multiple components are shown, the components may be referred to as a “first” component, a “second” component, and so forth, to the extent applicable.

[0059] Disjunctive language such as the phrase “at least one of X, Y, or Z,” unless specifically stated otherwise, is understood as used in general to present that an item, term, etc., may be either X, Y, or Z, or any combination thereof (e.g., X, Y, and / or Z). Thus, such disjunctive language is not generally intended to, and should not, imply that certain embodiments require at least one of X, at least one of Y, and at least one of Z to each be present.

[0060] The above-described embodiments of the present disclosure are merely possible examples of implementations set forth for a clear understanding of the principles of the disclosure. Many variations and modifications may be made to the above-described embodiment s) without departing substantially from the principles of the disclosure. All such modifications and variations are intended to be included herein within the scope of this disclosure and protected by the following claims.

Claims

Attorney Docket: T2025-079 (751501-2040)CLAIMSTherefore, the following is claimed:

1. A computer-implemented method comprising:obtaining a dataset of semantic embeddings, wherein the semantic embeddings include semantic information extracted from multimodal messages using a transformer model;using the semantic embeddings to train one or more neural networks; andobtaining binary codewords corresponding to the semantic embeddings for transmission based on training the one or more neural networks.

2. The method according to claim 1, whereinthe semantic embeddings are used to train one neural network to obtain soft output, and the method further comprises obtaining the binary codewords from the soft output via binarizing and mapping to distinct codewords.

3. The method according to claim 2, wherein training the one neural network includes using a Euclidean distance between pairs of the semantic embeddings to generate the soft output.

4. The method according to claim 3, wherein the obtaining the binary codewords from the soft output includes assigning the binary codewords such that a Hamming distance between pairs of the binary codewords is based on the Euclidean distance between the pairs of the semantic embeddings associated with the pairs of the binary codewords.

5. The method according to claim 1, further comprising compressing the dataset of semantic embeddings into a clustered dataset of clusters of the semantic embeddings.Attorney Docket: T2025-079 (751501-2040) 6. The method according to claim 5, whereinthe clusters of the semantic embeddings in the clustered dataset are used to train one neural network to obtain soft output, andthe method further comprises obtaining the binary codewords from the soft output by binarizing and mapping to distinct codewords.

7. The method according to claim 1, wherein the using the semantic embeddings to train the one or more neural networks includes training a neural network of an encoder and training a neural network of a decoder to obtain a trained codebook.

8. The method according to claim 7, wherein the training the neural network of the encoder includes populating the trained codebook and obtaining encoder-side indices of the trained codebook.

9. The method according to claim 8, wherein the training the neural network of the decoder comprises:using channel effects to obtain perturbed indices from the encoder-side indices, and using the perturbed indices as input in a neural network model of the decoder.

10. The method according to claim 7, wherein the obtaining the codewords is based on indices of the codebook.

11. A semantic communication system comprising:processing circuitry configured to:obtain a dataset of semantic embeddings, wherein the semantic embeddings include semantic information extracted from multimodal messages using a transformer model;Attorney Docket: T2025-079 (751501-2040) use the semantic embeddings to train one or more neural networks; and obtain binary codewords corresponding to the semantic embeddings for transmission based on training the one or more neural networks.

12. The semantic communication system according to claim 11, wherein the processing circuitry is configured to use the semantic embeddings to train one neural network to obtain soft output, and to obtain the binary codewords from the soft output by binarizing and mapping to distinct codewords.

13. The semantic communication system according to claim 12, wherein the processing circuitry is configured to use a Euclidean distance between pairs of the semantic embeddings to generate the soft output.

14. The semantic communication system according to claim 13, wherein the processing circuitry is configured to assign the binary codewords such that a Hamming distance between pairs of the codewords is based on the Euclidean distance between the pairs of the semantic embeddings associated with the pairs of the binary codewords.

15. The semantic communication system according to claim 11, wherein the processing circuitry is further configured to compress the dataset of semantic embeddings into a clustered dataset of clusters of the semantic embeddings.

16. The semantic communication system according to claim 15, wherein the processing circuitry is configured to:use the clusters of the semantic embeddings in the clustered dataset to train one neural network to obtain soft output, andAttorney Docket: T2025-079 (751501-2040) obtain the binary codewords from the soft output by binarizing and mapping to distinct codewords.

17. The semantic communication system according to claim 11, wherein the one or more neural networks include a neural network of an encoder and training a neural network of a decoder that obtain a trained codebook.

18. The semantic communication system according to claim 17, wherein the processing circuitry is configured to populate the trained codebook and obtain encoder-side indices of the trained codebook.

19. The semantic communication system according to claim 18, wherein the processing circuitry is further configured to:use channel effects to obtain perturbed indices from the encoder-side indices, and use the perturbed indices as input in a neural network model of the decoder.

20. The semantic communication system according to claim 17, wherein the processing circuitry is configured to obtain the binary codewords based on indices of the codebook.