Multi-modal knowledge embedding method for battery energy storage material data and related device

By constructing a multimodal feature vector library and extracting feature vectors using BERT, crystal graph convolutional networks, and temporal convolutional networks, and combining cross-attention and contrastive learning, multimodal fusion vectors are generated, solving the problem of data heterogeneity in battery energy storage materials and achieving comprehensive understanding and accurate prediction of battery energy storage materials.

CN122287809APending Publication Date: 2026-06-26BEIJING INST OF TECH +1

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
BEIJING INST OF TECH
Filing Date
2026-02-04
Publication Date
2026-06-26

AI Technical Summary

Technical Problem

Existing general artificial intelligence technologies cannot effectively integrate text descriptions, numerical data, structural data, and sequence data in the field of battery energy storage materials. This results in the generated battery energy storage material representation vectors being one-sided and incomplete, failing to accurately reflect the true characteristics and behavior of battery energy storage materials. Consequently, systematic biases occur in tasks such as component optimization, performance prediction, or synthesis path recommendation.

Method used

A multimodal feature vector library is constructed. Text feature vectors are extracted using the BERT model, image feature vectors are extracted using the crystal graph convolutional network model, and temporal feature vectors are extracted using the temporal convolutional network. Feature alignment and fusion are performed through a cross-attention and contrastive learning collaborative mechanism to generate multimodal fused vectors.

Benefits of technology

This has enabled a comprehensive understanding of battery energy storage materials, accurately reflecting their true characteristics and behavior, improving the accuracy of composition optimization, performance prediction, and synthesis route recommendation, and solving the heterogeneity problem between different modal data.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure FT_1
    Figure FT_1
  • Figure FT_2
    Figure FT_2
  • Figure FT_3
    Figure FT_3
Patent Text Reader

Abstract

This application discloses a multimodal knowledge embedding method and related apparatus for battery energy storage material data, relating to the field of cross-modal learning. The method includes generating a corresponding intermediate-aligned feature vector set and a set of positive and negative sample pairs for each battery energy storage material in a constructed multimodal feature vector library; calculating a scalar loss value based on the positive and negative sample pair set and determining whether it reaches a minimum; if not, updating the model parameters used in generating the intermediate-aligned feature vector set and returning to the step of generating the intermediate-aligned feature vector set; if yes, performing multimodal fusion on the feature vectors in the intermediate-aligned feature vector set corresponding to each battery energy storage material to obtain the corresponding multimodal fused vector for that battery energy storage material. This application effectively solves the heterogeneity problem between different modal data of battery energy storage materials, comprehensively analyzes the correlation between the composition, structure, and performance of battery energy storage materials, and establishes a comprehensive understanding of battery energy storage materials.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This application relates to the field of battery energy storage materials, and in particular to a multimodal knowledge embedding method and related apparatus for battery energy storage material data. Background Technology

[0002] As the global energy structure transitions towards a green and low-carbon model, new power systems are gradually becoming the core carrier of the energy revolution, playing an increasingly crucial, fundamental, and strategic role. These new power systems are no longer simply traditional power transmission networks, but have evolved into intelligent infrastructure supporting energy security and promoting the low-carbon transformation of the economy and society. Battery energy storage technology can provide instantaneous, flexible, and stable support services to these new power systems, enhancing their reliability and resilience. Clearly, without large-scale, high-efficiency, and long-life battery energy storage technology, new power systems cannot achieve stable operation. Therefore, the construction of new power systems presents an unprecedented and urgent demand for battery energy storage technology, and breakthroughs in this technology highly depend on the discovery of new battery energy storage materials and the optimization of existing materials.

[0003] Traditional battery energy storage material R&D relies on trial and error and expert experience. This approach is time-consuming and costly, becoming a core bottleneck restricting the development of battery energy storage technology. In recent years, artificial intelligence (AI) technology has demonstrated powerful data-driven discovery capabilities in multiple fields, bringing new directions for the intelligent transformation of battery energy storage material R&D. AI aims to accelerate the R&D process by using machine learning methods to uncover the inherent patterns between composition, structure, process, and performance from massive amounts of data. However, when applying general AI to the highly complex and specialized field of battery energy storage materials, most studies focus on single-type battery energy storage material data, exhibiting limitations of "single-point breakthroughs." For example, predicting performance solely from chemical composition or classifying solely from crystal structure images. This general AI approach fails to build a panoramic understanding of battery energy storage materials, ignoring the deep correlations and synergistic effects between different modalities. Clearly, the aforementioned general artificial intelligence techniques fail to effectively integrate text descriptions, numerical data, structural data, and sequence data, resulting in a one-sided and incomplete representation vector for battery energy storage materials. This fails to accurately reflect the true characteristics and behavior of battery energy storage materials, leading to systematic biases in tasks such as component optimization, performance prediction, or synthesis path recommendation, thus hindering the development of battery energy storage materials. Summary of the Invention

[0004] The purpose of this application is to provide a multimodal knowledge embedding method and related device for battery energy storage material data, which can accurately reflect the real characteristics and behavior of battery energy storage materials and establish a comprehensive understanding of battery energy storage materials.

[0005] To achieve the above objectives, this application provides the following solution: Firstly, this application provides a multimodal knowledge embedding method for battery energy storage material data, including: A multimodal feature vector library is constructed; the multimodal feature vector library is a database constructed by processing text data and image data of various battery energy storage materials; the multimodal feature vector library includes various battery energy storage materials and text feature vectors, image feature vectors and time-series feature vectors corresponding to each battery energy storage material; For each battery energy storage material in the multimodal feature vector library, a first operation is performed independently to generate an intermediate aligned feature vector set corresponding to the corresponding battery energy storage material; the intermediate aligned feature vector set includes a first feature vector, a second feature vector, and a third feature vector; the first feature vector is a text feature vector enhanced by image structure information and temporal performance information; the second feature vector is an image feature vector guided by text semantic information; the third feature vector is a temporal feature vector guided by image structure information; The second operation is performed independently on the first feature vector of each battery energy storage material in the multimodal feature vector library to obtain a set of positive and negative sample pairs corresponding to the first feature vector of each battery energy storage material; the set of positive and negative sample pairs includes a set of positive sample pairs and a set of negative sample pairs. Based on the set of positive and negative sample pairs corresponding to the first feature vector of each battery energy storage material in the multimodal feature vector library, the scalar loss value is calculated, and it is determined whether the scalar loss value has reached the minimum value to obtain a first judgment result. When the first judgment result is yes, the feature vectors in the intermediate aligned feature vector set corresponding to each battery energy storage material are fused in a multimodal manner to obtain the multimodal fused vector corresponding to the corresponding battery energy storage material. When the first judgment result is no, the model parameters in the first operation are updated, and the process returns to the step of independently performing the first operation on each battery energy storage material in the multimodal feature vector library to generate an intermediate aligned feature vector set corresponding to the corresponding battery energy storage material.

[0006] In a second aspect, this application provides a computer device, including: a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein the processor executes the computer program to implement the steps of the multimodal knowledge embedding method for battery energy storage material data as described above.

[0007] Thirdly, this application provides a computer-readable storage medium having a computer program stored thereon, which, when executed by a processor, implements the steps of the multimodal knowledge embedding method for battery energy storage material data as described above.

[0008] Fourthly, this application provides a computer program product, including a computer program that, when executed by a processor, implements the steps of the multimodal knowledge embedding method for battery energy storage material data as described above.

[0009] According to the specific embodiments provided in this application, this application has the following technical effects: This application provides a multimodal knowledge embedding method and related apparatus for battery energy storage material data. First, a multimodal feature vector library is constructed, including feature vectors corresponding to text modal data, image modal data, and time-series modal data, for subsequent processing. Second, a first operation is independently performed on each battery energy storage material in the multimodal feature vector library to generate an intermediate aligned feature vector set corresponding to the corresponding battery energy storage material. This intermediate aligned feature vector set includes a first feature vector, a second feature vector, and a third feature vector. The first feature vector is a text feature vector enhanced by image structure information and time-series performance information; the second feature vector is an image feature vector guided by text semantic information; and the third feature vector is a text feature vector enhanced by image structure information and time-series performance information. The time-series feature vector guided by image structure information is generated by associating different modal data information to form an intermediate aligned feature vector set, thus solving the heterogeneity problem between different modal data of battery energy storage materials. Next, a second operation is performed on the first feature vector of each battery energy storage material in the multimodal feature vector library to obtain a corresponding set of positive and negative sample pairs. A scalar loss value is calculated, and it is then determined whether the loss value has reached its minimum. If not, the model parameters used in generating the intermediate aligned feature vector set are updated, and the process returns to the step of generating the intermediate aligned feature vector set. If so, multimodal fusion is performed on the feature vectors in the intermediate aligned feature vector set corresponding to each battery energy storage material to obtain the corresponding multimodal fused vector. This application effectively solves the heterogeneity problem between different modal data of battery energy storage materials, comprehensively analyzes the correlation between the composition, structure, and performance of battery energy storage materials, and establishes a comprehensive understanding of battery energy storage materials. Attached Figure Description

[0010] To more clearly illustrate the technical solutions in the embodiments of this application or the prior art, the drawings used in the embodiments will be briefly introduced below. Obviously, the drawings described below are only some embodiments of this application. For those skilled in the art, other drawings can be obtained based on these drawings without creative effort.

[0011] Figure 1 This is an application environment diagram of a multimodal knowledge embedding method for battery energy storage material data in one embodiment of this application; Figure 2A flowchart illustrating a multimodal knowledge embedding method for battery energy storage material data provided in an embodiment of this application; Figure 3 This is a schematic diagram of the structure of a computer device provided in an embodiment of this application. Detailed Implementation

[0012] The technical solutions of the embodiments of this application will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of this application, and not all embodiments. Based on the embodiments of this application, all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of this application.

[0013] To make the above-mentioned objectives, features and advantages of this application more apparent and understandable, the application will be further described in detail below with reference to the accompanying drawings and specific embodiments.

[0014] This application provides a multimodal knowledge embedding method for battery energy storage material data, which can be applied to, for example... Figure 1In the application environment shown, terminal 102 communicates with server 104 via a network. A data storage system can store the data that server 104 needs to process. The data storage system can be set up independently, integrated into server 104, or placed in the cloud or on another server. Terminal 102 can send the battery energy storage material data to be processed to server 104. Server 104 receives the data and constructs a multimodal feature vector library for it. This multimodal feature vector library is a database constructed by processing text and image data of various battery energy storage materials. It includes various battery energy storage materials and corresponding text feature vectors, image feature vectors, and time-series feature vectors for each material. A first operation is independently performed on each battery energy storage material in the multimodal feature vector library to generate a set of intermediate-aligned feature vectors corresponding to that material. This set includes a first feature vector, a second feature vector, and a third feature vector. The first feature vector is a text feature vector enhanced by image structure information and time-series performance information. The second feature vector is an image feature vector guided by text semantic information. The third feature vector is... The time-series feature vector guided by image structure information is used; a second operation is independently performed on the first feature vector of each battery energy storage material in the multimodal feature vector library to obtain a set of positive and negative sample pairs corresponding to the first feature vector of each battery energy storage material; the set of positive and negative sample pairs includes a set of positive sample pairs and a set of negative sample pairs; based on the set of positive and negative sample pairs corresponding to the first feature vector of each battery energy storage material in the multimodal feature vector library, the current scalar loss value is calculated, and it is determined whether the current scalar loss value has reached the minimum value to obtain a first judgment result. When the first judgment result is yes, the feature vectors in the intermediate aligned feature vector set corresponding to each battery energy storage material are multimodally fused to obtain the multimodal fused vector corresponding to the corresponding battery energy storage material. When the first judgment result is no, the model parameters in the first operation are updated, and the process of independently performing the first operation on each battery energy storage material in the multimodal feature vector library to generate an intermediate aligned feature vector set corresponding to the corresponding battery energy storage material is returned. The server 104 can feed back the obtained multimodal fused vector corresponding to the corresponding battery energy storage material to the terminal 102. In addition, in some embodiments, the multimodal knowledge embedding method for battery energy storage material data can also be implemented separately by the server 104 or the terminal 102. For example, the terminal 102 can directly process the battery energy storage material data to be processed, or the server 104 can obtain the battery energy storage material data to be processed from the data storage system and process it.

[0015] The terminal 102 can be, but is not limited to, various desktop computers, laptops, smartphones, tablets, IoT devices, and portable wearable devices. IoT devices can include smart speakers, smart TVs, smart air conditioners, and smart in-vehicle devices. Portable wearable devices can include smartwatches, smart bracelets, and head-mounted devices. The server 104 can be implemented using a standalone server or a server cluster composed of multiple servers, or it can be a cloud server.

[0016] In one exemplary embodiment, such as Figure 2 As shown, a multimodal knowledge embedding method for battery energy storage material data is provided. This method is executed by a computer device, specifically by a terminal or server alone, or by both a terminal and a server. In this embodiment, the method is applied to... Figure 1 Taking server 104 as an example, the explanation includes the following steps 201 to 204. Wherein: Step 201: Construct a multimodal feature vector library; the multimodal feature vector library is a database constructed by processing text data and image data of various battery energy storage materials; the multimodal feature vector library includes various battery energy storage materials and text feature vectors, image feature vectors and time-series feature vectors corresponding to each battery energy storage material.

[0017] Step 202: Perform the first operation independently on each battery energy storage material in the multimodal feature vector library to generate an intermediate aligned feature vector set corresponding to the corresponding battery energy storage material; the intermediate aligned feature vector set includes a first feature vector, a second feature vector, and a third feature vector; the first feature vector is a text feature vector enhanced by image structure information and temporal performance information; the second feature vector is an image feature vector guided by text semantic information; the third feature vector is a temporal feature vector guided by image structure information.

[0018] Step 203: Perform the second operation independently on the first feature vector of each battery energy storage material in the multimodal feature vector library to obtain the set of positive and negative sample pairs corresponding to the first feature vector of each battery energy storage material; the set of positive and negative sample pairs includes a set of positive sample pairs and a set of negative sample pairs.

[0019] Step 204: Based on the set of positive and negative sample pairs corresponding to the first feature vector of each battery energy storage material in the multimodal feature vector library, calculate the scalar loss value for this operation, and determine whether the scalar loss value has reached the minimum value to obtain a first judgment result. When the first judgment result is yes, perform multimodal fusion on the feature vectors in the intermediate aligned feature vector set corresponding to each battery energy storage material to obtain the multimodal fused vector corresponding to the corresponding battery energy storage material. When the first judgment result is no, update the model parameters in the first operation, and return to the step of independently performing the first operation on each battery energy storage material in the multimodal feature vector library to generate the intermediate aligned feature vector set corresponding to the corresponding battery energy storage material.

[0020] By implementing steps 201 to 204 above, text modal data, image modal data, and time-series modal data of battery energy storage materials can be processed to generate a multimodal fusion vector that aggregates text semantics, image structure, and time-series performance. This vector can accurately reflect the true characteristics and behavior of battery energy storage materials and establish a comprehensive understanding of them.

[0021] In another exemplary embodiment of this application, step 201 is replaced by steps 301 to 302: Step 301: The acquired text data and image data of various battery energy storage materials are classified into text modality, image modality and time-series modality to obtain text modality data, image modality data and time-series modality data.

[0022] Step 302: Extract features from the text modal data, the image modal data, and the time-series modal data respectively to obtain the text feature vector, image feature vector, and time-series feature vector corresponding to each battery energy storage material.

[0023] Through the feature extraction process described above for multimodal data, the modal data can be represented as feature vectors for subsequent processing.

[0024] As an optional implementation, the process of acquiring text and image data of various battery energy storage materials is replaced by the following steps 401-402: Taking lithium-ion battery cathode materials as an example: Step 401: Collect text and image data of battery energy storage materials through various means such as structured databases, academic literature, patent documents and experimental records; that is, first obtain crystal structure image data of the corresponding battery energy storage material from the Materials Project database, collect text data of synthesis methods from academic literature, and extract text and image data of charge-discharge cycle curves from experimental records.

[0025] Step 402: Preprocess the acquired data according to its category; for text data such as chemical formulas and material descriptions, character correction and synonym normalization are required; for numerical data, unified measurement units, outlier removal, and normalized data are required; for image data such as XRD patterns and microscopic images, size unification, background noise reduction, and intensity normalization are required, while charge-discharge curves are required to be filtered and denoised, and interpolated and resampled to align the sequence length.

[0026] The purpose of this step is to eliminate the heterogeneity and noise of the original data, and to ensure that the multi-source data meets the high-quality requirements of being machine readable and scale-consistent, thus laying the foundation for subsequent feature fusion and model training.

[0027] As an alternative implementation method, the applicant found that there are some limitations when applying general artificial intelligence technology to the highly complex and specialized field of battery energy storage materials. For example, general artificial intelligence technology may not be able to understand the large number of technical terms, complex chemical formulas and crystallographic symbols in battery energy storage materials, which limits the ability to accurately model the constitutive relations of battery energy storage materials, leading to semantic bias or reasoning errors in tasks such as composition optimization, performance prediction or synthesis route recommendation.

[0028] To achieve accurate modeling of battery energy storage materials and reduce semantic biases or reasoning errors in tasks such as composition optimization, performance prediction, or synthesis path recommendation, step 402 above is replaced by steps 501 to 502: Step 501: Use the BERT model to extract text feature vectors from the text modal data.

[0029] The BERT model is a pre-trained language model based on the Transformer encoder structure. It features a 12-layer Transformer encoder architecture and uses a bidirectional attention mechanism to deeply understand word contextual relationships, transforming material text modal data into high-dimensional semantic vectors. Compared to RNN models that can only capture unidirectional semantics or the Word2Vec model that lacks contextual understanding, BERT's advantage lies in its ability to simultaneously consider the preceding and following context, accurately representing the complex technical terms and grammatical structures in materials science, and significantly improving the feature extraction quality for descriptive information such as material composition and properties.

[0030] Enter the original text description, such as "LiFePO4 is synthesized by solid-state method, with olivine structure and specific capacity of up to 170 mAh / g" and other professional descriptions.

[0031] The specific example operations after entering the original text description are as follows: First, the word vectors are converted into initial word vectors through a word embedding layer, forming an initial word vector sequence. The formula is as follows: ; in, For the initial word vector sequence, For the initial word vectors, , for × dimensional original text matrix space, For words The encoded vector, For word embedding matrix, It is the bias vector; Then, a bidirectional attention mechanism is used to capture the semantic relationships between words within the text, such as the dependency between "LiFePO4" and "olivine structure" and "specific capacity". Linear projection is used to generate Query, Key, and Value vectors, mapping the initial word vectors to a space suitable for attention calculation, thus avoiding the problem of mismatch in the dimensions of the original vectors. The formula is as follows: ; ; ; in, For the first The input of the layer, i.e., the input of the previous layer ( The output hidden state sequence of the layer, , , For the first Layer-specific, trainable projection weight matrix , , For the first The trainable bias vector corresponding to the layer. , , For the first The Query, Key, and Value matrix calculated by the layer; Finally, the self-attention weights and attention output are calculated. The attention weights are then normalized using softmax to ensure a reasonable distribution of attention weights and prevent excessively large weight values ​​from causing gradient anomalies. The formula is as follows: ; The self-attention calculation core of the l-th layer transformer outputs "context-enhanced word vectors", which is calculated once in each layer.

[0032] in, For attention weights, The dimension of the input text feature vector. For the first The output of the attention mechanism calculated by the layer; The resulting text modal feature vector is: ; The BERT model used in the above process extracts text modal feature vectors through a bidirectional attention mechanism, enabling a deep understanding of technical terms and complex semantic relationships in materials science texts. Correspondingly, the deep semantic understanding allows the generated text feature vectors to more comprehensively reflect the characteristics of materials, improving the discriminativeness and consistency of feature representations, as well as the accuracy of subsequent cross-modal alignment. At the same time, it optimizes the accuracy of material retrieval and the overall efficiency of the system, laying a reliable semantic foundation for multimodal fusion and effectively solving the problem of text data heterogeneity in traditional methods.

[0033] Step 502: Extract image modal feature vectors from the image modal data using the corresponding convolutional network model.

[0034] For example, image modal feature vectors can be extracted from material crystal image data using a Crystal Graph Convolutional Network (CGCNN) model. The Crystal Graph Convolutional Network model is a graph neural network specifically designed for processing crystal structures. It treats atoms as nodes and chemical bonds as edges, aggregating neighborhood atomic information through multi-layer graph convolutions, and finally outputting a fixed-dimensional feature vector through global pooling. This feature vector comprehensively characterizes the composition and structural features of the crystal. Compared to the traditional GCNN model, the CGCNN model directly extracts features from interatomic interactions, possesses translation and rotation invariance, and reveals the correlation between material structure and properties. A specific example is shown below: First, define the crystal diagram, which is represented as: ; in, It is a set of atomic nodes. It is a set of chemical bond edges; Secondly, the convolutional layer updates the atomic features: ; in, Indicates the first Feature vectors of layer atoms It is an atom The set of nearest-neighbor atoms, Represents atoms and Edge features between them It is to fuse the characteristic functions of adjacent atoms. It is an activation function; Finally, global pooling aggregates all atomic features into overall crystal features: ; in, This is a characteristic of the crystal as a whole. It is the total number of atoms in the crystal. It is the number of convolutional layers. and These are pooling layer parameters; The resulting image modal feature vector is: ; The above steps employ a crystal graph convolutional network model to abstract the material's crystal structure into a graph data structure of atomic nodes and chemical bond edges. This enables the direct extraction of the material's structural feature vectors from the microscopic scale. Furthermore, by aggregating neighborhood atomic information through multi-layer graph convolution operations, the problem of traditional image processing methods being unable to directly analyze the spatial topological relationships of crystal structures is effectively solved.

[0035] Step 503: Extract the temporal modality feature vector using the TCN model. The TCN model is a temporal convolutional network that uses dilated causal convolution and residual connections to gradually expand the receptive field by stacking convolutional layers to capture the long-term dependencies of temporal data, and finally outputs a fixed-dimensional vector that represents the global features of the entire sequence. Compared with traditional recursive models for processing time series (such as the LSTM model), the advantages of the TCN model are that its convolutional structure supports parallel computing, training is more efficient, and the dilation mechanism effectively alleviates the gradient vanishing problem. It can more accurately model the long-term trend in the evolution of material properties and is suitable for processing time series data such as charge and discharge curves that need to capture long-range correlations.

[0036] When input time-series performance data, such as 1000 sampling points of the capacity decay curve, the TCN model outputs a 512-dimensional dynamic behavior vector, representing the performance evolution trend and key features.

[0037] The TCN model first performs causal convolution, which is achieved by left-padding the one-dimensional convolution kernel. Specifically, for a layer with a kernel size of k, k-1 zeros are padded to the left of the input sequence. In this way, the first element of the output is calculated only from the first element of the input. This ensures that when the model predicts the output at time step t, it only depends on historical information at time t and before, and cannot "see" future information.

[0038] The formula for causal convolution is shown below: ; in, In time step The output, For the input at time step ti, The weights of the convolution kernel, The size of the convolution kernel; Secondly, an inflation factor d is introduced. Dilated convolution inserts d-1 zeros between each weight of the convolution kernel, thereby exponentially expanding the receptive field without increasing the number of parameters. After introducing the inflation factor d, the output at time t... It becomes: ; When d=1, it is a standard convolution; When d=2, the input indices sensed by the convolutional kernel are t, t-2, t-4, ... The formula for calculating the receptive field after increasing d layer by layer is: ; The obtained temporal modal eigenvectors are: ; The above steps utilize a TCN model, which progressively expands the receptive field through stacked multi-layer dilated convolutions. This enables accurate modeling of long-term patterns in time-series data, accurately representing the evolution sequence of material properties. This provides a stable and efficient temporal feature foundation for multimodal alignment, enhancing the reliability and real-time performance of the overall system in material property prediction and health status monitoring. By employing dilated causal convolutions and residual connection structures, the system effectively addresses the low training efficiency, vanishing gradient problems, and computational challenges arising from the serial nature of sequence processing inherent in traditional recurrent neural networks when capturing long-term dependencies.

[0039] In another exemplary embodiment of this application, in order to align the semantic spaces of text, images, and temporal modal data, step 202 is replaced by steps 601-603, wherein the target battery energy storage material is any battery energy storage material in the multimodal feature vector library: Step 601: Based on the text feature vector corresponding to the target battery energy storage material, a query matrix determined by the first learning model is used, and based on the image feature vector corresponding to the target battery energy storage material, a key matrix and a value matrix determined by the second learning model are used. Attention weights are used to capture the association between text semantic information and image structural information, and an image feature vector corresponding to the target battery energy storage material guided by text semantic information is generated.

[0040] Step 602: Based on the image feature vector of the target battery energy storage material guided by text information, the query matrix determined by the third learning model, and the key matrix and value matrix determined by the time-series feature vector of the target battery energy storage material, the attention weight is used to capture the correlation between image structure information and time-series performance information, and the time-series feature vector of the target battery energy storage material guided by image structure information is generated.

[0041] Step 603: Perform residual connection and layer normalization on the text feature vector, second feature vector and third feature vector corresponding to the target battery energy storage material to obtain the text feature vector enhanced by image structure information and time series performance information.

[0042] In another exemplary embodiment of this application, before performing step 601, when the multimodal feature vector is used as input, a separate fully connected layer (linear layer) is used for each modality, and it needs to be projected to a uniform dimension, specifically: ; ; ; in, Let be the time-series feature vector of the target battery energy storage material. , To project the time-series feature vector of the target battery energy storage material onto the weight matrix and bias matrix of the corresponding text feature vector dimension; The image feature vector of the target battery energy storage material. , The weight matrix and bias matrix are used to project the image feature vector of the target battery energy storage material onto the dimension of the corresponding text feature vector.

[0043] The image feature vector corresponding to the target battery energy storage material in step 601 above, guided by textual semantic information, is obtained through the following calculation process: The atomic arrangement features in the image corresponding to the text "olivine structure" are used to integrate the image features with the textual semantic information: ; in, This is the second feature vector. This is the text-image attention weight matrix. The value matrix is ​​determined using the second learning model based on the image feature vector corresponding to the target battery energy storage material.

[0044] The calculation method is as follows: ; in, The query matrix is ​​determined using the first learning model based on the text feature vector corresponding to the target battery energy storage material. The bond matrix is ​​determined using a second learning model based on the image feature vectors corresponding to the target battery energy storage material. For normalized exponential functions, The dimension of the text feature vector corresponding to the target battery energy storage material.

[0045] The calculation method is as follows: ; in, The text feature vector of the target battery energy storage material. , Here are the weight matrix and bias matrix of the first learning model; and The calculation method is as follows: ; ; in, This is the projection vector of the image feature vector of the target battery energy storage material onto the dimension of the corresponding text feature vector. , , Let be the key weight matrix, value weight matrix, key bias matrix, and value bias matrix of the second learning model.

[0046] In another exemplary embodiment of this application, the time-series feature vector corresponding to the target battery energy storage material in step 602 above, guided by image semantic information, is obtained through the following calculation process: ; in, The third eigenvector, The image-temporal attention weight matrix is... The value matrix is ​​determined using the fourth learning model based on the time-series feature vectors corresponding to the target battery energy storage material.

[0047] The calculation method is as follows: ; in, The query matrix is ​​determined using a third learning model based on the image feature vectors guided by textual information corresponding to the target battery energy storage material. The bond matrix is ​​determined using the fourth learning model based on the time-series feature vectors corresponding to the target battery energy storage material. For normalized exponential functions, The dimension of the image feature vector corresponding to the target battery energy storage material; The calculation method is as follows: ; in, Let be the second eigenvector of the target battery energy storage material. , Here are the weight matrix and bias matrix of the third learning model; and The calculation method is as follows: ; ; in, This represents the projection of the time-series feature vector of the target battery energy storage material onto the dimension of the corresponding text feature vector. '、 '、 For the fourth learning model, there are the key weight matrix, value weight matrix, key bias matrix, and value bias matrix.

[0048] In another exemplary embodiment of this application, the text feature vector corresponding to the target battery energy storage material in step 603 above, enhanced by image structure information and time-series performance information, is obtained through the following calculation process: ; in, This is the layer normalization function.

[0049] In another exemplary embodiment of this application, the set of positive and negative sample pairs corresponding to the first feature vector of each battery energy storage material in step 203 above is obtained through the following process: Positive and negative sample pairs are constructed from the intermediate alignment feature vector set corresponding to various battery energy storage materials obtained in step 202 to teach the model how to distinguish between similar and dissimilar data, providing an optimization target for alignment.

[0050] Define "similar" and "dissimilar" feature pairs: positive sample pairs are those where multimodal features of the same material are close together in space, and negative sample pairs are those where features of different materials are far apart, as shown in the following formula: ; ; in, For the set of positive sample pairs, For the set of negative sample pairs, This represents the battery energy storage material corresponding to the anchor vector. This represents the second type of battery energy storage material. This represents the third type of battery energy storage material; among them, the battery energy storage material corresponding to the anchor vector, the second type of battery energy storage material, and the third type of battery energy storage material are all different battery energy storage materials.

[0051] In another exemplary embodiment of this application, the calculation of the scalar loss value for the set of positive and negative sample pairs corresponding to the first feature vector of each battery energy storage material in step 204 above is achieved through the following calculation process: ; in, The number of battery energy storage materials in the multimodal feature vector library; The first feature vector, i.e., the anchor vector, of each battery energy storage material in the multimodal feature vector library; These are positive sample pairs of anchor vectors; For all sample pairs within the set of positive and negative sample pairs of the anchor vector; For similarity functions; Here, is the temperature parameter, used to adjust the smoothness of the distribution and the sensitivity of the loss function; specifically, here we take . .

[0052] In another exemplary embodiment of this application, the update of the model parameters in the first operation in step 204 specifically includes: The parameters of the first learning model, the second learning model, the third learning model, and the fourth learning model are updated respectively.

[0053] As an optional implementation, in another exemplary embodiment of this application, the update of the model parameters in the first operation in step 204 above is achieved through the following calculation process: ; For parameter update rules, This includes all parameters of the learning model, namely the query weight matrix, query bias matrix, key weight matrix, key bias matrix, value weight matrix, and value bias matrix. This is a scalar loss.

[0054] The above process can be summarized as a "cross-attention and contrastive learning collaborative mechanism." This refers to first establishing fine-grained semantic relationships between features of different modalities through cross-attention (steps 601-603), and then using contrastive learning to force the associated features (i.e., the process of constructing positive and negative sample pairs in step 204) to achieve global alignment in a unified semantic space. The two form a "local association-global optimization" closed loop, jointly solving the problem of semantic gaps and inconsistent spatial distribution caused by modal heterogeneity in multimodal features. Cross-attention compensates for the deficiency of contrastive learning, which "only focuses on global distance and lacks local semantic relationships." Through dynamic weight allocation, it accurately captures detailed correspondences between modalities. Multiple rounds of attention interaction and loss optimization from contrastive learning form feedback, enabling features to possess both fine-grained semantic consistency and reasonable global spatial distribution.

[0055] By integrating interactive attention and contrastive learning collaborative mechanisms, along with gating fusion methods, we achieved the following in the construction of multimodal knowledge embedding vectors for energy storage materials: accurate alignment of the semantic spaces of text, images, and time-series data, improved cross-modal retrieval accuracy, and effectively solved the problem of material data heterogeneity.

[0056] In another exemplary embodiment of this application, in step 204 above, when it is determined that the scalar loss value has reached its minimum value, the aligned three-modal characteristics of the corresponding battery energy storage material are obtained, as shown in the following formula: ; ; ; As an optional implementation, in another exemplary embodiment of this application, the multimodal fusion of feature vectors in the intermediate aligned feature vector set corresponding to each battery energy storage material described in step 204 above is replaced by steps 701 to 705: Step 701, calculate the gating weights: ; ; ; in, For text modality gating weights, is the weight matrix of the gated network, used to project text features into the gate space; These are text modality feature vectors from the set of center-aligned feature vectors; This is the bias term of the gated network; the calculation process of the gate weights for image modality feature vectors and temporal modality feature vectors is the same as that for text modality feature vectors, and will not be repeated here.

[0057] Step 702: Perform weight normalization, calculated as follows: ; ; .

[0058] Step 703: Perform weighted fusion, calculated as follows: ; in, This is the initial fusion feature vector.

[0059] Step 704, perform feature dimension compression: ; in, This is the compressed fused feature vector; This is a compressed weight matrix used for dimensionality reduction. This step aims to compress the bias vector, thereby reducing computational complexity and the number of parameters, improving efficiency, mitigating the risk of overfitting, removing redundant information, and extracting the most essential cross-modal information.

[0060] Step 705, Feature Normalization and Verification: ; ; in, This is the normalized fused feature vector; This is the normalization function; For a unified multimodal fusion vector; For similarity functions; As a reference feature vector; The similarity threshold; This is a scaling factor used to shrink the feature vector.

[0061] The above steps employ a gated fusion method to achieve adaptive integration of cross-modal information. The core of gated fusion is the dynamic adjustment of the contribution of each modal vector through learnable gate weights. The gated fusion method exhibits strong dynamic adaptability; the gate weights are automatically adjusted based on input features, increasing the weight of text vectors when analyzing material composition and emphasizing temporal vectors when studying structural evolution. Regarding dimensionality control, gated fusion maintains stable output dimensions through weighted summation rather than vector concatenation, avoiding feature inflation. Clearly, gated fusion adaptively optimizes feature integration through dynamic weight allocation, reducing material performance prediction errors and significantly improving computational efficiency.

[0062] This application provides a multimodal knowledge embedding method for battery energy storage material data. First, text and image data are collected from various sources, including structured databases, academic literature, patent documents, and experimental records, and then standardized and preprocessed to form a multimodal database. Subsequently, the obtained data is classified into three modalities: text, image, and time-series. The BERT model is used to extract text feature vectors, a crystal image convolutional neural network (CGCNN) is used to process crystal image data to extract visual feature vectors, and a temporal convolutional network (TCN) is used to analyze time-series data to generate dynamic vectors that capture performance evolution trends. After generating feature vectors for each modality, a cross-attention and contrastive learning collaborative mechanism is used to align these vectors from different sources in the semantic space to maintain consistency between modalities. Then, a gating fusion method is used to fuse the multimodal vectors. Finally, a unified multimodal fused vector that comprehensively represents the full range of material information is output, providing a strong data foundation for subsequent material retrieval, analysis, and prediction.

[0063] The core advantage of this application lies in its combination of dynamic adaptability and strong interpretability. By integrating text, image, and time-series data into a cross-modal learning framework, it effectively solves the adaptability problem of general models in the field of battery energy storage materials science. This method utilizes a collaborative mechanism of cross-attention and contrastive learning to achieve semantic alignment of multimodal information and employs a hierarchical feature fusion strategy to generate a unified vector representation containing the correlation between material composition, structure, and performance. This overcomes the limitations of traditional single-modal analysis, providing a new technical path for the intelligent design and performance optimization of energy storage materials. It also addresses the problem of insufficient cross-modal semantic alignment accuracy and the large matching error during cross-modal retrieval caused by the discrete distribution of text, image, and time-series information in the semantic space.

[0064] This application also provides an application scenario in which the aforementioned multimodal knowledge embedding method for battery energy storage material data is applied. Specifically, the multimodal knowledge embedding method for battery energy storage material data provided in this embodiment can be applied to a battery energy storage material representation vector display scenario. The battery energy storage material representation vector display scenario includes a data production stage, a data processing link, and a vector display stage. Battery energy storage material data enters the data processing link from the data production stage, undergoes multimodal data information extraction and fusion to obtain a unified multimodal fusion vector, and then enters the downstream vector display stage. The multimodal knowledge embedding method for battery energy storage material data provided in this embodiment belongs to the multimodal data fusion stage in the data processing link. Specifically, in the process of extracting and fusing multimodal data information to obtain a unified multimodal fusion vector, the data can be processed based on cross-attention and contrastive learning collaboration, as well as gated fusion methods, to generate a unified multimodal fusion vector.

[0065] In one exemplary embodiment, a computer device is provided, which may be a server or a terminal, and its internal structure diagram may be as follows. Figure 3 As shown, this computer device includes a processor, memory, input / output (I / O) interfaces, and a communication interface. The processor, memory, and I / O interfaces are connected via a system bus, and the communication interface is also connected to the system bus via the I / O interfaces. The processor provides computational and control capabilities. The memory includes non-volatile storage media and internal memory. The non-volatile storage media stores the operating system, computer programs, and a database. The internal memory provides the environment for the operating system and computer programs in the non-volatile storage media to run. The database stores unified multimodal fusion vector data. The I / O interfaces are used for information exchange between the processor and external devices. The communication interface is used for communication with external terminals via a network connection. When executed by the processor, the computer program implements a multimodal knowledge embedding method for battery energy storage material data.

[0066] Those skilled in the art will understand that Figure 3 The structures shown are merely block diagrams of some structures related to the present application and do not constitute a limitation on the computer device to which the present application is applied. Specific computer devices may include more or fewer components than shown in the figures, or combine certain components, or have different component arrangements. In an exemplary embodiment, a computer device is provided, including a memory and a processor. The memory stores a computer program, and the processor executes the computer program to implement the steps in the above-described method embodiments.

[0067] In one exemplary embodiment, a computer-readable storage medium is provided storing a computer program that, when executed by a processor, implements the steps in the above-described method embodiments.

[0068] In one exemplary embodiment, a computer program product is provided, including a computer program that, when executed by a processor, implements the steps in the above-described method embodiments.

[0069] It should be noted that the user information (including but not limited to user device information, user personal information, etc.) and data (including but not limited to data used for analysis, data stored, data displayed, etc.) involved in this application are all information and data authorized by the user or fully authorized by all parties. Moreover, the collection, use and processing of the relevant data are carried out in compliance with the relevant data protection laws and policies of the country where the location is located, and with the authorization granted by the owner of the corresponding device.

[0070] Those skilled in the art will understand that all or part of the processes in the above embodiments can be implemented by a computer program instructing related hardware. The computer program can be stored in a non-volatile computer-readable storage medium. When executed, the computer program can include the processes of the embodiments described above. Any references to memory, databases, or other media used in the embodiments provided in this application can include at least one of non-volatile and volatile memory. Non-volatile memory can include read-only memory (ROM), magnetic tape, floppy disk, flash memory, optical memory, high-density embedded non-volatile memory, resistive random access memory (ReRAM), magnetic random access memory (MRAM), ferroelectric random access memory (FRAM), phase change memory (PCM), graphene memory, etc. Volatile memory can include random access memory (RAM) or external cache memory, etc. By way of illustration and not limitation, RAM can take many forms, such as Static Random Access Memory (SRAM) or Dynamic Random Access Memory (DRAM).

[0071] The databases involved in the embodiments provided in this application may include at least one type of relational database and non-relational database. Non-relational databases may include, but are not limited to, blockchain-based distributed databases. The processors involved in the embodiments provided in this application may be general-purpose processors, central processing units, graphics processing units, digital signal processors, programmable logic devices, quantum computing-based data processing logic devices, etc., and are not limited to these.

[0072] The technical features of the above embodiments can be combined in any way. For the sake of brevity, not all possible combinations of the technical features in the above embodiments are described. However, as long as there is no contradiction in the combination of these technical features, they should be considered to be within the scope of this specification.

[0073] This document uses specific examples to illustrate the principles and implementation methods of this application. The descriptions of the above embodiments are only for the purpose of helping to understand the methods and core ideas of this application. Furthermore, those skilled in the art will recognize that, based on the ideas of this application, there will be changes in the specific implementation methods and application scope. Therefore, the content of this specification should not be construed as a limitation of this application.

Claims

1. A multimodal knowledge embedding method for battery energy storage material data, characterized in that, The multimodal knowledge embedding method for battery energy storage material data includes: A multimodal feature vector library is constructed; the multimodal feature vector library is a database constructed by processing text data and image data of various battery energy storage materials; the multimodal feature vector library includes various battery energy storage materials and text feature vectors, image feature vectors and time-series feature vectors corresponding to each battery energy storage material; For each battery energy storage material in the multimodal feature vector library, a first operation is performed independently to generate an intermediate aligned feature vector set corresponding to the corresponding battery energy storage material; the intermediate aligned feature vector set includes a first feature vector, a second feature vector, and a third feature vector; the first feature vector is a text feature vector enhanced by image structure information and temporal performance information; the second feature vector is an image feature vector guided by text semantic information; the third feature vector is a temporal feature vector guided by image structure information; The second operation is performed independently on the first feature vector of each battery energy storage material in the multimodal feature vector library to obtain a set of positive and negative sample pairs corresponding to the first feature vector of each battery energy storage material; the set of positive and negative sample pairs includes a set of positive sample pairs and a set of negative sample pairs. Based on the set of positive and negative sample pairs corresponding to the first feature vector of each battery energy storage material in the multimodal feature vector library, the scalar loss value is calculated, and it is determined whether the scalar loss value has reached the minimum value to obtain a first judgment result. When the first judgment result is yes, the feature vectors in the intermediate aligned feature vector set corresponding to each battery energy storage material are fused in a multimodal manner to obtain the multimodal fused vector corresponding to the corresponding battery energy storage material. When the first judgment result is no, the model parameters in the first operation are updated, and the process returns to the step of independently performing the first operation on each battery energy storage material in the multimodal feature vector library to generate an intermediate aligned feature vector set corresponding to the corresponding battery energy storage material.

2. The multimodal knowledge embedding method for battery energy storage material data according to claim 1, characterized in that, The construction of the multimodal feature vector library specifically includes: The acquired text and image data of various battery energy storage materials are classified into text modality, image modality, and time-series modality to obtain text modality data, image modality data, and time-series modality data. Feature extraction is performed on the text modal data, the image modal data, and the time-series modal data respectively to obtain the text feature vector, image feature vector, and time-series feature vector corresponding to each battery energy storage material.

3. The multimodal knowledge embedding method for battery energy storage material data according to claim 1, characterized in that, The first operation specifically includes: Based on the text feature vector corresponding to the target battery energy storage material, a query matrix is ​​determined using the first learning model, and a key matrix and value matrix are determined using the second learning model based on the image feature vector corresponding to the target battery energy storage material. Attention weights are used to capture the association between text semantic information and image structural information, and an image feature vector corresponding to the target battery energy storage material guided by text semantic information is generated. Based on the image feature vector guided by text information corresponding to the target battery energy storage material, the query matrix determined by the third learning model, and the key matrix and value matrix determined by the fourth learning model based on the time-series feature vector corresponding to the target battery energy storage material, attention weights are used to capture the correlation between image structure information and time-series performance information, and the time-series feature vector guided by image structure information corresponding to the target battery energy storage material is generated. The text feature vectors, second feature vectors, and third feature vectors corresponding to the target battery energy storage material are subjected to residual connection and layer normalization to obtain text feature vectors enhanced by image structure information and temporal performance information. The target battery energy storage material is any battery energy storage material in the multimodal feature vector library.

4. The multimodal knowledge embedding method for battery energy storage material data according to claim 3, characterized in that, The model parameter updates in the first operation specifically include: The parameters of the first learning model, the second learning model, the third learning model, and the fourth learning model are updated respectively.

5. The multimodal knowledge embedding method for battery energy storage material data according to claim 1, characterized in that, The second operation specifically includes: Using the first feature vector of each battery energy storage material in the multimodal feature vector library as the anchor vector, obtain the set of positive sample pairs and the set of negative sample pairs of the anchor vector of each battery energy storage material; The set of positive sample pairs includes: a vector pair consisting of an anchor vector and a second feature vector in the set of intermediate-aligned feature vectors containing the anchor vector; a vector pair consisting of an anchor vector and a third feature vector in the set of intermediate-aligned feature vectors containing the anchor vector; and a vector pair consisting of the second and third feature vectors in the set of intermediate-aligned feature vectors containing the anchor vector. The set of negative sample pairs includes: a vector pair consisting of an anchor vector and a second feature vector corresponding to the second type of battery energy storage material, and a vector pair consisting of an anchor vector and a third feature vector corresponding to the third type of battery energy storage material; the second type of battery energy storage material and the third type of battery energy storage material are both battery energy storage materials in the multimodal feature vector library, and the battery energy storage material corresponding to the anchor vector, the second type of battery energy storage material and the third type of battery energy storage material are different battery energy storage materials.

6. The multimodal knowledge embedding method for battery energy storage material data according to claim 1, characterized in that, The formula for calculating the second eigenvector is: ; in, Let be the second feature vector corresponding to the target battery energy storage material. This is the text-image attention weight matrix. The value matrix is ​​determined using the second learning model based on the image feature vector corresponding to the target battery energy storage material; The calculation formula is: ; in, The query matrix is ​​determined using the first learning model based on the text feature vector corresponding to the target battery energy storage material. The bond matrix is ​​determined using a second learning model based on the image feature vectors corresponding to the target battery energy storage material. For normalized exponential functions, The dimension of the text feature vector corresponding to the target battery energy storage material; The calculation method is as follows: ; in, This represents the text feature vector corresponding to the target battery energy storage material. , These are the weight matrix and bias matrix of the first learning model, respectively; and The calculation formula is: ; ; in, This is the projection vector of the image feature vector corresponding to the target battery energy storage material onto the dimension of the text feature vector corresponding to the target battery energy storage material. , , These are the key weight matrix, value weight matrix, key bias matrix, and value bias matrix of the second learning model, respectively. The calculation formula is: ; in, This represents the image feature vector corresponding to the target battery energy storage material. , These are the weight matrix and bias matrix, respectively, when projecting the image feature vector corresponding to the target battery energy storage material onto the dimension of the text feature vector corresponding to the target battery energy storage material; The target battery energy storage material is any battery energy storage material in the multimodal feature vector library.

7. The multimodal knowledge embedding method for battery energy storage material data according to claim 1, characterized in that, The formula for calculating the third feature vector is as follows: ; in, This represents the third feature vector corresponding to the target battery energy storage material. The image-temporal attention weight matrix is... The value matrix is ​​determined using the fourth learning model based on the time-series feature vectors corresponding to the target battery energy storage material; The calculation formula is: ; in, The query matrix is ​​determined using a third learning model based on the image feature vectors guided by textual information corresponding to the target battery energy storage material. The bond matrix is ​​determined using the fourth learning model based on the time-series feature vectors corresponding to the target battery energy storage material. For normalized exponential functions, The dimension of the image feature vector corresponding to the target battery energy storage material; The calculation formula is: ; in, Let be the second eigenvector of the target battery energy storage material. , Here are the weight matrix and bias matrix of the third learning model; and The calculation formula is: ; ; in, This represents the projection of the time-series feature vector corresponding to the target battery energy storage material onto the dimension of the text feature vector corresponding to the target battery energy storage material. '、 '、 These are the key weight matrix, value weight matrix, key bias matrix, and value bias matrix of the fourth learning model, respectively. The calculation formula is: ; in Let be the time-series feature vector of the target battery energy storage material. , The weight matrix and bias matrix are used to project the time-series feature vector corresponding to the target battery energy storage material onto the dimension of the text feature vector corresponding to the target battery energy storage material.

8. A computer device, comprising: A memory, a processor, and a computer program stored in the memory and capable of running on the processor, characterized in that the processor executes the computer program to implement a multimodal knowledge embedding method for battery energy storage material data according to any one of claims 1-7.

9. A computer-readable storage medium having a computer program stored thereon, characterized in that, When the computer program is executed by the processor, it implements a multimodal knowledge embedding method for battery energy storage material data as described in any one of claims 1-7.

10. A computer program product, comprising a computer program, characterized in that, When the computer program is executed by the processor, it implements a multimodal knowledge embedding method for battery energy storage material data as described in any one of claims 1-7.