Cross-business data collaborative analysis method and system based on multi-modal AI

By constructing a unified semantic space across businesses and a dynamically updated representation difference mapping table, and utilizing a base matching network and a perspective correction network, the problems of unified representation of multimodal data and differences in business perspectives are solved, thereby improving the accuracy and adaptability of cross-business collaborative analysis.

CN122241087APending Publication Date: 2026-06-19HENAN ZEYUAN NETWORK TECH CO LTD

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
HENAN ZEYUAN NETWORK TECH CO LTD
Filing Date
2026-03-18
Publication Date
2026-06-19

AI Technical Summary

Technical Problem

Existing technologies struggle to achieve unified semantic representation while maintaining the information integrity of multimodal data. They suffer from differences in business perspectives, lack effective semantic alignment mechanisms, and the models cannot adapt to the dynamic changes in business systems.

Method used

A unified semantic space is constructed across businesses. Standardized representation of multimodal data is achieved through base matching network and perspective correction network. The representation difference mapping table is dynamically updated, and graph neural network is used to identify key paths for collaborative analysis.

Benefits of technology

It achieves standardized representation of multimodal data in a unified semantic space, dynamically optimizes business understanding biases, and improves the accuracy and interpretability of cross-business collaborative analysis.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN122241087A_ABST
    Figure CN122241087A_ABST
Patent Text Reader

Abstract

This invention discloses a cross-business data collaborative analysis method and system based on multimodal AI, belonging to the field of artificial intelligence technology. It solves the technical problems of bridging the semantic gap between multiple business systems and the inability to quantify differences in business perspectives in existing technologies. The method includes: constructing a unified cross-business semantic space containing atomic event types and entity types and assigning prototype vectors; inputting multimodal data into a base matching network to obtain base matching vectors, querying a perspective correction parameter set from a representation difference mapping table based on business identifiers, and inputting it into a perspective correction network to obtain corrected semantic representation vectors; parsing the collaborative analysis task to obtain target types and retrieving associated vectors; constructing a cross-business semantic association graph using the associated vectors as nodes, identifying key paths connecting different business systems, and outputting the collaborative analysis results after differentiation and fusion through a graph neural network. This invention achieves semantic alignment and perspective-aware collaborative analysis across business systems.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention belongs to the field of artificial intelligence technology, specifically relating to a cross-business data collaborative analysis method and system based on multimodal AI. Background Technology

[0002] As enterprises deepen their digital transformation, different business systems (such as production systems, marketing systems, after-sales systems, and financial systems) accumulate massive amounts of multimodal data, including image data, text data, and time-series data. This data contains rich business information. Taking the manufacturing industry as an example, the production system (the first business system) has accumulated massive amounts of equipment sensor time-series data and quality inspection image data; while the after-sales system (the second business system) records a large amount of text data of user complaints. When 'equipment downtime' occurs, the production system focuses on repair costs, while the after-sales system focuses on customer churn rate...

[0003] In existing technologies, cross-business data collaborative analysis mainly faces the following technical challenges: First, images, text, and time-series data have different data structures and feature spaces, making it difficult for existing technologies to achieve unified semantic representation while maintaining the integrity of their respective information. Second, different business systems have inherent biases in their understanding of the same event. For example, the production system focuses on "equipment failure" in terms of downtime and maintenance costs, while the after-sales system focuses on "equipment failure" in terms of user complaints and satisfaction. Existing technologies cannot effectively quantify and model these differences in business perspectives, leading to information distortion during cross-business collaborative analysis. Third, different business systems have different localized expressions for the same atomic event or entity. For example, system A calls it "customer complaint," while system B calls it "customer grievance." Existing technologies lack effective semantic alignment mechanisms, making it difficult to achieve semantic interoperability across business systems. Fourth, the misunderstanding biases of business systems change dynamically with adjustments to business rules and the accumulation of practical experience. Once the semantic alignment models in existing technologies are trained, they remain fixed and cannot adapt to the continuous evolution of business systems.

[0004] Therefore, cross-business data collaborative analysis methods and systems based on multimodal AI have emerged. Summary of the Invention

[0005] This invention aims to solve at least one of the technical problems existing in the prior art; to this end, this invention proposes a cross-business data collaborative analysis method and system based on multimodal AI to solve the technical problems of difficulty in uniformly representing multimodal data, difficulty in quantifying differences in business perspectives, difficulty in bridging semantic gaps, and lack of dynamic optimization capabilities in collaborative analysis.

[0006] To address the aforementioned problems, the first aspect of this invention provides a cross-business data collaborative analysis method based on multimodal AI, comprising the following steps: S1: Obtain the first multimodal data from the first business system and the second multimodal data from the second business system; S2: Construct a cross-business unified semantic space, which includes a predefined set of atomic event types and a set of entity types, and assign a prototype vector to each atomic event type and entity type; S3: Input the first multimodal data into the base matching network composed of a cross-modal interaction layer and a semantic decoupling layer to obtain the base matching vector, and obtain the business identifier of the business system to which the first multimodal data belongs. Query the view correction parameter set corresponding to the business identifier from the pre-built representation difference mapping table. Input the base matching vector and the view correction parameter set into the view correction network to obtain the corrected matching score vector, which is used as the first semantic representation vector corresponding to the first multimodal data. Perform the same operation on the second multimodal data to obtain the second semantic representation vector; S4: Receive the collaborative analysis task, and parse the collaborative analysis task to obtain the target event type and target entity type; S5: Based on the target event type and target entity type, retrieve the associated vectors from the first semantic representation vector set and the second semantic representation vector set respectively to obtain the first associated vector set and the second associated vector set; S6: Construct a cross-business semantic association graph using the semantic membership vectors of the first association vector set and the second association vector set as nodes, identify the shortest path connecting the first business system node and the second business system node in the association graph as the critical path, and mark the nodes on the critical path as high-weight nodes; input the association graph into a graph neural network containing multiple graph convolutional layers, and output the cross-business collaborative analysis results after feature differentiation and fusion of high-weight nodes and ordinary weight nodes.

[0007] Preferably, in step S2, constructing a unified semantic space across services includes the following steps: Construct a cross-business atomic event library and an entity library. The atomic event library contains the smallest granularity business event types that co-occur across multiple business systems, and the entity library contains business object types that co-occur across multiple business systems. For each business system, obtain the system-specific business rule documents and business process documents, and use natural language processing technology to extract the system's local semantic concept system; The local semantic concept system of each business system is compared and analyzed with the cross-business atomic event library and entity library to identify different ways of expressing the same atomic event or entity in each business system and to construct a representation difference mapping table. Construct a semantic decoupling-recoupling encoder network, the network comprising: a modality-specific coding subnet, a service-specific semantic decoupling subnet, and a cross-service semantic recoupling subnet; Construct a first loss function, a second loss function, and a third loss function, and train the semantic decoupling-recoupling encoder network using the training dataset; After training, the output space of the cross-business semantic recoupling subnet is determined as the cross-business unified semantic space, and the prototype vectors of each atomic event type and entity type in the cross-business unified semantic space are determined.

[0008] Preferably, in step S3, the process of generating the first semantic representation vector and the second semantic representation vector includes the following steps: The first multimodal data is input into a modality-specific coding subnet, and the multimodal fusion features are output. Obtain the business identifier of the business system to which the first multimodal data belongs. and the representation difference mapping table ; Construct a view-aware matcher, which includes a base matching network and A perspective correction network, in which This represents the total number of atomic event types and entity types. The multimodal fusion features Input to a basis matching network, output basis matching vectors ,in, For the first multimodal data and the second Base matching score for each atomic event type or entity type; From the representation difference mapping table Inquiry and Business Identifier Corresponding view correction parameter set ,in, For business systems For the The correction amount for understanding the type of an atomic event or entity; For each atomic event type or entity type , base matching score With viewpoint correction parameters The input is fed into the corresponding viewpoint correction network to calculate the corrected matching score. ,in, For the first A perspective-based network correction and For the first Each perspective corrects the weight coefficients determined through model training in the network; All The corrected matching scores are combined into a vector, which serves as the semantic membership vector corresponding to the first multimodal data. The second multimodal data is input into the modality-specific coding subnet and the viewpoint-aware matcher in the same way as the first multimodal data, and the corresponding semantic membership vector is output as the second semantic representation vector set.

[0009] Preferably, obtaining the business identifier of the business system to which the first multimodal data belongs and performing feature fusion includes the following steps: The image data in the first multimodal data is input into an image encoder based on a residual network, and the image feature vector is output. Text data is input into a pre-trained language model encoder based on bidirectional Transformer, and the output is a text feature vector; The time-series data is input into a bidirectional long short-term memory network encoder, which outputs a time-series feature vector. The image feature vector, text feature vector, and temporal feature vector are concatenated and then subjected to dimensionality reduction mapping through a fully connected layer to output the multimodal fusion feature. .

[0010] Preferably, the construction and training of the base matching network includes the following steps: Construct a base matching network, which consists of a cross-modal interaction layer and a semantic decoupling layer; The cross-modal interaction layer includes a multi-head attention mechanism for capturing multimodal fusion features. The interaction relationships between different modal feature dimensions are analyzed to output interactive enhancement features. The semantic decoupling layer includes Each feature selection subnetwork corresponds to an atomic event type or entity type and is used to enhance features from interactions. Select the feature dimension related to the type; Each feature selection subnetwork outputs a selection mask. ,in, for The dimension will and After element-wise multiplication, the result is fed into the fully connected layer of this sub-network to obtain the [number of elements]. Individual matching score ; The base matching network is trained using a training dataset. The training objectives include: First objective: Minimize the base matching loss function. Second objective: Minimize the mask sparsity loss. Third objective: Minimize mask diversity loss ;in, As an indicator variable, when the training sample belongs to the first... The value is 1 when it is an atomic event type or entity type, and 0 otherwise. To select a mask L1 norm, To select a mask and The number of dimensions that simultaneously have a value of 1.

[0011] Preferably, the understanding correction amount includes the following steps: The data collection system is marked as number 1 An initial sample set of atomic event types or entity types. Calculate the initial understanding correction amount Store in the representation difference mapping table; Read the understanding correction amount currently used by the business system from the representation difference mapping table. It also acquires newly added unlabeled data from the business system in real time, inputs the unlabeled data into the perspective perception matcher, obtains the corrected matching score, and pushes samples with matching scores ≥ preset matching thresholds to the manual review queue of the business system. When the business system returns the score for manual review of the pushed sample, the base matching score of the sample is recorded. Corrected score And manual review scores ; Construct a function to evaluate the effect of the correction ; When the accumulated number of samples reaches a preset threshold, the average correction effect is calculated. and the average gradient of the corrected score ; Understanding the correction amount based on average correction effect and average gradient update. ,in, The preset update step size; The updated Store it in the representation difference mapping table, replacing the original understanding correction amount.

[0012] Preferably, step S4 includes the following steps: The collaborative analysis task is input into the task parser, which outputs preliminary analysis results, which include a set of candidate target event types and a set of candidate target entity types. The preliminary analysis results are sent to the first business system and the second business system respectively. Receive a first confirmation result returned by the first business system and a second confirmation result returned by the second business system. The first confirmation result and the second confirmation result respectively include the score of each business system’s degree of recognition of the candidate target event type and the candidate target entity type. The first confirmation result and the second confirmation result are merged, and the candidate target event types and candidate target entity types are reordered based on the score of the degree of recognition after fusion. The top N event types after reordering are selected as target event types, and the top M entity types after reordering are selected as target entity types.

[0013] Preferably, step S5 includes the following steps: Convert the target event type and target entity type into a target indication vector; A hierarchical retrieval index is constructed, which is organized according to the category hierarchy of atomic event type and entity type. The first layer is a coarse-grained category index, and the second layer is a fine-grained atomic event type and entity type index. Based on the category hierarchy of the target event type and the target entity type, after locating the corresponding category cluster in the coarse-grained category index, the semantic membership vector most similar to the target indicator vector is retrieved in the fine-grained index corresponding to the category cluster. The first candidate vector set is obtained by retrieving from the first semantic representation vector set, and the second candidate vector set is obtained by retrieving from the second semantic representation vector set; Calculate the similarity between each vector in the first candidate vector set and the target indicator vector, and select the top K vectors with the highest similarity as the first associated vector set; The same process is applied to the second candidate vector set to obtain the second associated vector set.

[0014] Preferably, step S6 includes the following steps: Use each semantic membership vector in the first and second association vector sets as a node to construct the initial graph structure; The business logic relationships between atomic event types and entity types are obtained from the business knowledge graph. Edges are added between related nodes according to the business logic relationships to obtain a cross-business semantic association graph. Identify the key path in the cross-business semantic association graph, where the key path is the shortest path connecting the first business system node and the second business system node; Nodes on the critical path are marked as high-weight nodes, and the remaining nodes are marked as ordinary-weight nodes. The cross-business semantic association graph is input into a graph neural network, which contains multiple graph convolutional layers. During graph convolution, the feature update frequency of high-weight nodes is higher than that of ordinary weight nodes. After multi-layer graph convolution, the final feature representation of all nodes is obtained. The feature representation of high-weight nodes is distinguished and fused with the feature representation of ordinary-weight nodes to obtain the global feature representation. The global feature representation is input into the classifier, and the cross-business collaborative analysis results are output.

[0015] A second aspect of the present invention provides a cross-business data collaborative analysis system based on multimodal AI, comprising the following modules: Data acquisition module: used to acquire the first multimodal data of the first business system and the second multimodal data of the second business system; Semantic space construction module: used to construct a cross-business unified semantic space, which includes a predefined set of atomic event types and a set of entity types, and assigns a prototype vector to each atomic event type and entity type; Semantic representation generation module: It is used to input the first multimodal data into the base matching network composed of a cross-modal interaction layer and a semantic decoupling layer to obtain the base matching vector, obtain the business identifier of the business system to which the first multimodal data belongs, query the view correction parameter set corresponding to the business identifier from the pre-built representation difference mapping table, input the base matching vector and the view correction parameter set into the view correction network to obtain the corrected matching score vector, which is used as the first semantic representation vector corresponding to the first multimodal data; Perform the same operation on the second multimodal data to obtain the second semantic representation vector; Task parsing module: used to receive collaborative analysis tasks and parse the collaborative analysis tasks to obtain the target event type and target entity type; The association retrieval module is used to retrieve association vectors from the first semantic representation vector set and the second semantic representation vector set respectively according to the target event type and the target entity type, so as to obtain the first association vector set and the second association vector set. Collaborative Analysis Module: This module constructs a cross-business semantic association graph by using the semantic membership vectors in the first and second association vector sets as nodes. It identifies the shortest path connecting the nodes of the first and second business systems in the association graph as the critical path and marks the nodes on the critical path as high-weight nodes. The association graph is then input into a graph neural network containing multiple graph convolutional layers. After feature differentiation and fusion between high-weight nodes and ordinary weight nodes, the module outputs the cross-business collaborative analysis results.

[0016] The beneficial effects of this invention are: This invention constructs a dual-network architecture consisting of a base matching network and a perspective correction network. The base matching network maps multimodal data to a unified semantic space across businesses, outputting a base matching vector that represents the initial matching degree between the data and each atomic event type and entity type. The perspective correction network queries the corresponding understanding correction amount from the representation difference mapping table based on the business system identifier, corrects the base matching score, and obtains the corrected matching score perceived from the business perspective. This achieves standardized representation of multimodal data in a unified semantic space. At the same time, the perspective correction network explicitly models the understanding bias of different business systems for the same semantic concept, enabling the same data to obtain differentiated semantic representations under different business perspectives, providing a more accurate semantic foundation for cross-business collaborative analysis. This invention analyzes the business rule documents and business process documents of various business systems, extracts the local semantic concept system, compares and analyzes it with cross-business atomic event libraries and entity libraries, identifies different ways of expressing the same atomic event or entity in various business systems, constructs a representation difference mapping table, and further, introduces a manual review and feedback mechanism to evaluate the effect of the understanding correction in real time. The understanding correction in the representation difference mapping table is dynamically updated based on the average correction effect and average gradient, forming a closed-loop optimization. This realizes the transformation of the unique business knowledge of business systems into computable structured information, provides knowledge guidance for perspective correction, and the dynamic update mechanism enables the understanding correction to be continuously optimized with business practice and rule adjustments. The system becomes more accurate with use, avoiding the defect of traditional models that cannot adapt to business changes once deployed, and realizing continuous iterative optimization of the model. This invention identifies key paths connecting nodes of different business systems after constructing a cross-business semantic association graph, marks nodes on key paths as high-weight nodes, and assigns higher aggregation weights to high-weight nodes during graph convolution. This makes the final global feature representation more focused on the key semantic information connecting different business systems. The key path identification mechanism enables the model to automatically discover the most critical semantic chains in cross-business collaborative analysis, avoids interference from irrelevant or weakly related information, and improves the accuracy and interpretability of collaborative analysis. This invention avoids misunderstandings that may arise from a single parser by sending preliminary parsing results to relevant business systems for confirmation during the parsing of collaborative analysis tasks. Based on the degree of recognition returned by each business system, the candidate target types are reordered, and the type with the highest ranking after fusion is selected as the final target type. This ensures that the target type of the collaborative analysis task conforms to the actual understanding of each business system. Attached Figure Description

[0017] Figure 1 This is a schematic diagram of the method flow of the present invention; Figure 2 This is a schematic diagram of the module flow of the present invention. Detailed Implementation

[0018] The technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of the present invention, and not all embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of the present invention.

[0019] Please see Figure 1 As shown, this invention is a cross-business data collaborative analysis method based on multimodal AI, comprising the following steps: S1: Obtain the first multimodal data from the first business system and the second multimodal data from the second business system; S2: Construct a cross-business unified semantic space, which includes a predefined set of atomic event types and a set of entity types, and assign a prototype vector to each atomic event type and entity type; S3: Input the first multimodal data into the base matching network composed of a cross-modal interaction layer and a semantic decoupling layer to obtain the base matching vector, and obtain the business identifier of the business system to which the first multimodal data belongs. Query the view correction parameter set corresponding to the business identifier from the pre-built representation difference mapping table. Input the base matching vector and the view correction parameter set into the view correction network to obtain the corrected matching score vector, which is used as the first semantic representation vector corresponding to the first multimodal data. Perform the same operation on the second multimodal data to obtain the second semantic representation vector; S4: Receive the collaborative analysis task, and parse the collaborative analysis task to obtain the target event type and target entity type; S5: Based on the target event type and target entity type, retrieve the associated vectors from the first semantic representation vector set and the second semantic representation vector set respectively to obtain the first associated vector set and the second associated vector set; S6: Construct a cross-business semantic association graph using the semantic membership vectors of the first association vector set and the second association vector set as nodes, identify the shortest path connecting the first business system node and the second business system node in the association graph as the critical path, and mark the nodes on the critical path as high-weight nodes; input the association graph into a graph neural network containing multiple graph convolutional layers, and output the cross-business collaborative analysis results after feature differentiation and fusion of high-weight nodes and ordinary weight nodes.

[0020] Specifically, the first multimodal data and the second multimodal data respectively include multiple types of image data, text data, and time-series data.

[0021] In one embodiment of the present invention, step S2, constructing a cross-business unified semantic space, includes the following steps: Construct a cross-business atomic event library and an entity library. The atomic event library contains the smallest granularity business event types that co-occur across multiple business systems, and the entity library contains business object types that co-occur across multiple business systems. For each business system, obtain the system-specific business rule documents and business process documents, and use natural language processing technology to extract the system's local semantic concept system; The local semantic concept system of each business system is compared and analyzed with the cross-business atomic event library and entity library to identify different ways of expressing the same atomic event or entity in each business system and to construct a representation difference mapping table. Construct a semantic decoupling-recoupling encoder network, the network comprising: a modality-specific coding subnet, a service-specific semantic decoupling subnet, and a cross-service semantic recoupling subnet; Construct a first loss function, a second loss function, and a third loss function, and train the semantic decoupling-recoupling encoder network using the training dataset; After training, the output space of the cross-business semantic recoupling subnet is determined as the cross-business unified semantic space, and the prototype vectors of each atomic event type and entity type in the cross-business unified semantic space are determined.

[0022] Specifically, a predefined set of atomic event types that co-occur across multiple business systems. and entity type collection As a unified semantic anchor, for each business system, its business rule documents and business process documents are obtained. For example, for the production system: "Equipment Maintenance Procedures," "Fault Handling Procedures," and "Production Anomaly Management Specifications." Natural language processing technology is used to extract the system's unique local semantic concept system. Specifically, this includes: preprocessing the documents, including removing irrelevant characters, paragraph segmentation, and sentence segmentation; using a named entity recognition model to identify business objects in the documents, such as "equipment," "product," "customer," and "work order"; using an event extraction model to identify business events in the documents, such as "equipment failure," "customer complaint," and "cost overrun"; and using a relation extraction model to identify the semantic relationships between entities and events, such as "equipment failure" and "customer complaint." The identified entities, events, and their relationships are then organized into a local semantic concept system. Each concept This includes probability names, probability types, probability descriptions, and related relationships (in this embodiment, the definitions of atomic event types and entity types are pre-defined by domain experts in conjunction with the common characteristics of each business system to ensure that they cover the main business scenarios and have appropriate granularity). Local semantic concept systems of various business systems With cross-business atomic event library and entity library Comparative analysis was conducted to identify different ways of representing the same atomic event or entity in various business systems, and a representation difference mapping table was constructed. For each business system Local semantic concepts It encodes each atomic event type into a semantic vector using a pre-trained language model. and entity type It is also encoded as a semantic vector; the cosine similarity between local semantic concepts and atomic event types and entity types is calculated to establish matching relationships. A similarity threshold of 0.7 is set, and matching relationships higher than the threshold are considered candidates; "many-to-one" relationships are identified, that is, situations where multiple local semantic concepts match the same atomic event type or entity type. For example, "equipment shutdown" and "equipment abnormality" in the production system and "fault reporting" in the after-sales system all match the atomic event "equipment failure"; for each atomic event type... This study analyzes the differences in how different business systems describe the event, including differences in terminology, granularity, and perspective. Based on the matching results, a preliminary mapping table of representation differences is constructed to record the mapping relationship between each atomic event type and the local semantic concepts of each business system. For each business system... and each atomic event type The system collects labeled samples and inputs them into a basis matching network to obtain the basis matching score distribution. Statistical features such as mean, variance, skewness, and kurtosis are then extracted. This information is then used in the business system. Semantic parsing of business rule documents to extract relevant information. The relevant rule text fragments are input into a pre-trained language model to obtain the semantic vector of the business rules. Construct a correction mapping function, taking the distribution statistical features and rule semantic vectors as input, and calculate the understanding correction. The correction mapping function is determined by optimizing it on a validation set using a learnable neural network or a weighted fusion method; the calculated understanding correction is stored in a representation difference mapping table and mapped to the business system. and atomic event types Establish corresponding relationships to form a complete representation difference mapping table; during system operation, push samples with matching scores higher than the preset matching threshold to the manual review queue, collect manual review scores, construct a correction effect evaluation function, evaluate the effect of the current correction amount, and when the accumulated samples reach the preset sample number threshold, update the understanding correction amount according to the average correction effect and gradient information, and update the corresponding entries in the mapping table. The representation difference mapping table As shown in Table 1: Business System ID Atomic event type ID Atomic event type name Understanding the correction amount Confidence Last updated Local expression example PROD-01 EVT-001 Equipment failure +0.12 0.95 2024-03-15 Equipment downtime, equipment malfunction, production interruption PROD-01 EVT-003 Cost expenditure -0.05 0.88 2024-03-15 Material consumption, energy expenditure, and labor costs AF-02 EVT-001 Equipment failure -0.08 0.92 2024-03-15 Fault reporting, machine malfunction, equipment problems AF-02 EVT-002 Customer complaints +0.15 0.96 2024-03-14 User complaints, customer feedback, after-sales opinions FIN-03 EVT-003 Cost expenditure +0.20 0.97 2024-03-11 Expenses, costs incurred, and cash outflows PROD-01 EVT-004 Repair completed -0.03 0.85 2024-03-13 Maintenance completed, equipment restored, fault resolved AF-02 EVT-004 Repair completed +0.07 0.90 2024-03-15 Repair completed, customer confirmed, work order closed. Table 1 The system constructs a semantic decoupling-recoupling encoder network, which includes: a modality-specific encoding subnet, a service-specific semantic decoupling subnet, and a cross-service semantic recoupling subnet. The modality-specific encoding subnet is composed of an image encoder, a text encoder, and a temporal encoder connected in parallel. The image encoder uses a ResNet-50 structure to encode the input image into a 2048-dimensional image feature vector; the text encoder uses a BERT structure to encode the input text into a 768-dimensional text feature vector; and the temporal encoder uses a bidirectional LSTM structure to encode the input temporal data into a 512-dimensional temporal feature vector. A business-specific semantic decoupling subnet is constructed: The business-specific semantic decoupling subnet includes a feature fusion layer and a feature separation layer: The feature fusion layer concatenates image feature vectors, text feature vectors, and temporal feature vectors, inputs them to a fully connected layer for dimensionality reduction and fusion, and outputs multimodal fusion features; The feature separation layer includes a main branch and N business-specific branches, where N equals the total number of business systems. The main branch extracts cross-business shared features from the fusion features; Each business-specific branch includes a gating unit, which dynamically fuses shared features and original features according to the business identifier, and outputs the expression style features and cross-business common semantic content features of the business system; Constructing a cross-business semantic re-coupling subnet: The cross-business semantic re-coupling subnet includes a basis vector mapper that maps common semantic content features across businesses to a unified semantic space across businesses and outputs semantic membership vectors. The first loss function is constructed based on the feature distance between samples of the same atomic event or entity type in the target semantic space, and is used to narrow the distance between samples of the same type, specifically: ,in, This represents the total number of atomic event types and entity types. For belonging to the first The number of samples in each class For the first The one belonging to the first The semantic membership vector of a class's samples. For the first The prototype vector of the class; the second loss function is constructed based on the feature distances of samples of different atomic events or entity types in the target semantic space to push away heterogeneous samples, specifically: ,in, For the first The prototype vector of a class. The preset inter-class distance threshold is set to m=1.0 in this embodiment; the third loss function is constructed based on the difference between the second features of samples of the same atomic event or entity type, and is used to constrain the representation style features of samples of the same class to be as consistent as possible: ,in, and Belonging to the same first The business-specific expression style characteristics of different samples in the class; constructing the total loss function: ,in, , and The weighting coefficients are obtained through the analytic hierarchy process (AHP). In this embodiment, , and The corresponding values ​​are 0.5, 0.3, and 0.2, respectively; Prepare a training dataset containing multimodal data from multiple business systems and their corresponding atomic event types and entity type labels. Set the training parameters as follows: batch size = 64, initial learning rate = 0.001, Adam optimizer, and 100 training epochs. Forward propagation: input the training samples into the network, which then passes through a modality-specific encoding subnet, a business-specific semantic decoupling subnet, and a cross-business semantic recoupling subnet. Output semantic membership vectors and business-specific expression style features. Calculate the loss using the constructed loss function, calculate the gradient, and update the network parameters. After each training epoch, evaluate the model performance on the validation set, including classification accuracy and inter-class distance. After training, save the optimal model parameters. After training, the output space of the cross-business semantic recoupling subnet is defined as the cross-business unified semantic space. For each atomic event type, samples belonging to that category from all training samples are collected and input into the trained semantic decoupling-recoupling encoder network to obtain the corresponding semantic membership vector. The mean of the semantic membership vectors of all samples in each category is calculated as the prototype vector of that category. Store the prototype vectors of all categories into the prototype vector library for subsequent semantic representation generation and matching calculation.

[0023] In one embodiment of the present invention, step S3, the generation process of the first semantic representation vector and the second semantic representation vector, includes the following steps: The first multimodal data is input into a modality-specific coding subnet, and the multimodal fusion features are output. Obtain the business identifier of the business system to which the first multimodal data belongs. and the representation difference mapping table ; Construct a view-aware matcher, which includes a base matching network and A perspective correction network, in which This represents the total number of atomic event types and entity types. The multimodal fusion features Input to a basis matching network, output basis matching vectors ,in, For the first multimodal data and the second Base matching score for each atomic event type or entity type; From the representation difference mapping table Inquiry and Business Identifier Corresponding view correction parameter set ,in, For business systems For the The correction amount for understanding the type of an atomic event or entity; For each atomic event type or entity type , base matching score With viewpoint correction parameters The input is fed into the corresponding viewpoint correction network to calculate the corrected matching score. ,in, For the first A perspective-based network correction and For the first Each perspective corrects the weight coefficients determined through model training in the network; All The corrected matching scores are combined into a vector, which serves as the semantic membership vector corresponding to the first multimodal data. The second multimodal data is input into the modality-specific coding subnet and the viewpoint-aware matcher in the same way as the first multimodal data, and the corresponding semantic membership vector is output as the second semantic representation vector set.

[0024] Specifically, the first multimodal data of the first business system is acquired, including image data, text data, and time-series data. The first multimodal data is input into a modality-specific encoding subnet and encoded separately: In this embodiment, ResNet-50 is used as the image encoder, and the last fully connected layer is removed to obtain a 2048-dimensional image feature vector; In this embodiment, a pre-trained BERT model is used as the text encoder, and the output at the [CLS] position is taken as the text feature to obtain a 768-dimensional text feature vector; In this embodiment, a two-layer bidirectional LSTM is used, and the hidden state at the last time step is taken as the time-series feature to obtain a 512-dimensional time-series feature vector; The feature vectors of the three modalities are concatenated and fused to obtain multimodal fusion features. To reduce dimensionality, the concatenated features are input into a fully connected layer for dimensionality reduction to obtain 256-dimensional fusion features; In this embodiment, the first business system is a production system, with business identifier B = "PROD_01", derived from a pre-built representation difference mapping table. In the middle, read and business identifier The corresponding viewpoint correction parameter set; Build There are three parallel viewpoint correction networks, each corresponding to an atomic event type or entity type; with the first... Taking a viewpoint correction network as an example, the viewpoint correction network is a single-layer fully connected network with a 2D vector as input. The output is the corrected matching score. ,in, and For the first In this embodiment, the weight coefficients determined through model training in the viewpoint correction network are used. It is 1.0. It is 0.5; The corrected matching score vector is used as the semantic membership vector corresponding to the first multimodal data. This vector has the following characteristics: each element , indicating that the sample belongs to the first The degree of atomic event types or entity types; the sum of the elements of the vector is not necessarily 1, and multiple labels are allowed (a sample may belong to multiple types at the same time); the vector dimension K is fixed and consistent with the total number of atomic event types and entity types; For the second multimodal data of the second business system (e.g., the after-sales system), perform the exact same steps described above; for all multimodal data of the first business system, repeat the above steps to obtain the first semantic representation vector set; similarly, for all multimodal data of the second business system, perform the same steps to obtain the second semantic representation vector set. The generated semantic representation vector set is stored in a vector database and an index is built for subsequent fast retrieval. In this embodiment, the FAISS library is used to build an IVF index, which divides the vectors into 100 cluster centers, with each center associated with 256 vectors, to achieve approximate nearest neighbor retrieval.

[0025] In one embodiment of the present invention, obtaining the business identifier of the business system to which the first multimodal data belongs and performing feature fusion includes the following steps: The image data in the first multimodal data is input into an image encoder based on a residual network, and the image feature vector is output. Text data is input into a pre-trained language model encoder based on bidirectional Transformer, and the output is a text feature vector; The time-series data is input into a bidirectional long short-term memory network encoder, which outputs a time-series feature vector. The image feature vector, text feature vector, and temporal feature vector are concatenated and then subjected to dimensionality reduction mapping through a fully connected layer to output the multimodal fusion feature. .

[0026] Specifically, In one embodiment of the present invention, the construction and training of the base matching network includes the following steps: Construct a base matching network, which consists of a cross-modal interaction layer and a semantic decoupling layer; The cross-modal interaction layer includes a multi-head attention mechanism for capturing multimodal fusion features. The interaction relationships between different modal feature dimensions are analyzed to output interactive enhancement features. The semantic decoupling layer includes Each feature selection subnetwork corresponds to an atomic event type or entity type and is used to enhance features from interactions. Select the feature dimension related to the type; Each feature selection subnetwork outputs a selection mask. ,in, for The dimension will and After element-wise multiplication, the result is fed into the fully connected layer of this sub-network to obtain the [number of elements]. Individual matching score ; The base matching network is trained using a training dataset. The training objectives include: First objective: Minimize the base matching loss function. Second objective: Minimize the mask sparsity loss. Third objective: Minimize mask diversity loss ;in, As an indicator variable, it takes a value of 1 when the training sample belongs to the i-th atomic event type or entity type, and 0 otherwise; To select the L1 norm of the mask, To select a mask and The number of dimensions that simultaneously have a value of 1.

[0027] Specifically, a base matching network is constructed to map multimodal fusion features to base matching vectors; the cross-modal interaction layer adopts a standard Transformer multi-head attention structure, with multimodal fusion features as input and interaction enhancement features as output; in the semantic decoupling layer, each feature selection sub-network generates a selection mask through learnable parameters. This embodiment uses the Gumbel-Softmax technique to achieve differentiable binary mask generation, ensuring differentiability during training while outputting a near-binary mask. and After element-wise multiplication, the input is fed into the fully connected layer of this sub-network to enhance the interactive features. and Element-by-element multiplication: This operation preserves the connection with the first... Class-related feature dimensions are filtered out, while irrelevant feature dimensions are masked out; the filtered features are then... The input is fed into a fully connected layer and activated by a sigmoid function to obtain the first... Individual matching score ,in, It is the sigmoid activation function. This is the weight matrix. For the first The bias terms of the fully connected layers in the feature selection subnetwork are scalars; for =1 to K, the above operations are performed in parallel to obtain K basis matching scores, which are then combined into a basis matching vector; Multimodal data samples were collected from multiple business systems and labeled with multiple tags by domain experts. Each sample may belong to multiple atomic event types or entity types simultaneously. The samples were divided into training, validation, and test sets. The total training loss function was constructed using the loss functions obtained from the first, second, and third objectives. ,in, and These are the weighting coefficients. It is 0.01. It is 0.005. The value is small because sparsity is used as an auxiliary constraint and should not be too strong to affect the main task. The smaller value is because the diversity constraint only works in multi-class scenarios, avoiding excessive enforcement of non-overlapping features that would lead to the loss of effective features. The Adam optimizer was used with an initial learning rate of 0.001, a batch size of 64, and a training duration of 100 epochs. After each training round, classification accuracy, mask sparsity, and mask overlap were evaluated on the validation set. An early stopping mechanism was employed, stopping training when the validation set loss no longer decreased for 10 consecutive epochs.

[0028] In one embodiment of the present invention, the understanding of the correction amount includes the following steps: The data collection system is marked as number 1 An initial sample set of atomic event types or entity types. Calculate the initial understanding correction amount Store in the representation difference mapping table; Read the understanding correction amount currently used by the business system from the representation difference mapping table. It also acquires newly added unlabeled data from the business system in real time, inputs the unlabeled data into the perspective perception matcher, obtains the corrected matching score, and pushes samples with matching scores ≥ preset matching thresholds to the manual review queue of the business system. When the business system returns the score for manual review of the pushed sample, the base matching score of the sample is recorded. Corrected score And manual review scores ; Construct a function to evaluate the effect of the correction ; When the accumulated number of samples reaches a preset threshold, the average correction effect is calculated. and the average gradient of the corrected score ; Understanding the correction amount based on average correction effect and average gradient update. ,in, The preset update step size; The updated Store it in the representation difference mapping table, replacing the original understanding correction amount.

[0029] Specifically, for each business system and each atomic event type, collect the data already labeled as the first atomic event in that business system. Initial sample set of atomic event types The initial samples are input into the trained basis matching network to obtain the basis matching score of each sample, and the mean basis matching score of the sample set is calculated. Simultaneously calculate the first [number] in all business systems Mean global basis matching score of class samples Calculate the initial understanding correction amount ; During system operation, the business system is read from the representation difference mapping table. Current understanding correction amount The system acquires newly added unlabeled data from business system B in real time, inputs the unlabeled data into the view perception matcher, and obtains the corrected matching score. For samples whose corrected matching score is greater than or equal to the preset matching threshold (obtained through historical data statistical analysis), they are pushed to the manual review queue of the business system and await scoring by reviewers. When the business system returns the score for manual review of the pushed sample, the base matching score of the sample is recorded. Corrected score And manual review scores Each piece of feedback data is stored in a temporary buffer for further processing; the manual review score indicates whether the reviewer considers the sample to belong to the [number missing] category. Confidence level of an atomic event type or entity type; For each sample that received a score from manual review, construct a function to evaluate the corrective effect. In the formula, This indicates that the corrected score is closer to the manually reviewed score than the baseline matching score, and the correction is effective. This indicates that the corrected score deviates from the manually reviewed score compared to the base match score, and the correction is invalid. This indicates that the difference between the score before and after the correction and the score after manual review is the same; When the accumulated number of samples reaches a preset threshold (the preset threshold is the minimum number of samples required to trigger a correction update, and is pre-set based on the historical data volume and update frequency requirements of the business system, with a value ranging from 50 to 500), the average correction effect is calculated. and the average gradient of the corrected score ,in, , ,in, For the first The correction effect value for each sample, For the first The corrected score of each sample For the first The manual review score of each sample. Indicates the first The absolute difference between the corrected score and the manually reviewed score for each sample is crucial for understanding the correction amount. The partial derivatives; in, The preset update step size, the update step size The hyperparameter for controlling the adjustment range of the correction amount is preset according to the sensitivity of the business system to changes in the correction amount, with a value ranging from 0.01 to 0.1; when When this time, it indicates that the current correction is valid and should be maintained or slightly adjusted positively, updating the formula. (Negative numbers) lead to Increase; when When this happens, it indicates that the current correction is ineffective and needs to be adjusted in the opposite direction, updating the formula. (Positive number) leads to Decrease; when When the distance between the score before and after correction and the score after manual review is the same, the correction amount has no significant impact, and the update magnitude is determined only by the gradient term. .

[0030] The updated Store it in the difference mapping table, replacing the original understanding correction amount; In this embodiment, the correction amount is understood. In the dynamic update phase, assuming the production system responds to atomic events like 'equipment failure'... The initial baseline matching score was 0.8, and the actual severity score obtained by manual review was 0.9. The correction effect for a single sample was calculated according to the formula. Because the corrected score deviated from the human expectation, the system calculated a negative average gradient. Substitute into the update formula Set update step size With a value of 0.05, the system can automatically fine-tune the correction parameters from the production system's perspective within a manual review cycle (e.g., after accumulating 50 samples), ensuring that the collaborative analysis model's weight assessment of the 'equipment failure' event converges with the actual business logic.

[0031] In one embodiment of the present invention, step S4 includes the following steps: The collaborative analysis task is input into the task parser, which outputs preliminary analysis results, which include a set of candidate target event types and a set of candidate target entity types. The preliminary analysis results are sent to the first business system and the second business system respectively. Receive a first confirmation result returned by the first business system and a second confirmation result returned by the second business system. The first confirmation result and the second confirmation result respectively include the score of each business system’s degree of recognition of the candidate target event type and the candidate target entity type. The first confirmation result and the second confirmation result are merged, and the candidate target event types and candidate target entity types are reordered based on the score of the degree of recognition after fusion. The top N event types after reordering are selected as target event types, and the top M entity types after reordering are selected as target entity types.

[0032] Specifically, the system receives collaborative analysis tasks initiated by users, such as "analyze the impact of recent equipment failures on customer complaints," and inputs the collaborative analysis task text into a task parser. The task parser consists of a pre-trained language model (e.g., BERT) and an intent classifier. The language model encodes the task text into task semantic vectors, and the intent classifier maps the task semantic vectors to an atomic event type space and an entity type space, outputting the probability that each atomic event type belongs to the target event type and the probability that each entity type belongs to the target entity type. A probability threshold is set (in this embodiment, it is set to 0.3, obtained by analyzing historical data), and atomic event types with a probability value ≥ the probability threshold are selected as the candidate target event type set E. candidate Entity types with probability values ​​greater than or equal to a probability threshold are selected as the candidate target entity type set O. candidate In this embodiment, the preliminary analysis results are as follows: Candidate target event type set: E candidate ={"Equipment Failure" (0.85), "Customer Complaint" (0.62), "Repair Completed" (0.28), "Cost Expenditure" (0.15)}, where the values ​​in parentheses are probability values; Candidate Target Entity Type Set: O candidat= {"Equipment" (0.78), "Customer" (0.65), "Work Order" (0.32), "Product" (0.12)}; The preliminary analysis results are sent to the first business system (production system) and the second business system (after-sales system), respectively, requesting each business system to confirm the candidate target event type and candidate target entity type based on local business knowledge. The content sent includes: a list of candidate target event types and their preliminary probabilities, a list of candidate target entity types and their preliminary probabilities, and a confirmation request description: Please have each business system score the degree of agreement for each candidate type based on local business practices. The score range is 0 to 1, where 1 indicates that the type should be the target type and 0 indicates that the type should not be the target type. Receive the first confirmation result from the first business system (production system), including the production system's score for the degree of recognition of each candidate type: Production system confirmation result: Event type recognition: "Equipment failure" (0.95), "Customer complaint" (0.30), "Repair completed" (0.85), "Cost expenditure" (0.60); Entity type recognition: "Equipment" (0.98), "Customer" (0.25), "Work order" (0.90), "Product" (0.40); Receive the second confirmation result from the second business system (after-sales system), including the after-sales system's degree of recognition of each candidate type. Scoring: Event type recognition: "Equipment failure" (0.80), "Customer complaint" (0.95), "Repair completed" (0.40), "Cost expenditure" (0.20); Entity type recognition: "Equipment" (0.70), "Customer" (0.96), "Work order" (0.50), "Product" (0.30); The preliminary probability of the task parser is integrated with the recognition scores of each business system. This embodiment adopts a weighted integration method, with the following weight allocation: task parser weight 0.3, production system weight 0.35, after-sales system weight 0.35, for each candidate event type. The candidate types are weighted and summed to obtain a fusion score. Based on the fusion score, the candidate types are reordered, and the number of target types to be selected is set: the top N=2 event types and the top M=2 entity types are selected to obtain the final target event types and target entity types. This embodiment also includes the following special case handling mechanisms: Case 1: If a business system does not return a confirmation result, and a business system does not return a confirmation result within a specified time (e.g., 24 hours), the historical average acceptance rate of that business system will be used as the default value, or the weight of that business system will be redistributed to other confirmers; Case 2: There is a serious disagreement between business systems. If the difference in acceptance rates of two business systems for the same candidate type exceeds a threshold, a disagreement handling mechanism will be triggered: increase the weight of the task parser, decrease the weight of the business system, or mark the type as pending and subject to manual arbitration, etc.

[0033] In one embodiment of the present invention, step S5 includes the following steps: Convert the target event type and target entity type into a target indication vector; A hierarchical retrieval index is constructed, which is organized according to the category hierarchy of atomic event type and entity type. The first layer is a coarse-grained category index, and the second layer is a fine-grained atomic event type and entity type index. Based on the category hierarchy of the target event type and the target entity type, after locating the corresponding category cluster in the coarse-grained category index, the semantic membership vector most similar to the target indicator vector is retrieved in the fine-grained index corresponding to the category cluster. The first candidate vector set is obtained by retrieving from the first semantic representation vector set, and the second candidate vector set is obtained by retrieving from the second semantic representation vector set; Calculate the similarity between each vector in the first candidate vector set and the target indicator vector, and select the top K vectors with the highest similarity as the first associated vector set; The same process is applied to the second candidate vector set to obtain the second associated vector set.

[0034] Specifically, the determined target event type and target entity type are converted into target indication vectors. For the first A number of atomic event types or entity types, if the type belongs to the target set, then =1; otherwise =0; A hierarchical retrieval index is pre-constructed for the first semantic representation vector set and the second semantic representation vector set. Atomic event types and entity types are divided into several coarse-grained category clusters according to business semantics. Each coarse-grained cluster contains multiple semantically similar fine-grained types. The first layer is the coarse-grained cluster index, and the second layer is the vector index within each fine-grained type. In this embodiment, 20 fine-grained types are divided into 5 coarse-grained clusters, such as equipment-related clusters, customer-related clusters, cost-related clusters, etc. According to the target indication vector The dimension with a value of 1 determines the coarse-grained cluster to which these fine-grained types belong; In the fine-grained indexes of the device-related cluster and the customer-related cluster, retrieve the target indicator vector respectively. The most similar semantic representation vector, each cluster returns the most similar semantic representation vector. Most similar In this embodiment, N=200 vectors are used to merge the search results of the two clusters to obtain the first candidate vector set Candidate1, which contains 400 vectors. The same hierarchical search operation is performed on the second semantic representation vector set to obtain the second candidate vector set Candidate2. Calculate the cosine similarity between each vector in the first candidate vector set Candidate1 and the target indicator vector Q. Sort the vectors by similarity from high to low and select the top K vectors with the highest similarity as the first associated vector set R1. Perform the same operation on the first candidate vector set Candidate2 to obtain the second associated vector set R2. The first and second associated vector sets are used as outputs and passed to step S6 for collaborative analysis. The hierarchical retrieval mechanism in this embodiment limits the retrieval scope to semantically related category clusters through coarse-grained cluster filtering, avoiding retrieval in the entire vector space. This improves retrieval efficiency while ensuring the semantic relevance of the retrieval results.

[0035] In one embodiment of the present invention, step S6 includes the following steps: Use each semantic membership vector in the first and second association vector sets as a node to construct the initial graph structure; The business logic relationships between atomic event types and entity types are obtained from the business knowledge graph. Edges are added between related nodes according to the business logic relationships to obtain a cross-business semantic association graph. Identify the key path in the cross-business semantic association graph, where the key path is the shortest path connecting the first business system node and the second business system node; Nodes on the critical path are marked as high-weight nodes, and the remaining nodes are marked as ordinary-weight nodes. The cross-business semantic association graph is input into a graph neural network, which contains multiple graph convolutional layers. During graph convolution, the feature update frequency of high-weight nodes is higher than that of ordinary weight nodes. After multi-layer graph convolution, the final feature representation of all nodes is obtained. The feature representation of high-weight nodes is distinguished and fused with the feature representation of ordinary-weight nodes to obtain the global feature representation. The global feature representation is input into the classifier, and the cross-business collaborative analysis results are output.

[0036] Specifically, each semantic membership vector in the first and second association vector sets is used as a graph node. Each node carries the following information: node features: semantic membership vector; node attributes: the business system to which it belongs; and node label: the corresponding atomic event type or entity type. An initial graph structure is then constructed. ,in, For a set of nodes, Initially, it is an empty set; the business logic relationship between each atomic event type and entity type is obtained from the pre-built business knowledge graph. In this embodiment, the construction process of the business knowledge graph is as follows: using a pre-trained relation extraction model, triplet data is extracted from the "Equipment Maintenance Procedures" of the production system, and the edge association of the graph structure is initialized accordingly. The business knowledge graph contains the following types of relationships: causal relationships ("equipment failure" leads to "customer complaint"), association relationships ("equipment failure" is associated with "equipment"), temporal relationships ("repair completed" occurs after "equipment failure"), and compositional relationships ("product" contains "equipment"); for each pair of nodes in the graph... and Query whether the corresponding atomic event type or entity type has a direct relationship in the business knowledge graph. If it does, then... and Add an edge between The edge type corresponds to the business logic relationship, resulting in a cross-business semantic association graph. ; In cross-business semantic relationship graph In this process, the critical path connecting the first business system node and the second business system node is identified. The critical path is defined as: the starting point (any first business system node), the ending point (any second business system node), the path length (the number of edges traversed by the path), and the critical path (one or more of the shortest paths among all paths connecting the first business system node and the second business system node). In this embodiment, a breadth-first search algorithm is used to calculate the shortest path from all starting points to the ending points. Cross-business semantic relationship graph The input is fed into a graph neural network, which contains three graph convolutional layers. The standard graph convolutional network (GCN) is used for node feature updates. During the graph convolution process, high-weight nodes participate in feature updates in every layer, while ordinary-weight nodes only participate in feature updates in the first and third layers. The features remain unchanged in the second layer. After three layers of graph convolution, the final feature representation of all nodes is obtained. The feature representations of high-weight nodes are weighted and fused with those of ordinary-weight nodes to obtain the global feature representation. ,in, The mean of the features of high-weight nodes. The average of features of ordinary weighted nodes; representing the global features The input is fed into a classifier consisting of a fully connected layer and a softmax layer, and the output is a cross-business collaborative analysis result. In this embodiment, the output is the probability of three categories: "no impact", "slight impact" and "significant impact", and the final judgment is "significant impact". The output includes interpretable information such as critical path visualization and node importance ranking to help business personnel understand the basis of the analysis results.

[0037] Please see Figure 2 As shown, this invention is a cross-business data collaborative analysis system based on multimodal AI, comprising the following modules: Data acquisition module: used to acquire the first multimodal data of the first business system and the second multimodal data of the second business system; Semantic space construction module: used to construct a cross-business unified semantic space, which includes a predefined set of atomic event types and a set of entity types, and assigns a prototype vector to each atomic event type and entity type; Semantic representation generation module: It is used to input the first multimodal data into the base matching network composed of a cross-modal interaction layer and a semantic decoupling layer to obtain the base matching vector, obtain the business identifier of the business system to which the first multimodal data belongs, query the view correction parameter set corresponding to the business identifier from the pre-built representation difference mapping table, input the base matching vector and the view correction parameter set into the view correction network to obtain the corrected matching score vector, which is used as the first semantic representation vector corresponding to the first multimodal data; Perform the same operation on the second multimodal data to obtain the second semantic representation vector; Task parsing module: used to receive collaborative analysis tasks and parse the collaborative analysis tasks to obtain the target event type and target entity type; The association retrieval module is used to retrieve association vectors from the first semantic representation vector set and the second semantic representation vector set respectively according to the target event type and the target entity type, so as to obtain the first association vector set and the second association vector set. Collaborative Analysis Module: This module constructs a cross-business semantic association graph by using the semantic membership vectors in the first and second association vector sets as nodes. It identifies the shortest path connecting the nodes of the first and second business systems in the association graph as the critical path and marks the nodes on the critical path as high-weight nodes. The association graph is then input into a graph neural network containing multiple graph convolutional layers. After feature differentiation and fusion between high-weight nodes and ordinary weight nodes, the module outputs the cross-business collaborative analysis results.

[0038] The above embodiments are only used to illustrate the technical methods of the present invention and are not intended to limit it. Although the present invention has been described in detail with reference to preferred embodiments, those skilled in the art should understand that modifications or equivalent substitutions can be made to the technical methods of the present invention without departing from the spirit and scope of the technical methods of the present invention.

Claims

1. A cross-business data collaborative analysis method based on multimodal AI, characterized in that, Includes the following steps: S1: Obtain the first multimodal data from the first business system and the second multimodal data from the second business system; S2: Construct a cross-business unified semantic space, which includes a predefined set of atomic event types and a set of entity types, and assign a prototype vector to each atomic event type and entity type; S3: Input the first multimodal data into the base matching network composed of a cross-modal interaction layer and a semantic decoupling layer to obtain the base matching vector, and obtain the business identifier of the business system to which the first multimodal data belongs. Query the view correction parameter set corresponding to the business identifier from the pre-built representation difference mapping table. Input the base matching vector and the view correction parameter set into the view correction network to obtain the corrected matching score vector, which is used as the first semantic representation vector corresponding to the first multimodal data. Perform the same operation on the second multimodal data to obtain the second semantic representation vector; S4: Receive the collaborative analysis task, and parse the collaborative analysis task to obtain the target event type and target entity type; S5: Based on the target event type and target entity type, retrieve the associated vectors from the first semantic representation vector set and the second semantic representation vector set respectively to obtain the first associated vector set and the second associated vector set; S6: Construct a cross-business semantic association graph using the semantic membership vectors of the first association vector set and the second association vector set as nodes, identify the shortest path connecting the first business system node and the second business system node in the association graph as the critical path, and mark the nodes on the critical path as high-weight nodes; input the association graph into a graph neural network containing multiple graph convolutional layers, and output the cross-business collaborative analysis results after feature differentiation and fusion of high-weight nodes and ordinary weight nodes.

2. The cross-business data collaborative analysis method based on multimodal AI according to claim 1, characterized in that, In step S2, constructing a unified semantic space across businesses includes the following steps: Construct a cross-business atomic event library and an entity library. The atomic event library contains the smallest granularity business event types that co-occur across multiple business systems, and the entity library contains business object types that co-occur across multiple business systems. For each business system, obtain the system-specific business rule documents and business process documents, and use natural language processing technology to extract the system's local semantic concept system; The local semantic concept system of each business system is compared and analyzed with the cross-business atomic event library and entity library to identify different ways of expressing the same atomic event or entity in each business system and to construct a representation difference mapping table. Construct a semantic decoupling-recoupling encoder network, the network comprising: a modality-specific coding subnet, a service-specific semantic decoupling subnet, and a cross-service semantic recoupling subnet; Construct a first loss function, a second loss function, and a third loss function, and train the semantic decoupling-recoupling encoder network using the training dataset; After training, the output space of the cross-business semantic recoupling subnet is determined as the cross-business unified semantic space, and the prototype vectors of each atomic event type and entity type in the cross-business unified semantic space are determined.

3. The cross-business data collaborative analysis method based on multimodal AI according to claim 1, characterized in that, In step S3, the generation process of the first semantic representation vector and the second semantic representation vector includes the following steps: The first multimodal data is input into a modality-specific coding subnet, and the multimodal fusion features are output. Obtain the business identifier of the business system to which the first multimodal data belongs. and the representation difference mapping table ; Construct a view-aware matcher, which includes a base matching network and A perspective correction network, in which This represents the total number of atomic event types and entity types. The multimodal fusion features Input to a basis matching network, output basis matching vectors ,in, For the first multimodal data and the second Base matching score for each atomic event type or entity type; From the representation difference mapping table Inquiry and Business Identifier Corresponding view correction parameter set ,in, For business systems For the The correction amount for understanding the type of an atomic event or entity; For each atomic event type or entity type , base matching score With viewpoint correction parameters The input is fed into the corresponding viewpoint correction network to calculate the corrected matching score. ,in, For the first A perspective-based network correction and For the first Each perspective corrects the weight coefficients determined through model training in the network; All The corrected matching scores are combined into a vector, which serves as the semantic membership vector corresponding to the first multimodal data. The second multimodal data is input into the modality-specific coding subnet and the viewpoint-aware matcher in the same way as the first multimodal data, and the corresponding semantic membership vector is output as the second semantic representation vector set.

4. The cross-business data collaborative analysis method based on multimodal AI according to claim 3, characterized in that, The process of obtaining the business identifier of the business system to which the first multimodal data belongs and performing feature fusion includes the following steps: The image data in the first multimodal data is input into an image encoder based on a residual network, and the image feature vector is output. Text data is input into a pre-trained language model encoder based on bidirectional Transformer, and the output is a text feature vector; The time-series data is input into a bidirectional long short-term memory network encoder, which outputs a time-series feature vector. The image feature vector, text feature vector, and temporal feature vector are concatenated and then subjected to dimensionality reduction mapping through a fully connected layer to output the multimodal fusion feature. .

5. The cross-business data collaborative analysis method based on multimodal AI according to claim 3, characterized in that, The construction and training of the base matching network includes the following steps: Construct a base matching network, which consists of a cross-modal interaction layer and a semantic decoupling layer; The cross-modal interaction layer includes a multi-head attention mechanism for capturing multimodal fusion features. The interaction relationships between different modal feature dimensions are analyzed to output interactive enhancement features. The semantic decoupling layer includes Each feature selection subnetwork corresponds to an atomic event type or entity type and is used to enhance features from interactions. Select the feature dimension related to the type; Each feature selection subnetwork outputs a selection mask. ,in, for The dimension will and After element-wise multiplication, the result is fed into the fully connected layer of this sub-network to obtain the [number of elements]. Individual matching score ; The base matching network is trained using a training dataset. The training objectives include: First objective: Minimize the base matching loss function. Second objective: Minimize the mask sparsity loss. Third objective: Minimize mask diversity loss ;in, As an indicator variable, when the training sample belongs to the first... The value is 1 when it is an atomic event type or entity type, and 0 otherwise. To select a mask L1 norm, To select a mask and The number of dimensions that simultaneously have a value of 1.

6. The cross-business data collaborative analysis method based on multimodal AI according to claim 3, characterized in that, The understanding correction amount includes the following steps: The data collection system is marked as number 1 An initial sample set of atomic event types or entity types. Calculate the initial understanding correction amount Store in the representation difference mapping table; Read the understanding correction amount currently used by the business system from the representation difference mapping table. It also acquires newly added unlabeled data from the business system in real time, inputs the unlabeled data into the perspective perception matcher, obtains the corrected matching score, and pushes samples with matching scores ≥ preset matching thresholds to the manual review queue of the business system. When the business system returns the score for manual review of the pushed sample, the base matching score of the sample is recorded. Corrected score And manual review scores ; Construct a function to evaluate the effect of the correction ; When the accumulated number of samples reaches a preset threshold, the average correction effect is calculated. and the average gradient of the corrected score ; The correction amount is understood based on the average correction effect and the average gradient update, where, The preset update step size; The updated Store it in the representation difference mapping table, replacing the original understanding correction amount.

7. The cross-business data collaborative analysis method based on multimodal AI according to claim 1, characterized in that, Step S4 includes the following steps: The collaborative analysis task is input into the task parser, which outputs preliminary analysis results, which include a set of candidate target event types and a set of candidate target entity types. The preliminary analysis results are sent to the first business system and the second business system respectively. Receive a first confirmation result returned by the first business system and a second confirmation result returned by the second business system. The first confirmation result and the second confirmation result respectively include the score of each business system’s degree of recognition of the candidate target event type and the candidate target entity type. The first confirmation result and the second confirmation result are merged, and the candidate target event types and candidate target entity types are reordered based on the score of the degree of recognition after fusion. The top N event types after reordering are selected as target event types, and the top M entity types after reordering are selected as target entity types.

8. The cross-business data collaborative analysis method based on multimodal AI according to claim 1, characterized in that, Step S5 includes the following steps: Convert the target event type and target entity type into a target indication vector; A hierarchical retrieval index is constructed, which is organized according to the category hierarchy of atomic event type and entity type. The first layer is a coarse-grained category index, and the second layer is a fine-grained atomic event type and entity type index. Based on the category hierarchy of the target event type and the target entity type, after locating the corresponding category cluster in the coarse-grained category index, the semantic membership vector most similar to the target indicator vector is retrieved in the fine-grained index corresponding to the category cluster. The first candidate vector set is obtained by retrieving from the first semantic representation vector set, and the second candidate vector set is obtained by retrieving from the second semantic representation vector set; Calculate the similarity between each vector in the first candidate vector set and the target indicator vector, and select the top K vectors with the highest similarity as the first associated vector set; The same process is applied to the second candidate vector set to obtain the second associated vector set.

9. The cross-business data collaborative analysis method based on multimodal AI according to claim 1, characterized in that, Step S6 includes the following steps: Use each semantic membership vector in the first and second association vector sets as a node to construct the initial graph structure; The business logic relationships between atomic event types and entity types are obtained from the business knowledge graph. Edges are added between related nodes according to the business logic relationships to obtain a cross-business semantic association graph. Identify the key path in the cross-business semantic association graph, where the key path is the shortest path connecting the first business system node and the second business system node; Nodes on the critical path are marked as high-weight nodes, and the remaining nodes are marked as ordinary-weight nodes. The cross-business semantic association graph is input into a graph neural network, which contains multiple graph convolutional layers. During graph convolution, the feature update frequency of high-weight nodes is higher than that of ordinary weight nodes. After multi-layer graph convolution, the final feature representation of all nodes is obtained. The feature representation of high-weight nodes is distinguished and fused with the feature representation of ordinary-weight nodes to obtain the global feature representation. The global feature representation is input into the classifier, and the cross-business collaborative analysis results are output.

10. A cross-business data collaborative analysis system based on multimodal AI, used to implement the cross-business data collaborative analysis method based on multimodal AI as described in any one of claims 1-9, characterized in that, Includes the following modules: Data acquisition module: used to acquire the first multimodal data of the first business system and the second multimodal data of the second business system; Semantic space construction module: used to construct a cross-business unified semantic space, which includes a predefined set of atomic event types and a set of entity types, and assigns a prototype vector to each atomic event type and entity type; Semantic representation generation module: It is used to input the first multimodal data into the base matching network composed of a cross-modal interaction layer and a semantic decoupling layer to obtain the base matching vector, obtain the business identifier of the business system to which the first multimodal data belongs, query the view correction parameter set corresponding to the business identifier from the pre-built representation difference mapping table, input the base matching vector and the view correction parameter set into the view correction network to obtain the corrected matching score vector, which is used as the first semantic representation vector corresponding to the first multimodal data; Perform the same operation on the second multimodal data to obtain the second semantic representation vector; Task parsing module: used to receive collaborative analysis tasks and parse the collaborative analysis tasks to obtain the target event type and target entity type; The association retrieval module is used to retrieve association vectors from the first semantic representation vector set and the second semantic representation vector set respectively according to the target event type and the target entity type, so as to obtain the first association vector set and the second association vector set. Collaborative Analysis Module: This module constructs a cross-business semantic association graph by using the semantic membership vectors in the first and second association vector sets as nodes. It identifies the shortest path connecting the nodes of the first and second business systems in the association graph as the critical path and marks the nodes on the critical path as high-weight nodes. The association graph is then input into a graph neural network containing multiple graph convolutional layers. After feature differentiation and fusion between high-weight nodes and ordinary weight nodes, the module outputs the cross-business collaborative analysis results.