Multi-source heterogeneous data fusion processing method and system
By constructing a dynamic meta-path and modal feature classification processing based on a graph database structure, the problem of insufficient utilization of correlation information in the fusion of multi-source heterogeneous data is solved, achieving efficient data fusion and resource utilization, and improving the accuracy and efficiency of data analysis.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- SHENZHEN TIANDUN DATA TECHNOLOGY CO LTD
- Filing Date
- 2025-08-20
- Publication Date
- 2026-06-16
AI Technical Summary
Traditional methods lack a unified correlation framework in the fusion processing of multi-source heterogeneous data, making it difficult to effectively capture the correlation information between different modal data. Furthermore, they waste computational resources or have low processing efficiency. When feature fusion, they ignore the weight differences between modal features, which affects the data analysis results.
A dynamic metapath is constructed, which incorporates multimodal data nodes and heterogeneous nodes into a unified association framework of the graph database structure. The interaction data between nodes is monitored in real time and dynamically updated. The nodes are classified according to modal attribute identifiers and matched with the processing rules of the corresponding processing units to generate modal feature vectors. The fusion weight is calculated based on the node association strength and the performance of the processing units.
It effectively captures the correlation between multi-source heterogeneous data, improves the data fusion effect and processing efficiency, rationally allocates computing resources, and improves the accuracy and efficiency of data analysis.
Smart Images

Figure CN121009504B_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the field of data processing technology, specifically to a method and system for multi-source heterogeneous data fusion processing. Background Technology
[0002] In today's era of rapid information development, data has become a core resource driving social progress and economic development. With the widespread application of technologies such as the Internet, the Internet of Things, and social media, the speed and scale of data generation have exploded. Moreover, this data often exists in a multi-source heterogeneous form, including multiple modalities such as text, images, and audio, as well as different types of heterogeneous nodes such as users, products, and tags. How to effectively integrate these multi-source heterogeneous data and explore their potential value has become a major challenge facing current data processing technologies.
[0003] However, traditional methods have significant shortcomings in the fusion processing of multi-source heterogeneous data. First, when processing multimodal data, traditional methods lack a unified correlation framework, making it difficult to effectively capture and utilize the correlation information between different modalities. When processing mixed text, image, and audio data, traditional methods process each modality independently, ignoring their inherent connections, thus limiting the effectiveness of data fusion. Second, traditional methods struggle to dynamically adjust computational resources based on the modal characteristics and processing requirements of the data. When processing large-scale multi-source heterogeneous data, this often leads to wasted computational resources or low processing efficiency. Furthermore, traditional methods often employ simple weighted averaging or concatenation strategies in the feature fusion stage, ignoring the weight differences between different modal features. This approach can easily result in the fused feature vector failing to accurately reflect the true characteristics of the original data, thereby affecting the subsequent analysis results. Summary of the Invention
[0004] The purpose of this invention is to overcome the shortcomings of existing technologies and provide a method and system for multi-source heterogeneous data fusion processing. This invention collects multimodal data nodes (text, images, audio) and heterogeneous nodes (users, products, tags), performs attribute analysis and standardization processing, constructs meta-paths based on the initial association relationships between nodes, and incorporates all nodes and meta-paths into a unified association framework of a graph database structure. At the same time, it monitors the dynamic updates of meta-paths based on the interaction data between nodes in real time, including expanding and adjusting the structure, and adding or deleting nodes and edges. It can also capture cross-type node association features in the meta-path expansion, forming an association feature set with modality identification, effectively capturing the association relationships between different data sources, transforming the association relationships into valuable feature information, fully exploring the potential value of multi-source heterogeneous data, and solving the problem that traditional methods cannot effectively utilize multimodal data association information.
[0005] To solve the above-mentioned technical problems, the present invention provides the following technical solution: On the one hand, a method for multi-source heterogeneous data fusion processing, the specific steps of which are as follows:
[0006] Constructing dynamic meta-paths: Collect multimodal data nodes and heterogeneous nodes, perform attribute analysis and standardization processing, construct meta-paths, incorporate all nodes and meta-paths into a unified association framework using a graph database structure, and dynamically update meta-paths by monitoring new interaction data between nodes in real time, while simultaneously mining cross-node association features to form an association feature set with modality labels.
[0007] Allocation of computing power mapping channels: Based on the associated feature set, the associated features are classified according to the modal attribute identifier, and the processing rules of the corresponding processing unit are matched for different categories of associated features;
[0008] Generate modal feature vectors: Based on the determined processing units and processing rules, perform targeted processing on various related features to generate feature vectors of different modalities, and record the metadata of different modal feature vectors and the performance data of the corresponding processing units.
[0009] Determine the fusion weight allocation: Based on the initial association relationship between nodes, extract the tightness of node connection, quantify it into the association strength of meta-path and normalize it, and calculate the performance score in combination with the performance data of the processing unit. Then, based on the association strength and performance score, calculate the fusion weight of feature vectors of different modalities.
[0010] Weighted fusion feature vector: Based on different modal feature vectors and fusion weights, the feature vectors of different modalities are preprocessed and unified into the same dimensional space. The preprocessed feature vectors are then weighted to obtain the preliminary fused feature vector. The preliminary fused feature vector is then processed to complete the data fusion.
[0011] Furthermore, in constructing the dynamic metapath, multimodal data nodes and heterogeneous nodes are collected. Multimodal data nodes include text, images, and audio, while heterogeneous nodes include users, products, and tags. Attribute analysis and standardization are performed on these nodes, and metapaths are constructed based on the initial relationships between nodes, with each node as a vertex and the initial relationships between nodes as edges. All nodes and metapaths are then incorporated into a unified association framework using a graph database structure. The metapaths are dynamically updated by monitoring the interaction data between nodes in real time, including expanding the metapaths, adjusting their structure, and adding or deleting nodes from the paths. It captures the association features between nodes of different types during the meta-path expansion process, and identifies the modal attributes of the associated data of each node through a preset modal recognition rule base. Each associated feature is assigned a unique modal identifier to form an associated feature set with modal identifiers. The modal recognition rule base contains feature parameters and judgment logic for different modal data. For text-type associated features, there are judgment rules for character encoding range and word density threshold. For image-type associated features, there are recognition parameters for pixel matrix dimension range and color space features. For audio-type associated features, there are judgment rules for sampling rate range and waveform amplitude threshold.
[0012] Furthermore, in the allocated computing power mapping channel, based on the associated feature set, the associated features are classified according to modal attribute identifiers. The categories include text, image, and audio. For the associated features of different categories, the processing rules of the corresponding processing units are matched respectively. Specifically, the associated features of the text category are matched with the word vector conversion rules of the CPU processing unit, the associated features of the image category are matched with the convolutional feature extraction technology of the GPU processing unit, and the associated features of the audio category are matched with the spectral feature mapping table of the audio processing unit.
[0013] Furthermore, in the generation of modal feature vectors, various types of associated features are processed in a targeted manner according to the determined processing units and processing rules. Specifically, the CPU processing unit processes text-related associated features according to word vector conversion rules to generate fixed-dimensional word vectors, the GPU processing unit processes image-related associated features through convolutional feature extraction technology to generate high-dimensional convolutional feature vectors, and the audio processing unit processes audio-related associated features through a spectral feature mapping table to generate spectral feature vectors of a preset dimension. At the same time, the metadata of different modal feature vectors and the performance data of the corresponding processing units are recorded.
[0014] Furthermore, in determining the fusion weight allocation, based on the initial association relationship between nodes, the tightness of node connections is extracted and quantified as the meta-path association strength. And after normalization, we obtain In conjunction with the processing unit performance data, a performance score is calculated using the processing unit performance scoring formula. The performance scores were then normalized to obtain... Then, based on the correlation strength and performance score, the fusion weights of different modal feature vectors are calculated using the fusion weight allocation formula. .
[0015] Furthermore, in determining the fusion weight allocation, the processing unit performance data is considered, and the current performance utilization rate of the processing unit is assumed to be... The calculated throughput is The historical average response time is The current response time is The performance score is calculated using the unit performance scoring formula. The performance scoring formula for the processing unit is: ,in, stab represents the stability weight, and stab represents the stability coefficient.
[0016] Furthermore, in determining the fusion weight allocation, based on the correlation strength and performance score, the comprehensive score of the feature vectors of different modalities is calculated using the fusion weight allocation formula, which is as follows: ,in, To integrate weights, For the correlation strength weighting coefficient, The total number of modes, The normalized metapath association strength, The performance score is after normalization. and The first Meta-path association strength after modal normalization and performance score after normalization. This is the index of the modality.
[0017] Furthermore, in the weighted fusion feature vector, based on the feature vectors of different modalities and the fusion weights, the feature vectors of different modalities are preprocessed, and all feature vectors are unified to the same dimensional space through a feature concatenation method. Let the first... The feature vectors of each mode are The feature vectors are then weighted according to the feature vector weighting fusion formula to obtain the preliminary fused feature vectors. The feature vectors after initial fusion are then processed, including outlier correction, feature normalization, and redundant feature removal, to complete the data fusion process and form a fused feature system.
[0018] Furthermore, in the weighted fusion feature vector, weighting is performed according to the feature vector weighted fusion formula, which is: ,in, The first feature vector after fusion The value of each element, For the first The fusion weights corresponding to each modal feature vector For the first The th modal eigenvector in the th modality eigenvector One element, For feature enhancement coefficients, It is a natural constant. The total number of modes, For modal indexing, This is the index of an element in the feature vector.
[0019] On the other hand, a multi-source heterogeneous data fusion processing system includes:
[0020] Dynamic Meta-Path Construction Module: Collects multimodal data nodes and heterogeneous nodes, performs attribute analysis and standardization on these nodes, constructs meta-paths based on the initial association relationships between nodes, incorporates all nodes and meta-paths into a unified association framework using a graph database structure, dynamically updates meta-paths by monitoring new interaction data between nodes in real time, and simultaneously mines cross-node association features to form an association feature set with modality labels.
[0021] Computing power mapping channel allocation module: Based on the associated feature set, the associated features are classified according to the modal attribute identifier, and the processing rules of the corresponding processing unit are matched for different categories of associated features;
[0022] Modal feature vector generation module: Based on the determined processing units and processing rules, it performs targeted processing on various related features to generate feature vectors of different modalities, and records the metadata of different modal feature vectors and the performance data of the corresponding processing units.
[0023] The fusion weight allocation determination module: Based on the initial association relationship between nodes, it extracts the tightness of node connection from the meta-path, quantifies it into the association strength of the meta-path and performs normalization processing, and calculates the performance score in combination with the performance data of the processing unit. Then, based on the association strength and the performance score, it calculates the fusion weight of feature vectors of different modalities.
[0024] Feature vector weighted fusion module: It obtains feature vectors of different modalities and fusion weights, preprocesses the feature vectors of different modalities, unifies them into the same dimensional space, and performs weighted calculation on the preprocessed feature vectors to obtain the preliminary fused feature vectors. Then, it performs outlier correction, normalization and redundant feature removal on the preliminary fused feature vectors to complete the data fusion.
[0025] Compared with existing technologies, this multi-source heterogeneous data fusion processing method and system have the following advantages:
[0026] I. This invention collects multimodal data nodes (text, images, audio) and heterogeneous nodes (users, products, tags), performs attribute analysis and standardization, constructs meta-paths based on the initial relationships between nodes, and incorporates all nodes and meta-paths into a unified association framework of a graph database structure. Simultaneously, it monitors the dynamic updates of meta-paths based on the interaction data between nodes in real time, including expanding and adjusting the structure, and adding or deleting nodes and edges. Furthermore, it captures cross-type node association features during meta-path expansion, forming a set of association features with modal identifiers. This effectively captures the association relationships between different data sources, transforming these relationships into valuable feature information, fully exploring the potential value of multi-source heterogeneous data, and solving the problem that traditional methods cannot effectively utilize multimodal data association information.
[0027] Second, this invention categorizes associated features according to modal attribute identifiers and matches processing rules for corresponding processing units for different categories, achieving flexible allocation and efficient utilization of processing unit computing power. It also extracts the node connection tightness based on the initial association relationship between nodes, quantifies it into meta-path association strength, and normalizes it. Combined with the performance data of processing units, it calculates performance scores, and then calculates the fusion weights of different modal feature vectors based on association strength and performance scores. This approach considers both data association and processing unit performance, rationally allocates fusion weights, solves the problems of wasted computing resources or low processing efficiency in traditional methods, and improves data fusion effect and processing efficiency.
[0028] Other advantages, objectives and features of the invention will be set forth in part in the description which follows, and in part will be apparent to those skilled in the art from the following examination or study, or may be learned from the practice of the invention. Attached Figure Description
[0029] To more clearly illustrate the technical solutions in the embodiments of the present invention or the prior art, the accompanying drawings used in the description of the embodiments or the prior art will be briefly introduced below. Obviously, the drawings described below are merely some embodiments of the present invention. For those skilled in the art, other drawings can be obtained based on these drawings without any creative effort.
[0030] Figure 1 A flowchart of a multi-source heterogeneous data fusion processing method;
[0031] Figure 2 This is a framework diagram of a multi-source heterogeneous data fusion and processing system. Detailed Implementation
[0032] To further illustrate the technical means and effects of the present invention in achieving its intended purpose, the following detailed description of the specific implementation methods, structures, features, and effects of the present invention, in conjunction with the accompanying drawings and preferred embodiments, is provided below. Example 1:
[0033] Constructing Dynamic Meta-Paths: In e-commerce platform user behavior analysis scenarios, multimodal data nodes such as user search text (text nodes), product images (image nodes), and customer service call recordings (audio nodes) are collected, along with heterogeneous nodes such as user accounts (user nodes), product IDs (product nodes), and category tags (tag nodes). Keywords are extracted from text nodes and their encoding format is standardized; image nodes have standardized resolution; and audio nodes have standardized sampling rates. Meta-paths are constructed based on initial associations such as "user-search text-product" and "user-customer service call-product," with each node as a vertex and the initial associations between nodes as edges for connection. All... Nodes and metapaths are incorporated into a unified association framework using a graph database structure. By monitoring user interaction data such as clicking on product images and modifying product tags in real time, metapaths are expanded (e.g., adding a "user-product image-tag" path), edge weights are adjusted (e.g., increasing the edge weight between users and frequently clicked products), and association features are identified through a modality recognition rule base: text types (e.g., search keywords) are identified by character encoding range and word density threshold, image types (e.g., product images) are identified by pixel matrix dimension and color space features, and audio types (e.g., call recordings) are identified by sampling rate range and waveform amplitude threshold, forming a set of association features with modality labels.
[0034] Allocation of computing power mapping channels: Based on the associated feature set, the labeled associated features are classified into text (search keywords, evaluation text), image (product main image, detail image) and audio (customer service call, voice evaluation). Among them, the text category matches the word vector conversion rules of the CPU processing unit, the image category matches the convolutional feature extraction technology of the GPU processing unit, and the audio category matches the spectral feature mapping table of the audio processing unit.
[0035] Generating modal feature vectors: Based on the determined processing units and processing rules, various types of associated features are processed in a targeted manner. Specifically, the CPU converts text-related associated features into fixed-dimensional word vectors according to word vector conversion rules, the GPU extracts high-dimensional convolutional feature vectors of image-related features through convolutional feature extraction technology, and the audio processing unit converts audio-related associated features into preset-dimensional spectral feature vectors according to the spectral feature mapping table. At the same time, metadata such as the generation time of text vectors, the memory usage of image vectors, and the computation time of audio vectors are recorded, as well as performance data such as CPU utilization, GPU frame rate, and audio processing unit response speed.
[0036] Determine the fusion weight allocation: Based on the initial association between nodes, extract the click count and dwell time of the "user-product" node connection, quantify them into meta-path association strength, and normalize them. Simultaneously, combine the processing unit performance data and calculate the performance score of each processing unit using a performance scoring formula. Assume the current performance utilization rate of the processing unit is... The calculated throughput is The historical average response time is The current response time is The performance score is calculated using the unit performance scoring formula. The performance scoring formula for the processing unit is: ,in, stab is the stability weight, and stab is the stability coefficient. The performance score is normalized, and then the fusion weight of the text, image, and audio feature vectors is calculated based on the association strength and performance score using the fusion weight allocation formula: ,in, To integrate weights, For the correlation strength weighting coefficient, The total number of modes, The normalized metapath association strength, The performance score is after normalization. and The first Meta-path association strength after modal normalization and performance score after normalization. For modal indexes, such as Figure 1 As shown.
[0037] Weighted fusion feature vector: Based on feature vectors of different modalities and fusion weights, text, image, and audio feature vectors are preprocessed, and all feature vectors are unified to the same dimension through feature concatenation. Let the first feature vector be the first feature vector. The feature vectors of each mode are The eigenvector weighted fusion is then calculated using the eigenvector weighted fusion formula, which is: ,in, The first feature vector after fusion The value of each element, For the first The fusion weights corresponding to each modal feature vector For the first The th modal eigenvector in the th modality eigenvector One element, For feature enhancement coefficients, It is a natural constant. The total number of modes, For modal indexing, Using the indices of the elements in the feature vector, we obtain the preliminary fused feature vector. Then, the feature vectors after initial fusion are corrected for outliers (such as removing elements that exceed the mean by a certain multiple of the standard deviation), normalized to a specific interval, and redundant features are removed (such as retaining dimensions that meet the conditions by using the variance threshold method), thereby completing the data fusion process.
[0038] In summary, in the user behavior analysis scenario of e-commerce platforms, by constructing dynamic meta-paths, multimodal data nodes such as text, images, and audio, as well as heterogeneous nodes such as users, products, and tags, are integrated. After processing, a set of associated features with modal identifiers is formed. Then, based on modal attributes, corresponding computing power mapping channels and processing rules are assigned to different associated features to generate various modal feature vectors and record relevant metadata and performance data. Next, the fusion weight is determined by combining the node association strength and the performance of the processing unit. Finally, through preprocessing to unify dimensions, weighted calculation, and subsequent optimization processing, the effective fusion of multi-source heterogeneous user behavior data is achieved, providing integrated feature support for e-commerce platforms to accurately analyze user preferences and optimize product recommendations. Example 2:
[0039] Constructing Dynamic Meta-Paths: In intelligent security monitoring scenarios, multimodal data nodes such as surveillance video frames (image nodes), ambient sounds (audio nodes), and alarm text records (text nodes) are collected, along with heterogeneous nodes such as camera devices (device nodes), personnel IDs (user nodes), and event tags (tag nodes). A unified frame rate is applied to image nodes, a unified sampling rate to audio nodes, and a unified storage format to text nodes. Meta-paths are constructed based on initial associations such as "camera-video frame-person" and "camera-ambient sound-event," using each node as a vertex and the initial associations between nodes as edges for connection. Then, all nodes... Points and metapaths are incorporated into a unified association framework using a graph database structure. By monitoring interactive data such as camera switching triggered by personnel movement and alarm linkage triggered by abnormal sound in real time, the metapath structure is adjusted (e.g., extending the path "Camera A-Recording Frame-Person-Camera B"), adding "Sound-Event-Label" edges, and identifying association features through a modal recognition rule base: text-based (alarm records) are identified by character encoding and word density, image-based (recording frames) are identified by pixel matrix dimension and color space features, and audio-based (ambient sound) is identified by sampling rate range and waveform amplitude threshold, forming an association feature set with modal identification.
[0040] Allocation of computing power mapping channels: Based on the associated feature set, the labeled associated features are classified into text (alarm records, device logs), image (video frames, face screenshots) and audio (ambient sound, voice commands). Among them, the text category matches the word vector conversion rules of the CPU, the image category matches the convolutional feature extraction technology of the GPU, and the audio category matches the spectral feature mapping table of the audio processing unit.
[0041] Generating modal feature vectors: Based on the determined processing units and rules, various related features are processed in a targeted manner. Specifically, the CPU converts text features into fixed-dimensional word vectors according to word vector conversion rules; the GPU extracts high-dimensional convolutional feature vectors for image features using convolutional feature extraction technology; and the audio processing unit converts audio features into preset-dimensional spectral feature vectors based on a spectral feature mapping table. Simultaneously, metadata such as the vocabulary size of text vectors, inference time of image vectors, and spectral resolution of audio vectors are recorded, along with performance data such as CPU load, GPU memory usage, and audio processing unit latency. Figure 2 As shown.
[0042] Determine the fusion weight allocation: Based on the initial association relationship between nodes, extract the tracking duration of the "camera-person" node and the trigger number of "sound-event" nodes, quantify them into meta-path association strength and normalize them. At the same time, combine the performance data of the processing units, calculate the performance score of each processing unit through the performance scoring formula. The performance scoring formula of the processing unit is as follows: The performance score is normalized, and then, based on the correlation strength and performance score, the feature vector fusion weights for text, image, and audio classes are calculated using a fusion weight allocation formula: .
[0043] Weighted fusion feature vectors: Based on feature vectors of different modalities and fusion weights, text, image, and audio feature vectors are preprocessed, and all feature vectors are unified to the same dimension through feature concatenation. Let the first feature vector be the first feature vector. The feature vectors of each mode are The eigenvector weighted fusion is then calculated using the eigenvector weighted fusion formula, which is: The feature vectors after preliminary fusion are obtained. Then, outlier correction (such as replacing outliers with the median), normalization to a specific interval, and removal of redundant features (such as filtering key dimensions using mutual information) are performed to complete the fusion processing of security monitoring data.
[0044] In summary, in intelligent security monitoring scenarios, a dynamic meta-path is first constructed to collect multimodal data nodes such as monitoring video frames, ambient sounds, and alarm texts, as well as heterogeneous nodes such as device, personnel, and event tags. After processing, a set of associated features with modal identifiers is formed. Subsequently, corresponding computing power channels and processing rules are matched for different categories of associated features to generate modal feature vectors and record metadata and performance data. Then, based on the node association strength and processing unit performance, the fusion weight is calculated. Through preprocessing to unify dimensions, weighted fusion, outlier correction, normalization, and redundant feature removal, the fusion of multi-source heterogeneous data in security monitoring is completed, providing integrated feature basis for intelligent security systems to accurately identify events and improve monitoring efficiency.
[0045] The above description is merely a preferred embodiment of the present invention and is not intended to limit the present invention in any way. Although the present invention has been disclosed above with reference to preferred embodiments, it is not intended to limit the present invention. Any person skilled in the art can make some modifications or alterations to the above-disclosed technical content to create equivalent embodiments without departing from the scope of the present invention. Any simple modifications, equivalent changes and alterations made to the above embodiments based on the technical essence of the present invention without departing from the scope of the present invention shall still fall within the scope of the present invention.
Claims
1. A method for fusing and processing multi-source heterogeneous data, characterized in that, The specific steps of this method are as follows: Constructing Dynamic Meta-Paths: Multimodal data nodes (including text, images, and audio) and heterogeneous nodes (including users, products, and tags) are collected. Attribute analysis and standardization are performed on these nodes, and meta-paths are constructed based on the initial relationships between nodes. Each node is used as a vertex, and the initial relationships between nodes are used as edges for connection. All nodes and meta-paths are then incorporated into a unified association framework using a graph database structure. Meta-paths are dynamically updated in real-time by monitoring the interaction data between nodes, including expanding meta-paths, adjusting their structure, and adding or deleting nodes and edges. During the meta-path expansion process, the association features between cross-type nodes are captured. At the same time, the modal attributes of the associated data of each node are identified through a preset modal recognition rule base. Each associated feature is assigned a unique modal identifier, forming an associated feature set with modal identifiers. The modal recognition rule base contains feature parameters and judgment logic for different modal data. For text-type associated features, there are judgment rules for character encoding range and word density threshold. For image-type associated features, there are recognition parameters for pixel matrix dimension range and color space features. For audio-type associated features, there are judgment rules for sampling rate range and waveform amplitude threshold. Allocation of computing power mapping channels: Based on the associated feature set, the associated features are classified according to the modal attribute identifier, and the processing rules of the corresponding processing unit are matched for different categories of associated features; Generate modal feature vectors: Based on the determined processing units and processing rules, perform targeted processing on various related features to generate feature vectors of different modalities, and record the metadata of different modal feature vectors and the performance data of the corresponding processing units. Determine the fusion weight allocation: Based on the initial association relationship between nodes, extract the tightness of node connections and quantify it as the meta-path association strength. And after normalization, we obtain In conjunction with the processing unit performance data, a performance score is calculated using the processing unit performance scoring formula. The performance scores were then normalized to obtain... Then, based on the correlation strength and performance score, the fusion weights of different modal feature vectors are calculated using the fusion weight allocation formula. The formula for weight allocation in the fusion process is: ,in, To integrate weights, For the correlation strength weighting coefficient, The total number of modes, The normalized metapath association strength, The performance score is after normalization. and The first Meta-path association strength after modal normalization and performance score after normalization. For modal indexes; Weighted fusion feature vector: Based on different modal feature vectors and fusion weights, the feature vectors of different modalities are preprocessed and unified into the same dimensional space. The preprocessed feature vectors are then weighted to obtain the preliminary fused feature vector. The preliminary fused feature vector is then processed to complete the data fusion.
2. The multi-source heterogeneous data fusion processing method according to claim 1, characterized in that, In the allocated computing power mapping channel, based on the associated feature set, the associated features are classified according to modal attribute identifiers. The categories include text, image, and audio. For the associated features of different categories, the processing rules of the corresponding processing units are matched respectively. Specifically, the associated features of text are matched with the word vector conversion rules of the CPU processing unit, the associated features of image are matched with the convolutional feature extraction technology of the GPU processing unit, and the associated features of audio are matched with the spectral feature mapping table of the audio processing unit.
3. The multi-source heterogeneous data fusion processing method according to claim 1, characterized in that, In the process of generating modal feature vectors, various types of associated features are processed in a targeted manner according to the determined processing units and processing rules. Specifically, the CPU processing unit processes text-related associated features according to word vector conversion rules to generate fixed-dimensional word vectors, the GPU processing unit processes image-related associated features through convolutional feature extraction technology to generate high-dimensional convolutional feature vectors, and the audio processing unit processes audio-related associated features through a spectral feature mapping table to generate spectral feature vectors of a preset dimension. At the same time, the metadata of different modal feature vectors and the performance data of the corresponding processing units are recorded.
4. The multi-source heterogeneous data fusion processing method according to claim 1, characterized in that, In determining the fusion weight allocation, the current performance utilization rate of the processing unit is set based on the processing unit performance data. The calculated throughput is The historical average response time is The current response time is The performance score is calculated using the unit performance scoring formula. The performance scoring formula for the processing unit is: ,in, stab represents the stability weight, and stab represents the stability coefficient.
5. The multi-source heterogeneous data fusion processing method according to claim 1, characterized in that, In the weighted fusion feature vector, based on the feature vectors of different modalities and the fusion weights, the feature vectors of different modalities are preprocessed, and all feature vectors are unified to the same dimensional space through a feature concatenation method. Let the first feature vector be... The feature vectors of each mode are The feature vectors are then weighted according to the feature vector weighting fusion formula to obtain the preliminary fused feature vectors. The feature vectors after initial fusion are then processed, including outlier correction, feature normalization, and redundant feature removal, to complete the data fusion process.
6. The multi-source heterogeneous data fusion processing method according to claim 5, characterized in that, The weighted fusion feature vector is calculated by weighting according to the feature vector weighted fusion formula, which is: ,in, The first feature vector after fusion The value of each element, For the first The fusion weights corresponding to each modal feature vector For the first The th modal eigenvector in the th modality eigenvector One element, For feature enhancement coefficients, It is a natural constant. The total number of modes, For modal indexing, This is the index of an element in the feature vector.
7. A multi-source heterogeneous data fusion processing system, wherein the system is applicable to the multi-source heterogeneous data fusion processing method according to any one of claims 1-6, characterized in that, The system includes: Dynamic Meta-Path Construction Module: Collects multimodal data nodes and heterogeneous nodes, performs attribute analysis and standardization on these nodes, constructs meta-paths based on the initial association relationships between nodes, incorporates all nodes and meta-paths into a unified association framework using a graph database structure, dynamically updates meta-paths by monitoring new interaction data between nodes in real time, and mines cross-node association features to form an association feature set with modality labels. Computing power mapping channel allocation module: Based on the associated feature set, the associated features are classified according to the modal attribute identifier, and the processing rules of the corresponding processing unit are matched for different categories of associated features; Modal feature vector generation module: Based on the determined processing units and processing rules, it performs targeted processing on various related features to generate feature vectors of different modalities, and records the metadata of different modal feature vectors and the performance data of the corresponding processing units. The fusion weight allocation determination module: Based on the initial association relationship between nodes, it extracts the tightness of node connection from the meta-path, quantifies it into the association strength of the meta-path and performs normalization processing, and calculates the performance score in combination with the performance data of the processing unit. Then, based on the association strength and the performance score, it calculates the fusion weight of feature vectors of different modalities. Feature vector weighted fusion module: It obtains feature vectors of different modalities and fusion weights, preprocesses the feature vectors of different modalities, unifies them into the same dimensional space, and performs weighted calculation on the preprocessed feature vectors to obtain the preliminary fused feature vectors. Then, it performs outlier correction, normalization and redundant feature removal on the preliminary fused feature vectors to complete the data fusion.