Multi-modal based user identity consolidation method and apparatus, device, and medium
By using multimodal feature fusion and clustering techniques, the problem of lag and mismatch caused by the lack of strong identity identifiers in user identity merging was solved, realizing accurate merging of user identities without strong identity identifiers and improving the accuracy of identity merging.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- CHINA PING AN PROPERTY INSURANCE CO LTD
- Filing Date
- 2026-03-06
- Publication Date
- 2026-06-12
AI Technical Summary
In existing technologies, user identity merging methods rely on strong user identity identifiers pre-stored in the master data management system, which cannot capture dynamic behavioral data of users in the business system in a timely manner. This results in lag in user merged data and mismatches or omissions in identity, leading to low accuracy.
By acquiring user data of the target object, extracting user information features, user behavior features, user device fingerprint features, and user social graph features, performing feature fusion and clustering, generating identity merging profile data, and realizing user identity merging without strong identity identification.
It improves the accuracy of user identity merging, effectively solves the problem of mismatch and omission of user identities in different systems, and ensures the accuracy of user identity merging.
Smart Images

Figure CN122196888A_ABST
Abstract
Description
Technical Field
[0001] This application relates to the field of artificial intelligence technology, and is applicable to the financial technology field, particularly to a method, apparatus, device, and medium for merging user identities based on multimodality. Background Technology
[0002] Currently, traditional user identity merging methods typically use Master Data Management (MDM) systems and strong user identifiers (such as ID cards, mobile phone numbers, etc.) to perform cross-business identity matching for users in different business systems, and then aggregate the successfully matched user data. For example, in an insurance scenario, if the MDM system pre-stores the binding relationship between Zhang San, a user in the auto insurance system, and Zhang San's ID card number, a user in the group accident insurance system, then it can be directly determined that Zhang San in the auto insurance and group accident insurance systems is the same user. However, if the user's strong identifier changes (such as Zhang San's mobile phone number changing), then manual verification is required by comparing auxiliary information such as user names and addresses from different insurance business systems to achieve user identity merging across different insurance business systems. However, this method relies solely on the strong user identity identifiers pre-stored in the master data management system, which cannot capture dynamic user behavior data in the business system in a timely manner. This results in a lag in user identity merging. Furthermore, if the strong user identity identifier changes or is missing, the same user may be mismatched or missed due to inconsistencies in the identity categories of different business systems. Consequently, the accuracy of user identity merging is low. Therefore, improving the accuracy of user identity merging has become an urgent problem to be solved. Summary of the Invention
[0003] To address the technical problem of improving the accuracy of user identity merging, the main objective of this application is to propose a multimodal user identity merging method, apparatus, device, and medium, aiming to improve the accuracy of user identity merging.
[0004] To achieve the above objectives, a first aspect of this application proposes a multimodal user identity merging method, the method comprising: Obtain target user data of the target object; wherein the target object has at least two identity categories, and each identity category is different; Feature extraction is performed on the target user data to obtain target user features; wherein, the target user features include at least two of the following: user information features, user behavior features, user device fingerprint features, and user social graph features; The target fused user features are obtained by fusing at least two of the user information features, user behavior features, user device fingerprint features, and user social graph features. Feature fusion is performed on the target fused user features of any two of the aforementioned identity categories to obtain multi-identity fused features; Based on the multi-identity fusion features, the target object is clustered to obtain the identity merging profile data of the target object.
[0005] In some embodiments, the step of fusing the target fused user features of any two of the identity categories to obtain multiple identity fused features includes: For any of the aforementioned identity categories, the target fused user features are associated and aggregated to obtain aggregated user features; The target fused user features are subjected to feature interference processing to obtain interference fused features; The user aggregation features and the interference fusion features are weighted to obtain the enhanced fusion features; The enhanced fusion features of any two identity categories are fused to obtain the multi-identity fusion features.
[0006] In some embodiments, fusing identity features on the enhanced fusion features of any two identity categories to obtain the multiple identity fusion features includes: The enhanced fusion feature of any of the aforementioned identity categories is encrypted to obtain the encrypted fusion feature; The encrypted fusion feature is shared to obtain the encrypted sharing feature; The encrypted identity sharing features of any of the aforementioned identity categories are aggregated to obtain encrypted aggregate features; Decrypt the encrypted aggregate feature of any of the aforementioned identity categories to obtain the decrypted aggregate feature; The multi-identity fusion feature is obtained by weighting the decryption aggregation features of any two identity categories.
[0007] In some embodiments, the step of performing feature interference processing on the target fused user features to obtain interference fused features includes: The target fused user features are randomly discarded to obtain missing fused features; The target fused user features are subjected to noise processing to obtain noise fused features; The interference fusion feature is determined based on the missing fusion feature and the noise fusion feature.
[0008] In some embodiments, the step of clustering the target object based on the multi-identity fusion features to obtain the target object's merged identity profile data includes: Based on the multi-identity fusion features, the participating identity category of the target object and the identity category user of the participating identity category are determined; Similarity calculations are performed on users of the aforementioned identity categories to obtain identity feature similarity. Obtain the nearest neighbor users of the user with the specified identity category, and calculate the similarity between the user with the specified identity category and the nearest neighbor users to obtain the nearest neighbor identity similarity. The identity category users are divided into identity clusters based on the identity feature similarity and the nearest neighbor identity similarity to obtain the target identity cluster; Based on the target identity cluster, the identity data of the user category is fused to obtain the identity merged profile data.
[0009] In some embodiments, the step of fusing identity data of the user category based on the target identity cluster to obtain the merged identity profile data includes: The matching identity user of the target object is determined based on the target identity cluster; According to the preset identity matching rules, the matched users are subjected to identity matching detection to obtain identity matching detection data; The identity data of the target object is merged based on the identity matching detection data to obtain the identity merged profile data.
[0010] In some embodiments, the identity matching detection data is either that the matched identity user matches, or that the matched identity user does not match; The step of merging identity data of the target object based on the identity matching detection data to obtain the identity merged profile data includes: If the matched users match, the target user data of the matched users are merged to obtain the merged identity profile data. If the matched identity user does not match, then a similarity calculation is performed on the matched identity user to obtain the matching user similarity. A matching rule score is obtained by calculating the rule score for the matched identity user according to the identity matching rule. The matching confidence score is calculated based on the matching user similarity and the matching rule score. Based on the matching confidence level, the target user data of the matched identity user is merged to obtain the identity merged profile data.
[0011] To achieve the above objectives, a second aspect of this application provides a multimodal user identity merging apparatus, the apparatus comprising: The user data acquisition module is used to acquire target user data of the target object; wherein the target object has at least two identity categories, and each identity category is different; The feature extraction module is used to extract features from the target user data to obtain target user features; wherein, the target user features include at least two of the following: user information features, user behavior features, user device fingerprint features, and user social graph features; The user feature fusion module is used to fuse at least two of the user information features, user behavior features, user device fingerprint features, and user social graph features to obtain the target fused user features. The identity feature fusion module is used to perform feature fusion on the target fused user features of any two identity categories to obtain multiple identity fusion features; The identity clustering module is used to perform identity clustering on the target object based on the multi-identity fusion features to obtain the identity merging profile data of the target object.
[0012] To achieve the above objectives, a third aspect of this application provides an electronic device, which includes a memory and a processor. The memory stores a computer program, and the processor executes the computer program to implement the method described in the first aspect.
[0013] To achieve the above objectives, a fourth aspect of the present application provides a computer-readable storage medium storing a computer program that, when executed by a processor, implements the method of the first aspect described above.
[0014] The multimodal user identity merging method, apparatus, device, and medium proposed in this application firstly capture the complete initial features of a user by extracting and fusing at least two of user information features, user behavior features, user device fingerprint features, and user social graph features. Even in the absence of strong identity identifiers, accurate user association can still be achieved based on behavior, device fingerprints, or social graphs. Secondly, by performing feature fusion on the target user features of any two identity categories, it is possible to further capture user features associated with different identity categories. Finally, by performing identity clustering on the target object based on the multi-identity fusion features, identity merging profile data of the target object is obtained. This enables the merging of user identities without strong identity identifiers, effectively solving the problem of mismatch and omission of user identities in different systems, and significantly improving the accuracy of user identity merging. Attached Figure Description
[0015] Figure 1 This is a flowchart of a multimodal user identity merging method provided in an embodiment of this application; Figure 2 yes Figure 1 The flowchart of step S104 in the process; Figure 3 yes Figure 2 The flowchart of step S203 in the process; Figure 4 yes Figure 2 The flowchart of step S205 in the document; Figure 5 yes Figure 1 The flowchart of step S105 in the process; Figure 6 yes Figure 5 The flowchart of step S505 in the document; Figure 7 yes Figure 6 The flowchart of step S603 in the process; Figure 8 This is a schematic diagram of the structure of the multimodal user identity merging device provided in the embodiments of this application; Figure 9 This is a schematic diagram of the hardware structure of the electronic device provided in the embodiments of this application. Detailed Implementation
[0016] To make the objectives, technical solutions, and advantages of this application clearer, the following detailed description is provided in conjunction with the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative and not intended to limit the scope of this application.
[0017] It should be noted that although functional modules are divided in the device schematic diagram and a logical order is shown in the flowchart, in some cases, the steps shown or described may be performed in a different order than the module division in the device or the order in the flowchart. The terms "first," "second," etc., in the specification, claims, and the aforementioned drawings are used to distinguish similar objects and are not necessarily used to describe a specific order or sequence.
[0018] Unless otherwise defined, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs. The terminology used herein is for the purpose of describing embodiments of this application only and is not intended to limit this application.
[0019] First, let's analyze some of the terms used in this application: Artificial intelligence (AI) is a new branch of computer science that studies, develops, and applies theories, methods, technologies, and systems to simulate, extend, and expand human intelligence. It aims to understand the essence of intelligence and produce intelligent machines that can react in a way similar to human intelligence. Research in this field includes robotics, speech recognition, image recognition, natural language processing, and expert systems. AI can simulate the information processes of human consciousness and thought. Furthermore, AI utilizes digital computers or machines controlled by digital computers to simulate, extend, and expand human intelligence, perceiving the environment, acquiring knowledge, and using that knowledge to achieve optimal results.
[0020] This application provides a multimodal user identity merging method, apparatus, device, and medium, aiming to improve the accuracy of user identity merging.
[0021] The multimodal user identity merging method, apparatus, device, and medium provided in this application are specifically described through the following embodiments. First, the multimodal user identity merging method in this application is described.
[0022] The embodiments of this application can acquire and process relevant data based on artificial intelligence technology. Artificial intelligence (AI) refers to the theories, methods, technologies, and application systems that use digital computers or machines controlled by digital computers to simulate, extend, and expand human intelligence, perceive the environment, acquire knowledge, and use that knowledge to obtain optimal results.
[0023] Foundational technologies for artificial intelligence generally include sensors, dedicated AI chips, cloud computing, distributed storage, big data processing, operating / interactive systems, and mechatronics. AI software technologies mainly encompass computer vision, robotics, biometrics, speech processing, natural language processing, and machine learning / deep learning.
[0024] The multimodal user identity merging method provided in this application relates to the field of artificial intelligence technology. This method can be applied to a terminal, a server, or software running on either a terminal or a server. In some embodiments, the terminal can be a smartphone, tablet, laptop, desktop computer, etc.; the server can be configured as an independent physical server, a server cluster or distributed system composed of multiple physical servers, or a cloud server providing basic cloud computing services such as cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communication, middleware services, domain name services, security services, CDN, and big data and artificial intelligence platforms; the software can be an application implementing the multimodal user identity merging method, but is not limited to the above forms.
[0025] This application can be used in a wide variety of general-purpose or special-purpose computer system environments or configurations. Examples include: personal computers, server computers, handheld or portable devices, tablet devices, multiprocessor systems, microprocessor-based systems, set-top boxes, programmable consumer electronics, network PCs, minicomputers, mainframe computers, and distributed computing environments including any of the above systems or devices. This application can be described in the general context of computer-executable instructions executed by a computer, such as program modules. Generally, program modules include routines, programs, objects, components, data structures, etc., that perform specific tasks or implement specific abstract data types. This application can also be practiced in distributed computing environments where tasks are performed by remote processing devices connected via a communication network. In distributed computing environments, program modules can reside in local and remote computer storage media, including storage devices.
[0026] Figure 1 This is an optional flowchart of a multimodal user identity merging method provided in the embodiments of this application. Figure 1 The method may include, but is not limited to, steps S101 to S105.
[0027] Step S101: Obtain target user data of the target object; wherein the target object has at least two identity categories, and each identity category is different.
[0028] Step S102: Extract features from the target user data to obtain target user features; wherein, the target user features include at least two of the following: user information features, user behavior features, user device fingerprint features, and user social graph features.
[0029] Step S103: At least two of the following features are fused: user information features, user behavior features, user device fingerprint features, and user social graph features, to obtain the target fused user features.
[0030] Step S104: Perform feature fusion on the target user features of any two identity categories to obtain multi-identity fusion features.
[0031] Step S105: Cluster the target object's identity based on the multi-identity fusion features to obtain the target object's identity merging profile data.
[0032] Steps S101 to S105 as illustrated in this application embodiment firstly capture the complete initial features of a user by extracting and fusing at least two of the following: user information features, user behavior features, user device fingerprint features, and user social graph features. Even in the absence of strong identity identifiers, accurate user association can still be achieved based on behavior, device fingerprints, or social graphs. Secondly, by performing feature fusion on the target user features of any two identity categories, user features associated with different identity categories can be further captured. Finally, identity clustering is performed on the target object based on the multi-identity fusion features to obtain the identity merging profile data of the target object. This enables the merging of user identities without strong identity identifiers, effectively solving the problem of mismatched or missing user identities in different systems and significantly improving the accuracy of user identity merging.
[0033] In step S101 of some embodiments, specifically, the target object refers to the user who needs to be identified and merged in different business systems, and the target object has at least two identity categories, each of which is different; wherein, the identity category refers to the business attribute category of the user in different insurance business systems.
[0034] For example, in an insurance scenario, the insurance business system can be a car insurance business system, an accident insurance business system, and a home insurance business system. For the car insurance business system, the identity categories can include high-risk driving customers (such as those with more than 3 accidents in a year), safe driving quality customers, new energy vehicle exclusive customers, and potential renewal loss customers, etc. For the accident insurance business system, the identity categories can include group accident insurance corporate customers, individual high-frequency travel customers, high-risk occupation customers, and short-term insurance repeat customers, etc. For the home insurance business system, the identity categories can include high-end residential customers, rental housing customers, comprehensive protection package customers, and smart home device associated customers, etc.
[0035] For example, in an insurance scenario, user Zhang San could be a high-risk driving customer in the auto insurance business system, a group accident insurance corporate customer in the accident insurance business system, or a smart home device associated customer in the home insurance business system.
[0036] Specifically, target user data refers to the collection of raw data related to the target object collected from different insurance business systems. This target user data may include multimodal data such as user information (e.g., name, age, gender), user behavior data (e.g., transaction records, login frequency), user device information (e.g., IMEI number, MAC address), and user social graph data (e.g., contact relationships, common service objects).
[0037] Specifically, customer behavior data, transaction records, device information, communication logs, and service interaction records can be collected from various insurance business systems (such as auto insurance, accident insurance, and home insurance) to determine target user data.
[0038] In step S102 of some embodiments, user information features refer to basic user information such as age, gender, occupation, and identity category based on information feature vectors of users in the business system.
[0039] Specifically, user behavior features refer to the behavioral feature vectors extracted from user behavior data.
[0040] Specifically, user equipment fingerprint features refer to the unique identifier feature vector of the device extracted from user equipment fingerprint data.
[0041] Specifically, user social graph features are feature vectors extracted from user social graph data.
[0042] Specifically, the target user data can be deduplicated and filled to obtain preprocessed target user data. Embedding processing can be performed on the preprocessed user information data to obtain numerical features of user information. Dynamic behavioral indicators can be statistically analyzed on the preprocessed user behavior data based on a sliding window to obtain numerical features of behavior. The preprocessed user device fingerprint features can be hashed and encrypted, and the first 8 bits can be retained as a numerical feature of device identification. Numerical features such as degree centrality (e.g., total number of user contacts, number of associated service objects) and relationship strength (e.g., frequency of group interaction, duration of joint service participation) of the preprocessed user social graph data can be calculated. All numerical features can be standardized by Z-score to obtain feature vectors such as user information features, user behavior features, user device fingerprint features, and user social graph features in a unified format.
[0043] In step S103 of some embodiments, specifically, target fusion user features refers to integrating at least two feature vectors from user information features, user behavior features, user device fingerprint features, and user social graph features.
[0044] Specifically, at least two of the following features can be combined: user information features, user behavior features, user device fingerprint features, and user social graph features, to obtain the target fused user features.
[0045] For example, in an insurance scenario, Li Si is a new energy vehicle exclusive customer of the auto insurance business system. By splicing together Li Si's target user features in a unified format into user information features (name Li Si, age 32, gender male, etc.), user behavior features (such as auto insurance products clicked in the past 7 days), user device fingerprint features (such as abc123), and user social graph features (such as Li Si's contact relationships), the final target fused user feature vector is obtained.
[0046] In this embodiment, by fusing at least two of the following features—user information features, user behavior features, user device fingerprint features, and user social graph features—the target fused user features are obtained. This allows for the capture of the user's complete initial features, and even in the absence of strong identity identifiers, accurate user association can still be achieved based on behavior, device fingerprints, or social graphs.
[0047] Please see Figure 2 In some embodiments, step S104 includes, but is not limited to, steps S201 to S204: Step S201: For any identity category, perform associated user feature aggregation on the target fused user features to obtain user aggregate features.
[0048] Step S202: Perform feature interference processing on the target fused user features to obtain interference fused features.
[0049] Step S203: Perform feature weighting on the user aggregation features and interference fusion features to obtain enhanced fusion features.
[0050] Step S204: Perform identity feature fusion on the enhanced fusion features of any two identity categories to obtain multiple identity fusion features.
[0051] In step S201 of some embodiments, specifically, the user aggregation feature refers to the graph embedding vector representation of the target object and its associated users within the same business system.
[0052] Specifically, the target fused user features can be compressed into a fused embedding vector of the target dimension (e.g., 128 dimensions) using a shared encoder (e.g., a two-layer MLP with ReLU activation function). Based on identity categories (e.g., high-risk driving customers in a car insurance system), a user relationship graph is constructed with the target object as nodes and user behavior similarity and device overlap as edge weights. Through the GraphSAGE aggregation mechanism of GNN (Graph Neural Network), the fused features of the target object are concatenated with the features of adjacent users to obtain a graph embedding vector containing related user features.
[0053] For example, in the insurance scenario, a car insurance claims association network graph centered on the target object Zhang San can be constructed for the identity category of high-risk driving customers. The edge weights are jointly determined by factors such as the frequency and number of accidents of Zhang San and related accident users. Furthermore, the graph embedding vector containing the features of users related to Zhang San's accidents can be extracted from the car insurance claims association network graph through GNN.
[0054] Please see Figure 3 In some embodiments, step S202 includes, but is not limited to, steps S301 to S303: Step S301: Randomly discard the target fused user features to obtain missing fused features.
[0055] Step S302: Add noise to the target fused user features to obtain noise fused features.
[0056] Step S303: Determine the interference fusion features based on the missing fusion features and the noise fusion features.
[0057] In step S301 of some embodiments, specifically, missing fusion features refer to user feature representations generated by randomly discarding some feature dimensions to simulate data missingness.
[0058] Specifically, a CL (Contrastive Learning) network can be used to randomly select a portion of the feature dimensions in the target fusion user feature vector according to a preset feature dropout rate (e.g., 20%), and force the selected feature values to be zero while retaining the original values of the remaining dimensions to generate missing fusion features.
[0059] For example, in insurance scenarios, when targeting users with multiple information such as device fingerprints, login frequency, and transfer amount, random discarding may set the "device fingerprint" feature value to zero, while retaining the login frequency and transfer amount features, to simulate the actual situation where device information is temporarily missing after a user changes devices.
[0060] In step S302 of some embodiments, specifically, the noise fusion feature refers to the user feature representation generated by temporally perturbing the target fused user features.
[0061] Specifically, since the target fusion user features are time-series based features, a sliding perturbation can be applied to the time-series sequence of the target fusion user features through a CL network to simulate behavioral fluctuations and generate noisy fusion features.
[0062] For example, a sliding window perturbation can be applied to the temporal characteristics of the accident frequency of Zhang San, a high-risk driving customer. For instance, the sequence of Zhang San's occurrence frequency over the past six months [3,5,4,6,2,4,5] can be randomly adjusted to [5,4,6,2,4,5,3] to simulate the natural fluctuations in Zhang San's accident behavior.
[0063] In step S303 of some embodiments, specifically, the interference fusion feature refers to the feature vector representation learned through comparison that is not affected by missing fusion features and noise fusion features.
[0064] Specifically, the interference contrast loss between missing fusion features and noisy fusion features can be calculated using the following loss function:
[0065] Where L represents the interference contrast loss, This indicates the i-th missing fusion feature. Represents the j-th noise fusion feature. This represents the noise fusion characteristics of other users. This indicates the similarity between missing fusion features and noisy fusion features. This indicates the similarity between the missing fusion feature and the noisy fusion features of other users. This represents the temperature parameter.
[0066] Furthermore, the loss function described above reveals that its objective is to maximize the positive sample pairs formed by the missing fusion features and the noise fusion features. The similarity between the target user and all other user features is minimized. The similarity is used to map the missing fusion features and the noisy fusion features to the closest points in the feature space, so as to obtain an essential user feature representation that is not affected by interference.
[0067] By using steps S301 to S303, we can not only avoid over-reliance on a few key features in the target fused user features, but also comprehensively consider all available information of the fused user features. This also ensures that even in the case of data interference, the essential information of the fused user features can still be extracted, providing accurate feature data support for subsequent identity fusion.
[0068] In step S203 of some embodiments, specifically, the enhanced fusion feature is a feature vector representation that integrates user aggregation features and interference fusion features.
[0069] Specifically, appropriate weights can be assigned to the user aggregation features output by GNN and the interference fusion features output by CL, and the user aggregation features and interference fusion features can be weighted together to obtain the enhanced fusion features.
[0070] For example, if the weight of the user aggregation feature is 0.6 and the weight of the interference fusion feature is 0.4, then the enhanced fusion feature is the sum of the user aggregation feature x0.6 and the interference fusion feature x0.4.
[0071] Please see Figure 4 In some embodiments, step S204 includes, but is not limited to, steps S401 to S405: Step S401: Encrypt the enhanced fusion feature of any identity category to obtain the encrypted fusion feature.
[0072] Step S402: Share the encrypted fusion feature to obtain the encrypted sharing feature.
[0073] Step S403: Aggregate the encrypted identity sharing features of any identity category to obtain encrypted aggregated features.
[0074] Step S404: Decrypt the encrypted aggregate feature of any identity category to obtain the decrypted aggregate feature.
[0075] Step S405: Weight the decryption aggregation features of any two identity categories to obtain multi-identity fusion features.
[0076] In step S401 of some embodiments, specifically, the encrypted fusion feature refers to the enhanced fusion feature represented in ciphertext form.
[0077] Specifically, Gaussian noise conforming to (ε,δ) differential privacy requirements can be added to the enhanced fusion feature to achieve privacy protection of the feature. Then, using the key of each business system, the enhanced fusion feature is homomorphically encrypted using the Paillier homomorphic encryption algorithm to generate an encrypted fusion feature in ciphertext form. Here, ε represents the differential privacy strength, and δ represents the differential privacy defect degree.
[0078] Furthermore, Gaussian noise can be calculated using the following formula:
[0079] in, This represents the variance of Gaussian noise. ε represents global sensitivity, δ represents differential privacy strength, and δ represents differential privacy flaws.
[0080] In differential privacy, global sensitivity refers to the maximum change in the output value of a query when a record is added or deleted from the dataset.
[0081] For example, in an insurance scenario, for the annual number of accidents of Zhang San, a high-risk driving customer, if the global sensitivity is 1, the differential privacy strength is 0.1 (representing the strength of privacy protection), and the differential privacy defect is 0.0001 (representing the reliability of privacy protection), Gaussian noise can be determined using the above formula.
[0082] In step S402 of some embodiments, specifically, the encrypted sharing feature is the encrypted feature that the business system corresponding to each identity category sends its own encrypted fusion feature to other participants (i.e., other business systems).
[0083] For example, in an insurance scenario, in a federated learning system composed of multiple parties such as auto insurance, accident insurance, and home insurance, each insurance business system can send its own encrypted fusion features to the server for joint analysis.
[0084] In step S403 of some embodiments, specifically, the encrypted aggregated feature refers to the feature after aggregating the encrypted sharing features from multiple participants (i.e., multiple business systems) in the ciphertext state.
[0085] For example, federated learning can be used to sum the encrypted sharing features shared by all participants without decrypting the encrypted sharing features of each business system, thus obtaining the encrypted aggregate feature.
[0086] For example, in an insurance scenario, the server can add the encrypted shared features from the auto insurance business system, accident insurance business system, and home insurance business system to obtain a new encrypted aggregate feature that represents the sum of the features from the three parties. During this process, the server can never see the original features of any of the participating parties.
[0087] In step S404 of some embodiments, specifically, decrypting the aggregation feature refers to the plaintext aggregation data obtained after the authorizing party (one of the participants) decrypts the encrypted aggregation result using a private key.
[0088] Specifically, authorized parties with decryption privileges (i.e., all participants) can use their private keys to decrypt the encrypted aggregate feature, obtaining the plaintext sum of the encrypted shared features of all participants. The plaintext sum is then divided by the number of all participants to obtain the decrypted aggregate feature.
[0089] In step S405 of some embodiments, specifically, the multi-identity fusion feature refers to the fusion of the identity vector representations of the target object on different business systems.
[0090] For example, if Zhang San's user identity category in the auto insurance business system is a high-risk driving customer with a corresponding weight of 0.6, his user identity category in the accident insurance business system is a group accident insurance corporate customer with a corresponding weight of 0.3, and his user identity category in the home insurance business system is a smart home device associated customer with a weight of 0.1, by weighting and combining Zhang San's user characteristics of high-risk driving customer, group accident insurance corporate customer, and smart home device associated customer, we can obtain Zhang San's multiple identity fusion characteristics.
[0091] Through steps S401 to S405, by means of feature encryption, encrypted feature sharing, encrypted feature aggregation, feature decryption, and identity feature weighting, a comprehensive feature vector reflecting the multiple identities of the target object can be generated. This not only ensures the security and privacy of cross-system data fusion, but also comprehensively reflects the identity feature information of the target object in different business systems, providing comprehensive user identity information for subsequent identity clustering, thereby helping to improve the accuracy of subsequent user identity merging.
[0092] Through steps S201 to S204, by aggregating associated user features, it is possible to further capture other user features associated with the target object under the same identity category. By using feature interference processing, it is ensured that even when data noise and missing features occur, the essential user features can still be captured. Combined with feature weighting, important user features are highlighted and the influence of noise is suppressed. Enhanced fusion features from any two identity categories are fused to obtain multi-identity fusion features that can comprehensively reflect the user's multiple identity information. This enables the effective capture of associated user features of different identity categories even when faced with missing strong user identity identifiers or cross-system behavioral differences, which helps to improve the accuracy of subsequent user identity merging.
[0093] Please see Figure 5 In some embodiments, step S105 includes, but is not limited to, steps S501 to S505: Step S501: Determine the participating identity category of the target object and the user of the participating identity category based on the multi-identity fusion characteristics.
[0094] Step S502: Calculate the similarity of user identity categories to obtain identity feature similarity.
[0095] Step S503: Obtain the nearest neighbor users of the identity category users, and calculate the similarity between the identity category users and the nearest neighbor users to obtain the nearest neighbor identity similarity.
[0096] Step S504: Divide users of the identity category into identity clusters based on identity feature similarity and nearest neighbor identity similarity to obtain the target identity cluster.
[0097] Step S505: Perform identity data fusion on users of identity categories according to the target identity cluster to obtain identity merged profile data.
[0098] In step S501 of some embodiments, specifically, the identity category user refers to a user who has similar characteristics to the target object in different business systems and needs to be associated with it.
[0099] For example, in an insurance scenario, when it is necessary to confirm that customer Zhang San in the auto insurance business system and Zhang San in the accident insurance business system are the same person, the participating identity categories can be determined based on the multi-identity fusion characteristics. These categories are Zhang San's high-risk driving customer category in the auto insurance business system and Zhang San's group accident insurance corporate customer category in the accident insurance business system. Furthermore, all users in the auto insurance and accident insurance systems who have similar characteristics to Zhang San belong to the identity category users.
[0100] In step S502 of some embodiments, specifically, identity feature similarity refers to the distance between users of different identity categories in the multi-identity fusion feature space.
[0101] Specifically, the Euclidean distance between users of different identity categories in the multi-identity fusion feature space can be calculated, and the similarity of identity features can be determined based on the Euclidean distance. The closer the distance, the higher the similarity of identity features.
[0102] For example, the similarity between Zhang San, a high-risk driving customer in the auto insurance system, and Zhang San, a group accident insurance corporate customer in the accident insurance system, can be determined by calculating the Euclidean distance between their multiple identity fusion features.
[0103] In step S503 of some embodiments, specifically, the nearest neighbor user refers to the user closest to the identity category user, and the nearest neighbor identity similarity refers to the distance between the identity category user and the nearest neighbor user in the multi-identity fusion feature space.
[0104] Specifically, the similarity of nearest neighbor identities can be determined by calculating the Euclidean distance between users of different identity categories and their nearest neighbor users.
[0105] In step S504 of some embodiments, specifically, the target identity cluster refers to user groups formed by hierarchical density clustering.
[0106] Specifically, the HDBSCAN (Hierarchical Density-Based Spatial Clustering of Applications with Noise) algorithm can be used to treat the multi-identity fusion features corresponding to all user identity categories as multiple data points, and connect these multiple data points into a minimum spanning tree (MST). Each node represents the multi-identity fusion feature of a user, and the edge weights can be identity feature similarity and nearest neighbor identity similarity.
[0107] Specifically, the minimum spanning tree can be pruned based on a stability metric to determine the target identity cluster containing the cluster ID, cluster center vector, and a list of associated user IDs; where the stability metric can be the maximum distance range that a cluster can have before splitting into smaller subclusters.
[0108] For example, in an insurance scenario, if the multiple identity fusion features of Zhang San, a high-risk driving customer under auto insurance, a group accident insurance corporate customer under accident insurance, and a smart home device associated customer under home insurance, are tightly connected in a tree structure, and this connection forms a stable target identity cluster, then a target identity cluster is generated that includes a unified cluster ID (Zhang San), a cluster center vector (i.e., Zhang San's multiple identity fusion features), and a list of associated user IDs (Zhang San, a high-risk driving customer under auto insurance, a group accident insurance corporate customer under accident insurance, and a group accident insurance corporate customer under accident insurance), to determine that the accounts of the three different insurance business systems actually belong to the same individual Zhang San.
[0109] Please see Figure 6 In some embodiments, step S505 includes, but is not limited to, steps S601 to S603: Step S601: Determine the matching identity user of the target object based on the target identity cluster.
[0110] Step S602: Perform identity matching detection on the matched users according to the preset identity matching rules to obtain identity matching detection data.
[0111] Step S603: Merge the identity data of the target object based on the identity matching detection data to obtain the merged identity profile data.
[0112] In step S601 of some embodiments, specifically, matching identity users refers to a set of users belonging to the same user from different business systems.
[0113] For example, if it is found that Zhang San, a high-risk driving customer in auto insurance, and Zhang San, a group accident insurance corporate customer, belong to the same target identity cluster, then Zhang San, the auto insurance customer, and Zhang San, the accident insurance customer, constitute a matched user identity.
[0114] In step S602 of some embodiments, specifically, the identity matching rule is a set of predefined business logic judgment conditions. The identity matching rule includes strong identifier matching rules and weak feature verification rules, which are used to verify the strength of user association between different identity categories.
[0115] Specifically, the identity matching detection data is the result of verifying whether the identity of the user matches according to the identity matching rules.
[0116] For example, strong identifier matching rules (such as ID card number and mobile phone number) can be used to identify whether the ID card numbers of car insurance customer Zhang San and accident insurance customer Zhang San are the same. If they are the same, it means that the identity users are matched. If they are not the same, it means that the identity users are not matched. Weak feature verification rules (such as name, email, permanent address, etc.) are needed to assist in further matching the identity users.
[0117] Please see Figure 7 In some embodiments, the identity matching detection data is either a match between the matched identity user and a non-match between the matched identity user and the matched identity user. Step S603 includes, but is not limited to, steps S701 to S705: Step S701: If the matched identity users match, then merge the target user data of the matched identity users to obtain the identity merged profile data.
[0118] Step S702: If no matching identity user is found, then perform similarity calculation on the matching identity users to obtain the matching user similarity.
[0119] Step S703: Calculate the rule score for the matched identity user according to the identity matching rules to obtain the matching rule score.
[0120] Step S704: Calculate the matching confidence score for the matched identity user based on the matching user similarity and matching rule score.
[0121] Step S705: Based on the matching confidence level, merge the target user data of the matching identity users to obtain the identity merged profile data.
[0122] In step S701 of some embodiments, specifically, identity merging profile data refers to a unified view that integrates complete information about the user across all business systems.
[0123] For example, if it is detected that the ID numbers of car insurance customer Zhang San and home insurance customer Zhang San are the same, then after confirming that car insurance customer Zhang San and home insurance customer Zhang San are the same person, the system will combine Zhang San's vehicle information, historical accident records, driving behavior score in the car insurance system, and Zhang San's residential address, property insured value, security equipment related data, device fingerprints and usage behavior trajectory on the home insurance APP in the home insurance system to form a complete profile of Zhang San that includes data from both the car insurance and home insurance systems.
[0124] In step S702 of some embodiments, specifically, matching user similarity refers to the spatial distance between matching identity users.
[0125] Specifically, the embedding similarity of the matching user identity within the target identity cluster can be calculated by calling the output of the HDBSCAN clustering process, and the temporal behavior similarity of the matching user identity can be calculated using DTW (Dynamic Time Warping). The user information similarity of the matching identity user can then be calculated. The embedding similarity, temporal behavior similarity, and user information similarity are weighted by similarity to obtain the matching user similarity.
[0126] In step S703 of some embodiments, specifically, the matching rule score is the score of the matching identity user satisfying the weak feature verification rule in the identity matching rule, wherein the weak feature verification rule is a matching rule that cannot definitively prove the user's identity on its own, but can provide auxiliary or biased evidence.
[0127] For example, feature verification rules could include device fingerprint similarity (e.g., the same model of mobile phone), behavioral sequence similarity (e.g., login time habits), and social network overlap (e.g., the same emergency contact).
[0128] Specifically, each weak feature verification rule can be multiplied by its corresponding weak rule weight, and all the weak rules that are satisfied can be summed in a weighted manner to obtain the matching rule score.
[0129] For example, if the device fingerprint matches (weight 0.4), login time habits are similar (weight 0.3), and emergency contacts are the same (weight 0.3), for car insurance user Zhang San and home insurance user Zhang San, if it is detected that the two use the same model of mobile phone and have highly consistent login time patterns, but have different emergency contacts, then the matching rule score can be R=(1×0.4)+(1×0.3)+(0×0.3)=0.7.
[0130] In step S704 of some embodiments, specifically, the matching confidence refers to the degree of identity matching that combines the similarity between matched users and the matching rule score.
[0131] Specifically, the matching confidence score can be obtained by weighted summation of the matching user similarity score and the matching rule score.
[0132] For example, if the user similarity is 0.9 and the weight is 0.6, and the matching rule score is 0.7 and the weight is 0.4, then 0.9 x 0.6 + 0.7 x 0.4 = 0.82, which means the matching confidence is 0.82.
[0133] In step S705 of some embodiments, specifically, if the matching confidence is greater than a preset matching threshold (e.g., 0.8), it is determined that the matching identity users match, and the target user data of the matching identity users can be merged to obtain identity merged profile data.
[0134] In one optional embodiment of this application, if the number of matched users is less than a preset matching threshold (e.g., 0.80) and greater than or equal to a preset trust threshold (e.g., 0.5), the reliability of the matching between the matched users cannot be determined, and manual review is required for confirmation. If the manual review determines that the users are matched, the target user data of the matched users are merged. If the manual review determines that the users are not matched, they are considered as different customers and no merging is required. Furthermore, if the preset trust threshold (e.g., 0.5) indicates that the matched users are different customers, no merging is required.
[0135] Through steps S701 to S705, the matching identity rules and matching confidence scores are combined to evaluate whether the matched identity users are truly the same user, thereby achieving identity merging for the same user. This ensures the efficiency of user identity merging under high matching degree conditions, while also ensuring the accuracy of user identity merging by combining matching rule scores and matching confidence scores.
[0136] Through steps S601 to S603, the range of user candidates can be narrowed down by clustering results, and user identity matching verification can be performed by combining identity matching rules, so as to realize the merging of user identities without strong identity identifiers and effectively ensure the reliability of user identity merging.
[0137] Through steps S501 to S505, clustering algorithms and identity matching rules can be combined to achieve the merging of user identities without strong identity identifiers. This effectively solves the problem of fragmented customer information caused by users using different identity identifiers in different business systems, avoids mismatches and omissions in user identities in different business systems, and significantly improves the accuracy of user identity merging.
[0138] In one optional embodiment of this application, after determining the identity merging profile data, target services can be provided to the target object based on the identity merging profile data.
[0139] For example, in an insurance scenario, if the identity merging profile shows that the target is a high-risk driving customer in auto insurance (e.g., 4 accidents in the past two years), a high-risk occupation customer in accident insurance (e.g., high-rise construction workers), and a rental housing customer in home insurance, the target can be identified as a high-risk composite insurance customer. A combined protection plan including higher coverage accident insurance, extended vehicle damage insurance clauses, and tenant liability insurance can be recommended to the target. Risk prevention and control services such as high-rise construction safety training courses and safe driving training courses can also be provided to assist the target. This achieves a service upgrade from passive underwriting to proactive risk management.
[0140] This application first captures a user's complete initial characteristics by extracting and fusing at least two of the following: user information features, user behavior features, user device fingerprint features, and user social graph features. Even in the absence of strong identity identifiers, accurate user association can still be achieved based on behavior, device fingerprints, or social graphs. Second, by fusing the features of target users from any two identity categories, it is possible to further capture user features associated with different identity categories. Finally, by clustering the target object based on the multi-identity fusion features, identity merging profile data of the target object is obtained. This enables the merging of user identities without strong identity identifiers, effectively solving the problem of mismatched or missing user identities in different systems and significantly improving the accuracy of user identity merging.
[0141] Please see Figure 8 This application also provides a multimodal user identity merging apparatus, which can implement the above-described multimodal user identity merging method. The apparatus includes: The user data acquisition module is used to acquire target user data of the target object; wherein the target object has at least two identity categories, and each identity category is different; The feature extraction module is used to extract features from the target user data to obtain target user features; wherein, the target user features include at least two of the following: user information features, user behavior features, user device fingerprint features, and user social graph features; The user feature fusion module is used to fuse at least two of the following: user information features, user behavior features, user device fingerprint features, and user social graph features, to obtain the target fused user features. The identity feature fusion module is used to fuse the features of target users from any two identity categories to obtain multiple identity fusion features. The identity clustering module is used to cluster the identities of target objects based on the multi-identity fusion features, and obtain the identity merging profile data of the target objects.
[0142] In some embodiments, the identity feature fusion module can be specifically used to implement: The user feature aggregation submodule is used to perform associated user feature aggregation on target fused user features of any identity category to obtain user aggregate features; The feature interference submodule is used to perform feature interference processing on the target fused user features to obtain interference fused features; The feature weighting submodule is used to perform feature weighting on user aggregated features and interference fusion features to obtain enhanced fusion features; The identity feature fusion submodule is used to perform identity feature fusion on the enhanced fusion features of any two identity categories to obtain multiple identity fusion features.
[0143] Specifically, the identity feature fusion module can be used to implement the above steps S201 to S204, which will not be elaborated here.
[0144] In some embodiments, the feature interference submodule can be specifically used to implement: The random discard processing unit is used to randomly discard the target fused user features to obtain missing fused features; A noise processing unit is added to add noise to the target fused user features to obtain noise fusion features; An interference processing unit is used to determine interference fusion features based on missing fusion features and noise fusion features.
[0145] Specifically, the identity feature fusion module can be used to implement the above steps S301 to S303, which will not be elaborated here.
[0146] In some embodiments, the identity feature fusion submodule can be specifically used to implement: The feature encryption unit is used to perform feature encryption on the enhanced fusion feature of any identity category to obtain the encrypted fusion feature; An encryption feature sharing unit is used to share encryption fusion features to obtain encryption shared features. The encrypted feature aggregation unit is used to aggregate the encrypted identity sharing features of any identity category to obtain encrypted aggregated features; The feature decryption unit is used to decrypt the encrypted aggregate features of any identity category to obtain the decrypted aggregate features; The identity feature weighting unit is used to weight the decryption aggregate features of any two identity categories to obtain multi-identity fusion features.
[0147] Specifically, the identity feature fusion submodule can be used to implement steps S401 to S405 above, which will not be elaborated here.
[0148] In some embodiments, the identity clustering module can be specifically used to implement: The identity category user acquisition submodule is used to determine the participating identity category of the target object and the identity category user participating in the identity category based on the multi-identity fusion characteristics; The identity category user similarity calculation submodule is used to calculate the similarity between users of different identity categories to obtain the identity feature similarity. The nearest neighbor user similarity calculation submodule is used to obtain the nearest neighbor users of the identity category user, and to calculate the similarity between the identity category user and the nearest neighbor user to obtain the nearest neighbor identity similarity. The identity clustering submodule is used to divide users of identity categories into identity clusters based on identity feature similarity and nearest neighbor identity similarity, so as to obtain the target identity cluster; The identity data fusion submodule is used to fuse identity data of users of different identity categories based on the target identity cluster, and obtain merged identity profile data.
[0149] Specifically, the identity clustering module can be used to implement steps S501 to S505 above, which will not be elaborated here.
[0150] In some embodiments, the identity data fusion submodule can be specifically used to implement: The matching identity user acquisition unit is used to determine the matching identity user of the target object based on the target identity cluster; The identity matching and detection unit is used to perform identity matching detection on users with matching identities according to preset identity matching rules, and obtain identity matching detection data. The identity data merging unit is used to merge the identity data of the target object based on the identity matching detection data to obtain the identity merged profile data.
[0151] Specifically, the identity data fusion submodule can be used to implement steps S601 to S603 above, which will not be elaborated here.
[0152] In some embodiments, the identity data merging unit includes identity matching detection data that indicates a matching user for a matching identity, or that a matching user for a matching identity does not match. Specifically, it can be used to implement: The identity data merging subunit is used to merge the target user data of the matching identity users if the matched identity users match, and obtain the identity merged profile data. The matching user similarity calculation subunit is used to calculate the similarity between matching users if no matching identity user is found. The rule score calculation subunit is used to calculate the rule score for the matched identity user according to the identity matching rule, and obtain the matching rule score; The matching confidence calculation subunit is used to calculate the matching confidence of the matched identity user based on the matching user similarity and the matching rule score, and obtain the matching confidence. The identity data merging subunit is used to merge the target user data of matching identity users according to the matching confidence level, so as to obtain the identity merged profile data.
[0153] Specifically, the identity data merging unit can be used to implement the above steps S701 to S705, which will not be described in detail here.
[0154] This application also provides an electronic device, which includes a memory and a processor. The memory stores a computer program, and the processor executes the computer program to implement the above-described multimodal user identity merging method. This electronic device can be any smart terminal, including tablet computers, in-vehicle computers, etc.
[0155] Please see Figure 9 , Figure 9 The hardware structure of an electronic device according to another embodiment is illustrated. The electronic device includes: The processor 901 can be implemented using a general-purpose CPU (Central Processing Unit), microprocessor, application-specific integrated circuit (ASIC), or one or more integrated circuits, and is used to execute relevant programs to implement the technical solutions provided in the embodiments of this application. The memory 902 can be implemented as a read-only memory (ROM), static storage device, dynamic storage device, or random access memory (RAM). The memory 902 can store the processing system and other application programs. When the technical solutions provided in the embodiments of this specification are implemented through software or firmware, the relevant program code is stored in the memory 902 and is called and executed by the processor 901 using the multimodal user identity merging method of the embodiments of this application. The input / output interface 903 is used to implement information input and output; The communication interface 904 is used to enable communication and interaction between this device and other devices. Communication can be achieved through wired means (such as USB, Ethernet cable, etc.) or wireless means (such as mobile network, WIFI, Bluetooth, etc.). Bus 905 transmits information between various components of the device (e.g., processor 901, memory 902, input / output interface 903, and communication interface 904); The processor 901, memory 902, input / output interface 903, and communication interface 904 are connected to each other within the device via bus 905.
[0156] This application also provides a computer-readable storage medium storing a computer program that, when executed by a processor, implements the above-described multimodal user identity merging method.
[0157] Memory, as a non-transitory computer-readable storage medium, can be used to store non-transitory software programs and non-transitory computer-executable programs. Furthermore, memory may include high-speed random access memory, and may also include non-transitory memory, such as at least one disk storage device, flash memory device, or other non-transitory solid-state storage device. In some embodiments, memory may optionally include memory remotely located relative to the processor, and these remote memories can be connected to the processor via a network. Examples of such networks include, but are not limited to, the Internet, intranets, local area networks, mobile communication networks, and combinations thereof.
[0158] The multimodal user identity merging method, device, electronic device, and storage medium provided in this application embodiment acquire historical insurance data, including historical insurance categories, historical insurance task data matching the historical insurance categories, and historical operational behavior data of the historical insurance task data. They acquire a pre-trained multimodal user identity merging model, which includes a feature extraction layer, a behavior analysis layer, a feature filtering layer, a task data prediction layer, and a task warning layer. They acquire target insurance task data and use the feature extraction layer to extract features from the target insurance task data to obtain target task features. They use the feature extraction layer to extract features from the historical operational behavior data to obtain historical behavior features. The behavior analysis layer scores the historical behavior features to obtain behavior feature scores. The feature filtering layer, behavior feature scores, and target task features are used to filter the historical behavior features to obtain target behavior features. The task data prediction layer and target behavior features are used to predict the target task features to obtain predicted task data. The task warning layer provides a task achievement warning based on the predicted task data to obtain a target task warning level. This application uses a multimodal user identity merging model to filter historical behavioral features for different insurance tasks. This allows for the identification of behavioral features that influence the prediction of insurance tasks, addressing the changing needs of different insurance tasks and improving the accuracy of user identity merging. Furthermore, by predicting the target insurance task data, the execution progress of the insurance task can be predicted. Finally, by providing early warnings on the task achievement level based on the predicted task data, early warnings can be issued for insurance tasks that fail to meet the standards, significantly improving the accuracy of multimodal user identity merging.
[0159] The embodiments described in this application are for the purpose of more clearly illustrating the technical solutions of the embodiments of this application, and do not constitute a limitation on the technical solutions provided by the embodiments of this application. As those skilled in the art will know, with the evolution of technology and the emergence of new application scenarios, the technical solutions provided by the embodiments of this application are also applicable to similar technical problems.
[0160] Those skilled in the art will understand that the technical solutions shown in the figures do not constitute a limitation on the embodiments of this application, and may include more or fewer steps than shown, or combine certain steps, or different steps.
[0161] The device embodiments described above are merely illustrative. The units described as separate components may or may not be physically separate; that is, they may be located in one place or distributed across multiple network units. Some or all of the modules can be selected to achieve the purpose of this embodiment according to actual needs.
[0162] Those skilled in the art will understand that all or some of the steps in the methods disclosed above, as well as the functional modules / units in the systems and devices, can be implemented as software, firmware, hardware, or suitable combinations thereof.
[0163] The terms “first,” “second,” “third,” “fourth,” etc. (if present) in the specification and accompanying drawings of this application are used to distinguish similar objects and are not necessarily used to describe a specific order or sequence. It should be understood that such data can be interchanged where appropriate so that the embodiments of this application described herein can be implemented in orders other than those illustrated or described herein. Furthermore, the terms “comprising” and “having,” and any variations thereof, are intended to cover non-exclusive inclusion; for example, a process, method, system, product, or apparatus that comprises a series of steps or units is not necessarily limited to those steps or units explicitly listed, but may include other steps or units not explicitly listed or inherent to such processes, methods, products, or apparatus.
[0164] It should be understood that in this application, "at least one (item)" means one or more, and "more than" means two or more. "And / or" is used to describe the relationship between related objects, indicating that three relationships can exist. For example, "A and / or B" can represent three cases: only A exists, only B exists, and both A and B exist simultaneously, where A and B can be singular or plural. The character " / " generally indicates that the preceding and following related objects are in an "or" relationship. "At least one (item) of the following" or similar expressions refer to any combination of these items, including any combination of single or plural items. For example, at least one (item) of a, b, or c can represent: a, b, c, "a and b", "a and c", "b and c", or "a and b and c", where a, b, and c can be single or multiple.
[0165] In the several embodiments provided in this application, it should be understood that the disclosed apparatus and methods can be implemented in other ways. For example, the apparatus embodiments described above are merely illustrative; for instance, the division of the units described above is only a logical functional division, and in actual implementation, there may be other division methods. For example, multiple units or components may be combined or integrated into another system, or some features may be ignored or not executed. The coupling or direct coupling or communication connection between the shown or discussed units may be through some interfaces, or indirect coupling or communication connection between the apparatus or units, and may be electrical, mechanical, or other forms.
[0166] The units described above as separate components may or may not be physically separate. The components shown as units may or may not be physical units; that is, they may be located in one place or distributed across multiple network units. Some or all of the units can be selected to achieve the purpose of this embodiment according to actual needs.
[0167] Furthermore, the functional units in the various embodiments of this application can be integrated into one processing unit, or each unit can exist physically separately, or two or more units can be integrated into one unit. The integrated unit can be implemented in hardware or as a software functional unit.
[0168] If the integrated unit is implemented as a software functional unit and sold or used as an independent product, it can be stored in a computer-readable storage medium. Based on this understanding, the technical solution of this application, in essence, or the part that contributes to the prior art, or all or part of the technical solution, can be embodied in the form of a software product. This computer software product is stored in a storage medium and includes multiple instructions to cause a computer device (which may be a personal computer, server, or network device, etc.) to execute all or part of the steps of the methods of the various embodiments of this application. The aforementioned storage medium includes various media capable of storing programs, such as USB flash drives, portable hard drives, read-only memory (ROM), random access memory (RAM), magnetic disks, or optical disks.
[0169] The preferred embodiments of the present application have been described above with reference to the accompanying drawings, but this does not limit the scope of the claims of the present application. Any modifications, equivalent substitutions, and improvements made by those skilled in the art without departing from the scope and substance of the embodiments of the present application shall be within the scope of the claims of the present application.
Claims
1. A multimodal user identity merging method, characterized in that, The method includes: Obtain target user data of the target object; wherein the target object has at least two identity categories, and each identity category is different; Feature extraction is performed on the target user data to obtain target user features; wherein, the target user features include at least two of the following: user information features, user behavior features, user device fingerprint features, and user social graph features; The target fused user features are obtained by fusing at least two of the user information features, user behavior features, user device fingerprint features, and user social graph features. Feature fusion is performed on the target fused user features of any two of the aforementioned identity categories to obtain multi-identity fused features; Based on the multi-identity fusion features, the target object is clustered to obtain the identity merging profile data of the target object.
2. The method according to claim 1, characterized in that, The feature fusion of the target fused user features for any two of the identity categories to obtain multi-identity fused features includes: For any of the aforementioned identity categories, the target fused user features are associated and aggregated to obtain aggregated user features; The target fused user features are subjected to feature interference processing to obtain interference fused features; The user aggregation features and the interference fusion features are weighted to obtain the enhanced fusion features; The enhanced fusion features of any two identity categories are fused to obtain the multi-identity fusion features.
3. The method according to claim 2, characterized in that, The step of fusing identity features of any two of the enhanced fusion features of the identity categories to obtain the multi-identity fusion features includes: The enhanced fusion feature of any of the aforementioned identity categories is encrypted to obtain the encrypted fusion feature; The encrypted fusion feature is shared to obtain the encrypted sharing feature; The encrypted identity sharing features of any of the aforementioned identity categories are aggregated to obtain encrypted aggregate features; Decrypt the encrypted aggregate feature of any of the aforementioned identity categories to obtain the decrypted aggregate feature; The multi-identity fusion feature is obtained by weighting the decryption aggregation features of any two identity categories.
4. The method according to claim 2, characterized in that, The step of performing feature interference processing on the target fused user features to obtain interference fused features includes: The target fused user features are randomly discarded to obtain missing fused features; The target fused user features are subjected to noise processing to obtain noise fused features; The interference fusion feature is determined based on the missing fusion feature and the noise fusion feature.
5. The method according to claim 1, characterized in that, The step of clustering the target object's identity based on the multi-identity fusion features to obtain the target object's merged identity profile data includes: Based on the multi-identity fusion features, the participating identity category of the target object and the identity category user of the participating identity category are determined; Similarity calculations are performed on users of the aforementioned identity categories to obtain identity feature similarity. Obtain the nearest neighbor users of the user with the specified identity category, and calculate the similarity between the user with the specified identity category and the nearest neighbor users to obtain the nearest neighbor identity similarity. The identity category users are divided into identity clusters based on the identity feature similarity and the nearest neighbor identity similarity to obtain the target identity cluster; Based on the target identity cluster, the identity data of the user category is fused to obtain the identity merged profile data.
6. The method according to claim 5, characterized in that, The step of fusing identity data of users of the identity category according to the target identity cluster to obtain the merged identity profile data includes: The matching identity user of the target object is determined based on the target identity cluster; According to the preset identity matching rules, the matched users are subjected to identity matching detection to obtain identity matching detection data; The identity data of the target object is merged based on the identity matching detection data to obtain the identity merged profile data.
7. The method according to claim 6, characterized in that, The identity matching detection data is either a match between the matched identity user and the matched identity user, or a match between the matched identity user and the matched identity user. The step of merging identity data of the target object based on the identity matching detection data to obtain the identity merged profile data includes: If the matched users match, the target user data of the matched users are merged to obtain the merged identity profile data. If the matched identity user does not match, then a similarity calculation is performed on the matched identity user to obtain the matching user similarity. A matching rule score is obtained by calculating the rule score for the matched identity user according to the identity matching rule. The matching confidence score is calculated based on the matching user similarity and the matching rule score. Based on the matching confidence level, the target user data of the matched identity user is merged to obtain the identity merged profile data.
8. A user identity merging device based on multimodal mode, characterized in that, The device includes: The user data acquisition module is used to acquire target user data of the target object; wherein the target object has at least two identity categories, and each identity category is different; The feature extraction module is used to extract features from the target user data to obtain target user features; wherein, the target user features include at least two of the following: user information features, user behavior features, user device fingerprint features, and user social graph features; The user feature fusion module is used to fuse at least two of the user information features, user behavior features, user device fingerprint features, and user social graph features to obtain the target fused user features. The identity feature fusion module is used to perform feature fusion on the target fused user features of any two identity categories to obtain multiple identity fusion features; The identity clustering module is used to perform identity clustering on the target object based on the multi-identity fusion features to obtain the identity merging profile data of the target object.
9. An electronic device, characterized in that, The electronic device includes a memory and a processor, the memory storing a computer program, and the processor executing the computer program to implement the multimodal user identity merging method according to any one of claims 1 to 7.
10. A computer-readable storage medium storing a computer program, characterized in that, When the computer program is executed by the processor, it implements the multimodal user identity merging method according to any one of claims 1 to 7.