Instant messaging data processing methods, devices, electronic equipment and storage media

By combining the desensitization processing of instant messaging data with the annotation information generation network, the problem of low accuracy in automatic image recognition in existing technologies is solved, and efficient identification and accurate annotation of abnormal images are achieved.

CN117542050BActive Publication Date: 2026-06-30TENCENT CLOUD COMPUTING (BEIJING) CO LTD

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
TENCENT CLOUD COMPUTING (BEIJING) CO LTD
Filing Date
2022-08-02
Publication Date
2026-06-30

Smart Images

  • Figure CN117542050B_ABST
    Figure CN117542050B_ABST
Patent Text Reader

Abstract

This invention discloses an instant messaging data method, apparatus, electronic device, and storage medium. The method includes: acquiring instant messaging data to be processed; performing de-identification processing on the instant messaging data to be processed, and extracting data features from the de-identified instant messaging data to obtain de-identified data features; inputting the de-identified data features into a labeling information generation network to generate labeling results corresponding to the images to be processed in the instant messaging data; the labeling results include a first labeling score for each candidate labeling word among multiple candidate labeling words; determining a second labeling score for each candidate labeling word based on the distance between the image to be processed and the target nearest neighbor historical labeled image; and selecting at least one target labeling word from multiple candidate labeling words as labeling information for the image to be processed based on the first and second labeling scores of each candidate labeling word. This invention improves the accuracy of abnormal image recognition in IM scenarios.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of computer technology, and in particular to an instant messaging data method, apparatus, electronic device, and storage medium. Background Technology

[0002] With the development of random Internet technology, instant messaging (IM) applications that integrate audio and video conferencing functions have been used more and more widely. In IM scenarios, image transmission is often involved, which requires automatic identification and processing of abnormal images (such as some vulgar images) in IM scenarios.

[0003] In the process of realizing this invention, the inventors found that the relevant technologies have a low accuracy rate in automatically recognizing images in IM scenarios, and cannot meet the relevant business processing needs for abnormal images in IM scenarios. Summary of the Invention

[0004] To address the problems of existing technologies, embodiments of the present invention provide an instant messaging data processing method, apparatus, electronic device, and storage medium. The technical solution is as follows:

[0005] On the one hand, an instant messaging data processing method is provided to obtain instant messaging data to be processed; the instant messaging data to be processed includes an image to be processed, context information associated with the image to be processed, and object attribute information;

[0006] The instant messaging data to be processed is de-identified, and the data features of the de-identified instant messaging data are extracted to obtain the de-identified data features.

[0007] The desensitized data features are input into the annotation information generation network to generate annotation results corresponding to the image to be processed; the annotation results include the first annotation score of each candidate tag word among multiple candidate tag words;

[0008] For each candidate tag, a second annotation score is determined based on the distance between the image to be processed and the target nearest neighbor historical annotation image; the annotation information of the target nearest neighbor historical annotation image includes the candidate tag.

[0009] Based on the first and second annotation scores of each candidate tag word, at least one target tag word is selected from the multiple candidate tag words as annotation information for the image to be processed; wherein, the at least one target tag word comes from different tag semantic paths.

[0010] On the other hand, an instant messaging data processing device is provided, the device comprising:

[0011] The first acquisition module is used to acquire instant messaging data to be processed; the instant messaging data to be processed includes an image to be processed, context information associated with the image to be processed, and object attribute information;

[0012] The desensitization processing module is used to desensitize the instant messaging data to be processed, extract the data features of the desensitized instant messaging data to be processed, and obtain the desensitized data features.

[0013] The annotation generation module is used to input the desensitized data features into the annotation information generation network to generate annotation results corresponding to the image to be processed; the annotation results include the first annotation score of each candidate tag word among multiple candidate tag words;

[0014] The annotation score determination module is used to determine a second annotation score for each candidate label word based on the distance between the image to be processed and the target nearest neighbor historical annotation image; the annotation information of the target nearest neighbor historical annotation image includes the candidate label word;

[0015] The annotation information determination module is used to select at least one target tag word from a plurality of candidate tag words as annotation information for the image to be processed, based on the first annotation score and the second annotation score of each candidate tag word; wherein the at least one target tag word comes from different tag semantic paths.

[0016] In one exemplary embodiment, the desensitization processing module includes:

[0017] The equivalence group partitioning module is used to anonymize the instant messaging data to be processed based on the k-anonymity algorithm to obtain multiple equivalence groups.

[0018] The sensitive attribute set determination module is used to extract sensitive attributes from each equivalence group to obtain a sensitive attribute set corresponding to each equivalence group.

[0019] The attribute probability adjustment module is used to determine the original probability of each sensitive attribute in each set of sensitive attributes, and add random noise conforming to a Laplace distribution to each of the original probabilities to obtain the target probability value corresponding to each sensitive attribute in the set of sensitive attributes.

[0020] The equivalence group update module is used to update the sensitive attributes in the corresponding equivalence group based on the target probability value corresponding to each sensitive attribute in each sensitive attribute set, so as to obtain the de-sensitized instant messaging data to be processed.

[0021] In one exemplary embodiment, the apparatus further includes a construction module for constructing the tag semantic path, the construction module comprising:

[0022] The second acquisition module is used to acquire sample instant communication data and corresponding reference annotation information; the sample instant communication data includes sample images, sample context information associated with the sample images, and sample object attribute information, and the reference annotation information includes at least one reference tag word;

[0023] The first determining module is used to determine a set of reference tags based on the reference annotation information corresponding to the instant communication data of each sample;

[0024] The second determining module is used to determine the hierarchical relationship between reference tag words in the reference tag word set based on a preset semantic dictionary;

[0025] The semantic path construction module is used to construct at least one tag semantic path based on the hierarchical relationship between reference tag words in the reference tag word set;

[0026] The path hierarchy weight determination module is used to determine the path hierarchy weight of each path level in each of the aforementioned tag semantic paths.

[0027] In one exemplary embodiment, the annotation information determination module includes:

[0028] The third determining module is used to obtain the target annotation score of each candidate tag word based on the first annotation score and the second annotation score of each candidate tag word;

[0029] The sorting position determination module is used to sort the multiple candidate tag words in descending order based on the target annotation score, and determine the sorting position information of each candidate tag word;

[0030] The fourth determining module is used to determine, based on the at least one tag semantic path, multiple first candidate tag words belonging to the same tag semantic path and the path level of each first candidate tag word in the same tag semantic path;

[0031] The fifth determining module is used to determine the target weight of each first candidate tag word based on the path level weight corresponding to the path level in the same tag semantic path and the sorting position information corresponding to the first candidate tag word.

[0032] The sixth determining module is used to select a target first candidate tag word from the plurality of first candidate tag words based on the target weight, and to use the target first candidate tag word and the second candidate tag word as the at least one target tag word; the second candidate tag word is a candidate tag word other than the first candidate tag word among the plurality of candidate tag words.

[0033] In one exemplary embodiment, the apparatus further includes a training module, the training module comprising:

[0034] The reference annotation vector determination module is used to determine the reference annotation vector corresponding to the sample image based on the reference annotation information corresponding to the sample instant messaging data.

[0035] The sample desensitization module is used to desensitize the sample instant messaging data, extract the data features of the desensitized sample instant messaging data, and obtain the sample desensitized data features.

[0036] The prediction module is used to input the desensitized data features of the sample and the random training noise data into the generative network of the conditional generative adversarial model to predict the annotation information and obtain the prediction annotation result.

[0037] The first loss determination module is used to input the predicted annotation result, the sample desensitized data features, and the reference annotation vector into the discriminant network of the generative adversarial model, and determine the first loss based on the discrimination result of the discriminant network; the discrimination result indicates the probability that the predicted annotation result belongs to the reference annotation information and the degree of matching between the predicted annotation result and the sample image;

[0038] The second loss determination module is used to determine a second loss based on a first extracted feature and a second extracted feature obtained from the discriminant network; the first extracted feature is a feature extracted in the discrimination process corresponding to the prediction labeling result, and the second extracted feature is a feature extracted in the discrimination process corresponding to the reference labeling vector;

[0039] The parameter adjustment module is used to adjust the model parameters of the conditional generative adversarial model based on the first loss and the second loss until a preset training termination condition is met; wherein, the generative network at the end of training serves as the annotation information generation network.

[0040] In one exemplary implementation, the reference annotation vector determination module includes:

[0041] The vector construction module is used to construct an initial one-hot encoded vector corresponding to the target dimension of the sample image based on the number of reference tag words in the reference tag word set; each dimension of the initial one-hot encoded vector corresponds to a reference tag word;

[0042] The traversal module is used to traverse each dimension of the initial one-hot encoded vector. For the current dimension, the value of the current dimension is determined based on the matching of the reference tag words corresponding to the current dimension and the reference annotation information corresponding to the sample instant messaging data.

[0043] The vector embedding module is used to perform vector embedding based on the one-hot encoded vector of the target dimension obtained at the end of the traversal, so as to obtain the reference annotation vector corresponding to the sample image.

[0044] In one exemplary implementation, the traversal module includes:

[0045] The seventh determining module is used to take a preset value as the current dimension value of the initial one-hot encoding vector when the reference tag word corresponding to the current dimension exists in the reference annotation information corresponding to the sample instant messaging data.

[0046] The eighth determining module is used to determine the current dimension value of the initial one-hot encoding vector based on the co-occurrence of the reference tag word corresponding to the current dimension and the reference annotation information in the sample image set when the reference tag word corresponding to the current dimension does not exist in the reference annotation information corresponding to the sample instant messaging data.

[0047] On the other hand, an electronic device is provided, including a processor and a memory, wherein the memory stores at least one instruction or at least one program, the at least one instruction or the at least one program being loaded and executed by the processor to implement the real-time communication data processing method of any of the above aspects.

[0048] On the other hand, a computer-readable storage medium is provided, wherein at least one instruction or at least one program is stored therein, the at least one instruction or the at least one program being loaded and executed by a processor to implement the real-time communication data processing method as described above.

[0049] On the other hand, a computer program product or computer program is provided, which includes computer instructions stored in a computer-readable storage medium. A processor of an electronic device reads the computer instructions from the computer-readable storage medium and executes the computer instructions, causing the electronic device to perform the real-time communication data processing method of any of the above aspects.

[0050] This invention de-identifies instant messaging data, including images, and extracts data features from the de-identified data to obtain de-identified data features. These features are then input into a labeling information generation network to generate labeling results for the images. The labeling results include a first labeling score for each candidate label, and a second labeling score for each candidate label based on the distance between the image and its nearest neighbor historical labeled image. Based on the first and second labeling scores, at least one target label is selected from the candidate labels as the labeling information for the image. This at least one target label comes from a different label semantic path, thus correcting the labeling results by incorporating image similarity and avoiding semantically repetitive labels in the labeling information. This significantly improves the accuracy and efficiency of identifying abnormal images in IM scenarios. Attached Figure Description

[0051] To more clearly illustrate the technical solutions in the embodiments of the present invention, the accompanying drawings used in the description of the embodiments will be briefly introduced below. Obviously, the accompanying drawings described below are only some embodiments of the present invention. For those skilled in the art, other drawings can be obtained based on these drawings without creative effort.

[0052] Figure 1 This is a schematic diagram of an implementation environment provided by an embodiment of the present invention;

[0053] Figure 2 This is a flowchart illustrating an instant messaging data processing method provided in an embodiment of the present invention;

[0054] Figure 3 This is a schematic diagram of the process for constructing a tag semantic path provided in an embodiment of the present invention;

[0055] Figure 4 This is a schematic diagram of the process of selecting at least one target tag word from multiple candidate tag words as annotation information for an image to be processed, provided by an embodiment of the present invention.

[0056] Figure 5 This is a flowchart illustrating the process of determining the reference annotation vector corresponding to a sample image, as provided in an embodiment of the present invention.

[0057] Figure 6 This is a structural block diagram of an instant messaging data processing device provided in an embodiment of the present invention;

[0058] Figure 7 This is a hardware structure block diagram of an electronic device provided in an embodiment of the present invention. Detailed Implementation

[0059] The technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of the present invention, and not all embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of the present invention.

[0060] It should be noted that the terms "first," "second," etc., in the specification, claims, and accompanying drawings of this invention are used to distinguish similar objects and are not necessarily used to describe a specific order or sequence. It should be understood that such data can be interchanged where appropriate so that the embodiments of the invention described herein can be implemented in orders other than those illustrated or described herein. Furthermore, the terms "comprising" and "having," and any variations thereof, are intended to cover non-exclusive inclusion; for example, a process, method, system, product, or server that comprises a series of steps or units is not necessarily limited to those steps or units explicitly listed, but may include other steps or units not explicitly listed or inherent to such processes, methods, products, or devices.

[0061] It is understood that in the specific embodiments of this application, data such as user information are involved. When the above embodiments of this application are applied to specific products or technologies, user permission or consent is required, and the collection, use and processing of related data must comply with the relevant laws, regulations and standards of the relevant countries and regions.

[0062] Please see Figure 1 The diagram shows an implementation environment provided by an embodiment of the present invention. The implementation environment includes a terminal 110 and a server 120, wherein the terminal 110 and the server 120 can communicate through a wired or wireless network connection.

[0063] Terminal 110 includes, but is not limited to, mobile phones, computers, smart voice interaction devices, smart home appliances, vehicle terminals, and aircraft. Terminal 110 is equipped with an instant messaging application (App) that provides audio and video conferencing functions. This application can be a standalone application or a subroutine within an application.

[0064] Server 120 can provide background services for applications in terminal 110. Specifically, this background service can be an image recognition service in instant messaging scenarios, such as identifying vulgar images in instant messaging. Specifically, server 120 can identify images by predicting the tag words of images in instant messaging data. For example, when it predicts that the tag word corresponding to an image contains the tag word A, it determines that the image is an abnormal image. Server 120 can be a standalone physical server, a server cluster or distributed system composed of multiple physical servers, or a cloud server providing basic cloud computing services such as cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communication, middleware services, domain name services, security services, CDN (Content Delivery Network), and big data and artificial intelligence platforms.

[0065] In one exemplary embodiment, both terminal 110 and server 120 can be node devices in a blockchain system, capable of sharing acquired and generated information with other node devices in the blockchain system, thus enabling information sharing among multiple node devices. Multiple node devices in the blockchain system can be configured with the same blockchain, which consists of multiple blocks, and adjacent blocks are related, ensuring that any data tampering in any block can be detected by the next block. This prevents data tampering in the blockchain and guarantees the security and reliability of the data in the blockchain.

[0066] The embodiments of the present invention can be applied to various scenarios, including but not limited to cloud technology, artificial intelligence, smart transportation, and assisted driving.

[0067] Cloud technology, based on the cloud computing business model, encompasses network technology, information technology, integration technology, management platform technology, and application technology. It can form resource pools, providing flexible and convenient on-demand access. Cloud computing technology will become a crucial support. Backend services of technical network systems require substantial computing and storage resources, such as video websites, image websites, and many portal websites. With the rapid development and application of the internet industry, every item may have its own identification mark in the future, requiring data to be transmitted to backend systems for logical processing. Data at different levels will be processed separately, and various industry data will require robust system support, which can only be achieved through cloud computing.

[0068] Artificial intelligence (AI) is the theory, methods, technology, and application systems that use digital computers or machines controlled by digital computers to simulate, extend, and expand human intelligence, perceive the environment, acquire knowledge, and use that knowledge to achieve optimal results. In other words, AI is a comprehensive technology within computer science that attempts to understand the essence of intelligence and produce a new kind of intelligent machine that can react in a way similar to human intelligence. AI studies the design principles and implementation methods of various intelligent machines, enabling them to possess the functions of perception, reasoning, and decision-making.

[0069] Artificial intelligence (AI) is a comprehensive discipline encompassing a wide range of fields, including both hardware and software technologies. Fundamental AI technologies generally include sensors, dedicated AI chips, cloud computing, distributed storage, big data processing, operating / interactive systems, and mechatronics. AI software technologies primarily include computer vision, speech processing, natural language processing, as well as machine learning / deep learning, autonomous driving, and intelligent transportation.

[0070] Computer vision (CV) is a science that studies how to enable machines to "see." More specifically, it refers to machine vision, which uses cameras and computers to replace human eyes in recognizing and measuring targets, and then performs image processing to create images more suitable for human observation or transmission to instruments. As a scientific discipline, computer vision studies related theories and technologies, attempting to build artificial intelligence systems capable of extracting information from images or multidimensional data. Computer vision technologies typically include image processing, image recognition, image semantic understanding, image retrieval, OCR, video processing, video semantic understanding, video content / behavior recognition, 3D object reconstruction, 3D technology, virtual reality, augmented reality, simultaneous localization and mapping (SLAM), autonomous driving, intelligent transportation, and common biometric recognition technologies such as facial recognition and fingerprint recognition.

[0071] Please see Figure 2 The diagram shown is a flowchart of an instant messaging data processing method provided by an embodiment of the present invention. This method can be applied to... Figure 1 The server in the middle. It should be noted that this specification provides the operation steps of the method as described in the embodiments or flowcharts, but based on conventional or non-inventive labor, more or fewer operation steps may be included. The order of steps listed in the embodiments is only one of many possible execution orders and does not represent the only execution order. In actual system or product execution, the methods shown in the embodiments or drawings can be executed sequentially or in parallel (e.g., in a parallel processor or multi-threaded processing environment). Specifically, as shown in the figures... Figure 2 As shown, the method may include:

[0072] S201, Obtain instant messaging data to be processed.

[0073] The instant messaging data to be processed includes the image to be processed, the context information associated with the image to be processed, and the object attribute information.

[0074] Among them, the object attribute information is the attribute information of the object in the IM scenario corresponding to the instant messaging data to be processed. For example, if multiple users interact based on an audio and video conferencing application, the object attribute information includes the attribute information of the multiple users, such as the user's name, nickname, age, class, etc.

[0075] S203, perform desensitization processing on the instant messaging data to be processed, extract the data features of the desensitized instant messaging data to be processed, and obtain the desensitized data features.

[0076] Anonymization refers to transforming or modifying sensitive data under given rules and strategies to achieve data de-identification or data transformation. Sensitive data refers to data that, if leaked, may cause serious harm to society or individuals, including personal privacy data.

[0077] In this embodiment of the invention, when desensitizing instant messaging data, the objects to be desensitized in the instant messaging data can be identified first, and then desensitization processing can be performed on these objects. Specifically, in IM scenarios, interactive information such as images and text, for security reasons, is essentially encrypted non-plaintext data that can be obtained during acquisition. That is, the image to be processed and its associated context information in the instant messaging data are actually encrypted data. Therefore, when desensitizing instant messaging data, only the object attribute information needs to be desensitized to protect the user's sensitive data information in the IM scenario.

[0078] In one exemplary embodiment, to ensure complete desensitization of sensitive information and improve the desensitization effect, step S203 above may include the following steps when performing desensitization processing on the instant messaging data to be processed:

[0079] The instant messaging data to be processed is anonymized based on the k-anonymity algorithm to obtain multiple equivalence sets.

[0080] For each equivalence group, the sensitive attributes in the equivalence group are extracted to obtain the sensitive attribute set corresponding to each equivalence group;

[0081] For each set of sensitive attributes, the original probability of each sensitive attribute in the set of sensitive attributes is determined, and random noise conforming to a Laplace distribution is added to each of the original probabilities to obtain the target probability value corresponding to each sensitive attribute in the set of sensitive attributes.

[0082] Based on the target probability value corresponding to each sensitive attribute in each set of sensitive attributes, the sensitive attributes in the corresponding equivalent group are updated to obtain the de-identified instant messaging data to be processed.

[0083] The k-anonymity algorithm, through generalization (providing a more generalized and abstract description of the data) and concealment (not publishing certain data items), publishes data with lower precision, ensuring that each record has at least the same quasi-identifier attribute as at least k-1 other records in the data table. All records with the same quasi-identifier attribute are called an equivalence set. A quasi-identifier attribute means that a single column cannot locate an object, but information from multiple columns can be used to potentially identify a particular object.

[0084] Sensitive attributes refer to information that needs protection, such as disease information and revenue information. In this embodiment of the invention, for each equivalence group M... i Extract the equivalent set M i The sensitive attributes in the data are used to obtain the set of sensitive attributes corresponding to each equivalence group, for example, S. i M represents the equivalent set. i The corresponding set of sensitive attributes.

[0085] The raw probabilities of each sensitive attribute in a sensitive attribute set can be statistical measures. For example, for each sensitive attribute set, the probability of occurrence of each sensitive attribute in that set can be calculated, and this probability can be used as the raw probability of the corresponding sensitive attribute in that sensitive attribute set. For example, S i = {s1, s2, s3}, then the set of sensitive attributes S i The probability of each sensitive attribute appearing is 1 / 3.

[0086] In another example, the original probability of each sensitive attribute in the sensitive attribute set can also be determined based on the correspondence between the sensitive attribute and the preset probability. A lower preset probability indicates a higher sensitivity level, meaning a higher need for desensitization. Therefore, by finding the correspondence between sensitive attributes and preset probabilities, the preset probability corresponding to each sensitive attribute can be determined as the original probability of the corresponding sensitive attribute.

[0087] The embodiments of the present invention target each set of sensitive attributes, such as S. i Construct a random noise Lap(ΔS) that follows a Laplace distribution. i / ε i ), where ΔS i Indicates sensitivity, ε iThis represents the privacy budget in ε-differential privacy.

[0088] Furthermore, by adding this random noise, which follows a Laplace distribution, to S i The S can be obtained from the original probabilities corresponding to each sensitive attribute. i The target probability values ​​corresponding to each sensitive attribute, if α p The target probability value can be expressed by the following formula:

[0089] α p =α+Lap(ΔS) i / ε i )

[0090] Where α represents S i The original probability of a certain sensitive attribute.

[0091] In this embodiment of the invention, after obtaining the target probability value of each sensitive attribute in each sensitive attribute set, the corresponding sensitive attributes in the corresponding equivalence group are updated based on the target probability value, thereby making the sensitive attributes appearing in the updated corresponding equivalence group highly similar, ensuring complete desensitization of sensitive information and improving the desensitization effect.

[0092] In step S203 above, when extracting data features from the anonymized instant messaging data, different feature extraction methods can be used for different modalities. Specifically, for the image to be processed, a pre-trained deep learning model can be used to extract its image features. For example, the pre-trained deep learning model can be a VGG network with a deep convolutional network structure. For text modal data such as context information and object attribute information, character vector embedding and word vector embedding can be performed respectively. Concatenating the character vectors obtained from character vector embedding yields the corresponding character vector features, and concatenating the word vectors obtained from word vector embedding yields the corresponding word vector features. Then, for context information, the character vector features and word vector features are concatenated to obtain context features, and for object attribute information, the character vector features and word vector features are concatenated to obtain object attribute features. Finally, the anonymized data features can be obtained by concatenating image features, context features, and object attribute features.

[0093] S205, the desensitized data features are input into the annotation information generation network to generate the annotation results corresponding to the image to be processed.

[0094] The annotation results include the first annotation score of each candidate tag word among multiple candidate tag words.

[0095] The annotation information generation network is a generative network trained on a conditional generative adversarial network using sample instant communication data. This trained generative network can generate multiple candidate labels based on the desensitized data features as input, and can obtain a first annotation score for each candidate label. The sample instant communication data includes sample images, sample context information associated with the sample images, and sample object attribute information. The reference annotation information includes at least one reference label.

[0096] The training of the annotation information generation network will be described in detail in a later section of this invention.

[0097] S207, for each candidate tag word, determine the second annotation score of the candidate tag word based on the distance between the image to be processed and the target nearest neighbor historical annotation image.

[0098] The annotation information of the target nearest neighbor historical annotated image includes the candidate tag words. The target nearest neighbor historical annotated image is determined from the set of historical annotated images obtained based on historical instant messaging data. Historical annotated images refer to historical images in historical instant messaging data that have been annotated with annotation information.

[0099] In specific implementation, for each candidate tag, target historical labeled images containing that candidate tag can be identified in the historical labeled image set, thus obtaining a target historical labeled image set. Then, the distance between the image to be processed and each target historical labeled image in the target historical labeled image set is determined, and the top preset number of target historical labeled images with the smallest distance are selected as the target nearest neighbor historical labeled images of the image to be processed. The sum of the distances corresponding to these preset number of target nearest neighbor historical labeled images can then be used as the second labeling score of the candidate tag. The preset number can be set according to actual needs, for example, it can be one or more.

[0100] S209, based on the first annotation score and the second annotation score of each of the candidate tag words, at least one target tag word is selected from the plurality of candidate tag words as the annotation information for annotating the image to be processed.

[0101] The at least one target tag word comes from different tag semantic paths. Different tag semantic paths indicate different semantic information. The tag words in each tag semantic path indicate the same semantic information. Therefore, the semantic information indicated by the at least one target tag word is different. That is, there will be no semantically duplicated tag words in the annotation information of the image to be processed.

[0102] In a specific implementation, for each candidate tag word, the sum of the first annotation score and the second annotation score corresponding to the candidate tag word can be calculated to obtain the target annotation score corresponding to each candidate tag word. Then, at least one target tag word is selected from multiple candidate tag words as the annotation information for annotating the to-be-processed picture in combination with the tag semantic path and the target annotation score. Among them, when calculating the target annotation score, score weights can also be assigned to the first annotation score and the second annotation score, and then the target annotation score is obtained by weighted summation based on the first annotation score, the second annotation score, and the corresponding score weights.

[0103] Based on this, in an exemplary implementation manner, the method of the embodiment of the present invention further includes constructing a tag semantic path, as Figure 3 shown, constructing the tag semantic path may include:

[0104] S301, obtaining sample instant messaging data and corresponding reference annotation information.

[0105] Among them, the sample instant messaging data includes sample pictures, sample context information associated with the sample pictures, and sample object attribute information, and the reference annotation information includes at least one reference tag word.

[0106] S303, determining a reference tag word set based on the reference annotation information corresponding to each sample instant messaging data.

[0107] Specifically, an initial reference tag word set can be obtained first based on the reference annotation information corresponding to each sample instant messaging data, and then the initial reference tag word set is de-duplicated to obtain the reference tag word set.

[0108] S305, determining the hyponymy relationship between reference tag words in the reference tag word set based on a preset semantic dictionary.

[0109] Among them, the preset semantic dictionary is a network structure dictionary composed of the meanings of words, which contains the hyponymy relationship between words. For example, "vehicle" is the hypernym of "train", and "train" and "car" are both the hyponyms of "vehicle".

[0110] Exemplarily, the preset semantic dictionary can be WordNet. WordNet is a lexical semantic network created by Princeton University, in which nouns, verbs, adjectives, and adverbs are each organized into a network of synonyms. Each synonym set represents a basic semantic concept, and these sets are also connected by various relationships, including hyponymy relationships (verbs, nouns), entailment relationships (verbs), similarity relationships (nouns), member part relationships (nouns), etc. The embodiment of the present invention determines the hyponymy relationship between reference tag words in the reference tag word set based on the hyponymy relationship in WordNet.

[0111] S307, Based on the hierarchical relationship between reference tag words in the reference tag word set, construct at least one tag semantic path.

[0112] In each semantic path of a tag, the reference tag words gradually become more specific, that is, more and more detailed, such as car-train-high-speed train.

[0113] S309, determine the path level weight of each path level in each of the aforementioned tag semantic paths.

[0114] In practice, the path level weights in each tag semantic path can be set according to the rule that the lower the path level in the tag semantic path, the greater the path level weight. This makes it easier for lower-level tags to be selected in subsequent selections, which is more conducive to achieving accurate description of the image and thus improving the accuracy of image recognition.

[0115] In one exemplary implementation, such as Figure 4 As shown, step S209 above, when selecting at least one target tag word from multiple candidate tag words as annotation information for the image to be processed based on the first and second annotation scores of each candidate tag word, may include:

[0116] S401, based on the first annotation score and the second annotation score of each candidate tag word, the target annotation score of each candidate tag word is obtained.

[0117] The target annotation score for each candidate tag word is the sum of its first annotation score and its second annotation score.

[0118] S403, based on the target annotation score, sort the multiple candidate tag words in descending order to determine the sorting position information of each candidate tag word.

[0119] Specifically, the sorting position information of each candidate tag can be the sorted sequence number, for example, the sequence number of the first one is 1, the sequence number of the second one is 2, and so on.

[0120] S405, based on the at least one tag semantic path, determine a plurality of first candidate tag words belonging to the same tag semantic path and the path level of each first candidate tag word in the same tag semantic path.

[0121] Specifically, when multiple candidate tag words belong to the same tag semantic path, these multiple candidate tag words are the aforementioned multiple first candidate tag words, thus obtaining at least one set of first candidate tag words, and each set of first candidate tag words corresponds to a tag semantic path.

[0122] In this embodiment of the invention, candidate tag words other than the first candidate tag word mentioned above are selected as second candidate tag words. It is understood that no other candidate tag words exist in the tag semantic path to which the second candidate tag word belongs.

[0123] S407, based on the path level weight corresponding to the path level of each first candidate tag word in the same tag semantic path and the sorting position information corresponding to the first candidate tag word, determine the target weight of each first candidate tag word.

[0124] In specific implementation, for each first candidate tag in each first candidate tag set, the path level weight corresponding to the first candidate tag can be divided by its corresponding sorting position information, and the result is used as the target weight of the first candidate tag.

[0125] S409, based on the target weight, select a target first candidate tag word from the plurality of first candidate tag words, and use the target first candidate tag word and the second candidate tag word as the at least one target tag word.

[0126] In specific implementation, for each set of first candidate tag words, the target first candidate tag word with the largest target weight can be selected, thereby obtaining at least one target first candidate tag word. This at least one target first candidate tag word corresponds to different tag semantic paths. Then, the at least one target first candidate tag word and the second candidate tag word are merged to obtain at least one target tag word. This at least one target tag word is used as the annotation information for the image to be processed, which greatly improves the accuracy of the annotation of the image to be processed and improves the recognition accuracy.

[0127] As can be seen from the above technical solutions of the embodiments of the present invention, the embodiments of the present invention correct the annotation results of the images to be processed by combining the similarity between images, and avoid semantically repetitive tag words in the annotation information by combining the semantic path of the tag, which greatly improves the accuracy and efficiency of identifying abnormal images in IM scenarios.

[0128] The following describes how to train a conditional generative adversarial network based on sample real-time communication data to obtain a network that generates labeled information. Specifically, this may include the following steps:

[0129] (1) Based on the reference annotation information corresponding to the sample instant messaging data, determine the reference annotation vector corresponding to the sample image in the sample instant messaging data.

[0130] Specifically, at least one reference tag word in the reference annotation information can be represented as a vector, and then the vector representations can be concatenated to obtain the reference annotation vector corresponding to the reference annotation information.

[0131] In one exemplary implementation, to improve training performance and thus enhance the accuracy of the annotation results generated by the annotation information generation network, the reference annotation vector corresponding to the sample image can be determined as follows: Figure 5 As shown, it includes the following steps:

[0132] S501, Based on the number of reference tags in the reference tag set, construct an initial one-hot encoding vector corresponding to the target dimension of the sample image.

[0133] The reference tag word set is obtained based on the reference annotation information corresponding to the instant messaging data of each sample. Each dimension of the initial one-hot encoding vector corresponds to a reference tag word, that is, the target dimension of the initial one-hot encoding vector is consistent with the number of reference tag words.

[0134] S503, traverse each dimension of the initial one-hot encoded vector. For the current dimension, determine the value of the current dimension based on the matching of the reference tag word corresponding to the current dimension with the reference annotation information corresponding to the sample instant messaging data.

[0135] Specifically, when the reference tag word corresponding to the current dimension matches the reference annotation information corresponding to the sample instant messaging data, a preset value (such as value 1) can be used as the value of the current dimension of the corresponding initial one-hot encoding vector; conversely, when the reference tag word corresponding to the current dimension does not match the reference annotation information corresponding to the sample instant messaging data, the value of the current dimension of the corresponding initial one-hot encoding vector is determined based on the correlation between the reference tags.

[0136] In one exemplary implementation, please continue to see Figure 5 The step S503 above, which determines the value of the current dimension based on the matching of the reference tag words corresponding to the current dimension with the reference annotation information corresponding to the sample instant messaging data, may include:

[0137] S5031, determine whether there is a reference tag word corresponding to the current dimension in the reference annotation information corresponding to the sample instant messaging data.

[0138] Specifically, if the result of the judgment is that it exists, then step S5033 can be executed; otherwise, if the result of the judgment is that it does not exist, then step S5035 can be executed.

[0139] S5033, the preset value is used as the current dimension value of the initial one-hot encoding vector.

[0140] The preset value can be set according to actual needs, for example, it can be the value 1.

[0141] S5035, based on the co-occurrence of the reference tag word corresponding to the current dimension and the reference annotation information in the sample image set, determine the current dimension value of the initial one-hot encoding vector.

[0142] Co-occurrence refers to the situation where the reference label word corresponding to the current dimension and the reference annotation information of the sample image coexist in the reference annotation information of a certain sample image in the sample image set.

[0143] Assume the reference tag word corresponding to the current dimension is m i The reference annotation information corresponding to the sample real-time communication data k (including sample image k) is M. k ={m k},and The value of the current dimension in the initial one-hot encoded vector corresponding to sample image k can be obtained through the formula. The calculation yields, where sum(m) i ) indicates that the reference annotation information in the sample image set includes m i The number of sample images, sum(m i M k ) indicates that the reference annotation information in the sample image set includes m. i and M k The number of sample images.

[0144] In the above implementation, when constructing the one-hot encoding vector corresponding to the sample image, the correlation between reference label words is considered. This effectively improves the accuracy of the one-hot encoding vector while ensuring that no new noise label words are introduced, thereby improving the training effect of the model.

[0145] S505, based on the one-hot encoded vector of the target dimension obtained at the end of the traversal, perform vector embedding to obtain the reference annotation vector corresponding to the sample image.

[0146] In practice, multiple vector embedding methods can be used to embed the one-hot encoded vector of the target dimension, and then the embedded vectors obtained by each vector embedding method can be concatenated to obtain the reference annotation vector corresponding to the sample image.

[0147] For example, a pre-trained language model can be used to embed the one-hot encoded vector of the target dimension corresponding to the sample image. This pre-trained language model can be a bidirectional encoder based on Transformer, such as the BERT model. Alternatively, Word2vec can be used to embed the one-hot encoded vector of the target dimension corresponding to the sample image.

[0148] (2) De-identify the sample instant messaging data, extract the data features of the de-identified sample instant messaging data, and obtain the sample de-identified data features;

[0149] For specific desensitization processing and feature extraction of the desensitized sample real-time communication data, please refer to the foregoing embodiments of this invention. Figure 2 The relevant descriptions in the document will not be repeated here.

[0150] (3) Input the desensitized data features of the sample and the random training noise data into the generative network of the conditional generative adversarial model to predict the annotation information and obtain the prediction annotation result.

[0151] In this model, the features of the desensitized sample data are used as constraints. These features are concatenated with the random training noise data and then used as input to the generative network of the conditional generative adversarial model. Under these constraints, the generative network predicts the label words to obtain the prediction and labeling results.

[0152] (4) Input the predicted labeling results, the sample desensitized data features and the reference labeling vector into the discriminant network of the conditional generative adversarial model, and determine the first loss based on the discriminant network's discriminant results.

[0153] The discrimination result indicates the probability that the predicted annotation result belongs to the reference annotation information and the degree of matching between the predicted annotation result and the sample image.

[0154] The first loss is the loss function of the conditional generative adversarial model, specifically the first loss L. cGAN It can be represented as:

[0155]

[0156] Where y represents the constraint condition, i.e., the features of the sample desensitized data; x represents the reference label vector; z represents the random training noise data; G represents the generator network; and D represents the discriminator network.

[0157] (5) Determine the second loss based on the first and second extraction features obtained from the discriminant network.

[0158] Wherein, the first extracted feature is the feature extracted corresponding to the predicted annotation result during the discrimination process, and the second extracted feature is the feature extracted corresponding to the reference annotation vector during the discrimination process. In specific implementation, the first and second extracted features can be taken from the output of the second-to-last fully connected layer of the discriminator, or the output of the penultimate fully connected layer.

[0159] Considering that the process of generating annotation results using a generative adversarial network (GAN) is a sampling process, each iteration of GAN training produces a batch of predicted annotation results S. b =G(z) b ), where b represents batch size, i.e., one batch, and also samples some real labeled information X from the real data. b ~p d , where p d It is the distribution of real data. In order to learn a distribution from noisy data z to true labeled information X... b A mapping G(·):z→p between d The expected generated prediction labeling result S b And the actual annotation information X b It is obtained by sampling the kernel matrix of the same determinant point process, assuming Indicates that S was sampled. b The kernel matrix of the determinant point process, Indicates sampling X b If the kernel matrix of the determinant of the point process is given, then the expectation is... in It can be obtained from the first extracted feature corresponding to the prediction annotation result in the discriminator. This can be obtained through the second extracted feature extracted from the corresponding reference annotation vector in the discriminator. Specifically, Where D(·) represents the corresponding feature extracted by the discriminant network.

[0160] Therefore, it can be seen that the loss function L introduced in the training process of the conditional generative adversarial model is the determinant point process. DPP It can be represented as: This is the second loss.

[0161] (6) Based on the first loss and the second loss, adjust the model parameters of the conditional adversarial model until the preset training termination condition is met.

[0162] The generated network at the end of training serves as the annotation information generation network.

[0163] Specifically, the sum of the first loss and the second loss can be calculated to obtain the total loss L = L cGAN +LDPP adjusts the model parameters of the conditional generative adversarial model based on the total loss until a preset training termination condition is met, and the generated network at the end of training is used as the annotation information generation network in this embodiment of the invention. The preset training termination condition can be either the total loss value reaching a preset loss threshold or the number of iterations reaching a preset iteration number threshold.

[0164] The above implementation introduces a matrix-based point process during the training of the conditional generative adversarial model and incorporates the loss of the matrix-based point process into the loss of the conditional generative adversarial model. This allows the trained annotation information generation network to generate multiple different candidate labels for the same input, thereby improving the diversity of the generated annotation results.

[0165] Corresponding to the instant messaging data processing methods provided in the above embodiments, this embodiment of the invention also provides an instant messaging data processing device. Since the instant messaging data processing device provided in this embodiment of the invention corresponds to the instant messaging data processing methods provided in the above embodiments, the implementation methods of the aforementioned instant messaging data processing methods are also applicable to the instant messaging data processing device provided in this embodiment, and will not be described in detail in this embodiment.

[0166] Please see Figure 6 The diagram shows a schematic representation of an instant messaging data processing device according to an embodiment of the present invention. This device has the function of implementing the instant messaging data processing method described in the above-described method embodiments. This function can be implemented in hardware or by hardware executing corresponding software. Figure 6 As shown, the instant messaging data processing device 600 may include:

[0167] The first acquisition module 610 is used to acquire instant messaging data to be processed; the instant messaging data to be processed includes an image to be processed, context information associated with the image to be processed, and object attribute information;

[0168] The desensitization processing module 620 is used to desensitize the instant messaging data to be processed, extract the data features of the desensitized instant messaging data to be processed, and obtain the desensitized data features.

[0169] The annotation generation module 630 is used to input the desensitized data features into the annotation information generation network to generate annotation results corresponding to the image to be processed; the annotation results include the first annotation score of each candidate label word among multiple candidate label words;

[0170] The annotation score determination module 640 is used to determine a second annotation score for each candidate tag word based on the distance between the image to be processed and the target nearest neighbor historical annotation image; the annotation information of the target nearest neighbor historical annotation image includes the candidate tag word;

[0171] The annotation information determination module 650 is used to select at least one target tag word from a plurality of candidate tag words as annotation information for the image to be processed, based on the first annotation score and the second annotation score of each candidate tag word; wherein the at least one target tag word comes from different tag semantic paths.

[0172] In one exemplary embodiment, the desensitization processing module 620 includes:

[0173] The equivalence group partitioning module is used to anonymize the instant messaging data to be processed based on the k-anonymity algorithm to obtain multiple equivalence groups.

[0174] The sensitive attribute set determination module is used to extract sensitive attributes from each equivalence group to obtain a sensitive attribute set corresponding to each equivalence group.

[0175] The attribute probability adjustment module is used to determine the original probability of each sensitive attribute in each set of sensitive attributes, and add random noise conforming to a Laplace distribution to each of the original probabilities to obtain the target probability value corresponding to each sensitive attribute in the set of sensitive attributes.

[0176] The equivalence group update module is used to update the sensitive attributes in the corresponding equivalence group based on the target probability value corresponding to each sensitive attribute in each sensitive attribute set, so as to obtain the de-sensitized instant messaging data to be processed.

[0177] In one exemplary embodiment, the apparatus further includes a construction module for constructing the tag semantic path, the construction module comprising:

[0178] The second acquisition module is used to acquire sample instant communication data and corresponding reference annotation information; the sample instant communication data includes sample images, sample context information associated with the sample images, and sample object attribute information, and the reference annotation information includes at least one reference tag word;

[0179] The first determining module is used to determine a set of reference tags based on the reference annotation information corresponding to the instant communication data of each sample;

[0180] The second determining module is used to determine the hierarchical relationship between reference tag words in the reference tag word set based on a preset semantic dictionary;

[0181] The semantic path construction module is used to construct at least one tag semantic path based on the hierarchical relationship between reference tag words in the reference tag word set;

[0182] The path hierarchy weight determination module is used to determine the path hierarchy weight of each path level in each of the aforementioned tag semantic paths.

[0183] In one exemplary embodiment, the annotation information determination module 650 includes:

[0184] The third determining module is used to obtain the target annotation score of each candidate tag word based on the first annotation score and the second annotation score of each candidate tag word;

[0185] The sorting position determination module is used to sort the multiple candidate tag words in descending order based on the target annotation score, and determine the sorting position information of each candidate tag word;

[0186] The fourth determining module is used to determine, based on the at least one tag semantic path, multiple first candidate tag words belonging to the same tag semantic path and the path level of each first candidate tag word in the same tag semantic path;

[0187] The fifth determining module is used to determine the target weight of each first candidate tag word based on the path level weight corresponding to the path level in the same tag semantic path and the sorting position information corresponding to the first candidate tag word.

[0188] The sixth determining module is used to select a target first candidate tag word from the plurality of first candidate tag words based on the target weight, and to use the target first candidate tag word and the second candidate tag word as the at least one target tag word; the second candidate tag word is a candidate tag word other than the first candidate tag word among the plurality of candidate tag words.

[0189] In one exemplary embodiment, the apparatus further includes a training module, the training module comprising:

[0190] The reference annotation vector determination module is used to determine the reference annotation vector corresponding to the sample image based on the reference annotation information corresponding to the sample instant messaging data.

[0191] The sample desensitization module is used to desensitize the sample instant messaging data, extract the data features of the desensitized sample instant messaging data, and obtain the sample desensitized data features.

[0192] The prediction module is used to input the desensitized data features of the sample and the random training noise data into the generative network of the conditional generative adversarial model to predict the annotation information and obtain the prediction annotation result.

[0193] The first loss determination module is used to input the predicted annotation result, the sample desensitized data features, and the reference annotation vector into the discriminant network of the generative adversarial model, and determine the first loss based on the discrimination result of the discriminant network; the discrimination result indicates the probability that the predicted annotation result belongs to the reference annotation information and the degree of matching between the predicted annotation result and the sample image;

[0194] The second loss determination module is used to determine a second loss based on a first extracted feature and a second extracted feature obtained from the discriminant network; the first extracted feature is a feature extracted in the discrimination process corresponding to the prediction labeling result, and the second extracted feature is a feature extracted in the discrimination process corresponding to the reference labeling vector;

[0195] The parameter adjustment module is used to adjust the model parameters of the conditional generative adversarial model based on the first loss and the second loss until a preset training termination condition is met; wherein, the generative network at the end of training serves as the annotation information generation network.

[0196] In one exemplary implementation, the reference annotation vector determination module includes:

[0197] The vector construction module is used to construct an initial one-hot encoded vector corresponding to the target dimension of the sample image based on the number of reference tag words in the reference tag word set; each dimension of the initial one-hot encoded vector corresponds to a reference tag word;

[0198] The traversal module is used to traverse each dimension of the initial one-hot encoded vector. For the current dimension, the value of the current dimension is determined based on the matching of the reference tag words corresponding to the current dimension and the reference annotation information corresponding to the sample instant messaging data.

[0199] The vector embedding module is used to perform vector embedding based on the one-hot encoded vector of the target dimension obtained at the end of the traversal, so as to obtain the reference annotation vector corresponding to the sample image.

[0200] In one exemplary implementation, the traversal module includes:

[0201] The seventh determining module is used to take a preset value as the current dimension value of the initial one-hot encoding vector when the reference tag word corresponding to the current dimension exists in the reference annotation information corresponding to the sample instant messaging data.

[0202] The eighth determining module is used to determine the current dimension value of the initial one-hot encoding vector based on the co-occurrence of the reference tag word corresponding to the current dimension and the reference annotation information in the sample image set when the reference tag word corresponding to the current dimension does not exist in the reference annotation information corresponding to the sample instant messaging data.

[0203] It should be noted that the apparatus provided in the above embodiments is only illustrated by the division of the above functional modules when implementing its functions. In actual applications, the above functions can be assigned to different functional modules as needed, that is, the internal structure of the device can be divided into different functional modules to complete all or part of the functions described above. In addition, the apparatus and method embodiments provided in the above embodiments belong to the same concept, and the specific implementation process can be found in the method embodiments, which will not be repeated here.

[0204] This invention provides an electronic device including a processor and a memory. The memory stores at least one instruction or at least one program, which is loaded and executed by the processor to implement any of the instant messaging data processing methods provided in the above method embodiments.

[0205] Memory can be used to store software programs and modules. The processor executes various functional applications and data processing by running the software programs and modules stored in the memory. Memory can primarily include a program storage area and a data storage area. The program storage area can store the operating system, application programs required for the functions, etc.; the data storage area can store data created based on the use of the device, etc. Furthermore, memory can include high-speed random access memory, and can also include non-volatile memory, such as at least one disk storage device, flash memory device, or other volatile solid-state storage device. Accordingly, memory can also include a memory controller to provide the processor with access to the memory.

[0206] The methods and embodiments provided in this invention can be executed in a computer terminal, server, or similar computing device; that is, the aforementioned electronic device may include a computer terminal, server, or similar computing device. Taking running on a server as an example... Figure 7 This is a hardware structure block diagram of a server running an instant messaging data processing method provided in an embodiment of the present invention, as shown below. Figure 7As shown, the server 700 can vary significantly due to different configurations or performance. It may include one or more central processing units (CPUs) 710 (CPUs 710 may include, but are not limited to, microprocessors such as MCUs or programmable logic devices such as FPGAs), a memory 730 for storing data, and one or more storage media 720 (e.g., one or more mass storage devices) for storing application programs 723 or data 722. The memory 730 and storage media 720 may be temporary or persistent storage. The program stored in the storage media 720 may include one or more modules, each module may include a series of instruction operations on the server. Furthermore, the CPU 710 may be configured to communicate with the storage media 720 and execute the series of instruction operations stored in the storage media 720 on the server 700. Server 700 may also include one or more power supplies 760, one or more wired or wireless network interfaces 750, one or more input / output interfaces 740, and / or one or more operating systems 721, such as Windows Server™, Mac OS X™, Unix™, Linux™, FreeBSD™, etc.

[0207] The input / output interface 740 can be used to receive or send data via a network. Specific examples of the network described above may include a wireless network provided by the communication provider of server 700. In one example, the input / output interface 740 includes a network interface controller (NIC), which can connect to other network devices via a base station to communicate with the Internet. In another example, the input / output interface 740 may be a radio frequency (RF) module for wireless communication with the Internet.

[0208] Those skilled in the art will understand that Figure 7 The structure shown is for illustrative purposes only and does not limit the structure of the aforementioned electronic device. For example, server 700 may also include... Figure 7 The more or fewer components shown, or having the same Figure 7 The different configurations shown.

[0209] Embodiments of the present invention also provide a computer-readable storage medium, which can be disposed in an electronic device to store at least one instruction or at least one program related to implementing an instant messaging data processing method. The at least one instruction or the at least one program is loaded and executed by the processor to implement any of the instant messaging data processing methods provided in the above-described method embodiments.

[0210] Optionally, in this embodiment, the storage medium may include, but is not limited to, various media capable of storing program code, such as USB flash drives, read-only memory (ROM), random access memory (RAM), portable hard drives, magnetic disks, or optical disks.

[0211] It should be noted that the order of the above embodiments of the present invention is merely for descriptive purposes and does not represent the superiority or inferiority of the embodiments. Furthermore, specific embodiments have been described above. Other embodiments are within the scope of the appended claims. In some cases, the actions or steps described in the claims can be performed in a different order than that shown in the embodiments and still achieve the desired result. Additionally, the processes depicted in the drawings do not necessarily require a specific or sequential order to achieve the desired result. In some embodiments, multitasking and parallel processing are also possible or may be advantageous.

[0212] The various embodiments in this specification are described in a progressive manner. Similar or identical parts between embodiments can be referred to mutually. Each embodiment focuses on describing the differences from other embodiments. In particular, the apparatus embodiments are basically similar to the method embodiments, so the description is relatively simple; relevant parts can be referred to the descriptions of the method embodiments.

[0213] Those skilled in the art will understand that all or part of the steps of the above embodiments can be implemented by hardware or by a program instructing related hardware. The program can be stored in a computer-readable storage medium, such as a read-only memory, a disk, or an optical disk.

[0214] The above description is only a preferred embodiment of the present invention and is not intended to limit the present invention. Any modifications, equivalent substitutions, improvements, etc., made within the spirit and principles of the present invention should be included within the protection scope of the present invention.

Claims

1. A method for processing instant messaging data, characterized in that, The method includes: Acquire instant messaging data to be processed; the instant messaging data to be processed includes an image to be processed, context information associated with the image to be processed, and object attribute information; The instant messaging data to be processed is de-identified, and the data features of the de-identified instant messaging data are extracted to obtain the de-identified data features. The desensitized data features are input into the annotation information generation network to generate annotation results corresponding to the image to be processed; the annotation results include the first annotation score of each candidate tag word among multiple candidate tag words; For each candidate tag, a second annotation score is determined based on the distance between the image to be processed and the target nearest neighbor historical annotation image; the annotation information of the target nearest neighbor historical annotation image includes the candidate tag. Based on the first and second annotation scores of each candidate tag word, the target annotation score of each candidate tag word is obtained; the candidate tag words are sorted in descending order based on the target annotation scores to determine the sorting position information of each candidate tag word; based on at least one tag semantic path, multiple first candidate tag words belonging to the same tag semantic path and the path level of each first candidate tag word in the same tag semantic path are determined; each tag semantic path is composed of reference tag words with hierarchical relationships; Based on the path level weight corresponding to each first candidate tag word in the same tag semantic path and the sorting position information corresponding to the first candidate tag word, a target weight for each first candidate tag word is determined; based on the target weight, a target first candidate tag word is selected from the plurality of first candidate tag words, and the target first candidate tag word and the second candidate tag word are used as at least one target tag word, and the at least one target tag word is used as annotation information for annotating the image to be processed; the second candidate tag word is a candidate tag word other than the first candidate tag word among the plurality of candidate tag words.

2. The method according to claim 1, characterized in that, The process of de-identifying the instant messaging data to be processed includes: based on k -Anonymous algorithms are used to anonymize the instant messaging data to be processed, resulting in multiple equivalence sets; For each equivalence group, the sensitive attributes in the equivalence group are extracted to obtain the sensitive attribute set corresponding to each equivalence group; For each set of sensitive attributes, the original probability of each sensitive attribute in the set of sensitive attributes is determined, and random noise conforming to a Laplace distribution is added to each of the original probabilities to obtain the target probability value corresponding to each sensitive attribute in the set of sensitive attributes. Based on the target probability value corresponding to each sensitive attribute in each set of sensitive attributes, the sensitive attributes in the corresponding equivalent group are updated to obtain the de-identified instant messaging data to be processed.

3. The method according to claim 1 or 2, characterized in that, The method further includes constructing the tag semantic path, wherein constructing the tag semantic path includes: Acquire sample real-time communication data and corresponding reference annotation information; the sample real-time communication data includes sample images, sample context information associated with the sample images, and sample object attribute information, and the reference annotation information includes at least one reference tag word; Based on the reference annotation information corresponding to the instant messaging data of each sample, a reference tag word set is determined; Based on a pre-defined semantic dictionary, the hierarchical relationship between reference tag words in the reference tag word set is determined; Based on the hierarchical relationship between reference tags in the reference tag set, at least one tag semantic path is constructed; Determine the path hierarchy weights for each path level in each of the aforementioned tag semantic paths.

4. The method according to claim 3, characterized in that, The method further includes: Based on the reference annotation information corresponding to the sample instant messaging data, the reference annotation vector corresponding to the sample image is determined. The sample instant messaging data is de-identified, and the data features of the de-identified sample instant messaging data are extracted to obtain the sample de-identified data features; The desensitized data features of the samples and the random training noise data are input into the generative network of the conditional generative adversarial model to predict the annotation information and obtain the predicted annotation results. The predicted annotation result, the de-identified data features of the sample, and the reference annotation vector are input into the discriminant network of the generative adversarial model, and a first loss is determined based on the discrimination result of the discriminant network; the discrimination result indicates the probability that the predicted annotation result belongs to the reference annotation information and the degree of matching between the predicted annotation result and the sample image; A second loss is determined based on the first extracted features and the second extracted features obtained from the discriminant network; the first extracted features are features extracted in the discrimination process corresponding to the predicted labeling results, and the second extracted features are features extracted in the discrimination process corresponding to the reference labeling vector. Based on the first loss and the second loss, the model parameters of the conditional generative adversarial model are adjusted until a preset training termination condition is met; wherein, the generative network at the end of training serves as the annotation information generation network.

5. The method according to claim 4, characterized in that, The step of determining the reference annotation vector corresponding to the sample image based on the reference annotation information corresponding to the sample instant messaging data includes: Based on the number of reference tags in the reference tag set, an initial one-hot encoding vector corresponding to the target dimension of the sample image is constructed; each dimension of the initial one-hot encoding vector corresponds to a reference tag. Iterate through each dimension of the initial one-hot encoded vector. For the current dimension, determine the value of the current dimension based on the matching of the reference tag word corresponding to the current dimension with the reference annotation information corresponding to the sample instant messaging data. Vector embedding is performed based on the one-hot encoded vector of the target dimension obtained at the end of the traversal to obtain the reference annotation vector corresponding to the sample image.

6. The method according to claim 5, characterized in that, The step of determining the value of the current dimension based on the matching of the reference tags corresponding to the current dimension with the reference annotation information corresponding to the sample instant messaging data includes: If the reference tag word corresponding to the current dimension exists in the reference annotation information corresponding to the sample instant messaging data, then the preset value will be used as the current dimension value of the initial one-hot encoding vector. If the reference tag word corresponding to the current dimension does not exist in the reference annotation information corresponding to the sample instant messaging data, then the current dimension value of the initial one-hot encoding vector is determined based on the co-occurrence of the reference tag word corresponding to the current dimension and the reference annotation information in the sample image set.

7. An instant messaging data processing device, characterized in that, The device includes: The first acquisition module is used to acquire instant messaging data to be processed; the instant messaging data to be processed includes an image to be processed, context information associated with the image to be processed, and object attribute information; The desensitization processing module is used to desensitize the instant messaging data to be processed, extract the data features of the desensitized instant messaging data to be processed, and obtain the desensitized data features. The annotation generation module is used to input the desensitized data features into the annotation information generation network to generate annotation results corresponding to the image to be processed; the annotation results include the first annotation score of each candidate tag word among multiple candidate tag words; The annotation score determination module is used to determine a second annotation score for each candidate label word based on the distance between the image to be processed and the target nearest neighbor historical annotation image; the annotation information of the target nearest neighbor historical annotation image includes the candidate label word; The annotation information determination module is used to obtain the target annotation score of each candidate tag word based on the first annotation score and the second annotation score of each candidate tag word; to sort the multiple candidate tag words in descending order based on the target annotation score and determine the sorting position information of each candidate tag word; to determine multiple first candidate tag words belonging to the same tag semantic path and the path level of each first candidate tag word in the same tag semantic path based on at least one tag semantic path; each tag semantic path is composed of reference tag words with hierarchical relationship; to determine the target weight of each first candidate tag word based on the path level weight corresponding to the path level of each first candidate tag word in the same tag semantic path and the sorting position information corresponding to the first candidate tag word; to select a target first candidate tag word from the multiple first candidate tag words based on the target weight, and to use the target first candidate tag word and the second candidate tag word as at least one target tag word, and the at least one target tag word as the annotation information for annotating the image to be processed; the second candidate tag word is the candidate tag word other than the first candidate tag word among the multiple candidate tag words.

8. An electronic device, characterized in that, The method includes a processor and a memory, wherein the memory stores at least one instruction or at least one program, the at least one instruction or the at least one program being loaded and executed by the processor to implement the instant messaging data processing method as described in any one of claims 1 to 6.

9. A computer-readable storage medium, characterized in that, The computer-readable storage medium stores at least one instruction or at least one program, which is loaded and executed by a processor to implement the instant messaging data processing method as described in any one of claims 1 to 6.

10. A computer program product, characterized in that, It includes a computer program that, when executed by a processor, implements the instant messaging data processing method according to any one of claims 1 to 6.