Content recognition method and apparatus
By employing specific identification methods based on content type and business type to obtain anomaly indices, the problem of inaccurate anomaly content identification in existing technologies is solved, achieving refined content identification and improving identification accuracy and stability.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- BEIJING SANKUAI ONLINE TECH CO LTD
- Filing Date
- 2022-07-21
- Publication Date
- 2026-06-23
AI Technical Summary
Existing technologies exhibit discrepancies in content recognition performance across different business types, resulting in inaccurate identification of abnormal content.
Based on content type and business type, different identification methods are used to obtain anomaly indices. Anomaly lexicon, semantic recognition model and image processing technology are used to determine whether the supplied content is abnormal.
It enables refined identification of supply content for different business types, improving the accuracy and stability of abnormal content identification.
Smart Images

Figure CN115329171B_ABST
Abstract
Description
Technical Field
[0001] This application relates to the field of Internet technology, and in particular to a content recognition method and device. Background Technology
[0002] With the development of internet technology, internet platforms can display increasingly diverse content. Therefore, it is necessary to identify whether the content displayed on internet platforms is abnormal in order to address it promptly and purify the online environment. Abnormal content refers to content that has a higher probability of being reported by viewers than a certain threshold; for example, abnormal content may contain vulgar, seductive, terrifying, or disgusting material.
[0003] In related technologies, web crawlers are used to capture images or videos to be identified, and deep learning networks are used to extract feature information from the content. The extracted feature information is then analyzed using a constructed model to determine human posture, detect special sensitive parts, perform comprehensive image and text detection, and video detection, resulting in multi-dimensional detection results. The detection results are then comprehensively evaluated, and the evaluation score is compared with a judgment threshold to determine whether the content is abnormal.
[0004] However, for application scenarios involving multiple business types, the methods in related technologies use the same judgment threshold for different business types, which leads to deviations in the recognition effect achieved for different business types and the recognition results of abnormal content are not accurate enough. Summary of the Invention
[0005] This application provides a content recognition method and device that can solve the problems in related technologies.
[0006] Firstly, a content recognition method is provided, the method comprising:
[0007] Obtain multiple supply contents to be identified, wherein the content type of any supply content includes at least one of text, image, audio or video, and each supply content has a corresponding business type;
[0008] Based on the content type of the multiple supply contents, content identification is performed on the multiple supply contents, and an anomaly index corresponding to each of the multiple supply contents is obtained based on the identification results. The anomaly index is used to indicate the probability that the supply contents belong to anomalous content.
[0009] Based on the correspondence between business type and abnormal threshold, the abnormal thresholds corresponding to the multiple supply contents are determined respectively. Based on the relationship between the abnormal index and the corresponding abnormal threshold, it is determined whether any supply content among the multiple supply contents is abnormal content.
[0010] In one possible implementation, the plurality of supply contents include a plurality of first texts and a plurality of second texts, wherein the text length of the first texts is less than a length threshold, and the text length of the second texts is not less than the length threshold;
[0011] The step of performing content identification on the multiple supply contents and obtaining anomaly indices corresponding to each of the multiple supply contents based on the identification results includes:
[0012] The plurality of first texts are matched with an abnormal vocabulary library and a high-frequency vocabulary library respectively. Based on the matching results, the abnormal index corresponding to the plurality of first texts is obtained. The abnormal vocabulary library includes a plurality of abnormal words, and the high-frequency vocabulary library includes a plurality of high-frequency words. The high-frequency words are words whose display frequency is higher than the frequency threshold.
[0013] The semantic recognition model is invoked to extract the semantic features corresponding to the multiple second texts respectively, and the anomaly index corresponding to the multiple second texts is obtained based on the semantic features.
[0014] In one possible implementation, the plurality of content items includes a plurality of images; the step of performing content recognition on the plurality of content items and obtaining anomaly indices corresponding to each of the plurality of content items based on the recognition results includes:
[0015] Multiple candidate images are recalled from the multiple images according to the recognition speed, and the image features corresponding to the multiple candidate images are identified respectively. The anomaly index corresponding to the multiple images is obtained according to the recognition results. The probability that the multiple candidate images belong to abnormal content is greater than a first probability threshold, and the recognition speed is greater than a speed threshold.
[0016] In one possible implementation, each image includes corresponding text information, the text information including multiple words, and the abnormal content including the target abnormal image; the method further includes:
[0017] Based on the multiple words corresponding to the multiple images, multiple first target images are recalled from the multiple images according to multiple target anomalous words, wherein the probability of the first target image belonging to the target anomalous image is greater than a second probability threshold; the multiple first target images and the multiple candidate images are fused to obtain multiple second target images, and the target features corresponding to the multiple second target images are extracted respectively;
[0018] The step of identifying the image features corresponding to the plurality of candidate images and obtaining the anomaly index corresponding to the plurality of images based on the identification results includes: identifying the image features corresponding to the plurality of candidate images to obtain the image features corresponding to the plurality of candidate images; and obtaining the anomaly index corresponding to the plurality of images based on the image features and the target features.
[0019] In one possible implementation, the method further includes:
[0020] Obtain the initial anomaly vocabulary and initial semantic recognition model;
[0021] In response to the detection of an abnormal event, abnormal text is extracted from the abnormal event and added to the abnormal text set;
[0022] When the number of texts included in the abnormal text set is not greater than the first number threshold, the updated abnormal words are extracted from the abnormal text set, the initial abnormal word library is updated based on the updated abnormal words, and the updated initial abnormal word library is used as the abnormal word library.
[0023] When the number of texts included in the abnormal text set is greater than a first quantity threshold, the initial semantic recognition model is adjusted based on the abnormal text set, and the adjusted initial semantic recognition model is used as the semantic recognition model.
[0024] In one possible implementation, the step of recalling multiple candidate images from the multiple images according to the recognition speed, recognizing the image features corresponding to the multiple candidate images respectively, and obtaining the anomaly index corresponding to the multiple images respectively based on the recognition results includes:
[0025] The image recognition model is invoked to recall multiple candidate images from the multiple images according to the recognition speed. The image features corresponding to the multiple candidate images are recognized respectively, and the anomaly index corresponding to the multiple images is obtained according to the recognition results.
[0026] Before invoking the image recognition model to recall multiple candidate images from the multiple images according to the recognition speed, the method further includes:
[0027] Obtain the initial image recognition model;
[0028] In response to the detection of an abnormal event, an abnormal image is extracted from the abnormal event and added to an abnormal image set;
[0029] When the number of images in the abnormal image set is greater than the second quantity threshold, the image recognition model is adjusted based on the abnormal image set, and the adjusted initial image recognition model is used as the image recognition model.
[0030] In one possible implementation, the method further includes:
[0031] When the number of images included in the abnormal image set is not greater than the second quantity threshold, the image among the plurality of images whose similarity to any image included in the abnormal image set is greater than the similarity threshold is determined as the abnormal content.
[0032] In one possible implementation, the method further includes:
[0033] In response to receiving multiple feedback messages for the first supply content, each feedback message includes a corresponding exception label;
[0034] When the number of identical abnormal labels in the multiple feedback messages exceeds a third quantity threshold, the first supply content is determined to be abnormal content.
[0035] In one possible implementation, the target of the plurality of supplied content items is the target object; before determining the abnormal threshold corresponding to each of the plurality of supplied content items based on the correspondence between business type and abnormal threshold, the method further includes:
[0036] The tolerance index of the target object is determined based on at least one of the target object's historical feedback information or historical collection information, and the tolerance index is used to indicate the probability of providing feedback on the supplied content;
[0037] Based on the tolerance index of the target object, obtain the correspondence between the business type and the abnormal threshold corresponding to the target object.
[0038] Secondly, a content recognition device is provided, the device comprising:
[0039] The first acquisition module is used to acquire multiple supply contents to be identified. The content type of any supply content includes at least one of text, image, audio or video, and each supply content has a corresponding business type.
[0040] The identification module is used to identify the multiple supply contents according to their content types, and to obtain the anomaly index corresponding to each of the multiple supply contents based on the identification results. The anomaly index is used to indicate the probability that the supply contents belong to abnormal content.
[0041] The first determining module is used to determine the abnormal thresholds corresponding to the multiple supply contents based on the correspondence between business types and abnormal thresholds, and to determine whether any one of the multiple supply contents is abnormal content based on the relationship between the abnormal index and the corresponding abnormal threshold.
[0042] In one possible implementation, the plurality of supply contents include a plurality of first texts and a plurality of second texts, wherein the text length of the first texts is less than a length threshold, and the text length of the second texts is not less than the length threshold;
[0043] The identification module is used to match the plurality of first texts with an abnormal vocabulary library and a high-frequency vocabulary library respectively, and obtain the abnormal index corresponding to the plurality of first texts respectively based on the matching results. The abnormal vocabulary library includes a plurality of abnormal words, and the high-frequency vocabulary library includes a plurality of high-frequency words, wherein the high-frequency words are words whose display frequency is higher than a frequency threshold; and calls a semantic recognition model to extract the semantic features corresponding to the plurality of second texts respectively, and obtain the abnormal index corresponding to the plurality of second texts respectively based on the semantic features.
[0044] In one possible implementation, the plurality of supply contents includes a plurality of images;
[0045] The recognition module is used to recall multiple candidate images from the multiple images according to the recognition speed, recognize the image features corresponding to the multiple candidate images respectively, obtain the anomaly index corresponding to the multiple images respectively according to the recognition results, wherein the probability that the multiple candidate images belong to abnormal content is greater than a first probability threshold, and the recognition speed is greater than a speed threshold.
[0046] In one possible implementation, each image includes corresponding text information, the text information including multiple words, and the abnormal content including the target abnormal image;
[0047] The identification module is further configured to recall multiple first target images from the multiple images based on multiple words corresponding to the multiple images respectively, according to multiple target abnormal words, wherein the probability of the first target image belonging to the target abnormal image is greater than a second probability threshold; fuse the multiple first target images and the multiple candidate images to obtain multiple second target images, and extract the target features corresponding to the multiple second target images respectively;
[0048] The recognition module is used to recognize the image features corresponding to the multiple candidate images respectively, and obtain the image features corresponding to the multiple candidate images respectively; and obtain the anomaly index corresponding to the multiple images respectively based on the image features and the target features.
[0049] In one possible implementation, the device further includes:
[0050] The second acquisition module is used to acquire the initial abnormal vocabulary and the initial semantic recognition model;
[0051] The detection module is used to extract abnormal text from the abnormal event in response to the detection of an abnormal event and add the extracted abnormal text to the abnormal text set.
[0052] The update module is used to extract updated abnormal words from the abnormal text set when the number of texts included in the abnormal text set is not greater than a first quantity threshold, update the initial abnormal word library based on the updated abnormal words, and use the updated initial abnormal word library as the abnormal word library.
[0053] An adjustment module is used to adjust the initial semantic recognition model based on the abnormal text set when the number of texts included in the abnormal text set is greater than a first quantity threshold, and to use the adjusted initial semantic recognition model as the semantic recognition model.
[0054] In one possible implementation, the recognition module is used to call an image recognition model to recall multiple candidate images from the multiple images according to the recognition speed, recognize the image features corresponding to the multiple candidate images respectively, and obtain the anomaly index corresponding to the multiple images respectively based on the recognition results;
[0055] The second acquisition module is also used to acquire the initial image recognition model;
[0056] The detection module is also used to extract abnormal images from the abnormal events in response to the detection of abnormal events, and to add the extracted abnormal images to the abnormal image set;
[0057] The adjustment module is further configured to adjust the image recognition model based on the abnormal image set when the number of images included in the abnormal image set is greater than a second quantity threshold, and use the adjusted initial image recognition model as the image recognition model.
[0058] In one possible implementation, the updating module is further configured to, when the number of images included in the abnormal image set is not greater than the second quantity threshold, identify an image among the plurality of images whose similarity to any image included in the abnormal image set is greater than a similarity threshold as an abnormal image.
[0059] In one possible implementation, the device further includes:
[0060] The receiving module is used to respond to receiving multiple feedback messages for the first supply content, each feedback message including a corresponding exception label;
[0061] The second determining module is used to determine the first supply content as the abnormal content when the number of identical abnormal labels in the plurality of feedback information is greater than a third quantity threshold.
[0062] In one possible implementation, the plurality of supplied contents are supplied to a target object; the apparatus further includes:
[0063] The third determining module is used to determine the tolerance index of the target object based on at least one of the target object's historical feedback information or historical collection information, wherein the tolerance index is used to indicate the probability of providing feedback on the supplied content;
[0064] The third acquisition module is used to acquire the correspondence between the business type and the abnormal threshold corresponding to the target object based on the tolerance index of the target object.
[0065] Thirdly, a computer device is also provided, the computer device including a processor and a memory, the memory storing at least one piece of program code, the at least one piece of program code being loaded and executed by the processor to enable the computer device to implement any of the above-described content recognition methods.
[0066] Fourthly, a computer-readable storage medium is also provided, wherein at least one piece of program code is stored therein, the at least one piece of program code being loaded and executed by a processor to enable a computer to implement the content recognition method described in any of the preceding claims.
[0067] Fifthly, a computer program product or computer program is also provided, the computer program product or computer program including computer instructions stored in a computer-readable storage medium. A processor of a computer device reads the computer instructions from the computer-readable storage medium, and the processor executes the computer instructions, causing the computer device to perform any of the content recognition methods described above.
[0068] The technical solution provided in this application can bring at least the following beneficial effects:
[0069] The technical solution provided in this application offers different content recognition methods for different business types of supply content, enabling flexible acquisition of corresponding anomaly thresholds when recognizing supply content of different business types. The anomaly threshold is used to determine whether the supply content is abnormal. This method achieves refined content recognition by business type, is applicable to application scenarios including multiple business types, ensures the stability of the recognition effect for content recognition of different business types, and effectively improves the accuracy of anomaly content recognition. Attached Figure Description
[0070] To more clearly illustrate the technical solutions in the embodiments of this application, the accompanying drawings used in the description of the embodiments will be briefly introduced below. Obviously, the accompanying drawings described below are only some embodiments of this application. For those skilled in the art, other drawings can be obtained based on these drawings without creative effort.
[0071] Figure 1This is a schematic diagram of the implementation environment of a content recognition method provided in an embodiment of this application;
[0072] Figure 2 This is a flowchart of a content recognition method provided in an embodiment of this application;
[0073] Figure 3 This is a schematic diagram illustrating an embodiment of obtaining an image training sample set provided in this application;
[0074] Figure 4 This is a schematic diagram of an image recognition process provided in an embodiment of this application;
[0075] Figure 5 This is a schematic diagram of another image recognition process provided in an embodiment of this application;
[0076] Figure 6 This is a schematic diagram illustrating an embodiment of the present application for updating text recognition capabilities;
[0077] Figure 7 This is a schematic diagram illustrating an embodiment of updating image recognition capabilities provided in this application;
[0078] Figure 8 This is a schematic diagram of a content recognition device provided in an embodiment of this application;
[0079] Figure 9 This is a schematic diagram of the structure of a computer device provided in an embodiment of this application. Detailed Implementation
[0080] To make the objectives, technical solutions, and advantages of this application clearer, the embodiments of this application will be described in further detail below with reference to the accompanying drawings.
[0081] It should be stated that the information (including but not limited to object device information, object personal information, etc.) and data (including but not limited to data used for analysis, stored data, displayed data, etc.) mentioned in the technical solutions of this application are all collected and processed in compliance with relevant policies and regulations and with the consent of the corresponding subjects. This processed data is used in big data application scenarios and cannot be identified to any natural person or have a specific association with their privacy. For example, the multiple supply contents to be identified involved in this application were obtained with the authorization of the object or with full authorization from all parties.
[0082] With the development of internet technology, internet platforms have become a primary means of social interaction and information acquisition. Content provision is the core of these platforms; users essentially interact with the diverse content displayed on them. This content is typically published by content providers, who can be businesses, users, or the platforms themselves. When content providers publish inappropriate content, it can cause user discomfort or complaints. Therefore, content identification is necessary to purify the online environment and improve user experience.
[0083] Figure 1 This is a schematic diagram illustrating the implementation environment of a content recognition method provided in an embodiment of this application, such as... Figure 1 As shown, the implementation environment includes: terminal 101 and server 102.
[0084] The terminal 101 is equipped with an application that can acquire the supply content to be identified. After the application acquires the supply content to be identified, it sends the acquired supply content to be identified to the server 102. The server 102 can apply the content identification method provided in the embodiments of this application to determine whether the supply content is abnormal content.
[0085] Alternatively, terminal 101 may have an application installed capable of acquiring the supply content to be identified. After acquiring the supply content, the application uses the content identification method provided in this embodiment to identify abnormal content from the supply content. Alternatively, server 102 may have an application installed capable of acquiring the supply content to be identified. After acquiring the supply content, the application uses the content identification method provided in this embodiment to identify abnormal content from the supply content. Optionally, the supply content to be identified includes all content displayed by the application. For example, the content type of the supply content may be text, images, audio, or video.
[0086] Optionally, terminal 101 can be any electronic product that can interact with a user through one or more means such as a keyboard, touchpad, touch screen, remote control, voice interaction or handwriting device, such as PC (Personal Computer), mobile phone, smartphone, PDA (Personal Digital Assistant), wearable device, PPC (Pocket PC), tablet computer, smart car system, smart TV, smart speaker, etc.
[0087] Terminal 101 can refer to one of a plurality of terminals, and this embodiment uses terminal 101 as an example only. Those skilled in the art will know that the number of terminals 101 can be more or less. For example, there may be only one terminal 101, or there may be dozens or hundreds of terminals 101, or even more. This embodiment does not limit the number of terminals or the type of device.
[0088] Server 102 can be a single server, a server cluster consisting of multiple servers, or any of the following: a cloud computing platform or a virtualization center. This embodiment of the application does not limit this. Server 102 communicates with terminal 101 via a wired or wireless network. Server 102 has data receiving, data processing, and data sending functions. Of course, server 102 may also have other functions, which are not limited in this embodiment of the application.
[0089] Those skilled in the art should understand that the above-described terminal 101 and server 102 are merely examples. Other existing or future terminals and servers that are applicable to this application should also be included within the scope of protection of this application, and are hereby incorporated by reference.
[0090] This application provides a content recognition method, which is applied to the above-described implementation environment and executed by terminal 101, server 102, or both interactively. This application does not limit the specific implementation method. Figure 2 As shown in the embodiments of this application, the content recognition method includes the following steps.
[0091] Step 201: Obtain multiple supply contents to be identified. The content type of any supply content includes at least one of text, image, audio or video, and each supply content has a corresponding business type.
[0092] This application does not limit the method of obtaining the multiple supply contents to be identified in the embodiments. Optionally, at least one Internet interactive application is installed and running on the terminal, and users can upload content in the Internet interactive application. Therefore, the uploaded content obtained by accessing the Internet interactive application can be used as the multiple supply contents to be identified. Alternatively, all content displayed on the terminal's display interface can be used as the multiple supply contents to be identified.
[0093] In this embodiment, since there may be multiple running internet interactive applications, and each internet interactive application may include different types of business applications, the multiple supply contents obtained may be supply contents under different business types. For example, for network platforms with different businesses, the business type may include live streaming, social networking, entertainment, or group buying, etc.; for different users on the same network platform, the business type may include fruit shop, massage parlor, or fast food restaurant, etc.
[0094] Optionally, a supply content can be based on a single upload by a user. For example, if a user uploads a product display image and a product description, the content type of the supply content includes images and text. Alternatively, a supply content can be based on a content type, i.e., one supply content corresponds to one piece of text, or one supply content corresponds to one image, etc. Optionally, the content type of each supply content may include other types besides text, images, audio, or video, and this application embodiment does not limit this.
[0095] Step 202: Identify the multiple supply contents according to their content types, and obtain the anomaly index corresponding to each of the multiple supply contents based on the identification results. The anomaly index is used to indicate the probability that the supply contents belong to anomalous content.
[0096] After acquiring multiple supply content items to be identified, content identification can be performed on these items according to their different content types to determine whether any abnormal content is included. Subsequently, the abnormal content is removed from the online platform to improve the quality of the supply content displayed on the platform.
[0097] Since the content types of the multiple supply contents to be identified include at least one of text, images, audio, or video, this application embodiment designs different identification methods for different content types. Next, the identification methods for different types of supply content will be described in turn.
[0098] First, text recognition.
[0099] For multiple texts included in multiple supply contents, text processing methods are used to identify these multiple texts to obtain abnormal texts. In this embodiment, the texts can be further distinguished into a first text and a second text based on their length. The first text is also called short text, and the second text is also called long text. For example, the length of the first text is less than a length threshold, and the length of the second text is not less than the length threshold. Optionally, the length threshold can be flexibly adjusted according to the application scenario; for example, the length threshold is 5 characters or 2 words.
[0100] Since short texts are typically combinations of adjectives and nouns, such as shop names, product names, and dish names, keyword matching can be used to identify whether the content category of a short text is an anomalous category.
[0101] In one possible implementation, the multiple supply contents include multiple first texts; obtaining anomaly indices corresponding to the multiple supply contents based on the identification results includes: matching the multiple first texts with an anomaly vocabulary library and a high-frequency vocabulary library respectively, and obtaining the anomaly indices corresponding to the multiple first texts based on the matching results. The anomaly vocabulary library includes multiple anomalous words, and the high-frequency vocabulary library includes multiple high-frequency words.
[0102] It is understood that abnormal words are words indicating that the text belongs to an abnormal category, and high-frequency words are words whose supply frequency is higher than a frequency threshold, that is, words whose display frequency is higher than a frequency threshold. Optionally, the frequency threshold can be set based on experience or flexibly adjusted according to the application scenario; for example, the frequency threshold can be 50. In this embodiment of the application, before matching multiple first texts with the abnormal word library and the high-frequency word library respectively, the method further includes: obtaining the abnormal word library and the high-frequency word library.
[0103] Optionally, the methods for obtaining the abnormal vocabulary database and the high-frequency vocabulary database are not limited in the embodiments of this application. For example, a large number of abnormal and high-frequency words can be extracted from the historical content displayed on the network platform, and the extracted large number of abnormal and high-frequency words can be directly used as the abnormal vocabulary database and the high-frequency vocabulary database. When any first text successfully matches the abnormal vocabulary database, it indicates that the first text contains abnormal words, and the content category of the first text can be determined to be an abnormal category; when any first text successfully matches the high-frequency vocabulary database, it indicates that the first text contains high-frequency words, and the content category of the first text can be determined to be a non-abnormal category.
[0104] In one possible implementation, a large number of abnormal words are extracted from the historical content displayed on the network platform and denoted as abnormal word library B1; a large number of abnormal words are collected from the identified abnormal content and denoted as abnormal word library B2; the abnormal words in abnormal word libraries B1 and B2 are expanded using a similar word algorithm, and the expanded similar words are denoted as abnormal word library B3; finally, abnormal word libraries B1, B2 and B3 are merged to obtain abnormal word library B.
[0105] Similarly, the historical content displayed on the network platform is segmented to obtain a large number of words. The frequency of each word in the historical content is counted as the word frequency. Words with a frequency greater than the frequency threshold are designated as high-frequency words, resulting in a high-frequency word library W1. Then, a similar word algorithm is used to expand the high-frequency words in the high-frequency word library W1, and the expanded similar words are designated as the high-frequency word library W2. Finally, the high-frequency word libraries W1 and W2 are merged to obtain the high-frequency word library W.
[0106] This application does not limit the similarity algorithm used, as long as it can obtain at least one similar word for any given word. For example, the similarity algorithm can use word vector technology to calculate similarity, or it can train a BERT (Bidirectional Encoder Representation from Transformers) model to predict similar words.
[0107] Therefore, the above methods can be used to obtain an abnormal vocabulary database and a high-frequency vocabulary database. Since both abnormal and high-frequency vocabulary are obtained from the historical content displayed on the network platform, and this content recognition method is applied to this network platform, it is accurate and targeted. Furthermore, because a similar word algorithm is used to expand the obtained abnormal and high-frequency vocabulary, the abnormal and high-frequency vocabulary databases contain a sufficient amount of vocabulary, thereby improving the accuracy of the first text recognition.
[0108] In one possible implementation, after obtaining the abnormal vocabulary database B and the high-frequency vocabulary database W, the vocabulary can be classified to obtain the abnormal level corresponding to each abnormal vocabulary and the high-frequency level corresponding to each high-frequency vocabulary. That is, the abnormal vocabulary database also includes the abnormal level corresponding to each abnormal vocabulary, and the high-frequency vocabulary database also includes the high-frequency level corresponding to each high-frequency vocabulary, thereby better controlling the control intensity of vocabulary at different levels.
[0109] This application does not limit the criteria for word classification. Optionally, abnormal words can be classified according to the degree of abnormality or the frequency of their occurrence; high-frequency words can be classified according to their frequency or their importance. Similarly, this application does not limit the number of word classification levels. Optionally, abnormal words can be divided into two levels, that is, abnormal words in the abnormal word database B can be divided into levels 1 and 2, and high-frequency words can also be divided into two levels, that is, words in the high-frequency word database W can also be divided into levels 1 and 2.
[0110] In one possible implementation, obtaining anomaly indices corresponding to multiple first texts based on the matching results includes: for any first text among the multiple first texts, if the matching result of any first text is at least one anomalous word, determining the anomaly index of that first text as a first index; if the matching result of any first text is at least one high-frequency word, determining the anomaly index of that first text as a second index; if the matching result of any first text includes at least one anomalous word and at least one high-frequency word, obtaining the highest anomaly level among the at least one anomalous word and the highest high-frequency level among the at least one high-frequency word; if the highest anomaly level is greater than or equal to the highest high-frequency level, determining the anomaly index of any first text as the first index; if the highest anomaly level is less than the highest high-frequency level, determining the anomaly index of any first text as the second index.
[0111] In this embodiment of the application, the first index indicates a high probability that any first text belongs to an abnormal category, and the second index indicates a low probability that any first text belongs to an abnormal category. For example, taking a percentage probability as an example, the first index can be any probability value greater than 60, such as 100, and the second index can be any probability value less than 40, such as 0.
[0112] For example, taking the identification of anomaly index of any first text as an example, the first text is first segmented to obtain at least one word to be identified; the word to be identified is matched with the anomalous words in the anomalous word library B. If the match fails, that is, the word to be identified is different from any anomalous word in the anomalous word library B, then the anomalous index of the first text is determined to be 0; if it is successfully matched with the anomalous words in the anomalous word library B, that is, the word to be identified is the same as a certain anomalous word in the anomalous word library B, and the match fails with the high-frequency words in the high-frequency word library W, then the anomalous index of the first text is determined to be 100; if it is successfully matched with both the anomalous word library B and the high-frequency word library W, then the highest level lvb in the matched anomalous words and the highest level lvw in the matched high-frequency words are obtained. When lvb > lvw, the anomalous index of the first text is determined to be 100, and when lvb <= lvw, the anomalous index of the first text is determined to be 0.
[0113] For long texts, such as recommendations, introductions, and brand stories displayed on online platforms, semantic understanding is needed to identify whether they contain anomalous information in addition to keywords, thus determining whether the content category of the long text is abnormal.
[0114] In one possible implementation, the method further includes: invoking a semantic recognition model to extract semantic features corresponding to multiple second texts, and obtaining anomaly indices corresponding to the multiple second texts based on the semantic features. This application does not limit the model structure used in the semantic recognition model. Optionally, any network model used for recognizing semantic features can be used, such as a BERT model or a hidden Markov model. That is, the semantic recognition model is trained based on any network model used for recognizing semantic features.
[0115] When the anomalous content includes anomalous content of multiple anomalous categories, for example, the anomalous content can be divided into a first anomalous category, a second anomalous category, or a third anomalous category. Optionally, obtaining multiple anomalous indices corresponding to multiple second texts based on semantic features includes: obtaining multiple anomalous indices corresponding to multiple anomalous categories for multiple second texts based on semantic features. For example, for any second text, taking an anomalous category including a first anomalous category, a second anomalous category, and a third anomalous category as an example, the multiple anomalous indices corresponding to this second text include a first anomalous index for the first anomalous category, a second anomalous index for the second anomalous category, and a third anomalous index for the third anomalous category. It can be understood that the first anomalous index is used to indicate the probability that the supplied content belongs to the first anomalous category, the second anomalous index is used to indicate the probability that the supplied content belongs to the second anomalous category, and the third anomalous index is used to indicate the probability that the supplied content belongs to the third anomalous category.
[0116] Before calling the semantic recognition model, the initial semantic model needs to be trained using a text training sample set. The semantic recognition model is then obtained based on the training results. Optionally, the initial semantic model can be a BERT model. The text training sample set includes multiple text samples, each with a corresponding semantic label. Optionally, when the semantic recognition model is a binary classification model, the semantic labels can be anomalies and non-anomalies; when the semantic recognition model is a multi-class classification model, the semantic labels can be non-anomalies and multiple different anomaly categories, which can include a first anomaly category, a second anomaly category, a third anomaly category, or a fourth anomaly type, etc.
[0117] In one possible implementation, multiple text samples are input into an initial semantic model, and the initial semantic model outputs anomaly indices corresponding to the multiple text samples respectively. Alternatively, the initial semantic model outputs multiple anomaly indices corresponding to different anomaly categories for the multiple text samples. The loss function value is obtained based on the anomaly indices and semantic labels. The model parameters of the initial semantic model are adjusted based on the loss function value until the loss function value meets the loss requirements.
[0118] This application does not limit the method of obtaining the text training sample set. For example, taking the abnormal vocabulary library B and the high-frequency vocabulary library W as an example, a combined vocabulary library D1 is constructed based on the combination of various abnormal words; common abnormal sentence patterns are constructed, such as "_so comfortable", and abnormal words are filled into the sentence pattern _ to obtain the abnormal sentence pattern library D2; a semantic matching model is used to recall similar words and sentence patterns that have a similarity higher than the similarity threshold with the combined vocabulary library D1 and the abnormal sentence pattern library D2, and a similarity library D3 is constructed based on the similar words and sentence patterns. Optionally, the text in the combined vocabulary library D1, the abnormal sentence pattern library D2 and the similarity library D3 are manually labeled. If the text belongs to the abnormal category, it is added to the positive text sample set P, and if the text belongs to the non-abnormal category, it is added to the negative text sample set N. Thus, a text training sample set including the positive text sample set P and the negative text sample set N is obtained.
[0119] Optionally, real texts of identified anomaly categories from historical records can be added to the positive sample set P, making the number of samples in the text training sample set more sufficient. This embodiment does not limit the semantic matching model; for example, the semantic matching model can be a SimNet (simulated network) model. Optionally, the similarity threshold can be set empirically or flexibly adjusted according to the application scenario; for example, the similarity threshold can be 80%.
[0120] In one possible implementation, the method further includes: invoking a semantic recognition model to extract semantic features corresponding to multiple first texts, and obtaining anomaly indices corresponding to the multiple first texts based on the semantic features. For first texts with shorter lengths, semantic recognition can also be combined to increase the accuracy of the first text recognition. In this case, the first text has two corresponding anomaly indices, and the final anomaly index corresponding to the first text needs to be determined based on these two anomaly indices. For example, the larger of the two anomaly indices can be used as the final anomaly index corresponding to the first text.
[0121] Therefore, the above process can identify the anomaly index corresponding to the text type content in multiple supply contents.
[0122] Second, image recognition.
[0123] For multiple images included in multiple supply contents, image processing methods are used to identify the multiple images to obtain abnormal images among them.
[0124] In one possible implementation, content identification is performed on multiple supply content items based on their content types, and anomaly indices are obtained for each of the multiple supply content items based on the identification results. This includes: recalling multiple candidate images from multiple images according to the identification speed, wherein the probability of each candidate image belonging to anomalous content is greater than a first probability threshold; identifying the image features corresponding to each of the multiple candidate images; and obtaining the anomaly indices corresponding to each of the multiple images based on the identification results. Optionally, the first probability threshold may be set based on experience or flexibly adjusted according to the application scenario; for example, the first probability threshold may be 50%.
[0125] In this embodiment, the method for recalling multiple candidate images from multiple images according to the recognition speed is not limited; the recognition speed only needs to be greater than a speed threshold. The speed threshold may be flexibly adjusted according to the application scenario; for example, the speed threshold is 100 images recognized per second. Exemplarily, recalling multiple candidate images from multiple images according to the recognition speed can be implemented using the MobileNetV3 model. The MobileNetV3 model is a lightweight network with few parameters and low computational cost, and its execution speed can meet the requirement of exceeding the speed threshold. Therefore, it is possible to quickly recall multiple candidate images from a large number of images whose probability of belonging to an abnormal category is greater than a first probability threshold, greatly reducing the number of images that need to be accurately determined as abnormal.
[0126] Optionally, recalling multiple candidate images from multiple images according to recognition speed can also be achieved using the ModuleNetV3+CBAM (convolutional block attention module) model. CBAM is a module that combines spatial and channel attention mechanisms. CBAM is also a lightweight and general-purpose module that can be seamlessly integrated into ModuleNetV3 to improve its classification capabilities. This improves the recognition ability for images where features indicating anomalies are concentrated in a specific part of the image. For example, for images containing disgusting raw meat, the focus is often on a particular part of the image; adding an attention mechanism module can improve the accuracy of the recalled candidate images.
[0127] Similarly, in this application embodiment, the image features corresponding to multiple candidate images are identified respectively. The method for obtaining the anomaly index corresponding to multiple images based on the identification results is not limited, as long as the accuracy of extracting image features is greater than the accuracy threshold. The accuracy threshold may be flexibly adjusted according to the application scenario, for example, the accuracy threshold is 80%. For example, the image recognition method can adopt the open-source BiT (Big Transfer) model. The BiT model is a pre-trained model that can achieve excellent performance and high accuracy by performing simple transfer learning on a new dataset.
[0128] In this embodiment of the application, when the abnormal content includes abnormal content of multiple abnormal categories, for example, the abnormal content can be divided into abnormal content of a first abnormal category, abnormal content of a second abnormal category, or abnormal content of a third abnormal category. Optionally, obtaining the abnormal index corresponding to multiple images according to image features includes: obtaining multiple abnormal indices corresponding to multiple images for multiple abnormal categories according to image features.
[0129] In one possible implementation, the anomalous content includes a target anomalous image, which belongs to a target category. Target features are used to indicate the target anomalous image belonging to the target category. The method further includes: recalling multiple first target images from multiple images based on multiple words corresponding to multiple images, where the probability that a first target image belongs to a target anomalous image is greater than a second probability threshold; fusing multiple first target images and multiple candidate images to obtain multiple second target images; and extracting target features corresponding to each of the multiple second target images. In this case, identifying the image features corresponding to each of the multiple candidate images and obtaining an anomalous index corresponding to each of the multiple images based on the identification results includes: identifying the image features corresponding to each of the multiple candidate images to obtain the image features corresponding to each of the multiple candidate images; and obtaining the anomalous index corresponding to each of the multiple images based on the image features and target features.
[0130] Optionally, obtaining anomaly indices for multiple images based on image features and target features includes: obtaining image anomaly indices and target anomaly indices for multiple images based on image features and target features, and then using the image anomaly index and target anomaly index for any given image as the two anomaly indices for that given image; or, using the larger value between the image anomaly index and the target anomaly index for any given image as the anomaly index for that given image. The image anomaly index indicates the probability that an image is an anomalous image, and the target anomaly index indicates the probability that an image is a target anomalous image.
[0131] In this embodiment, each image includes corresponding text information, which includes multiple words. For example, the text information corresponding to an image may include an image title, an image description, or text extracted from the image. Therefore, after determining multiple target anomalous words, multiple first target images can be recalled from multiple images based on these target anomalous words.
[0132] This application does not limit the method of recalling multiple first target images from multiple images based on multiple target anomalous words, as long as the probability that the recalled first target image belongs to the target anomalous image is greater than a second probability threshold. Optionally, the second probability threshold can be set based on experience, or flexibly adjusted according to the application scenario, for example, the second probability threshold is 50%. For example, the multiple target anomalous words are matched with the multiple words included in the text information corresponding to the multiple images, and the images that are successfully matched are taken as the multiple first target images.
[0133] In this embodiment of the application, since there may be duplicate images among the multiple first target images and multiple candidate images, it is necessary to fuse the multiple first target images and multiple candidate images to obtain multiple second target images. Optionally, duplicate images are obtained from the multiple first target images and multiple candidate images, the duplicate images in the multiple first target images are deleted, and then the deleted multiple first target images are merged with the multiple candidate images to obtain multiple second target images.
[0134] In one possible implementation, before recalling multiple first target images from multiple images based on multiple target anomalous words, it is first necessary to obtain these multiple target anomalous words. For example, multiple target anomalous words are extracted from images based on historical target categories, such that the probability that an image containing any target anomalous word belongs to the target category is greater than a second probability threshold. Optionally, a similarity word algorithm can also be used to expand the extracted multiple target anomalous words.
[0135] Understandably, the method for extracting target features corresponding to multiple second target images needs to be flexibly set according to different target features. For example, when the target category is exposed skin, the target feature corresponding to this target category is the area of the exposed skin region. Therefore, a human parsing model technique for exposed skin recognition can be used, or an MTCNN (multi-task convolutional neural network) technique for face detection can be used.
[0136] Optionally, the anomalous content also includes a reference anomalous image, which belongs to a reference category. Reference features are used to indicate the reference anomalous image belonging to the reference category. The method further includes: recalling multiple third target images from the multiple images based on multiple words corresponding to each image, according to multiple target anomalous words, where the probability that the third target image belongs to the reference anomalous image is greater than a second probability threshold; fusing the multiple third target images and multiple candidate images to obtain multiple fourth target images; and extracting target features corresponding to each of the multiple fourth target images. In this case, obtaining anomaly indices corresponding to multiple images based on image features includes: obtaining anomaly indices corresponding to multiple images based on image features, target features, and reference features.
[0137] In this embodiment, by employing a recall-then-identify approach, both the speed and accuracy of content recognition are improved. Furthermore, online platforms typically display a large volume of content, but during image content recognition, the CPU (Central Processing Unit) of the execution device is idle while loading images, resulting in wasted GPU resources. In one possible implementation, while the GPU is performing content recognition on any image, the CPU loads the next image to be recognized, reducing GPU idle time. Thus, by utilizing the pipeline acceleration of the CPU and GPU, the speed of image content recognition is improved, reducing the time difference between image content recognition and text content recognition.
[0138] Optionally, the execution device can employ a high-performance deep learning engine (TensorRT) for content recognition. TensorRT can decompose and re-fuse the trained model, resulting in a highly ensemble model. Furthermore, TensorRT incorporates common deep learning implementation techniques, such as model quantization and dynamic memory optimization, thus improving both the speed and memory usage of content recognition.
[0139] In one possible implementation, the image recognition process described above can be executed by a constructed image recognition model. Optionally, multiple candidate images are recalled from multiple images according to a recognition speed, the image features corresponding to each of the multiple candidate images are recognized, and anomaly indices corresponding to each of the multiple images are obtained based on the recognition results. This includes: calling the image recognition model to recall multiple candidate images from multiple images according to a recognition speed, recognizing the image features corresponding to each of the multiple candidate images, and obtaining anomaly indices corresponding to each of the multiple images based on the recognition results.
[0140] Alternatively, multiple candidate images are retrieved from multiple images according to the recognition speed, and image features corresponding to each candidate image are extracted; multiple first target images are retrieved from multiple images based on multiple words corresponding to each image and multiple target anomalous words; multiple first target images and multiple candidate images are fused to obtain multiple second target images, and target features corresponding to each second target image are extracted; and anomalous indices corresponding to each image are obtained based on the image features and target features. This includes: calling an image recognition model to retrieve multiple candidate images from multiple images according to the recognition speed, recognizing the image features corresponding to each candidate image, and obtaining anomalous indices corresponding to each image based on the recognition results; retrieving multiple first target images from multiple images based on multiple words corresponding to each image and multiple target anomalous words; fusing multiple first target images and multiple candidate images to obtain multiple second target images, extracting target features corresponding to each second target image, and obtaining anomalous indices corresponding to each image based on the image features and target features.
[0141] It is understandable that when using an image recognition model to recognize images, the model parameters of the initial image recognition model need to be trained using an image training sample set to obtain the image recognition model with the target model parameters. This application does not limit the training method; for example, the MobileNet V3 model can be pre-trained using the self-supervised learning MOCO (MomentumContrast) framework. Since abnormal sample images account for a small proportion of the image training sample set, the distribution of positive and negative sample data in the image training sample set is unbalanced. Therefore, a focal loss function can be introduced during training to effectively solve the problem of unbalanced positive and negative sample data distribution. Training the BiT model in the transfer learning stage involves adjusting the pre-trained BiT model. Since the BiT model has already been pre-trained, this adjustment refers to fine-tuning, i.e., slight adjustments. For example, fine-tuning can use the BiT-HyperRule method to adjust the model parameters of the BiT model. In addition, the training time can be adjusted according to the sample size in the image training sample set, and the image size input to the BiT model can be adjusted according to the resolution of the sample images.
[0142] Before training the model parameters of the initial image recognition model using the image training sample set, it is necessary to obtain the image training sample set first. This application does not limit the method of obtaining the image training sample set; it is possible to extract a large number of abnormal and non-abnormal images from the historical content displayed on the network platform as the image training sample set.
[0143] For example, see Figure 3 , Figure 3This is a schematic diagram illustrating an embodiment of obtaining an image training sample set. For example... Figure 3 As shown, firstly, an initial abnormal image library is obtained based on anomaly images identified in historical records. Then, a similar image retrieval algorithm is used to recall similar abnormal images from the full image library that are similar to the abnormal images in the initial abnormal image library. The similar abnormal image library is then merged with the initial abnormal image library to obtain the first image sample library. The similar image retrieval algorithm can be a PHash (perceptual hash) similarity algorithm. Optionally, the images in the first image sample library are manually labeled. Images belonging to the abnormal category are added to the positive image sample set, and images belonging to the non-abnormal category are added to the negative image sample set. This results in an image training sample set including both the positive and negative image sample sets.
[0144] like Figure 3 As shown, after obtaining the first image sample library, data augmentation can be performed on the images in the first image sample library. The data-augmented images are then input into a lightweight classification model. The lightweight classification model obtains the anomaly index of the data-augmented images for the anomaly category, thus obtaining a second image sample library corresponding to the data-augmented images with an anomaly index higher than a threshold. Images in this second image sample library are added to the first image sample library as negative samples to expand the training sample set of the first image. Optionally, data augmentation methods include random flipping, random erasing, random cropping, etc. A lightweight classification model refers to a classification model with a relatively low number of parameters and low computational cost. For example, a lightweight classification model can be the MobileNetV3 model mentioned above.
[0145] like Figure 3 As shown, in this embodiment, the images in the expanded first image sample library can be repeatedly augmented and classified using a lightweight classification model to obtain a second image sample set for this cycle. This second image sample set is then used to further expand the first image sample library. Through multiple iterations, the expanded first image sample set is used as the image training sample set in this embodiment, thereby significantly increasing the number of negative images in the image training sample set.
[0146] It is understandable that, since images may contain anomalous content not only in themselves but also in the text within them, this embodiment of the application can further extract the text contained in any of the plurality of images, perform text recognition on the text contained in the image using a text recognition method to obtain the text anomalous index corresponding to the image, and perform image recognition on the image using an image recognition method to obtain the image anomalous index corresponding to the image. In other words, each image includes at least one of a corresponding image anomalous index or a text anomalous index. In this case, the anomalous index of each image is obtained based on at least one of the corresponding image anomalous index or text anomalous index.
[0147] Optionally, when any image includes a corresponding image anomaly index and a text anomaly index, the anomaly index of any image is determined based on the image anomaly index and the text anomaly index, including: using the maximum value of the image anomaly index and the text anomaly index as the anomaly index of any image; or, using the weighted average of the image anomaly index and the text anomaly index as the anomaly index of any image. Thus, by performing two-dimensional recognition of both image recognition and text recognition on the images in the supplied content, the accuracy of content recognition of the images in the supplied content is improved.
[0148] This application does not limit the method of extracting text from an image. Optionally, an OCR (Optical Character Recognition) model can be used. For example, extracting text from an image includes: for a first image, detecting whether text exists in the first image, for example, using a Mask RCNN (Mask Region Convolutional Neural Network) model to identify text boxes contained in the first image; when a text box is detected in the first image, it is determined that text exists in the first image; when it is determined that text exists in the first image, detecting the text within the text box, for example, using a CRNN (Convolutional Recurrent Neural Network) model to extract the text within the text box, and obtaining the text included in the image based on the extraction result.
[0149] Third, audio recognition.
[0150] For multiple audio files included in multiple supply contents, audio processing methods are used to identify these multiple audio files in order to obtain abnormal audio files among them.
[0151] Since the features indicating anomalies in audio come from two parts—first, the text in the audio, which can be extracted using speech-to-text technology and then identified using the aforementioned text recognition model; and second, the spectrogram in the audio, which can be identified using spectrogram detection technology, for example, by obtaining the spectrogram in the audio and inputting it into a CNN (Convolutional Neural Network) model to identify abnormal spectrograms. This application does not limit the application of speech-to-text technology; it can be any speech recognition technology, such as DTW (Dynamic Time Warping) technology or HMM (Hidden Markov Model) technology.
[0152] In one possible implementation, the multiple supply contents include multiple audio files; the multiple supply contents are identified using at least one of the following types: text, image, audio, or video; and anomaly indices are obtained for each of the multiple supply contents based on the identification results. This includes: extracting multiple texts and multiple spectra corresponding to the multiple audio files; identifying the multiple texts using the aforementioned text recognition method; obtaining text anomaly indices for the multiple texts based on the text recognition results; performing vulgar spectra detection on the multiple spectra; obtaining spectra anomaly indices for the multiple spectra based on the detection results; and obtaining anomaly indices for each of the multiple audio files based on the text anomaly indices and spectra anomaly indices.
[0153] Optionally, anomaly indices are obtained for multiple audio files based on the text anomaly index and the spectrogram anomaly index. This includes: for any audio file, using the larger value between the text anomaly index and the spectrogram anomaly index as the anomaly index for that audio file; or, using the weighted average of the text anomaly index and the spectrogram anomaly index as the anomaly index for that audio file. Thus, by performing two-dimensional recognition of the audio in the supplied content using both spectrogram recognition and text recognition, the accuracy of content recognition for the audio in the supplied content is improved.
[0154] Fourth, video recognition.
[0155] For multiple videos included in multiple supply content, video processing methods are used to identify the multiple videos in order to obtain the abnormal videos among them.
[0156] Since the features that indicate anomalies in the video come from three parts: first, the text in the video, which can be extracted using the OCR recognition technology mentioned above and then recognized using the text recognition method mentioned above; second, the audio in the video, which can be recognized using the audio recognition method mentioned above; and third, the images in the video, which can be recognized using the image recognition method mentioned above.
[0157] In one possible implementation, the multiple supply contents include multiple videos; the multiple supply contents are identified using at least one type of recognition, namely text, image, audio, or video, and anomaly indices are obtained for each of the multiple supply contents based on the recognition results, including: extracting multiple texts, multiple audios, and multiple images corresponding to the multiple videos; using the aforementioned text recognition method to identify the multiple texts, and obtaining a text anomaly index corresponding to the multiple texts based on the text recognition results; using the aforementioned audio recognition method to identify the multiple audios, and obtaining an audio anomaly index corresponding to the multiple texts based on the audio recognition results; using the aforementioned image recognition method to identify the multiple images, and obtaining an image anomaly index corresponding to the multiple texts based on the image recognition results; and obtaining anomaly indices for each of the multiple videos based on the text anomaly index, audio anomaly index, and image anomaly index.
[0158] Optionally, anomaly indices are obtained for multiple videos based on text anomaly index, audio anomaly index, and image anomaly index, including: for any video among the multiple videos, the larger value among text anomaly index, audio anomaly index, and image anomaly index is taken as the anomaly index corresponding to that video; or, the weighted average of text anomaly index, audio anomaly index, and image anomaly index is taken as the anomaly index corresponding to that video.
[0159] Therefore, by performing image recognition, audio recognition, and text recognition on the videos in the supplied content, the accuracy of content recognition in the videos in the supplied content is improved.
[0160] Step 203: Determine the abnormal thresholds corresponding to multiple supply contents based on the correspondence between business types and abnormal thresholds, and determine whether any supply content among the multiple supply contents is abnormal based on the relationship between the abnormal index and the corresponding abnormal threshold.
[0161] In this embodiment, since different business types have different anomaly characteristics, different anomaly thresholds are set for images of different business types. These thresholds serve as the standard for measuring whether the supplied content is abnormal. Different business types correspond to different anomaly thresholds, enabling accurate identification of abnormal content under different business types. For example, business types include live streaming, social media, and group buying. Since live streaming and social media may have a higher probability of generating abnormal content, the anomaly thresholds for these businesses are appropriately increased. For instance, the anomaly thresholds for live streaming, social media, and group buying are 80%, 80%, and 90%, respectively. It is understood that the anomaly thresholds for different business types can be flexibly adjusted according to the application scenario.
[0162] After determining the correspondence between business types and anomaly thresholds, since any supply content to be identified has a corresponding business type, the anomaly threshold corresponding to the business type of the supply content can be obtained, thus obtaining the anomaly threshold corresponding to the supply content.
[0163] In one possible implementation, determining whether any one of the multiple supply contents is an abnormal content based on the relationship between the abnormality index and the corresponding abnormality threshold includes: determining the supply contents whose abnormality index is greater than the corresponding abnormality threshold as abnormal content, and determining the supply contents whose abnormality index is not greater than the corresponding abnormality threshold as non-abnormal content.
[0164] Similarly, for target images of different business types, the recognition standards for target features differ due to the varying characteristics of each business. Taking images of exposed skin as an example of anomaly targets, since most images of massage parlors and close-ups of oranges in fruit shops also exhibit similar features, different target anomaly thresholds should be designed to improve the accuracy of skin exposure recognition when identifying images of exposed skin for massage parlors and fruit shops. For instance, the target anomaly thresholds for fruit shops, massage parlors, and live streaming are 90%, 90%, and 70%, respectively.
[0165] In this embodiment, depending on the different content types of the supplied content, the anomaly threshold corresponding to any business type may include at least one of the following: text anomaly threshold, image anomaly threshold, target anomaly threshold, audio anomaly threshold, or video anomaly threshold. Optionally, the multiple supplied contents include multiple texts. Determining whether any of the multiple supplied contents is an anomaly based on the relationship between the anomaly index and the corresponding anomaly threshold includes: determining the supplied content with an anomaly index greater than the corresponding text anomaly threshold as an anomaly text, and determining the supplied content with an anomaly index not greater than the corresponding text anomaly threshold as a non-anomaly text.
[0166] Optionally, the multiple supply contents include multiple images. When any image includes two corresponding anomaly indices, namely, the two anomaly indices are the image anomaly index and the target anomaly index, the anomaly thresholds include the image anomaly threshold and the target anomaly threshold. Determining whether any image among the multiple images is an anomalous image based on the relationship between the anomaly index and the corresponding anomaly threshold includes: identifying images whose image anomaly index is greater than the corresponding image anomaly threshold and whose target anomaly index is greater than the corresponding target anomaly threshold as anomalous images; the remaining images are identified as non-anomalous images. Since a target feature recognition method is also provided for specific target anomaly images, the recognition results obtained by combining target features are more accurate.
[0167] In this embodiment, the anomaly threshold corresponding to any business type may further include different anomaly thresholds for different anomaly categories, depending on the anomaly category. For example, at least one of a first anomaly threshold for a first anomaly category, a second anomaly threshold for a second anomaly category, or a third anomaly threshold for a third anomaly category. It is understood that there is a one-to-one correspondence between the anomaly threshold and the anomaly category. Optionally, the anomaly index includes multiple anomaly indices for multiple anomaly categories, and multiple anomaly thresholds corresponding to these multiple anomaly indices are determined based on the correspondence between business types and anomaly thresholds. Determining whether any one of the multiple supply contents is an anomaly content based on the relationship between the anomaly index and the corresponding anomaly threshold includes: for any anomaly category, determining the supply contents whose anomaly index corresponding to that anomaly category is greater than the anomaly threshold corresponding to that anomaly category as an anomaly content of that anomaly category; the remaining images are determined as non-anomaly content.
[0168] For example, based on the identification results, a first anomaly index for a first anomaly category and a second anomaly index for a second anomaly category are obtained for multiple supply contents. Based on the correspondence between business type and anomaly threshold, a first anomaly threshold for the first anomaly category and a second anomaly threshold for the second anomaly category are determined for each of the multiple supply contents. For any supply content among the multiple supply contents, if the first anomaly index is greater than the first anomaly threshold corresponding to that supply content, that supply content is determined to be an anomaly content of the first anomaly category; if the second anomaly index is greater than the second anomaly threshold corresponding to that supply content, that supply content is determined to be an anomaly content of the second anomaly category; if the first anomaly index is not greater than the first anomaly threshold corresponding to that supply content, and the second anomaly index is not greater than the second anomaly threshold corresponding to that supply content, that supply content is determined to be non-anomaly content.
[0169] In this embodiment, based on different content types and different anomaly categories of the abnormal content, the anomaly threshold corresponding to any service type may include at least one of a text anomaly threshold, an image anomaly threshold, a target anomaly threshold, an audio anomaly threshold, or a video anomaly threshold. At least one of these thresholds may also include different anomaly thresholds for different anomaly categories. For example, taking a content type including text and images, and an anomaly category including a second anomaly category and a first anomaly category, the correspondence between service types and anomaly thresholds includes a first text anomaly threshold and a first image anomaly threshold for any service type for the second anomaly category, and a second text anomaly threshold and a second image anomaly threshold for the first anomaly category.
[0170] In one possible implementation, the multiple supply contents include multiple texts and multiple images, and the anomaly categories include a first anomaly category and a second anomaly category. Anomaly indices corresponding to the multiple supply contents are obtained based on the identification results, including: obtaining a first text anomaly index and a second image anomaly index for the first anomaly category, and a second text anomaly threshold and a second image anomaly threshold for the second anomaly category, respectively, based on the identification results. Anomaly thresholds corresponding to the multiple supply contents are determined based on the correspondence between service types and anomaly thresholds, including: determining a first text anomaly threshold and a first image anomaly threshold for the first anomaly category, and a second text anomaly threshold and a second image anomaly threshold for the second anomaly category, respectively, based on the correspondence between service types and anomaly thresholds.
[0171] In this case, determining whether any one of the multiple supply contents is abnormal content is based on the relationship between the abnormality index and the corresponding abnormality threshold includes: determining text with a first text abnormality index greater than a first text abnormality threshold as abnormal text of the first abnormality category, determining text with a second text abnormality index greater than a second text abnormality threshold as abnormal text of the second abnormality category, determining images with a first image abnormality index greater than a first image abnormality threshold as abnormal images of the first abnormality category, determining images with a second image abnormality index greater than a second image abnormality threshold as abnormal images of the second abnormality category, and determining the remaining supply contents as non-abnormal content.
[0172] For example, using the MobileNetV3 model to recall multiple candidate images from multiple images according to the recognition speed, and using the BiT model to obtain the anomaly index of the candidate images, with the target anomalous image being a bare skin image, see [example missing]. Figure 4 , Figure 4 This is a schematic diagram illustrating an image recognition process provided in an embodiment of this application. Figure 4As shown, multiple images from multiple supply contents to be identified are input into a MobileNetV3 model and a multiple target anomaly vocabulary recall model. The MobileNetV3 model quickly identifies the multiple images to recall multiple candidate images. The multiple target anomaly vocabulary recall model recalls multiple first target images from the multiple images, which are suspected images of exposed skin. Then, the multiple candidate images are input into a BiT model, which accurately identifies the multiple candidate images to obtain multiple anomaly indices for multiple anomaly categories for each candidate image. Through a module that refines the anomaly threshold by business type, multiple anomaly thresholds for multiple anomaly categories corresponding to the business type of the multiple candidate images are obtained. Based on the multiple anomaly indices for multiple anomaly categories and the multiple anomaly thresholds for multiple anomaly categories for each candidate image, the anomaly images with anomaly indices corresponding to the corresponding anomaly thresholds in the multiple candidate images are obtained as anomaly image set 1.
[0173] like Figure 4 As shown, multiple candidate images and multiple first target images are input into a human parsing model. The human parsing model detects the exposed skin area in multiple second target images obtained by fusing the multiple candidate images and multiple first target images, obtaining the exposed skin area corresponding to each of the multiple second target images. An anomaly threshold module is used to refine the anomaly threshold by business type to obtain the exposed skin area threshold corresponding to the business type of each of the multiple second targets. Based on the exposed skin area corresponding to each of the multiple second target images and the exposed skin area threshold corresponding to the business type of each second target image, the exposed skin images in the multiple second target images are obtained as anomaly image set 2. Anomaly image set 1 and anomaly image set 2 are merged to obtain anomaly image set 3, which is the final recognition result.
[0174] For example, using the MobileNetV3+CBAM model to recall multiple candidate images from multiple images according to the recognition speed, and using the BiT model to obtain the anomaly index of the candidate images, with the target anomalous image being a bare skin image, see [example missing]. Figure 5 , Figure 5 This is a schematic diagram illustrating another image recognition process provided in an embodiment of this application. For example... Figure 5 As shown, the first business image and the second business image are first separated from the multiple images in the multiple supply contents to be identified. It can be understood that the first business image is an image under the business type that may contain images of exposed skin, such as massage parlor business, and the second business image is an image under the business type that may have abnormal features focused on a part of the image.
[0175] like Figure 5 As shown, multiple images from multiple supply content to be identified are input into the MobileNetV3 model for rapid identification. The second service image to be identified is input into the CBAM attention module, which extracts attention features from it. Multiple first candidate images are obtained based on the input of the MobileNetV3 model, and multiple second candidate images are obtained based on the output of the CBAM attention module. These first and second candidate images are then fused to obtain multiple candidate images. These candidate images are then input into the BiT model for accurate identification. The multiple first service images are then input into a multi-target anomaly word recall model to retrieve multiple first target images. The multiple first candidate images and multiple first target images are then input into a human parsing model to detect exposed skin area in the fused second target images. Finally, the BiT model and human parsing model are combined. The output of the parsing model is input into the module for finely defining anomaly thresholds based on business type, thereby obtaining the set of abnormal images corresponding to the multiple images.
[0176] In this embodiment of the application, a given content may simultaneously incorporate multiple dimensions such as text, images, audio, and video. Therefore, a multi-dimensional judgment method can be used to determine whether the content is anomalous. Optionally, multiple types of content corresponding to the content are obtained, and each type is identified using a corresponding recognition method. Based on the recognition results, it is determined whether any one of the multiple types of content is anomalous. Based on the results of whether each of the multiple types of content is anomalous, the overall anomalous content is determined. For example, taking a given content that includes text, images, audio, and video as an example, the aforementioned text recognition method, image recognition method, audio recognition method, and video recognition method can be used to determine whether the given content is anomalous text, anomalous image, anomalous audio, or anomalous video, respectively. Thus, it is possible to determine whether the given content is anomalous.
[0177] For example, if a certain supply content is determined to be any one of abnormal text, abnormal image, abnormal audio, or abnormal video, then the supply content is determined to be abnormal content; if a certain supply content is not determined to be any one of abnormal text, abnormal image, abnormal audio, or abnormal video, then the supply content is determined to be non-abnormal content.
[0178] However, there are some business scenarios where the content being supplied may not appear abnormal when identified individually in a single modality, but can be identified as anomalous when identified in combination across multiple modalities. In such cases, simply performing an OR operation on the identification results of multiple single modalities yields low accuracy. Therefore, this application also provides a multimodal joint model, which is used to jointly represent the multimodalities of the supplied content to identify whether the supplied content is anomalous. Here, multimodal joint refers to the fusion of multiple senses for identification; in this application embodiment, the multimodalities include text, images, audio, or video, etc.
[0179] Optionally, the multimodal joint model can adopt the ViL (Vision-and-Language) BERT model. The ViL BERT model is a multimodal two-stream model that can learn joint representations of visual and textual content that are independent of specific tasks. The text stream and the visual stream interact through an attention layer to achieve joint image and text content recognition.
[0180] In summary, steps 201-203 above can obtain the identification result of whether each of the multiple supply contents is abnormal content, and abnormal content can be promptly removed from display. Since the abnormal threshold used to determine whether supply content is abnormal is set differently for different business types, this method is applicable to content identification for different business types, effectively improving the accuracy of abnormal content identification.
[0181] It is understandable that, for online platforms, in the process of identifying the supplied content using the above content identification methods, the types of abnormal content may change over time. For example, content providers may process abnormal information to avoid abnormal content and then republish the processed abnormal content on the online platform, or new abnormal content may appear on the online platform. Both of these factors will reduce the accuracy of the above content identification methods, resulting in unidentified abnormal content being displayed on the online platform for a long time.
[0182] Therefore, this application provides an identification update module, which is used to update the above content identification method based on abnormal content of a different type or newly emerging abnormal content in the network platform, so as to ensure that the content identification method provided by this application can automatically update the identification capability of abnormal content for frequently changing supply content. In addition, it also ensures that the update speed of the identification update module is fast enough to avoid unidentified abnormal content from being displayed on the network platform for a long time.
[0183] In this embodiment, in response to the detection of a new abnormal event, the identification capability is updated based on the new abnormal content. This embodiment does not limit the method of detecting new abnormal events. For example, the network platform includes a monitoring system, which acquires supply content that has been complained about by users; this complained-about supply content is considered a new abnormal event. Alternatively, quality inspectors can manually screen the supply content displayed on the network platform, and the abnormal content identified through manual screening can be considered a new abnormal event for the network platform. Optionally, the monitoring system acquires supply content that has been complained about by users, and supply content that has been repeatedly complained about by users is identified as a new abnormal event for the network platform.
[0184] It is understood that the content displayed on the online platform has been identified using the content recognition method provided in this application embodiment; that is, the content displayed on the online platform is identified as non-abnormal content. Therefore, when a user complains about the content displayed on the online platform, the complained-about content becomes a new abnormal event.
[0185] Next, the update process for the aforementioned text recognition and image recognition methods will be explained separately. It is understandable that since the recognition capabilities of the audio and video recognition methods depend on those of the text and image recognition methods, once the text and image recognition methods have completed their update, the recognition capabilities of the audio and video recognition methods will also be updated.
[0186] For text recognition, see, for example, [link to relevant documentation]. Figure 6 , Figure 6 This is a schematic diagram illustrating an embodiment of updating text recognition capabilities provided in this application. For example... Figure 6 As shown, including but not limited to steps 601-607 below.
[0187] Step 601: In response to the detection of an abnormal event, extract abnormal text from the abnormal event and add the extracted abnormal text to the abnormal text set.
[0188] In the embodiments of this application, the abnormal event includes at least one type of content such as text, image, audio or video, so the abnormal text can be extracted from the abnormal event.
[0189] Step 602: Determine whether the number of texts included in the abnormal text set is greater than the first quantity threshold. If the number of texts is not greater than the first quantity threshold, proceed to step 603; if the number of texts is greater than the quantity threshold, proceed to step 607.
[0190] Optionally, the first quantity threshold can be set based on experience or adjusted flexibly according to the application scenario. For example, the first quantity threshold is 10.
[0191] Step 603: Extract and update abnormal words from the abnormal text set.
[0192] In this embodiment, when the number of texts included in the abnormal text set is no greater than a first threshold, the core of updating the text recognition model is updating the abnormal vocabulary library. Optionally, extracting updated abnormal words from the abnormal text set includes: performing word segmentation on the texts included in the abnormal text set, and obtaining multiple updated words based on the word segmentation results. It is understood that the texts included in the abnormal text set can be abnormal content of the text type in abnormal events, or text extracted from abnormal content of the image, audio, or video types.
[0193] Step 604: Determine whether the impact of the updated abnormal words on the business indicators is greater than the impact threshold. If the impact of the updated abnormal words on the business indicators is not greater than the impact threshold, proceed to step 605; if the impact of the updated abnormal words on the business indicators is greater than the impact threshold, proceed to step 606.
[0194] In one possible implementation, the updated abnormal words are compared with an abnormal word database to obtain candidate words from the updated abnormal words database that differ from the abnormal word database, and the impact value of the candidate words on business metrics is obtained. These business metrics may refer to click-through rates or impressions of the content provided that includes the candidate words.
[0195] Optionally, the impact threshold can be set based on experience or adjusted flexibly according to the application scenario; for example, the impact threshold can be 10.
[0196] Step 605: Update the abnormal vocabulary database based on the updated abnormal vocabulary to obtain the updated abnormal vocabulary database.
[0197] Optionally, the updated abnormal vocabulary can be merged with the abnormal vocabulary library to obtain an updated abnormal vocabulary library, or candidate words that are different from the abnormal vocabulary library can be obtained from the updated abnormal vocabulary library and added to the abnormal vocabulary library.
[0198] Step 606: Manually judge the candidate, and update the abnormal vocabulary database based on the manual judgment results to obtain the updated abnormal vocabulary database.
[0199] When the impact of updating abnormal terms on business metrics exceeds the threshold, manual determination is needed to ensure the stability of these metrics and avoid significant fluctuations. For example, candidate terms in the updated abnormal terms list that differ from those in the abnormal term list are logged for subsequent manual determination of whether to add them to the abnormal term list.
[0200] Thus, an updated abnormal vocabulary library was obtained. By replacing the abnormal vocabulary library in the keyword recognition model with the updated abnormal vocabulary library, the text recognition model was updated, and the ability to recognize abnormal text content was improved.
[0201] Step 607: Adjust the semantic recognition model based on the abnormal text set, and obtain the updated semantic recognition model based on the adjustment results.
[0202] In this embodiment, when the number of texts included in the abnormal text set is greater than a first threshold, it indicates that there is sufficient new text training data, and the model parameters of the semantic recognition model can be updated. For example, an updated text training set is obtained based on the texts included in the abnormal text set, and simple fine-tuning is performed on the aforementioned semantic recognition model to increase the ability to recognize abnormal texts without reducing the recognition accuracy.
[0203] In one possible implementation, the abnormal vocabulary library before the update is called the initial abnormal vocabulary library, and the semantic recognition model before the update is called the initial semantic recognition model. Before calling the abnormal vocabulary library and the semantic recognition model, the method further includes: obtaining the initial abnormal vocabulary library and the initial semantic recognition model; in response to detecting an abnormal event, extracting abnormal text from the abnormal event and adding the extracted abnormal text to the abnormal text set; when the number of texts included in the abnormal text set is not greater than a first quantity threshold, extracting updated abnormal words from the abnormal text set, updating the initial abnormal vocabulary library based on the updated abnormal words, and using the updated initial abnormal vocabulary library as the abnormal vocabulary library; when the number of texts included in the abnormal text set is greater than the first quantity threshold, adjusting the initial semantic recognition model based on the abnormal text set, and using the adjusted initial semantic recognition model as the semantic recognition model.
[0204] For image recognition, see, for example, [link to relevant documentation]. Figure 7 , Figure 7 This is a schematic diagram illustrating an embodiment of updating image recognition capabilities provided in this application. Figure 7 As shown, including but not limited to steps 701-704 below.
[0205] Step 701: In response to the detection of an abnormal event, extract the abnormal image from the abnormal event and add the extracted abnormal image to the abnormal image set.
[0206] In the embodiments of this application, the abnormal event includes at least one type of content such as text, image, audio or video, so the abnormal image can be extracted from the abnormal event.
[0207] Step 702: Determine whether the number of images in the abnormal image set is greater than the second quantity threshold. If the number of images is not greater than the second quantity threshold, proceed to step 703; if the number of images is greater than the second quantity threshold, proceed to step 704.
[0208] Optionally, the second quantity threshold can be set based on experience or flexibly adjusted according to the application scenario. For example, the second quantity threshold is 10.
[0209] Step 703: Match the abnormal image set with the images in the supplied content, and obtain the abnormal images based on the matching results.
[0210] Optionally, matching images in the supply content with the abnormal image set includes: obtaining the similarity between each image in the supply content and each abnormal image in the abnormal image set, and determining supply images with similarity greater than a similarity threshold as abnormal images. This application does not limit the method of obtaining similarity; for example, a PHash algorithm or a Deep Hash algorithm can be used to obtain the similarity between each image in the supply content and each abnormal image in the abnormal image set. The similarity threshold can be set empirically or flexibly adjusted according to the application scenario; for example, a similarity threshold of 80%.
[0211] For example, taking the PHash algorithm as an example, the frequency of an image can be reduced by discrete cosine transform, and an image can be compressed into an 8*8 01-bit string. Then, by comparing the Hamming distance of the PHash values between two images, if it is less than a preset threshold U, the two images are considered similar, that is, the image to be identified is determined to be an abnormal image.
[0212] Step 704: Adjust the image recognition model based on the abnormal image set, and obtain the updated image recognition model based on the adjustment results.
[0213] In this embodiment, when the number of images in the abnormal image set exceeds a threshold, it indicates that there is sufficient new image training sample data, and the model parameters of the image recognition model can be updated. For example, the images in the abnormal image set are used as the updated image training set for automated update iteration on the aforementioned image recognition model. Automated iteration includes automatic data acquisition and partitioning, automatic search for a model suitable for the characteristics of the current updated image training set, early termination techniques to prevent overfitting during training, and finally, verification of the updated image recognition model's performance using an image validation set. If the performance exceeds a threshold and is superior to the original image recognition model's capabilities, the updated image recognition model automatically replaces the original one.
[0214] This allows for the updating of image recognition capabilities. By replacing the original image recognition model with the updated one, the ability to identify abnormal images is improved.
[0215] In this embodiment, the image recognition model before the update is called the initial image recognition model. Before calling the image recognition model, the method further includes: obtaining the initial image recognition model; in response to detecting an abnormal event, extracting abnormal images from the abnormal event and adding the extracted abnormal images to an abnormal image set; when the number of images included in the abnormal image set is greater than a second quantity threshold, adjusting the image recognition model based on the abnormal image set and using the adjusted initial image recognition model as the image recognition model; when the number of images included in the abnormal image set is not greater than the second quantity threshold, identifying images among multiple images whose similarity to any image included in the abnormal image set is greater than a similarity threshold as abnormal images.
[0216] In summary, by timely updating text and image recognition capabilities, this content recognition method maintains high accuracy in recognizing frequently changing content, preventing new types of anomalous content from being exposed on the network platform for extended periods. Furthermore, the method can update content recognition capabilities for varying amounts of updated data, meaning updates can be performed even with small amounts of data, thus ensuring a sufficiently fast update speed.
[0217] The above-mentioned content identification method and identification update module mainly identify abnormal content from the perspective of the supplied content, without involving the user object level. However, the supplied content ultimately needs to be displayed to the user object, which can be a user. Therefore, the content identification method provided in this application also introduces a user object interaction module to improve the user object experience. Specifically, the user object interaction module guides the user object to provide feedback on abnormal content through the design of abnormal tags, realizing interaction with the user object regarding abnormal content, thereby achieving proactive discovery of abnormal content and further mining personalized abnormal content information of the user object.
[0218] In this embodiment, when a user browses a network platform, if they feel uncomfortable with the content provided by the platform, they can provide feedback on the content. For example, by long-pressing the content, multiple suspected abnormal tags corresponding to the content are obtained and displayed. In response to the user selecting one of the suspected abnormal tags, feedback information regarding the content is obtained. Optionally, the feedback information also includes the abnormal tag corresponding to the content.
[0219] Optionally, this application does not limit the method of obtaining the abnormal tags corresponding to the supply content. For example, multiple suspected abnormal tags can be randomly selected from the abnormal tag library, or the abnormal index output by the supply content for each abnormal category can be obtained through the above content recognition method, and the abnormal categories with the largest abnormal index (K is a positive integer) can be taken as the K suspected abnormal tags.
[0220] For example, taking K=3, for a non-abnormal content supply, the identification result output by the above content identification method includes anomaly indices corresponding to multiple anomaly categories. For instance, the identification result shows an anomaly index of 10 for the third anomaly category, 12 for the second anomaly category, 20 for the horror category, and 25 for the first anomaly category. Among these, the top three anomaly categories in descending order of anomaly index are the first anomaly category, the horror category, and the second anomaly category. These three categories can be used as suspected anomaly tags for the content supply. When a user is dissatisfied with the content supply and provides feedback by long-pressing, the display interface can pop up the three anomaly tags—the first anomaly category, the horror category, and the second anomaly category—for the user to choose from, and the selected anomaly tag will be included in the feedback information.
[0221] It is understandable that after the content is displayed for a period of time, multiple feedback messages from different users may be collected regarding the content. These multiple feedback messages can be used to proactively discover abnormal content that the aforementioned content recognition method has failed to identify. In this embodiment of the application, when multiple feedback messages from different users regarding the same abnormal content of the same abnormal type are obtained, that is, when the abnormal content is marked as the same abnormal type by most users, the abnormal content can be regarded as newly emerging abnormal content.
[0222] In one possible implementation, the method further includes: responding to receiving multiple feedback messages detected for the first supplied content, each feedback message including a corresponding anomaly tag; when the number of identical anomaly tags in the multiple feedback messages exceeds a third quantity threshold, determining the content category of the first supplied content as an anomaly content category. Optionally, the third quantity threshold can be flexibly adjusted according to the application scenario; for example, the third quantity threshold can be 3.
[0223] In this embodiment, multiple feedback messages from the same user to different content can be collected and used to set personalized anomaly content identification for each user. In one possible implementation, before determining the anomaly thresholds for each of the multiple content items based on the correspondence between service type and anomaly threshold, the method further includes: determining the tolerance index of the target user based on at least one of the target user's historical feedback information or historical favorites information, where the tolerance index indicates the probability of providing feedback on the content; and obtaining the correspondence between the service type and the anomaly threshold for the target user based on the target user's tolerance index.
[0224] Optionally, if multiple feedback messages from the same user for the same anomaly tag are received, it indicates that the user has a low tolerance for the abnormal content corresponding to that anomaly tag. Therefore, for this user, the anomaly threshold for this anomaly type in the above content recognition method can be appropriately reduced to identify more abnormal content corresponding to this anomaly type.
[0225] Similarly, user preference information, such as user favorites or followed information, can be used to determine the content categories preferred by the user. If the user's preferred content categories include anomalous categories, it indicates that the user has a high tolerance for those anomalous categories. Therefore, for this user, the anomalous threshold for that category in the obtained correspondence between business types and anomalous thresholds in the above content recognition method can be appropriately increased to return some anomalous content corresponding to that anomalous type to the user.
[0226] For example, for users who enjoy sauna and massage, images that expose large areas of skin may not cause discomfort but rather arouse their interest. Therefore, these users have a high tolerance for images of exposed skin, meaning that images of exposed skin can be displayed to a certain extent for them.
[0227] Optionally, after setting different exception thresholds for different business types, the exception thresholds for those different business types can be further adjusted based on the user's tolerance index for different exception categories. For example, taking the target user's tolerance index as determined based on historical feedback information, the number of times the target user provides feedback on a certain exception label is used as the target user's tolerance index. The already set exception threshold is multiplied by the target user's tolerance index to obtain the adjusted exception threshold for that target user. Alternatively, the already set exception threshold can be multiplied by the target user's tolerance index, and the product can be multiplied by a scaling factor to obtain the adjusted exception threshold for that target user.
[0228] Therefore, through the above-mentioned interactive module, the content recognition method provided in this application embodiment can mine and identify unique abnormal content for a user based on the user's personal preferences, feedback and other behavioral records, thereby achieving a personalized content recognition effect.
[0229] The content recognition method provided in this application offers different content recognition methods for different business types of supplied content. This allows for flexible acquisition of corresponding anomaly thresholds when recognizing supplied content of different business types. The anomaly threshold is used to determine whether the supplied content is abnormal, making the method applicable to application scenarios including multiple business types. This ensures the stability of the recognition effect for different business types of content and effectively improves the accuracy of anomaly content recognition. Furthermore, the recognition update module can update the recognition capability in real time, preventing newly emerging anomalies from being exposed on the network platform for a long time due to untimely updates, thus affecting user experience. The interaction module not only proactively discovers unrecognized anomaly content based on user feedback but also enables personalized content recognition for each user.
[0230] See Figure 8 This application provides a content recognition device, which includes:
[0231] The first acquisition module 801 is used to acquire multiple supply contents to be identified. The content type of any supply content includes at least one of text, image, audio or video, and any supply content has a corresponding business type.
[0232] The identification module 802 is used to identify multiple supply contents according to their content types, and to obtain an anomaly index corresponding to each of the multiple supply contents based on the identification results. The anomaly index is used to indicate the probability that the supply contents belong to anomalies.
[0233] The first determining module 803 is used to determine the abnormal thresholds corresponding to multiple supply contents based on the correspondence between business types and abnormal thresholds, and to determine whether any supply content among the multiple supply contents is abnormal content based on the relationship between the abnormal index and the corresponding abnormal threshold.
[0234] In one possible implementation, the multiple supply contents include multiple first texts and multiple second texts, wherein the text length of the first texts is less than a length threshold, and the text length of the second texts is not less than the length threshold.
[0235] The recognition module 802 is used to match multiple first texts with an abnormal vocabulary library and a high-frequency vocabulary library respectively, and obtain the abnormal index corresponding to each of the multiple first texts according to the matching results. The abnormal vocabulary library includes multiple abnormal words, and the high-frequency vocabulary library includes multiple high-frequency words. High-frequency words are words whose display frequency is higher than the frequency threshold. The semantic recognition model is called to extract the semantic features corresponding to each of the multiple second texts, and the abnormal index corresponding to each of the multiple second texts is obtained according to the semantic features.
[0236] In one possible implementation, the multiple supply contents include multiple images;
[0237] The recognition module 802 is used to recall multiple candidate images from multiple images according to the recognition speed, recognize the image features corresponding to the multiple candidate images respectively, obtain the abnormality index corresponding to the multiple images respectively according to the recognition results, and the probability that the multiple candidate images belong to abnormal content is greater than a first probability threshold and the recognition speed is greater than a speed threshold.
[0238] In one possible implementation, each image includes corresponding text information, which includes multiple words, and the abnormal content includes the target abnormal image.
[0239] The recognition module 802 is also used to recall multiple first target images from multiple images based on multiple words corresponding to multiple images respectively, according to multiple target anomalous words, wherein the probability of the first target image belonging to the target anomalous image is greater than a second probability threshold; to fuse multiple first target images and multiple candidate images to obtain multiple second target images, and to extract target features corresponding to the multiple second target images respectively;
[0240] The recognition module 802 identifies the image features corresponding to multiple candidate images respectively, and obtains the image features corresponding to multiple candidate images respectively; and obtains the anomaly index corresponding to multiple images respectively based on the image features and target features.
[0241] In one possible implementation, the device further includes:
[0242] The second acquisition module is used to acquire the initial abnormal vocabulary and the initial semantic recognition model;
[0243] The detection module is used to respond to the occurrence of abnormal events, extract abnormal text from the abnormal events, and add the extracted abnormal text to the abnormal text set;
[0244] The update module is used to extract updated abnormal words from the abnormal text set when the number of texts included in the abnormal text set is not greater than a first quantity threshold, update the initial abnormal word library based on the updated abnormal words, and use the updated initial abnormal word library as the abnormal word library.
[0245] The adjustment module is used to adjust the initial semantic recognition model based on the abnormal text set when the number of texts included in the abnormal text set exceeds a first quantity threshold, and to use the adjusted initial semantic recognition model as the semantic recognition model.
[0246] In one possible implementation, the recognition module 802 is used to call an image recognition model to recall multiple candidate images from multiple images according to the recognition speed, recognize the image features corresponding to the multiple candidate images respectively, and obtain the anomaly index corresponding to the multiple images respectively based on the recognition results.
[0247] The second acquisition module is also used to acquire the initial image recognition model;
[0248] The detection module is also used to respond to the occurrence of an abnormal event, extract an abnormal image from the abnormal event, and add the extracted abnormal image to the abnormal image set;
[0249] The adjustment module is also used to adjust the image recognition model based on the abnormal image set when the number of images included in the abnormal image set is greater than the second quantity threshold, and to use the adjusted initial image recognition model as the image recognition model.
[0250] In one possible implementation, the updating module is further configured to identify images among the multiple images whose similarity to any image in the abnormal image set is greater than a similarity threshold as abnormal content when the number of images included in the abnormal image set is not greater than a second quantity threshold.
[0251] In one possible implementation, the device further includes:
[0252] The receiving module is used to respond to receiving multiple feedback messages for the first supply content, each feedback message including a corresponding exception label;
[0253] The second determination module is used to determine the first supply content as abnormal content when the number of identical abnormal labels in multiple feedback messages exceeds a third quantity threshold.
[0254] In one possible implementation, the multiple content providers are target user objects; the apparatus further includes:
[0255] The third determining module is used to determine the tolerance index of the target user based on at least one of the target user's historical feedback information or historical collection information. The tolerance index is used to indicate the probability of providing feedback on the supplied content.
[0256] The third acquisition module is used to obtain the correspondence between the business type and the exception threshold of the target user object based on the tolerance index of the target user object.
[0257] The content recognition device provided in this application provides different content recognition methods for different types of service content, enabling flexible acquisition of corresponding anomaly thresholds when recognizing content of different service types. These anomaly thresholds are used to determine whether the content is abnormal, making the method applicable to application scenarios involving multiple service types. This ensures the stability of the recognition effect for different service types and effectively improves the accuracy of anomaly content recognition. Furthermore, the recognition update module can update the recognition capability in real time, preventing newly emerging anomalies from being exposed on the network platform for extended periods due to untimely updates, thus affecting user experience. The interaction module not only proactively discovers unrecognized anomaly content based on user feedback but also enables personalized content recognition for each user.
[0258] It should be understood that the apparatus provided in the above embodiments is only illustrated by the division of the above functional modules when implementing its functions. In actual applications, the above functions can be assigned to different functional modules as needed, that is, the internal structure of the device can be divided into different functional modules to complete all or part of the functions described above. In addition, the apparatus and method embodiments provided in the above embodiments belong to the same concept, and the specific implementation process can be found in the method embodiments, which will not be repeated here.
[0259] Please refer to Figure 9 This illustration shows a schematic diagram of a computer device provided in one embodiment of this application. The computer device can be a terminal, such as a smartphone, tablet computer, in-vehicle terminal, laptop computer, or desktop computer. The terminal may also be referred to as an object device, portable terminal, laptop terminal, desktop terminal, or other names.
[0260] Typically, a terminal includes a processor 901 and a memory 902.
[0261] Processor 901 may include one or more processing cores, such as a quad-core processor, an octa-core processor, etc. Processor 901 may be implemented using at least one hardware form selected from DSP (Digital Signal Processing), FPGA (Field-Programmable Gate Array), and PLA (Programmable Logic Array). Processor 901 may also include a main processor and a coprocessor. The main processor, also known as a CPU (Central Processing Unit), is used to process data in the wake-up state; the coprocessor is a low-power processor used to process data in the standby state. In some embodiments, processor 901 may integrate a GPU (Graphics Processing Unit), which is responsible for rendering and drawing the content required to be displayed on the screen. In some embodiments, processor 901 may also include an AI (Artificial Intelligence) processor, which is used to handle computational operations related to machine learning.
[0262] The memory 902 may include one or more computer-readable storage media, which may be non-transitory. The memory 902 may also include high-speed random access memory and non-volatile memory, such as one or more disk storage devices or flash memory devices. In some embodiments, the non-transitory computer-readable storage media in the memory 902 are used to store at least one instruction, which is executed by the processor 901 to implement the content recognition method provided in the method embodiments of this application.
[0263] In some embodiments, the terminal may also optionally include: a peripheral device interface 903 and at least one peripheral device. The processor 901, memory 902, and peripheral device interface 903 can be connected via a bus or signal line. Each peripheral device can be connected to the peripheral device interface 903 via a bus, signal line, or circuit board. Specifically, the peripheral device includes at least one of: a radio frequency circuit 904, a display screen 905, a camera assembly 906, an audio circuit 907, a positioning assembly 908, and a power supply 909.
[0264] Peripheral device interface 903 can be used to connect at least one I / O (Input / Output) related peripheral device to processor 901 and memory 902. In some embodiments, processor 901, memory 902 and peripheral device interface 903 are integrated on the same chip or circuit board; in some other embodiments, any one or two of processor 901, memory 902 and peripheral device interface 903 can be implemented on separate chips or circuit boards, which is not limited in this embodiment.
[0265] The radio frequency (RF) circuit 904 is used to receive and transmit RF (Radio Frequency) signals, also known as electromagnetic signals. The RF circuit 904 communicates with communication networks and other communication devices via electromagnetic signals. The RF circuit 904 converts electrical signals into electromagnetic signals for transmission, or converts received electromagnetic signals back into electrical signals. Optionally, the RF circuit 904 includes: an antenna system, an RF transceiver, one or more amplifiers, a tuner, an oscillator, a digital signal processor, a codec chipset, an object identification module card, etc. The RF circuit 904 can communicate with other terminals through at least one wireless communication protocol. This wireless communication protocol includes, but is not limited to: metropolitan area networks (MANs), various generations of mobile communication networks (2G, 3G, 4G, and 5G), wireless local area networks (WLANs), and / or Wireless Fidelity (WiFi) networks. In some embodiments, the RF circuit 904 may also include circuitry related to NFC (Near Field Communication), which is not limited in this application.
[0266] Display screen 905 is used to display a UI (User Interface). This UI may include graphics, text, icons, videos, and any combination thereof. When display screen 905 is a touch display screen, it also has the ability to collect touch signals on or above its surface. These touch signals can be input as control signals to processor 901 for processing. In this case, display screen 905 can also be used to provide virtual buttons and / or a virtual keyboard, also known as soft buttons and / or a soft keyboard. In some embodiments, there may be one display screen 905, located on the front panel of the terminal; in other embodiments, there may be at least two display screens 905, respectively located on different surfaces of the terminal or in a folded design; in still other embodiments, display screen 905 may be a flexible display screen, located on a curved or folded surface of the terminal. Furthermore, display screen 905 may be configured as a non-rectangular, irregular shape, i.e., a non-rectangular screen. Display screen 905 may be made of materials such as LCD (Liquid Crystal Display) or OLED (Organic Light-Emitting Diode).
[0267] The camera assembly 906 is used to acquire images or videos. Optionally, the camera assembly 906 includes a front-facing camera and a rear-facing camera. Typically, the front-facing camera is located on the front panel of the terminal, and the rear-facing camera is located on the back of the terminal. In some embodiments, there are at least two rear-facing cameras, which are any one of a main camera, a depth-sensing camera, a wide-angle camera, and a telephoto camera, to achieve background blurring by fusion of the main camera and the depth-sensing camera, panoramic shooting by fusion of the main camera and the wide-angle camera, VR (Virtual Reality) shooting, or other fusion shooting functions. In some embodiments, the camera assembly 906 may also include a flash. The flash can be a single-color temperature flash or a dual-color temperature flash. A dual-color temperature flash refers to a combination of a warm-light flash and a cool-light flash, which can be used for light compensation at different color temperatures.
[0268] The audio circuit 907 may include a microphone and a speaker. The microphone is used to collect sound waves from objects and the environment, converting the sound waves into electrical signals that are input to the processor 901 for processing, or input to the radio frequency circuit 904 for voice communication. For stereo sound acquisition or noise reduction purposes, multiple microphones may be used, each positioned at a different location on the terminal. The microphone may also be an array microphone or an omnidirectional microphone. The speaker is used to convert the electrical signals from the processor 901 or the radio frequency circuit 904 into sound waves. The speaker may be a conventional diaphragm speaker or a piezoelectric ceramic speaker. When the speaker is a piezoelectric ceramic speaker, it can convert electrical signals not only into audible sound waves but also into inaudible sound waves for purposes such as distance measurement. In some embodiments, the audio circuit 907 may also include a headphone jack.
[0269] The positioning component 908 is used to determine the current geographic location of the terminal for navigation or LBS (Location Based Service). The positioning component 908 can be a positioning component based on the US GPS (Global Positioning System), China's BeiDou system, Russia's Granas system, or the EU's Galileo system.
[0270] The power supply 909 is used to power the various components in the terminal. The power supply 909 can be AC power, DC power, a disposable battery, or a rechargeable battery. When the power supply 909 includes a rechargeable battery, the rechargeable battery can support wired or wireless charging. The rechargeable battery can also be used to support fast charging technology.
[0271] In some embodiments, the terminal further includes one or more sensors 910. The one or more sensors 910 include, but are not limited to: an accelerometer 911, a gyroscope 912, a pressure sensor 913, a fingerprint sensor 914, an optical sensor 915, and a proximity sensor 916.
[0272] Accelerometer 911 can detect the magnitude of acceleration along the three coordinate axes of a coordinate system established by the terminal. For example, accelerometer 911 can be used to detect the components of gravitational acceleration along the three coordinate axes. Processor 901 can control display screen 905 to display the object interface in either a landscape or portrait view based on the gravitational acceleration signal acquired by accelerometer 911. Accelerometer 911 can also be used to acquire motion data of games or objects.
[0273] The gyroscope sensor 912 can detect the terminal's orientation and rotation angle. The gyroscope sensor 912, in conjunction with the accelerometer sensor 911, can collect 3D motion data of the object on the terminal. Based on the data collected by the gyroscope sensor 912, the processor 901 can perform the following functions: motion sensing (e.g., changing the UI based on the object's tilt), image stabilization during shooting, game control, and inertial navigation.
[0274] The pressure sensor 913 can be disposed on the side bezel of the terminal and / or the lower layer of the display screen 905. When the pressure sensor 913 is disposed on the side bezel of the terminal, it can detect the grip signal of an object on the terminal, and the processor 901 can perform left / right hand recognition or quick operation based on the grip signal collected by the pressure sensor 913. When the pressure sensor 913 is disposed on the lower layer of the display screen 905, the processor 901 can control the operable controls on the UI interface based on the pressure operation of the object on the display screen 905. The operable controls include at least one of button controls, scroll bar controls, icon controls, and menu controls.
[0275] The fingerprint sensor 914 is used to collect the user's fingerprint. The processor 901 identifies the user's identity based on the fingerprint collected by the fingerprint sensor 914, or vice versa. When the user's identity is identified as trusted, the processor 901 authorizes the user to perform relevant sensitive operations, including unlocking the screen, viewing encrypted information, downloading software, making payments, and changing settings. The fingerprint sensor 914 can be located on the front, back, or side of the terminal. When the terminal has physical buttons or a manufacturer's logo, the fingerprint sensor 914 can be integrated with the physical buttons or manufacturer's logo.
[0276] An optical sensor 915 is used to collect ambient light intensity. In one embodiment, the processor 901 can control the display brightness of the display screen 905 based on the ambient light intensity collected by the optical sensor 915. Specifically, when the ambient light intensity is high, the display brightness of the display screen 905 is increased; when the ambient light intensity is low, the display brightness of the display screen 905 is decreased. In another embodiment, the processor 901 can also dynamically adjust the shooting parameters of the camera assembly 906 based on the ambient light intensity collected by the optical sensor 915.
[0277] The proximity sensor 916, also known as a distance sensor, is typically installed on the front panel of the terminal. The proximity sensor 916 is used to detect the distance between an object and the front of the terminal. In one embodiment, when the proximity sensor 916 detects that the distance between the object and the front of the terminal is gradually decreasing, the processor 901 controls the display screen 905 to switch from a screen-on state to a screen-off state; when the proximity sensor 916 detects that the distance between the object and the front of the terminal is gradually increasing, the processor 901 controls the display screen 905 to switch from a screen-off state to a screen-on state.
[0278] Those skilled in the art will understand that Figure 9 The structure shown does not constitute a limitation on the computer device and may include more or fewer components than shown, or combine certain components, or use different component arrangements.
[0279] In an exemplary embodiment, a computer device is also provided, comprising a processor and a memory storing at least one line of program code. The at least one line of program code is loaded and executed by one or more processors to enable the computer device to implement any of the content recognition methods described above.
[0280] In an exemplary embodiment, a computer-readable storage medium is also provided, which stores at least one piece of program code that is loaded and executed by a processor of a computer device to enable the computer to implement any of the content recognition methods described above.
[0281] Optionally, the aforementioned computer-readable storage medium may be a read-only memory (ROM), a random access memory (RAM), a compact disc read-only memory (CD-ROM), magnetic tape, floppy disk, and optical data storage device, etc.
[0282] In an exemplary embodiment, a computer program product or computer program is also provided, which includes computer instructions stored in a computer-readable storage medium. A processor of a computer device reads the computer instructions from the computer-readable storage medium and executes the computer instructions, causing the computer device to perform any of the content recognition methods described above.
[0283] The terms "first," "second," "third," and "fourth," etc., used in the specification, claims, and accompanying drawings of this application are used to distinguish different objects, not to describe a specific order. Furthermore, the terms "comprising" and "having," and any variations thereof, are intended to cover non-exclusive inclusion. For example, a process, method, system, product, or apparatus that includes a series of steps or units is not limited to the listed steps or units, but may optionally include steps or units not listed, or may optionally include other steps or units inherent to these processes, methods, products, or apparatuses.
[0284] The above description is merely an optional embodiment of this application and is not intended to limit this application. Any modifications, equivalent substitutions, improvements, etc., made within the principles of this application should be included within the protection scope of this application.
Claims
1. A content recognition method, characterized in that, The method includes: Obtain multiple supply contents to be identified, wherein the content type of any supply content includes at least one of text, image, audio or video, and each supply content has a corresponding business type; Based on the content type of the multiple supply contents, content identification is performed on the multiple supply contents, and an anomaly index corresponding to each of the multiple supply contents is obtained based on the identification results. The anomaly index is used to indicate the probability that the supply contents belong to anomalous content. Based on the correspondence between business type and abnormal threshold, the abnormal thresholds corresponding to the multiple supply contents are determined respectively. Based on the relationship between the abnormal index and the corresponding abnormal threshold, it is determined whether any supply content among the multiple supply contents is abnormal content.
2. The method according to claim 1, characterized in that, The multiple supply contents include multiple first texts and multiple second texts, wherein the text length of the first texts is less than a length threshold, and the text length of the second texts is not less than the length threshold; The step of performing content identification on the multiple supply contents and obtaining anomaly indices corresponding to each of the multiple supply contents based on the identification results includes: The plurality of first texts are matched with an abnormal vocabulary library and a high-frequency vocabulary library respectively. Based on the matching results, the abnormal index corresponding to the plurality of first texts is obtained. The abnormal vocabulary library includes a plurality of abnormal words, and the high-frequency vocabulary library includes a plurality of high-frequency words. The high-frequency words are words whose display frequency is higher than the frequency threshold. The semantic recognition model is invoked to extract the semantic features corresponding to the multiple second texts respectively, and the anomaly index corresponding to the multiple second texts is obtained based on the semantic features.
3. The method according to claim 1, characterized in that, The multiple supply contents include multiple images; the content recognition of the multiple supply contents, and the obtaining of anomaly indices corresponding to each of the multiple supply contents based on the recognition results, includes: Multiple candidate images are recalled from the multiple images according to the recognition speed, and the image features corresponding to the multiple candidate images are identified respectively. The anomaly index corresponding to the multiple images is obtained according to the recognition results. The probability that the multiple candidate images belong to abnormal content is greater than a first probability threshold, and the recognition speed is greater than a speed threshold.
4. The method according to claim 3, characterized in that, Each image includes corresponding text information, the text information including multiple words, and the abnormal content including the target abnormal image; the method further includes: Based on the multiple words corresponding to the multiple images respectively, multiple first target images are recalled from the multiple images according to multiple target abnormal words, and the probability of the first target image belonging to the target abnormal image is greater than a second probability threshold. Multiple first target images and multiple candidate images are fused to obtain multiple second target images, and target features corresponding to the multiple second target images are extracted respectively; The step of identifying the image features corresponding to the plurality of candidate images and obtaining the anomaly index corresponding to the plurality of images based on the identification results includes: The image features corresponding to the multiple candidate images are identified to obtain the image features corresponding to the multiple candidate images respectively; Based on the image features and the target features, obtain the anomaly index corresponding to each of the multiple images.
5. The method according to claim 2, characterized in that, The method further includes: Obtain the initial anomaly vocabulary and initial semantic recognition model; In response to the detection of an abnormal event, abnormal text is extracted from the abnormal event and added to the abnormal text set; When the number of texts included in the abnormal text set is not greater than the first number threshold, the updated abnormal words are extracted from the abnormal text set, the initial abnormal word library is updated based on the updated abnormal words, and the updated initial abnormal word library is used as the abnormal word library. When the number of texts included in the abnormal text set is greater than a first quantity threshold, the initial semantic recognition model is adjusted based on the abnormal text set, and the adjusted initial semantic recognition model is used as the semantic recognition model.
6. The method according to claim 3, characterized in that, The step of recalling multiple candidate images from the multiple images according to the recognition speed, recognizing the image features corresponding to the multiple candidate images respectively, and obtaining the anomaly index corresponding to the multiple images respectively based on the recognition results includes: The image recognition model is invoked to recall multiple candidate images from the multiple images according to the recognition speed. The image features corresponding to the multiple candidate images are recognized respectively, and the anomaly index corresponding to the multiple images is obtained according to the recognition results. Before invoking the image recognition model to recall multiple candidate images from the multiple images according to the recognition speed, the method further includes: Obtain the initial image recognition model; In response to the detection of an abnormal event, an abnormal image is extracted from the abnormal event and added to an abnormal image set; When the number of images in the abnormal image set is greater than the second quantity threshold, the image recognition model is adjusted based on the abnormal image set, and the adjusted initial image recognition model is used as the image recognition model.
7. The method according to claim 6, characterized in that, The method further includes: When the number of images included in the abnormal image set is not greater than the second quantity threshold, the image among the plurality of images whose similarity to any image included in the abnormal image set is greater than the similarity threshold is determined as the abnormal content.
8. The method according to any one of claims 1-7, characterized in that, The method further includes: Receive multiple feedback messages regarding the first supply content, each feedback message including a corresponding anomaly tag; When the number of identical abnormal labels in the multiple feedback messages exceeds a third quantity threshold, the first supply content is determined to be abnormal content.
9. The method according to any one of claims 1-7, characterized in that, The target objects for the multiple supply contents are the target objects; before determining the abnormal thresholds corresponding to the multiple supply contents based on the correspondence between business types and abnormal thresholds, the method further includes: The tolerance index of the target object is determined based on at least one of the target object's historical feedback information or historical collection information, and the tolerance index is used to indicate the probability of providing feedback on the supplied content; Based on the tolerance index of the target object, obtain the correspondence between the business type and the abnormal threshold corresponding to the target object.
10. A computer device, characterized in that, The computer device includes a processor and a memory, the memory storing at least one computer program or instruction, the at least one computer program or instruction being loaded and executed by the processor to enable the computer device to implement the content recognition method as described in any one of claims 1 to 9.