Method, device, electronic equipment and storage medium for detecting violation of media information
By acquiring key data from media information and performing type-specific recombination and matching, the problem of difficulty in detecting various media information violations in existing technologies has been solved, achieving rapid and convenient violation detection and identification.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- BEIJING BLUEPACIFIC TECH
- Filing Date
- 2021-12-28
- Publication Date
- 2026-06-16
Smart Images

Figure CN115827903B_ABST
Abstract
Description
Technical Field
[0001] This application relates to the field of violation detection technology, and in particular to a method, apparatus, electronic device, and storage medium for detecting violations of media information. Background Technology
[0002] With the development of internet technology, media information such as text, audio, images, and video on the internet has also grown explosively. Among them, there is a lot of media information containing illegal content such as political, pornographic, and violent content, which has a negative impact on national security, social stability and harmony, and especially on the growth of teenagers. Therefore, it is very necessary to detect illegal media information.
[0003] Currently, the common method for detecting violations in media information is to set up corresponding detection models for the content to be detected. However, due to the wide variety of media information, the detection criteria for different types of media information are not the same, which makes it difficult for the set detection models to simultaneously detect violations of various types of media information.
[0004] Therefore, how to provide an effective solution to achieve unified detection of violations for various types of media information has become a pressing problem to be solved in existing technologies. Summary of the Invention
[0005] In a first aspect, embodiments of this application provide a method for detecting violations of media information, including:
[0006] Obtain the media information to be detected;
[0007] Key data in the media information is identified, and the key data is used to evaluate whether there is a violation;
[0008] Based on the reorganization rules corresponding to the information type of the media information, the key data is reorganized to obtain reorganized data;
[0009] The reconstructed data is matched with data in a pre-configured standard database to determine whether there is any illegal content in the media information;
[0010] The data recorded in the standard database are either data indicating the presence of violations or data indicating the absence of violations.
[0011] In one possible design, the media information is text, and the key data identified in the media information, which is data used to evaluate whether a violation has occurred, includes:
[0012] The text is matched with a pre-configured first standard thesaurus to identify keywords in the text that are suspected of being illegal.
[0013] The key data is restructured based on the restructuring rules corresponding to the information type of the media information to obtain restructured data, including:
[0014] The first character in the keyword is combined with at least one character preceding it to obtain a first recombined word, and the last character in the keyword is combined with at least one character following it to obtain a second recombined word.
[0015] The step of matching the reconstructed data with data in a pre-configured standard database to determine whether there is any illegal content in the media information includes:
[0016] The first and second recombined words are matched with a pre-configured second standard vocabulary to determine whether there are any illegal words in the text.
[0017] The first standard lexicon records words that represent violations, while the second standard lexicon records words that represent non-violations.
[0018] In one possible design, when the text contains prohibited words, the method further includes:
[0019] Mark the prohibited words in the text.
[0020] In one possible design, the media information is audio, and the key data identified in the media information, which is data used to evaluate whether a violation has occurred, includes:
[0021] Convert the audio into text;
[0022] The text is matched with a pre-configured first standard thesaurus to identify keywords in the text that are suspected of being illegal.
[0023] The key data is reorganized based on the reorganization rules corresponding to the information type of the media information to obtain reorganized data;
[0024] The first character in the keyword is combined with at least one character preceding it to obtain a first recombined word, and the last character in the keyword is combined with at least one character following it to obtain a second recombined word.
[0025] The step of matching the reconstructed data with data in a pre-configured standard database to determine whether there is any illegal content in the media information includes:
[0026] The first and second recombined words are matched with a pre-configured second standard vocabulary to determine whether there are any illegal words in the text.
[0027] The first standard lexicon records words that represent violations, while the second standard lexicon records words that represent non-violations.
[0028] In one possible design, the media information is an image, and the key data identified in the media information, which is data used to evaluate whether a violation has occurred, includes:
[0029] Extract the shape features, color features, and texture features of the image;
[0030] The key data is restructured based on the restructuring rules corresponding to the information type of the media information to obtain restructured data, including:
[0031] The shape features, color features, and texture features of the image are combined to obtain combined features;
[0032] The step of matching the reconstructed data with data in a pre-configured standard database to determine whether there is any illegal content in the media information includes:
[0033] The combined features are matched with features in a pre-configured standard feature library to determine whether there is any illegal content in the image;
[0034] The standard feature library records combined features that characterize the presence of illegal content. These combined features are obtained by combining the shape, color, and texture features of the image containing illegal content.
[0035] In one possible design, the media information is video, and the key data identified in the media information, which is data used to evaluate whether a violation has occurred, includes:
[0036] Extract image frames from the video;
[0037] Extract the shape features, color features, and texture features of the image frame;
[0038] The key data is restructured based on the restructuring rules corresponding to the information type of the media information to obtain restructured data, including:
[0039] The shape features, color features, and texture features of the image frame are combined to obtain the combined features;
[0040] The step of matching the reconstructed data with data in a pre-configured standard database to determine whether there is any illegal content in the media information includes:
[0041] The combined features are matched with features in a pre-configured standard feature library to determine whether there is any illegal content in the image;
[0042] The standard feature library records combined features that characterize the presence of illegal content. These combined features are obtained by combining the shape, color, and texture features of the image containing illegal content.
[0043] In one possible design, acquiring the media information to be detected includes:
[0044] Crawl the media information from the media information publishing terminal; or
[0045] Receive the media information uploaded by the user.
[0046] Secondly, embodiments of this application provide a media information violation detection device, including:
[0047] The acquisition unit is used to acquire the media information to be detected;
[0048] The determining unit is used to determine key data in the media information, wherein the key data is data used to evaluate whether there is a violation;
[0049] The recombination unit is used to recombine the key data based on the recombination rules corresponding to the information type of the media information to obtain recombined data;
[0050] A matching unit is used to match the reconstructed data with data in a pre-configured standard database to determine whether there is any illegal content in the media information;
[0051] The data recorded in the standard database are either data indicating the presence of violations or data indicating the absence of violations.
[0052] Thirdly, embodiments of this application provide an electronic device, including a processor, a communication interface, a memory, and a communication bus, wherein the processor, the communication interface, and the memory communicate with each other through the bus;
[0053] Memory, used to store computer programs;
[0054] The processor executes programs stored in memory, implementing the following process:
[0055] Obtain the media information to be detected;
[0056] Key data in the media information is identified, and the key data is used to evaluate whether there is a violation;
[0057] Based on the reorganization rules corresponding to the information type of the media information, the key data is reorganized to obtain reorganized data;
[0058] The reconstructed data is matched with data in a pre-configured standard database to determine whether there is any illegal content in the media information;
[0059] The data recorded in the standard database are either data indicating the presence of violations or data indicating the absence of violations.
[0060] Fourthly, embodiments of this application provide a computer-readable storage medium storing a computer program, which, when executed by a processor, implements the following process:
[0061] Obtain the media information to be detected;
[0062] Key data in the media information is identified, and the key data is used to evaluate whether there is a violation;
[0063] Based on the reorganization rules corresponding to the information type of the media information, the key data is reorganized to obtain reorganized data;
[0064] The reconstructed data is matched with data in a pre-configured standard database to determine whether there is any illegal content in the media information;
[0065] The data recorded in the standard database are either data indicating the presence of violations or data indicating the absence of violations.
[0066] The above-described technical solutions employed in one or more embodiments of this application can achieve the following beneficial effects:
[0067] By identifying key data in media information used to evaluate whether it violates regulations, and based on the reorganization rules corresponding to the information type of the media information, the key data in the media information is reorganized to obtain reorganized data. This reorganized data is then matched with data in a standard database to determine whether there is any illegal content in the media information. In this way, different reorganization rules can be collected for different types of media information, thereby determining the reorganized data used to assess whether different types of media information contain illegal content. This allows for convenient and rapid violation detection of various types of media information based on the reorganized data, meeting the detection needs of massive amounts of media information. Attached Figure Description
[0068] The accompanying drawings, which are included to provide a further understanding of this document and form part of this document, illustrate exemplary embodiments and their descriptions, serving to explain this document and do not constitute an undue limitation thereof. In the drawings:
[0069] Figure 1 A flowchart of a media information violation detection method provided in one embodiment of this application.
[0070] Figure 2 This is a schematic diagram of the structure of an electronic device provided in one embodiment of this application.
[0071] Figure 3 This is a schematic diagram of the structure of a media information violation detection device provided in one embodiment of this application. Detailed Implementation
[0072] To facilitate violation detection of various types of media information, this application provides a method, apparatus, electronic device, and storage medium for detecting violations of media information. This method, apparatus, electronic device, and storage medium can conveniently and quickly complete the violation detection of various types of media information, meeting the detection needs of massive amounts of media information.
[0073] The media information violation detection method provided in this application embodiment can be applied to a server or a user terminal. The user terminal can be, but is not limited to, a personal computer, smartphone, tablet computer, personal digital assistant (PDA), etc.
[0074] The method for detecting violations of media information provided in the embodiments of this application will be described in detail below. It should be understood that the execution entity described does not constitute a limitation on the embodiments of this application.
[0075] like Figure 1 The diagram shown is a flowchart of a media information violation detection method provided in an embodiment of this application. The media information violation detection method may include the following steps:
[0076] Step 101: Obtain the media information to be detected.
[0077] The media information to be detected can be text, audio, images, or video, etc.
[0078] When obtaining media information to be detected, the information can be actively crawled from the media information publishing end, or it can be uploaded by the user.
[0079] Step 102: Identify the key data in the media information.
[0080] In this embodiment, the key data is used to evaluate whether a violation has occurred. Since media information can be text, audio, images, or video, the key data determined will differ for different types of media information. The following will mainly explain how to determine the key data for these four types of media information.
[0081] In the case of text-based media information, key data within the text can refer to keywords suspected of violating regulations. Specifically, in this embodiment, a first standard thesaurus is pre-configured, containing various words that characterize violations, such as sensitive or prohibited words related to politics, pornography, terrorism, or violence. When determining key data, the text can be matched against the pre-configured first standard thesaurus to identify keywords suspected of violating regulations within the text; these suspected keywords are the key data within the text.
[0082] For media information in the form of audio, the audio can be converted into text first, and then the text can be matched with a pre-configured first standard thesaurus to identify keywords in the text that are suspected of violating regulations. These suspected keywords are the key data in the audio.
[0083] When the media information is an image, the key data in the image can be its image features. Specifically, the shape, color, and texture features of the image can be extracted first, and these features can be used as the key data in the image.
[0084] When extracting the shape features of an image, the image outline can be extracted first, and different codes can be assigned to different types of outlines. For example, the outlines of people, trees, and cats can correspond to different codes.
[0085] Extracting color features from an image can be done by first dividing the image into multiple regions, extracting the color saturation, brightness, or contrast of each region, and then using the extracted color saturation, brightness, or contrast of each region as the color features of the image.
[0086] The texture features of an image can include the coarseness and density of the texture in the image.
[0087] In this embodiment, the shape features, color features, and texture features of the image are used as key data in the image. It is understood that in other embodiments, only one or two of the shape features, color features, and texture features of the image may be used as key data in the image.
[0088] For media information that is video, image frames can be extracted first. These image frames can be one or more frames. If multiple image frames are extracted, one image frame can be extracted at regular intervals. Then, the shape, color, and texture features of the image frames are extracted and used as key data in the video.
[0089] Step 103: Based on the reorganization rules corresponding to the information type of the media information, the key data is reorganized to obtain reorganized data.
[0090] In this embodiment of the application, different recombination rules are set for different information types of media information. After the key data in the media information is determined, the corresponding recombination rule can be selected according to the information type of the media information to recombine the key data in the media information.
[0091] When the media information is in text form, and the key data in the text is a keyword suspected of being in violation, the key data can be reorganized by first combining the first character of the keyword with at least one character adjacent to it to obtain the first reorganized word, and then combining the last character of the keyword with at least one character adjacent to it to obtain the second reorganized word.
[0092] For example, in one embodiment, the content of the text is ABCDEFGH. Assuming that DE is a keyword suspected of being in violation, D can be combined with the preceding C to form CD as the first recombinant word, or D can be combined with the preceding BC to form BCD as the first recombinant word. E can be combined with the following F to form EF as the second recombinant word, or E can be combined with the following FG to form EFG as the second recombinant word.
[0093] In cases where the media information is audio, the key data is that after the audio is converted into a document, there are suspected violations of keywords in the converted document. When recombining, the first character of the suspected violation keyword in the converted document can be combined with at least one character adjacent to it to obtain the first recombined word, and the last character of the keyword can be combined with at least one character adjacent to it to obtain the second recombined word.
[0094] When the media information is an image, the key data in the image are shape features, color features, and texture features. When reconstructing the key data, the shape features, color features, and texture features of the image can be combined to obtain combined features, and these combined features can be used as the reconstructed data.
[0095] When the media information is video, the key data in the video consists of the shape, color, and texture features of its image frames. When reconstructing this key data, the shape, color, and texture features of the image frames can be combined to obtain combined features, which are then used as the reconstructed data. Specifically, when combining shape, color, and texture features, they can be combined sequentially or in a specific order.
[0096] Step 104: Match the reconstructed data with the data in the pre-configured standard database to determine whether there is any illegal content in the media information.
[0097] In this embodiment of the application, a second standard lexicon and a standard feature library are also pre-configured. The second standard lexicon records various words that represent non-violations, and the standard feature library records various combined features that represent the existence of violation content. The combined features that represent the existence of violation content are features obtained by combining the shape features, color features, and texture features of the image (or image frame) containing violation content.
[0098] For media information of text or audio type, the identified first and second recombined words can be matched with a pre-configured second standard thesaurus to determine whether any illegal words exist in the text or audio. Specifically, if the matching result shows that the second standard thesaurus contains a word that matches either the first or second recombined word, then it is determined that no illegal words exist in the text or audio. If the second standard thesaurus does not contain a word that matches either the first or second recombined word, then it is determined that illegal words exist in the text or audio; that is, keywords in the text or audio suspected of containing illegal elements are illegal words.
[0099] In this embodiment, there can be multiple suspected violations. For each keyword, a first recombinant word and a second recombinant word can be identified. The first recombinant word and the second recombinant word identified for the same keyword can be referred to as a set of recombinant words. When determining whether there are violations in the text or audio, the presence of violations can be determined by matching each set of recombinant words with a second standard lexicon. Only when the judgment result is that there are no violations in the text or audio after matching each set of recombinant words with the second standard lexicon, will it be finally determined that there are no violations in the text or audio. Otherwise, as long as any set of recombinant words is matched with the second standard lexicon and the judgment result is that there are violations in the text or audio, it will be finally determined that there are violations in the text or audio.
[0100] For media information of the type of image or video, the determined combined features can be matched with features in the standard feature library to determine whether there is illegal content in the image. The standard feature library records combined features that represent illegal content. Therefore, as long as the combined feature matches one of the features in the standard feature library, it is determined that there is illegal content in the image or video.
[0101] In this embodiment of the application, after determining that there is illegal content in the media information, the determination result can also be fed back to the media information publisher or user so that the media information publisher or user can take corresponding corrective measures.
[0102] Furthermore, the standard feature library also records the violation types corresponding to the combination features of illegal content, such as pornography, terrorism, and violence. Thus, when it is determined that there is illegal content in an image or video, the violation type corresponding to the image or video can also be determined.
[0103] In addition, in this embodiment of the application, when the media information is text and the determination result is that there are illegal words in the text, the illegal words in the text can also be marked so that the media information publishing end or the user can know the location of the illegal words in a timely manner so that the illegal words can be modified in a timely manner.
[0104] In summary, the media information violation detection method provided in this application identifies key data in the media information used to evaluate whether it violates regulations. Based on the reorganization rules corresponding to the information type of the media information, the key data is reorganized to obtain reorganized data. This reorganized data is then matched with data in a standard database to determine whether any illegal content exists in the media information. This allows for the collection of different reorganization rules for different types of media information, thereby determining reorganized data for assessing the presence of illegal content in different types of media information. This enables convenient and quick violation detection of various types of media information, including text, audio, images, and video, without requiring significant manpower, thus meeting the detection needs of massive amounts of media information. Furthermore, the violation type corresponding to an image or video can be determined during violation detection. Additionally, for text, illegal words can be marked so that the media information publisher or user is aware of the location of illegal words and can promptly correct them.
[0105] Figure 2 This is a schematic diagram of the structure of an electronic device provided in one embodiment of this application. Please refer to it. Figure 2 At the hardware level, the electronic device includes a processor, and optionally also includes an internal bus, a network interface, and memory. The memory may include main memory, such as high-speed random-access memory (RAM), or non-volatile memory, such as at least one disk drive. Of course, the electronic device may also include other hardware required for other business operations.
[0106] The processor, network interface, and memory can be interconnected via an internal bus, which can be an ISA (Industry Standard Architecture) bus, a PCI (Peripheral Component Interconnect) bus, or an EISA (Extended Industry Standard Architecture) bus, etc. This bus can be divided into address bus, data bus, control bus, etc. For ease of representation, Figure 2 The symbol is represented by a single double-headed arrow, but this does not mean that there is only one bus or one type of bus.
[0107] Memory is used to store programs. Specifically, programs may include program code, which includes computer operation instructions. Memory may include main memory and non-volatile memory, and provides instructions and data to the processor.
[0108] The processor reads the corresponding computer program from non-volatile memory into main memory and then runs it, forming a media information violation detection device at the logical level. The processor executes the program stored in memory and specifically performs the following operations:
[0109] Obtain the media information to be detected;
[0110] Key data in the media information is identified, and the key data is used to evaluate whether there is a violation;
[0111] Based on the reorganization rules corresponding to the information type of the media information, the key data is reorganized to obtain reorganized data;
[0112] The reconstructed data is matched with data in a pre-configured standard database to determine whether there is any illegal content in the media information;
[0113] The data recorded in the standard database are either data indicating the presence of violations or data indicating the absence of violations.
[0114] The above is as stated in this application. Figure 2The method executed by the media information violation detection device disclosed in the illustrated embodiment can be applied to a processor, or implemented by a processor. The processor may be an integrated circuit chip with signal processing capabilities. During implementation, each step of the above method can be completed by integrated logic circuits in the processor's hardware or by instructions in software form. The processor can be a general-purpose processor, including a Central Processing Unit (CPU), a Network Processor (NP), etc.; it can also be a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field-Programmable Gate Array (FPGA), or other programmable logic devices, discrete gate or transistor logic devices, or discrete hardware components. It can implement or execute the methods, steps, and logic block diagrams disclosed in one or more embodiments of this application. The general-purpose processor can be a microprocessor or any conventional processor. The steps of the method disclosed in one or more embodiments of this application can be directly embodied in the execution of a hardware decoding processor, or executed by a combination of hardware and software modules in the decoding processor. The software module can reside in a mature storage medium in the field, such as random access memory, flash memory, read-only memory, programmable read-only memory, electrically erasable programmable memory, or registers. This storage medium is located in memory, and the processor reads information from the memory and, in conjunction with its hardware, completes the steps of the above method.
[0115] The electronic device can also perform Figure 1 The method, and to implement the media information violation detection device in Figure 2 The functions of the embodiments shown are not described in detail here.
[0116] Of course, in addition to software implementation, the electronic device of this application does not exclude other implementation methods, such as logic devices or a combination of hardware and software, etc. In other words, the execution subject of the following processing flow is not limited to each logic unit, but can also be hardware or logic devices.
[0117] This application also proposes a computer-readable storage medium that stores one or more programs, the programs including instructions that, when executed by a portable electronic device including multiple applications, enable the portable electronic device to perform... Figure 1 The method of the illustrated embodiment is specifically used to perform the following operations:
[0118] Obtain the media information to be detected;
[0119] Key data in the media information is identified, and the key data is used to evaluate whether there is a violation;
[0120] Based on the reorganization rules corresponding to the information type of the media information, the key data is reorganized to obtain reorganized data;
[0121] The reconstructed data is matched with data in a pre-configured standard database to determine whether there is any illegal content in the media information;
[0122] The data recorded in the standard database are either data indicating the presence of violations or data indicating the absence of violations.
[0123] Figure 3 This is a schematic diagram of the structure of a media information violation detection device provided in one embodiment of this application. Please refer to... Figure 3 In one software implementation, the media information violation detection device includes:
[0124] The acquisition unit is used to acquire the media information to be detected;
[0125] The determining unit is used to determine key data in the media information, wherein the key data is data used to evaluate whether there is a violation;
[0126] The recombination unit is used to recombine the key data based on the recombination rules corresponding to the information type of the media information to obtain recombined data;
[0127] A matching unit is used to match the reconstructed data with data in a pre-configured standard database to determine whether there is any illegal content in the media information;
[0128] The data recorded in the standard database are either data indicating the presence of violations or data indicating the absence of violations.
[0129] In summary, the above description is merely a preferred embodiment of this document and is not intended to limit the scope of protection of this document. Any modifications, equivalent substitutions, improvements, etc., made within the spirit and principles of this document should be included within the scope of protection of this document.
[0130] The systems, devices, modules, or units described in the above embodiments can be implemented by computer chips or entities, or by products with certain functions. A typical implementation device is a computer. Specifically, a computer can be, for example, a personal computer, laptop computer, cellular phone, camera phone, smartphone, personal digital assistant, media player, navigation device, email device, game console, tablet computer, wearable device, or any combination of these devices.
[0131] Computer-readable media includes both permanent and non-permanent, removable and non-removable media that can store information using any method or technology. Information can be computer-readable instructions, data structures, modules of programs, or other data. Examples of computer storage media include, but are not limited to, phase-change memory (PRAM), static random access memory (SRAM), dynamic random access memory (DRAM), other types of random access memory (RAM), read-only memory (ROM), electrically erasable programmable read-only memory (EEPROM), flash memory or other memory technologies, CD-ROM, digital versatile optical disc (DVD) or other optical storage, magnetic tape, magnetic magnetic disk storage or other magnetic storage devices, or any other non-transferable medium that can be used to store information accessible by a computing device. As defined herein, computer-readable media does not include transient computer-readable media, such as modulated data signals and carrier waves.
[0132] The various embodiments in this document are described in a progressive manner. Similar or identical parts between embodiments can be referred to mutually. Each embodiment focuses on describing the differences from other embodiments. In particular, the system embodiments are basically similar to the method embodiments, so the description is relatively simple; relevant parts can be referred to the descriptions in the method embodiments.
Claims
1. A method for detecting violations of media information, characterized in that, include: Obtain the media information to be detected; Key data in the media information is identified, and the key data is used to evaluate whether there is a violation; Based on the reorganization rules corresponding to the information type of the media information, the key data is reorganized to obtain reorganized data; The reconstructed data is matched with data in a pre-configured standard database to determine whether there is any illegal content in the media information; The data recorded in the standard database are either data indicating the presence of violations or data indicating the absence of violations. When the media information is text, the key data in the media information is determined. This key data is used to evaluate whether a violation has occurred, including: The text is matched with a pre-configured first standard thesaurus to identify keywords in the text that are suspected of being illegal. The key data is restructured based on the restructuring rules corresponding to the information type of the media information to obtain restructured data, including: The first character in the keyword is combined with at least one character preceding it to obtain a first recombined word, and the last character in the keyword is combined with at least one character following it to obtain a second recombined word. The step of matching the reconstructed data with data in a pre-configured standard database to determine whether there is any illegal content in the media information includes: The first and second recombined words are matched with a pre-configured second standard vocabulary to determine whether there are any illegal words in the text. The first standard lexicon contains words that represent violations, and the second standard lexicon contains words that represent non-violations. When the media information is audio, the key data in the media information is determined. This key data is used to evaluate whether a violation has occurred, including: Convert the audio into text; The text is matched with a pre-configured first standard thesaurus to identify keywords in the text that are suspected of being illegal. The key data is reorganized based on the reorganization rules corresponding to the information type of the media information to obtain reorganized data; The first character in the keyword is combined with at least one character preceding it to obtain a first recombined word, and the last character in the keyword is combined with at least one character following it to obtain a second recombined word. The step of matching the reconstructed data with data in a pre-configured standard database to determine whether there is any illegal content in the media information includes: The first and second recombined words are matched with a pre-configured second standard vocabulary to determine whether there are any illegal words in the text. The first standard lexicon contains words that represent violations, and the second standard lexicon contains words that represent non-violations. When the media information is an image, the key data in the media information is determined. This key data is used to evaluate whether a violation has occurred, including: Extract the shape features, color features, and texture features of the image; The key data is restructured based on the restructuring rules corresponding to the information type of the media information to obtain restructured data, including: The shape features, color features, and texture features of the image are combined to obtain combined features; The step of matching the reconstructed data with data in a pre-configured standard database to determine whether there is any illegal content in the media information includes: The combined features are matched with features in a pre-configured standard feature library to determine whether there is any illegal content in the image; The standard feature library records combined features that characterize the presence of illegal content. These combined features are obtained by combining the shape features, color features, and texture features of the image containing illegal content. When the media information is video, the key data in the media information is determined. This key data is used to evaluate whether a violation has occurred, including: Extract image frames from the video; Extract the shape features, color features, and texture features of the image frame; The key data is restructured based on the restructuring rules corresponding to the information type of the media information to obtain restructured data, including: The shape features, color features, and texture features of the image frame are combined to obtain the combined features; The step of matching the reconstructed data with data in a pre-configured standard database to determine whether there is any illegal content in the media information includes: The combined features are matched with features in a pre-configured standard feature library to determine whether there is any illegal content in the image; The standard feature library records combined features that characterize the presence of illegal content. These combined features are obtained by combining the shape, color, and texture features of the image containing illegal content.
2. The method according to claim 1, characterized in that, When the text contains prohibited words, the method further includes: Mark the prohibited words in the text.
3. The method according to claim 1, characterized in that, The acquisition of the media information to be detected includes: Crawl the media information from the media information publishing terminal; or Receive the media information uploaded by the user.
4. A device for detecting violations of media information, characterized in that, include: The acquisition unit is used to acquire the media information to be detected; The determining unit is used to determine key data in the media information, wherein the key data is data used to evaluate whether there is a violation; The recombination unit is used to recombine the key data based on the recombination rules corresponding to the information type of the media information to obtain recombined data; A matching unit is used to match the reconstructed data with data in a pre-configured standard database to determine whether there is any illegal content in the media information; The data recorded in the standard database are either data indicating the presence of violations or data indicating the absence of violations. When the media information is text, the key data in the media information is determined. This key data is used to evaluate whether a violation has occurred, including: The text is matched with a pre-configured first standard thesaurus to identify keywords in the text that are suspected of being illegal. The key data is restructured based on the restructuring rules corresponding to the information type of the media information to obtain restructured data, including: The first character in the keyword is combined with at least one character preceding it to obtain a first recombined word, and the last character in the keyword is combined with at least one character following it to obtain a second recombined word. The step of matching the reconstructed data with data in a pre-configured standard database to determine whether there is any illegal content in the media information includes: The first and second recombined words are matched with a pre-configured second standard vocabulary to determine whether there are any illegal words in the text. The first standard lexicon contains words that represent violations, and the second standard lexicon contains words that represent non-violations. When the media information is audio, the key data in the media information is determined. This key data is used to evaluate whether a violation has occurred, including: Convert the audio into text; The text is matched with a pre-configured first standard thesaurus to identify keywords in the text that are suspected of being illegal. The key data is reorganized based on the reorganization rules corresponding to the information type of the media information to obtain reorganized data; The first character in the keyword is combined with at least one character preceding it to obtain a first recombined word, and the last character in the keyword is combined with at least one character following it to obtain a second recombined word. The step of matching the reconstructed data with data in a pre-configured standard database to determine whether there is any illegal content in the media information includes: The first and second recombined words are matched with a pre-configured second standard vocabulary to determine whether there are any illegal words in the text. The first standard lexicon contains words that represent violations, and the second standard lexicon contains words that represent non-violations. When the media information is an image, the key data in the media information is determined. This key data is used to evaluate whether a violation has occurred, including: Extract the shape features, color features, and texture features of the image; The key data is restructured based on the restructuring rules corresponding to the information type of the media information to obtain restructured data, including: The shape features, color features, and texture features of the image are combined to obtain combined features; The step of matching the reconstructed data with data in a pre-configured standard database to determine whether there is any illegal content in the media information includes: The combined features are matched with features in a pre-configured standard feature library to determine whether there is any illegal content in the image; The standard feature library records combined features that characterize the presence of illegal content. These combined features are obtained by combining the shape features, color features, and texture features of the image containing illegal content. When the media information is video, the key data in the media information is determined. This key data is used to evaluate whether a violation has occurred, including: Extract image frames from the video; Extract the shape features, color features, and texture features of the image frame; The key data is restructured based on the restructuring rules corresponding to the information type of the media information to obtain restructured data, including: The shape features, color features, and texture features of the image frame are combined to obtain the combined features; The step of matching the reconstructed data with data in a pre-configured standard database to determine whether there is any illegal content in the media information includes: The combined features are matched with features in a pre-configured standard feature library to determine whether there is any illegal content in the image; The standard feature library records combined features that characterize the presence of illegal content. These combined features are obtained by combining the shape, color, and texture features of the image containing illegal content.
5. An electronic device, characterized in that, It includes a processor, a communication interface, a memory, and a communication bus, wherein the processor, the communication interface, and the memory communicate with each other through the bus; Memory, used to store computer programs; The processor executes programs stored in memory, implementing the following process: Obtain the media information to be detected; Key data in the media information is identified, and the key data is used to evaluate whether there is a violation; Based on the reorganization rules corresponding to the information type of the media information, the key data is reorganized to obtain reorganized data; The reconstructed data is matched with data in a pre-configured standard database to determine whether there is any illegal content in the media information; The data recorded in the standard database are either data indicating the presence of violations or data indicating the absence of violations. When the media information is text, the key data in the media information is determined. This key data is used to evaluate whether a violation has occurred, including: The text is matched with a pre-configured first standard thesaurus to identify keywords in the text that are suspected of being illegal. The key data is restructured based on the restructuring rules corresponding to the information type of the media information to obtain restructured data, including: The first character in the keyword is combined with at least one character preceding it to obtain a first recombined word, and the last character in the keyword is combined with at least one character following it to obtain a second recombined word. The step of matching the reconstructed data with data in a pre-configured standard database to determine whether there is any illegal content in the media information includes: The first and second recombined words are matched with a pre-configured second standard vocabulary to determine whether there are any illegal words in the text. The first standard lexicon contains words that represent violations, and the second standard lexicon contains words that represent non-violations. When the media information is audio, the key data in the media information is determined. This key data is used to evaluate whether a violation has occurred, including: Convert the audio into text; The text is matched with a pre-configured first standard thesaurus to identify keywords in the text that are suspected of being illegal. The key data is reorganized based on the reorganization rules corresponding to the information type of the media information to obtain reorganized data; The first character in the keyword is combined with at least one character preceding it to obtain a first recombined word, and the last character in the keyword is combined with at least one character following it to obtain a second recombined word. The step of matching the reconstructed data with data in a pre-configured standard database to determine whether there is any illegal content in the media information includes: The first and second recombined words are matched with a pre-configured second standard vocabulary to determine whether there are any illegal words in the text. The first standard lexicon contains words that represent violations, and the second standard lexicon contains words that represent non-violations. When the media information is an image, the key data in the media information is determined. This key data is used to evaluate whether a violation has occurred, including: Extract the shape features, color features, and texture features of the image; The key data is restructured based on the restructuring rules corresponding to the information type of the media information to obtain restructured data, including: The shape features, color features, and texture features of the image are combined to obtain combined features; The step of matching the reconstructed data with data in a pre-configured standard database to determine whether there is any illegal content in the media information includes: The combined features are matched with features in a pre-configured standard feature library to determine whether there is any illegal content in the image; The standard feature library records combined features that characterize the presence of illegal content. These combined features are obtained by combining the shape features, color features, and texture features of the image containing illegal content. When the media information is video, the key data in the media information is determined. This key data is used to evaluate whether a violation has occurred, including: Extract image frames from the video; Extract the shape features, color features, and texture features of the image frame; The key data is restructured based on the restructuring rules corresponding to the information type of the media information to obtain restructured data, including: The shape features, color features, and texture features of the image frame are combined to obtain the combined features; The step of matching the reconstructed data with data in a pre-configured standard database to determine whether there is any illegal content in the media information includes: The combined features are matched with features in a pre-configured standard feature library to determine whether there is any illegal content in the image; The standard feature library records combined features that characterize the presence of illegal content. These combined features are obtained by combining the shape, color, and texture features of the image containing illegal content.
6. A computer-readable storage medium, characterized in that, The storage medium stores a computer program, which, when executed by a processor, performs the following process: Obtain the media information to be detected; Key data in the media information is identified, and the key data is used to evaluate whether there is a violation; Based on the reorganization rules corresponding to the information type of the media information, the key data is reorganized to obtain reorganized data; The reconstructed data is matched with data in a pre-configured standard database to determine whether there is any illegal content in the media information; The data recorded in the standard database are either data indicating the presence of violations or data indicating the absence of violations. When the media information is text, the key data in the media information is determined. This key data is used to evaluate whether a violation has occurred, including: The text is matched with a pre-configured first standard thesaurus to identify keywords in the text that are suspected of being illegal. The key data is restructured based on the restructuring rules corresponding to the information type of the media information to obtain restructured data, including: The first character in the keyword is combined with at least one character preceding it to obtain a first recombined word, and the last character in the keyword is combined with at least one character following it to obtain a second recombined word. The step of matching the reconstructed data with data in a pre-configured standard database to determine whether there is any illegal content in the media information includes: The first and second recombined words are matched with a pre-configured second standard vocabulary to determine whether there are any illegal words in the text. The first standard lexicon contains words that represent violations, and the second standard lexicon contains words that represent non-violations. When the media information is audio, the key data in the media information is determined. This key data is used to evaluate whether a violation has occurred, including: Convert the audio into text; The text is matched with a pre-configured first standard thesaurus to identify keywords in the text that are suspected of being illegal. The key data is reorganized based on the reorganization rules corresponding to the information type of the media information to obtain reorganized data; The first character in the keyword is combined with at least one character preceding it to obtain a first recombined word, and the last character in the keyword is combined with at least one character following it to obtain a second recombined word. The step of matching the reconstructed data with data in a pre-configured standard database to determine whether there is any illegal content in the media information includes: The first and second recombined words are matched with a pre-configured second standard vocabulary to determine whether there are any illegal words in the text. The first standard lexicon contains words that represent violations, and the second standard lexicon contains words that represent non-violations. When the media information is an image, the key data in the media information is determined. This key data is used to evaluate whether a violation has occurred, including: Extract the shape features, color features, and texture features of the image; The key data is restructured based on the restructuring rules corresponding to the information type of the media information to obtain restructured data, including: The shape features, color features, and texture features of the image are combined to obtain combined features; The step of matching the reconstructed data with data in a pre-configured standard database to determine whether there is any illegal content in the media information includes: The combined features are matched with features in a pre-configured standard feature library to determine whether there is any illegal content in the image; The standard feature library records combined features that characterize the presence of illegal content. These combined features are obtained by combining the shape features, color features, and texture features of the image containing illegal content. When the media information is video, the key data in the media information is determined. This key data is used to evaluate whether a violation has occurred, including: Extract image frames from the video; Extract the shape features, color features, and texture features of the image frame; The key data is restructured based on the restructuring rules corresponding to the information type of the media information to obtain restructured data, including: The shape features, color features, and texture features of the image frame are combined to obtain the combined features; The step of matching the reconstructed data with data in a pre-configured standard database to determine whether there is any illegal content in the media information includes: The combined features are matched with features in a pre-configured standard feature library to determine whether there is any illegal content in the image; The standard feature library records combined features that characterize the presence of illegal content. These combined features are obtained by combining the shape, color, and texture features of the image containing illegal content.