Content type detection method and device

A detection method and content category technology, applied in the field of classification and identification, can solve problems that affect user browsing experience, consume a lot of manpower and material resources, and cannot deal with bad content, so as to shorten the detection time, reduce manpower and material resources, and reduce detection costs. Effect

Active Publication Date: 2015-03-04
BEIJING BAIDU NETCOM SCI & TECH CO LTD
View PDF5 Cites 40 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, while the Internet brings convenience to people's lives, it also brings many negative effects to people's lives
For example, for the purpose of making profits and increasing click-through rates, some websites on the Internet will display some unhealthy content to users, which seriously affects the browsing experience of users. Especially for teenagers, these contents will have a negative impact on their physical and mental development. Significant influence
[0003] At present, most of the identification of website content (such as pornographic content) is based on manual judgment. Although this method is accurate, it is inefficient and requires a lot of manpower and material resources. It cannot deal with the increasingly harmful content on the current website.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Content type detection method and device
  • Content type detection method and device
  • Content type detection method and device

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0022] figure 1 It is a schematic flowchart of a content category detection method provided in Embodiment 1 of the present invention. This embodiment is applicable to the situation where the category detection of the content to be detected is performed. The method can be executed by a category detection device, and the device is composed of software and / or or hardware implementation. see figure 1 The content category detection method provided in this embodiment specifically includes the following operations:

[0023] Operation 110, performing feature extraction on the content to be detected.

[0024] In this embodiment, the content to be detected may be pre-stored locally, or obtained in real time from other devices in text and / or picture format. For example, the content to be detected is web page content including text and / or image format obtained by parsing an HTML (HyperText Mark-up Language) page obtained from a server in the Internet.

[0025] For content in text form...

Embodiment 2

[0036] figure 2 It is a schematic flowchart of a content category detection method provided by Embodiment 2 of the present invention. On the basis of Embodiment 1 above, this embodiment adds the operation of obtaining the content to be detected, and further optimizes the above operation 110 based on this operation . see figure 2 The content category detection method provided in this embodiment specifically includes the following operations:

[0037] Operation 210. Obtain the web page content according to the uniform resource locator as the content to be detected;

[0038] Operation 220, if the webpage content contains text content, perform feature extraction on the text content based on the text feature extraction algorithm, and add the feature extraction result to the feature set of the webpage content;

[0039] Operation 230, if the webpage content includes picture content, perform target feature recognition on the picture content, establish a feature vector of the pict...

Embodiment 3

[0052] image 3 It is a schematic flowchart of a content category detection method provided by Embodiment 3 of the present invention. On the basis of the above-mentioned embodiments, this embodiment "determines the content corresponding to the content of the webpage according to the category detection results obtained by at least two classifiers." The operation of "final category detection results" is further optimized, and the operation of optimizing classifiers and their voting weights is added accordingly. see image 3 The content category detection method provided in this embodiment specifically includes the following operations:

[0053] Operation 310, performing feature extraction on the content to be detected;

[0054] Operation 320. According to the feature extraction result, use at least two classifiers suitable for the content to be detected to detect the category of the content to be detected;

[0055] Operation 330: Determine the final category detection result ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The embodiment of the invention discloses a content type detection method and device. The method comprises the following steps: extracting the features of content to be detected; adopting at least two kinds of classifiers matched with the content to be detected according to a feature extraction result, and performing type detection on the content to be detected; determining a final type detection result corresponding to the content to be detected according to type detection results obtained by the at least two kinds of classifiers. According to the technical scheme provided by the embodiment of the invention, the type of acquired content can be detected automatically, the detection time is shortened, and the detection cost is reduced.

Description

technical field [0001] The embodiments of the present invention relate to the technical field of category recognition, and in particular, to a content category detection method and device. Background technique [0002] With the development of Internet technology, the information on the Internet is increasing rapidly at an exponential rate all the time, and the ways for people to obtain and use information are becoming more and more diverse and convenient. However, while the Internet brings convenience to people's lives, it also brings many negative effects to people's lives. For example, for the purpose of making profits and increasing click-through rates, some websites on the Internet will display some unhealthy content to users, which seriously affects the browsing experience of users. Especially for teenagers, these contents will have a negative impact on their physical and mental development. Significant influence. [0003] At present, most of the identification of web...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30
CPCG06F16/9535G06F18/24
Inventor 唐呈光张兵杨念耿志峰
Owner BEIJING BAIDU NETCOM SCI & TECH CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products