Unlock instant, AI-driven research and patent intelligence for your innovation.

Webpage labeling method and device

A technology for marking and labeling webpages, applied in the Internet field, can solve problems such as low recall rate, small coverage of artificially designed label system, and failure to meet the real needs of users, and achieve high recall rate and wide label coverage

Active Publication Date: 2017-08-11
BEIJING BAIDU NETCOM SCI & TECH CO LTD
View PDF5 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, the manual labeling method is time-consuming and labor-intensive, the recall rate is low, and the artificially designed labeling system covers a small range, which cannot meet the real needs of users

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Webpage labeling method and device
  • Webpage labeling method and device
  • Webpage labeling method and device

Examples

Experimental program
Comparison scheme
Effect test

example 1

[0054] Example 1: The web page provides the download service of the TV series "The Legend of Zhen Huan". When the user clicks on the web page, a corresponding query statement will be generated in the query log. If the NER tool is used to analyze the query statement, the obtained named entity word is "Zhen Huan Biography" and the demand word is "Download", then the combination mode of the named entity word and the demand word is "Zhen Huan Biography+Download".

example 2

[0055] Example 2: The web page provides ticket price information for Shanghai Disneyland. When the user clicks on the web page, a corresponding query statement will be generated in the query log. If the NER tool is used to analyze the query sentence, the named entity words obtained are "Shanghai" and "Disneyland", and the demand word obtained is "ticket price", then the combination mode of the named entity word and the demand word is "Shanghai + Disneyland+ ticket prices".

[0056] S22: Obtain the page views corresponding to the query statement.

[0057] In this embodiment, after obtaining the query statement conforming to the combination pattern of the named entity word and the demand word, the number of page views corresponding to the part of the query statement is further obtained.

[0058] It should be understood that the number of page views is the total number of visits to the webpage by the user.

[0059] S23: Sort the query statements according to the number of page...

example 3

[0122] Example 3: Use the combination of the required label and the feature to classify the corresponding label for the webpage to be labeled.

[0123] The combination of requirement labels and features is used to mark the corresponding labels for the webpages to be marked by calculating the similarity between the requirement labels and the characteristics. When using this labeling method to label tags, a similarity threshold needs to be set in advance to judge whether the calculated similarity reaches the similarity threshold. If the similarity calculated according to the characteristics of the requirement tag and the webpage to be marked reaches the similarity threshold, the requirement tag is marked on the webpage to be marked.

[0124] Optionally, when various classifiers are used to label corresponding labels for webpages to be labeled, appropriate labels can also be selected manually in combination with prior rules, so as to make the labeled labels more accurate. For ex...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention provides a webpage labeling method and device. The method comprises the steps that a demand label list is established; training data is mined according to the demand label list; classifiers are generated by training according to the training data, wherein the classifiers include a maximum entropy classifier, a second-class classifier and a combination pair classifier based on demand labels and characteristics; and to-be-labeled webpages are labeled with the corresponding labels based on the classifiers. Through the webpage labeling method and device, a large amount of high-quality training data can be acquired without the need for manual labeling, time and labor are saved, and the recall rate is high; moreover, the labels are defined according to actual search behaviors of users, the label coverage is wide, and true demands of the users can be reflected comprehensively.

Description

technical field [0001] The invention relates to the technical field of the Internet, in particular to a method and device for marking a web page. Background technique [0002] Different webpages often present different content and satisfy different user needs. When a user searches for desired information through the Internet, among the plurality of web pages searched after inputting keywords, although each web page contains the input keywords, the services that each web page can provide are different. For example, when a user wants to watch the TV series "The Legend of Zhen Huan" online, he enters "The Legend of Zhen Huan" to search, and the services provided by the webpages listed in the obtained search results may include the download of "The Legend of Zhen Huan", the online service of "The Legend of Zhen Huan". Watch, synopsis, cast info, and more. Due to the variety of services provided by web pages, users may not be able to directly find web pages that meet their need...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30
CPCG06F16/9562
Inventor 陈亮宇肖欣延吕雅娟
Owner BEIJING BAIDU NETCOM SCI & TECH CO LTD