Unlock instant, AI-driven research and patent intelligence for your innovation.

Method and device for labeling web page theme

A web page and theme technology, applied in the field of web page theme labeling methods and devices, can solve problems such as low accuracy rate of page theme labeling, and achieve the effect of improving efficiency and accuracy

Active Publication Date: 2019-05-28
NAT COMP NETWORK & INFORMATION SECURITY MANAGEMENT CENT
View PDF5 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0003] The present invention provides a method and device for labeling webpage topics to solve the problem of low accuracy in labeling webpage topics in the prior art

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method and device for labeling web page theme
  • Method and device for labeling web page theme
  • Method and device for labeling web page theme

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0024] The present invention will be described in further detail below in conjunction with the accompanying drawings and embodiments. It should be understood that the specific embodiments described here are only used to explain the present invention, not to limit the present invention.

[0025] This embodiment provides a method for labeling webpage topics, such as figure 1 As shown, it is a flowchart of a method for labeling a webpage theme according to an embodiment of the present invention. This embodiment is performed for each webpage.

[0026] Step S110, based on the title and text of the webpage, the subject feature vector of the webpage is obtained.

[0027] Due to the different lengths and language styles of the title of the webpage and the text, the present embodiment extracts the title and the text in the webpage respectively; according to the title, the title feature vector is constructed; according to the text, the text feature vector is constructed; the title fea...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a labeling method and device for web page topics. The method includes the steps that based on titles and main bodies of web pages, topic feature vectors of the web pages are acquired; classification processing is performed on the topic feature vectors through a classifier which is obtained through training in advance; whether types which the topic feature vectors belong to exist is judged; if yes, the web pages are labeled as the types which the topic feature vectors belong to; otherwise, the web pages are labeled as web pages to be labeled; furthermore, clustering processing is performed on the multiple web pages to be labeled; the type of each cluster is obtained through analysis; the web pages to be labeled are labeled as the types of the clusters which the web pages belong to. By the adoption of a supervised classification method and unsupervised clustering method cascading mode, the topics are automatically acquired from the web pages, the web pages are labeled, and the labeling efficiency and accuracy of the web page topics are effectively improved.

Description

technical field [0001] The invention relates to the technical field of data processing, in particular to a method and device for labeling webpage topics. Background technique [0002] Extracting and labeling web page topics by analyzing the content of Internet web pages is an important basis for applications such as Internet data management and mining. At present, the keyword matching method is mostly used for web page topic tagging, and the tagging of web pages is realized by matching the title of the web page with some preset keywords. However, this method of direct matching is too simple, and if the keywords in the title of the webpage change, this method will not be able to accurately mark the subject, and the accuracy of the webpage standard will not be guaranteed. Another type of webpage topic labeling is to use a clustering method to cluster webpages, and extract keywords from the clustered webpages as labels for this type of webpage. However, because the clustering...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Patents(China)
IPC IPC(8): G06F16/35G06F16/36
CPCG06F16/35G06F16/374
Inventor 李扬曦杜翠兰李睿佟玲玲翟羽佳王晶刘洋秦韬付戈
Owner NAT COMP NETWORK & INFORMATION SECURITY MANAGEMENT CENT