Unlock instant, AI-driven research and patent intelligence for your innovation.

Method and device for extracting keywords in web pages

A keyword and page technology, applied in the field of computer networks, can solve problems such as inability to work, low versatility of keyword extraction technology, low processing efficiency, etc., and achieve the effect of improving versatility

Active Publication Date: 2018-05-08
BAIDU ONLINE NETWORK TECH (BEIJIBG) CO LTD
View PDF7 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] In order to solve the problem that the general keyword extraction in the prior art cannot work in an internationalized language, resulting in the low versatility, insufficient intelligence and low processing efficiency of the keyword extraction technology in the prior art, the embodiment of the present invention provides a A method and device for extracting keywords in a page

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method and device for extracting keywords in web pages
  • Method and device for extracting keywords in web pages
  • Method and device for extracting keywords in web pages

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0024] Various aspects of the present invention will be described in detail below with reference to the drawings and specific embodiments. Wherein, well-known modules, units and their mutual connections, links, communications or operations are not shown or described in detail. Also, the described features, architectures, or functions may be combined in any manner in one or more implementations. It should be understood by those skilled in the art that the various implementations described below are only for illustration, rather than limiting the protection scope of the present invention. It can also be easily understood that the modules or units or processing methods in the embodiments described herein and shown in the accompanying drawings can be combined and designed in various configurations.

[0025] figure 1 It is a flowchart of a method for extracting keywords in a page according to an embodiment of the present invention; see figure 1 , the method includes:

[0026] S...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a method and device for extracting keywords in a page. The method includes: analyzing the title content of the page to obtain candidate words, and the obtained candidate words form a candidate word lookup table; performing page analysis on the page to obtain Text combination, the obtained text combination constitutes a short string set; string analysis is performed on the short string set to obtain a string, and the obtained string forms an original weight pool; according to the order of the number of words contained in each string in the original weight pool , the weight vote is performed by the string to the candidate words in the candidate word lookup table, if the string matches the candidate word in the candidate word lookup table, the weight value of the candidate word is increased; according to the weight value of the candidate word from large to small Sorting is carried out, and according to the sorting, a predetermined number of candidate words whose weight values ​​are ranked first are extracted as keywords. By adopting the present invention, the versatility of the keyword extraction technology can be improved, and the way of extracting keywords is more intelligent and efficient.

Description

technical field [0001] The invention relates to the field of computer networks, and more specifically, to a method and device for extracting keywords in a page. Background technique [0002] With the development of the network, people can handle more and more things through the network. However, users need to use keywords as search content when facing various information queries. If the keywords in the page can be scientifically Extraction and application will improve the query effect with half the effort. [0003] The analysis and extraction of keywords in the existing technology needs to rely on prior knowledge, such as word segmentation, part-of-speech tagging, and stop word dictionary. These natural language processing logics need to accumulate thesaurus to carry out. Commonly used are statistical methods based on TF-IDF (term frequency-inverse document frequency, that is, a commonly used weighting technique for information retrieval and information mining), some based ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Patents(China)
IPC IPC(8): G06F17/27
Inventor θŒƒζ–Œ
Owner BAIDU ONLINE NETWORK TECH (BEIJIBG) CO LTD