Keyword extraction method and device

An extraction method and keyword technology, applied in the field of data mining, can solve the problems that the frequency of occurrence cannot determine the importance, the accuracy of keyword extraction is reduced, and keywords cannot accurately prompt documents, etc.

Active Publication Date: 2018-04-06
TENCENT TECH (SHENZHEN) CO LTD
View PDF7 Cites 10 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, the frequency of a word in a document does not determine the importance of the word in the document. Therefore, the ke

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Keyword extraction method and device
  • Keyword extraction method and device
  • Keyword extraction method and device

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0032] The following will clearly and completely describe the technical solutions in the embodiments of the present invention with reference to the accompanying drawings in the embodiments of the present invention. Obviously, the described embodiments are only some, not all, embodiments of the present invention. Based on the embodiments of the present invention, all other embodiments obtained by persons of ordinary skill in the art without making creative efforts belong to the protection scope of the present invention.

[0033] The keyword extraction method provided by the embodiment of the present invention can be applied to the scene of keyword extraction from a webpage document read by a user, for example: extracting multiple candidate morphemes from a webpage document, and calculating the importance of each candidate morpheme based on the morpheme importance model According to the preset rules, the plurality of candidate morphemes are arranged and combined to generate a plu...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The embodiment of the invention provides a keyword extraction method and device. The method comprises the steps that multiple candidate morphemes are extracted from a to-be-extracted document, and theimportance of each candidate morpheme is calculated based on a morpheme importance model; permutation and combination are performed on the candidate morphemes according to preset rules, multiple candidate short strings are generated, and the integrity of each candidate short string is calculated based on a short string integrity model; candidate morphemes in a first quantity are selected from thecandidate morphemes according to the order of the importance; candidate short strings in a second quantity are selected from the candidate short strings according to the order of the integrity; and the candidate morphemes in the first quantity and the candidate short strings in the second quantity are determined as keywords of the to-be-extracted document. By the adoption of the keyword extraction method and device, the morphemes with high importance and the short strings with high integrity in the to-be-extracted document are extracted, and therefore the accuracy of the extracted keywords isimproved.

Description

technical field [0001] The invention relates to the field of data mining, in particular to a keyword extraction method and device. Background technique [0002] With the development of computer technology, communication technology and Internet technology, more and more data are accumulated. Faced with the surge of data, people hope to dig out valuable information, so that they can better use these data to serve people. Among them, the extraction of keywords has become a hot issue, which can prompt or summarize the content of documents through keywords, so that It is convenient for some applications to extract keywords from articles that users have read before, and recommend articles that meet the user's interests and hobbies to users based on the extracted keywords, or advertisers can place suitable advertisements based on keywords on a certain web page, and so on. [0003] At present, there are many keyword extraction methods. The focus of these methods is to obtain the wo...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F17/27
CPCG06F40/216G06F40/284G06F40/289
Inventor 张博林乐宇夏锋陈磊刘毅冯喆
Owner TENCENT TECH (SHENZHEN) CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products