Keyword extraction system

A technology for extracting systems and keywords, applied in special data processing applications, instruments, electronic digital data processing, etc., can solve problems such as unsupervised, limited information, and inability to provide enough information on document topics, to achieve targeted, expanded Scope of investigation and the effect of improving computational efficiency

Inactive Publication Date: 2017-08-01
成都数联铭品科技有限公司
View PDF9 Cites 17 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] For the existing technology of keyword extraction, one way is to use an unsupervised method, use the statistical properties of candidate keywords, such as (TFIDF), sort them, and select the highest ones as keywords, but this way This is a pure use of statistical properties, but only uses the internal information of the document, that is, the degree of aggregation of words to discover the topic of the document
The disadvantage of this method is that the information of a document is limited, and often cannot provide enough information for discovering the subject of the document. In some documents, individual very important keywords, although the frequency of occurrence is relatively low, are important for the subject of the article. Responses play a very important role. At this time, it is impossible to extract these words only through statistical methods.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Keyword extraction system
  • Keyword extraction system
  • Keyword extraction system

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0047] Input the following text into the keyword extraction system of the present invention for keyword extraction, "European stocks open lower for the Christmas holiday arrangements of major European countries. China Securities Net News European stocks opened lower today, and the Stoxx 600 index fell 0.1% to 366.12 points; the French CAC index fell 0.1% to 4669.76 points; the British FTSE 100 index fell 0.2% to 6253.29 points; the German stock market is closed today. The Christmas holiday arrangements of major European countries are not the same. The market will be closed for three and a half days during the holidays, the market will be closed half a day earlier on the 24th, the market will be closed all day on the 25th (Christmas), the 26th (Boxing Day) will fall on Saturday, and a compensatory rest will be required on the 28th. Trading again; the French stock market is closed for the Christmas holiday, which is December 25; the German stock market is closed for two days on D...

Embodiment 2

[0050] Input the following text into the keyword extraction system of the present invention, and perform keyword extraction, "A certain stock is suspected of violating the securities law and was investigated by the Securities Regulatory Commission. To the Securities Regulatory Commission's "Investigation Notice". Because the company is suspected of violating securities laws and regulations, according to the relevant provisions of the Securities Law, the China Securities Regulatory Commission decided to investigate the company." The keyword extraction results are as follows: {XX shares||Securities Law|| Investigation||Securities Regulatory Commission||Company||}, and the keyword result extracted through the existing textrank technology is: {Securities Regulatory Commission||Company||Investigation||Securities Law||Violation||}. The system of the present invention better extracts the subject words such as certain shares, and can reflect the theme of the document better than the re...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention relates to the field of natural language processing, in particular to a keyword extraction system. The system comprises a preprocessing module, a word vector conversion module, a part-of-speech tagging module, a part-of-speech screening module and a candidate word weight calculation module. According to the system, the part-of-speech of keyword extraction is limited through part-of-speech screening, then a keyword extraction direction is adjusted, and a word vector is trained by introducing a large-scale corpus library; weights of candidate words in a to-be-extracted document are calculated in a cosine distance and IF-IDF weight combination mode by depending on the word vector trained in the large-scale corpus library; and an investigation range of keyword extraction is expanded by introducing an external corpus library, so that a keyword extraction result is more reasonable and a new tool is provided for effectively extracting a keyword.

Description

technical field [0001] The invention relates to the field of natural language processing, in particular to a keyword extraction system. Background technique [0002] With the rapid development of the Internet and the advent of the era of big data, in real life, a large amount of information that humans come into contact with exists in the form of electronic documents. Faced with such a vast amount of information, people urgently need machines that can automatically identify the most Keywords that can represent the main content of the article help people understand the main content of the article faster and save people's time in reading, processing and utilizing these electronic documents. [0003] At present, this technology is called Keyword Extraction (Keyword Extraction). Keyword extraction refers to quickly obtaining multiple words or phrases that can represent the subject of the document from the document, as a refined overview of the main content of the document. Thro...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/27
CPCG06F40/216G06F40/284G06F40/289
Inventor 罗镇权罗强刘世林练睿闫俊杰
Owner 成都数联铭品科技有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products