Method for constructing correlation networks of keywords of natural language texts

A related network and natural language technology, which is applied in unstructured text data retrieval, text database clustering/classification, special data processing applications, etc. Relationships and other issues, to achieve high-precision results

Inactive Publication Date: 2015-03-04
BEIJING ZHONGKE CHUANGYI TECH CO LTD
View PDF5 Cites 81 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, any two words are isolated, and the vector cannot represent the relationship between words
Therefore, synonyms composed of different words, such as "microphone" and "microphone", cannot reflect the same meaning through this method of expression
This leads to the fact that sometimes keywords with a high degree of association will not be recognized, making the built association network less accurate

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method for constructing correlation networks of keywords of natural language texts
  • Method for constructing correlation networks of keywords of natural language texts

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0029] In order to enable those skilled in the art to better understand the present invention, the present invention will be further described in detail below in conjunction with the accompanying drawings and specific embodiments.

[0030] An embodiment of the present invention provides a method for constructing a natural language text keyword association network.

[0031] see figure 1 As shown, the method includes the steps of:

[0032] Step S110, constructing a dictionary of keywords, performing word segmentation on the target corpus according to the dictionary, and obtaining multiple words.

[0033] The keyword information in the target corpus is crawled by crawler technology, and the obtained multiple keywords are summarized into a dictionary, and the word segmentation operation is performed on the corpus according to the dictionary.

[0034] The word segmentation operation includes word segmentation based on string matching. Preferably, word segmentation based on semant...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides a method for constructing correlation networks of keywords of natural language texts. The method includes steps of constructing dictionaries of the keywords and segmenting words of target corpuses according to the dictionaries to obtain a plurality of words; performing statistics on front and rear word correlation frequencies of the multiple obtained words on the basis of N-element statistic language models; training the language models by the aid of neural networks under training conditions which are the frequencies obtained by means of statistics, and acquiring word vectors; computing the similarity degrees of the word vectors of each two corresponding words and generating semantic correlation between each two corresponding words; generating the correlation networks of the keywords of the texts according to the level of the semantic correlation between the corresponding words. The similarity degrees of the word vectors of each two corresponding words are used as measurement for the semantic correlation of the two words. The method has the advantage that the accuracy of the correlation networks of the texts in relevant items can be effectively improved by the aid of the method.

Description

technical field [0001] The invention belongs to the technical field of natural language processing, and more specifically relates to a method for building a natural language text keyword association network. Background technique [0002] Under normal circumstances, it is particularly necessary to use computer processing to process massive scientific and technological project data or to summarize and evaluate expert information data. In natural language processing technology, due to the language characteristics of Chinese itself, Chinese processing is more advanced than Latin-based Western language processing. It's much more complicated. A prerequisite for enabling computers to process natural language is text quantification. One of the processing methods of text quantification is to extract the characteristic words in the text content, that is, to extract industry or field keywords from text materials such as various scientific and technological documents, scientific and te...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30
CPCG06F16/35G06F16/374
Inventor 郭光
Owner BEIJING ZHONGKE CHUANGYI TECH CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products