Unlock instant, AI-driven research and patent intelligence for your innovation.

Preprocessing method and device for correlation calculation

A preprocessing, non-related technology, applied in computing, electrical digital data processing, special data processing applications, etc., can solve the problem of inability to exclude non-related words

Active Publication Date: 2016-04-13
NAT UNIV OF DEFENSE TECH +1
View PDF7 Cites 14 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0011] The purpose of the present invention is to provide a preprocessing method and device for correlation calculation, which solves the technical problem that the TF-IDF measurement method in the prior art cannot exclude non-related words with extended meaning in the text

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Preprocessing method and device for correlation calculation
  • Preprocessing method and device for correlation calculation
  • Preprocessing method and device for correlation calculation

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0043] The drawings constituting a part of the present application are used to provide a further understanding of the present invention, and the exemplary embodiments and descriptions of the present invention are used to explain the present invention, and do not constitute an improper limitation of the present invention.

[0044] See figure 1 , The preprocessing method for correlation calculation provided by the present invention includes the following steps:

[0045] Step S100: After the text to be processed is segmented and marked as part of speech, a dictionary is constructed to obtain a document word frequency matrix F based on the dictionary, and LDA clustering calculation is performed on the document word frequency matrix F to obtain the document-topic probability distribution p(θ) and topic -Word probability distribution

[0046] Step S200: Calculate the non-related topic set NP of the text to be processed through the document-topic probability distribution p(θ), and obtain ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention provides a preprocessing method and device for correlation calculation. According to the method, on the basis of text segmentation, an LDA (latent Dirichlet allocation) model algorithm is used for performing topic clustering calculation on a text, and document-topic probability distribution and topic-word probability distribution are obtained; then a non-correlated topic set of the text is calculated according to the document-topic probability distribution, and non-correlated words of the text are calculated according to the topic-word probability distribution, so that the words that are non-correlated with the topic content of documents are recognized and extracted. A filter result is applied to further correlation calculation. Accordingly, interference in correlation calculation by the non-correlated words is reduced.

Description

Technical field [0001] The present invention relates to the technical field of natural language processing, in particular to a preprocessing method and device for association calculation. Background technique [0002] With the rapid development of the Internet, a large amount of news information is generated every day, and people's demand for efficient retrieval and acquisition of information is becoming stronger. The emergence of various search engines and recommendation systems provides effective ways. The basis of these applications is to calculate the relationship between keywords and web content. However, in the calculation process, some noises are often associated with the search keyword resume due to the common words (such as applications) or the ambiguity of the keywords, which affects the search And the effect of further analysis. Therefore, it is necessary to perform pre-processing before the association calculation, and filter the words that are not associated with th...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F17/27G06F17/30
CPCG06F16/335G06F16/35G06F40/242G06F40/284
Inventor 修保新陈发君刘忠黄金才朱承程光权陈超冯旸赫杨文辉龙开亮
Owner NAT UNIV OF DEFENSE TECH