File correlation computing system and method

A computing system and computing method technology, applied in the field of network communication, can solve problems such as unsatisfactory document correlation calculation, sparse vocabulary, and unsatisfactory overall effect of the algorithm, so as to avoid sparse vocabulary, improve accuracy, and eliminate negative effects.

Active Publication Date: 2007-11-28
SHENZHEN SHI JI GUANG SU INFORMATION TECH
View PDF0 Cites 23 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, due to the full development of expressive language, it is a common phenomenon that one word has multiple meanings and one word has multiple meanings. In addition, the use of rhetoric makes the phen

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • File correlation computing system and method
  • File correlation computing system and method

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0019] The present invention will be further elaborated below according to the accompanying drawings and preferred embodiments.

[0020] As shown in FIG. 1 , a document correlation calculation system of the present invention includes a document preprocessing module 1 , a word segmentation module 2 , a word segmentation postprocessing module 3 , a sememe processing module, and a document correlation calculation module 8 . The sememe processing module includes a sememe tagging module 4 , a word sense disambiguation module 5 and a topic semantic vector calculation module 6 which are connected in sequence. According to needs, it may also include a topic semantic vector library 7 , whose input end is connected to the topic semantic vector calculation module 6 , and whose output end is connected to the document correlation degree calculation module 8 .

[0021] Among them, the document preprocessing module 1 is used to convert the input documents in different formats into standard f...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a document related degree calculating system, which is characterized by the following: comprising sequence of document pretreating module and dividing vocabulary module; setting the output of the document pretreating module as at least one pre-analyzing document; setting the output of the dividing vocabulary module as relative first vocabulary meter; also comprising aryumentation element processing module and document related degree calculating module; converting the vocabulary of the first vocabulary mater to aryumentation element; calculating the weight of the aryumentation element; getting at least one theme semantic vector with relative to at least one document; connecting the document relative degree calculating module to the theme semantic vector calculating module; using to calculate related degree of at least two theme semantic vector. This invention also discloses a document related degree calculating method. This invention can remove vocabulary rarefaction and ambiguous vocabulary phenomenon to improve the calculating accuracy of document related degree.

Description

technical field [0001] The present invention relates to network communication technology, more specifically, to a system and method for calculating document correlation. Background technique [0002] Document correlation is a decimal between 0 and 1, representing the degree of semantic correlation between two documents. For example, the correlation between two identical documents is 1, while the correlation between a document involving programming technology and a document involving political society is far less than 1 and close to 0. Calculating document relevance can be applied in many aspects, such as classification and clustering of documents, retrieval of related article information, and so on. [0003] At present, the calculation of document relevance is based on the topic vocabulary extraction technology: firstly, the topic vocabulary of the documents to be compared is extracted by calculation, and then the correlation of the documents to be compared is obtained by c...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F17/27G06F17/30
Inventor 丁江伟
Owner SHENZHEN SHI JI GUANG SU INFORMATION TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products