A Sentence Similarity Calculation Method Based on Information Amount

A technology of sentence similarity and calculation method, applied in computing, special data processing applications, instruments, etc., can solve the problems of inaccurate sentences, poor versatility, and dependence on training data sets, etc., and achieve excellent time performance and good versatility , the effect of improving the accuracy

Active Publication Date: 2017-02-22
BEIJING INSTITUTE OF TECHNOLOGYGY
View PDF5 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Traditional methods mostly use the calculation method of document similarity, which only regards sentence words as meaningless symbols that are not related to each other, which is not accurate enough for calculating sentences containing a small number of words
However, the commonly used hybrid methods usually need to train parameters on related data sets or use empirical parameters. The disadvantage is that they rely on training data sets and are not very versatile.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A Sentence Similarity Calculation Method Based on Information Amount
  • A Sentence Similarity Calculation Method Based on Information Amount
  • A Sentence Similarity Calculation Method Based on Information Amount

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0047] The implementation process of the present invention will be described in detail below in conjunction with the accompanying drawings.

[0048] Such as figure 1 As shown, the inventive method is mainly divided into 5 steps:

[0049] Step 1: Enter the two sentences to be calculated. remember sentence s a and s b They are:

[0050]

[0051]

[0052] in, and Respectively represent the sentence s a and s b The i-th word of , n and m respectively represent the sentence s a and s b number of words.

[0053] Step 2: Due to the common phenomenon of polysemous words, selecting the meaning of the words in the input sentence can eliminate the uncertainty of the sentence semantics, so as to prepare for the subsequent calculation of sentence similarity. The specific process is as follows:

[0054] (1) In the two input sentences, select a word to form a word pair;

[0055] (2) Use the meaning (or concept) of the word in the semantic network (such as WordNet) to rep...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention relates to a sentence similarity calculation method based on information amount. The method comprises the following steps: firstly, confirming the sense of a word according to the concept with the maximum information amount in words of two sentences, subsequently, calculating the information amount of the word and the public information amount of multiple words according to an hierarchical structure and corpus statistics of a semantic net, calculating the total information amount of multiple words by using the inclusion-exclusion principle in combinatorial mathematics so as to respectively obtain respective information amount of two sentences and the total information amount of two sentences, and finally defining and calculating the similarity of the sentences according to the Jaccard similarity principle. By adopting the method, the judgment of human beings on the similarity degree of sentences can be authentically simulated, moreover other natural language processing techniques such as corpus training parameters or experience parameters, dependence on scale of corpus, part-of-speech tagging and the like are not needed, the time performance is excellent, and quasi real-time calculation efficiency can be obtained on a conventional main current multi-core PC (Personal Computer) for sentence pairs of normal lengths.

Description

technical field [0001] The invention relates to a sentence similarity calculation method, in particular to an information-based sentence similarity calculation method, which belongs to the technical field of natural language processing. Background technique [0002] Sentence or short text similarity calculation is an important research content of natural language processing. In recent years, it has played an increasingly important role in information retrieval, machine translation, question answering system, automatic summarization and other application fields. The traditional method mostly uses the calculation method of document similarity, and only regards the sentence words as meaningless symbols that are not related to each other, which is not accurate enough for the calculation of sentences containing a small number of words. However, the commonly used hybrid methods usually need to train parameters on related data sets or use empirical parameters. The disadvantage is t...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Patents(China)
IPC IPC(8): G06F17/30
CPCG06F40/30
Inventor 吴昊黄河燕
Owner BEIJING INSTITUTE OF TECHNOLOGYGY
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products