Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Sentence similarity calculation method based on information amount

A technology of sentence similarity and calculation method, applied in calculation, special data processing applications, instruments, etc., can solve problems such as lack of versatility, dependence on training data sets, and inaccurate sentences

Active Publication Date: 2014-10-08
BEIJING INSTITUTE OF TECHNOLOGYGY
View PDF5 Cites 5 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Traditional methods mostly use the calculation method of document similarity, which only regards sentence words as meaningless symbols that are not related to each other, which is not accurate enough for calculating sentences containing a small number of words
However, the commonly used hybrid methods usually need to train parameters on related data sets or use empirical parameters. The disadvantage is that they rely on training data sets and are not very versatile.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Sentence similarity calculation method based on information amount
  • Sentence similarity calculation method based on information amount
  • Sentence similarity calculation method based on information amount

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0046] The implementation process of the present invention will be described in detail below in conjunction with the accompanying drawings.

[0047] Such as figure 1 As shown, the inventive method is mainly divided into 5 steps:

[0048] Step 1: Enter the two sentences to be calculated. remember sentence s a and s b They are:

[0049] s a = { w i a | i = 1,2 , . . . , n }

[0050] s b = { w i b | i = 1,2 , . . . , m }

[0051] in, with Respectively represent the sentence s a and s b The i-th word of , n and m respectively represent the ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention relates to a sentence similarity calculation method based on information amount. The method comprises the following steps: firstly, confirming the sense of a word according to the concept with the maximum information amount in words of two sentences, subsequently, calculating the information amount of the word and the public information amount of multiple words according to an hierarchical structure and corpus statistics of a semantic net, calculating the total information amount of multiple words by using the inclusion-exclusion principle in combinatorial mathematics so as to respectively obtain respective information amount of two sentences and the total information amount of two sentences, and finally defining and calculating the similarity of the sentences according to the Jaccard similarity principle. By adopting the method, the judgment of human beings on the similarity degree of sentences can be authentically simulated, moreover other natural language processing techniques such as corpus training parameters or experience parameters, dependence on scale of corpus, part-of-speech tagging and the like are not needed, the time performance is excellent, and quasi real-time calculation efficiency can be obtained on a conventional main current multi-core PC (Personal Computer) for sentence pairs of normal lengths.

Description

technical field [0001] The invention relates to a sentence similarity calculation method, in particular to an information-based sentence similarity calculation method, which belongs to the technical field of natural language processing. Background technique [0002] Sentence or short text similarity calculation is an important research content of natural language processing. In recent years, it has played an increasingly important role in information retrieval, machine translation, question answering system, automatic summarization and other application fields. The traditional method mostly uses the calculation method of document similarity, and only regards the sentence words as meaningless symbols that are not related to each other, which is not accurate enough for the calculation of sentences containing a small number of words. However, the commonly used hybrid methods usually need to train parameters on related data sets or use empirical parameters. The disadvantage is t...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F17/30
CPCG06F40/30
Inventor 吴昊黄河燕
Owner BEIJING INSTITUTE OF TECHNOLOGYGY
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products