Sentence similarity calculation method and system

A technology of sentence similarity and calculation method, which can be used in computing, special data processing applications, instruments, etc., and can solve the problems of low similarity accuracy and large workload.

Active Publication Date: 2016-10-12
TCL CORPORATION
View PDF3 Cites 49 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] In view of the deficiencies in the above-mentioned prior art, the purpose of the present invention is to provide users with a method and system for calculat...

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Sentence similarity calculation method and system
  • Sentence similarity calculation method and system
  • Sentence similarity calculation method and system

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0047] In order to make the object, technical solution and advantages of the present invention more clear and definite, the present invention will be further described in detail below with reference to the accompanying drawings and examples. It should be understood that the specific embodiments described here are only used to explain the present invention, not to limit the present invention.

[0048] The invention provides a calculation method of sentence similarity, such as figure 1As shown, the method includes:

[0049] S1. Use the word2vec algorithm to train the pre-established corpus to obtain vectors of all words in the corpus.

[0050] Corpus Training

[0051] Word2vec training obtains word vectors, and the larger the training corpus, the more accurate the word vectors obtained; the corpus acquisition in this step can be obtained by crawling relevant news information from the Internet as a training corpus.

[0052] It is conceivable that those skilled in the art can s...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides a sentence similarity calculation method and system. By using a word2vec algorithm, a pre-built corpus is trained to obtain vectors of all words in the corpus; performing intelligent word segmentation on two sentences to be subjected to similarity calculation; finding vectors corresponding to each segmented word in the first sentence and the second sentence from the corpus; sequentially calculating the similarity between each segmented word of the first sentence and each segmented word of the second sentence; obtaining two groups of segmented word sets with the segmented word similarity exceeding a preset threshold; calculating the similarity contribution value of each group of segmented words in the whole sentence according to the deviation quantity of each group of segmented words in the position of the sentence; and adding the contribution values of the segmented words in the two sentences to obtain the similarity between the sentences. The method and the system provided by the invention have the advantages that the semantic similarity of words is calculated by word2vec; and through mass corpus automatic training, convenience is provided for accurate information retrieval, file classification or system answering.

Description

technical field [0001] The invention relates to the field of language information processing, in particular to a method and system for calculating sentence similarity. Background technique [0002] Text similarity calculation is an important part in the field of natural language processing, and plays an important role in information retrieval, document classification, question answering system, etc. According to the text length, the text similarity can be divided into long text (chapter level) and short text (sentence level, word level) similarity calculation. Texts of different lengths make each calculation method have advantages and disadvantages. For sentence-level similarity, not only the meaning of each word in the sentence must be considered, but also the order of word combinations, which makes this type of research more complicated. [0003] The traditional method of calculating sentence similarity is mainly to vectorize sentences, and to form a weight vector by the...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F17/27
CPCG06F40/205
Inventor 吴成龙
Owner TCL CORPORATION
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products