Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Semantic similarity calculation method and device based on CTW and KM algorithms

A technology of semantic similarity and KM algorithm, applied in the computer field, can solve the problem of low accuracy

Active Publication Date: 2019-06-07
HUBEI UNIV OF TECH
View PDF5 Cites 6 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0006] In view of this, the present invention provides a method and device for computing semantic similarity based on CTW and KM algorithms to solve or at least partially solve the technical problem of low accuracy in the existing methods

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Semantic similarity calculation method and device based on CTW and KM algorithms
  • Semantic similarity calculation method and device based on CTW and KM algorithms
  • Semantic similarity calculation method and device based on CTW and KM algorithms

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0073] This embodiment provides a method for computing semantic similarity based on CTW and KM algorithms, please refer to figure 1 , the method includes:

[0074] First, step S1 is performed: select the preset corpus, and train through the method of preset word vector combined with neural network learning to obtain a word vector space, wherein each word vector in the word vector space is used to represent the semantic information of the word segment.

[0075] Specifically, the Word2Vec deep learning platform can be used to train the preset corpus to obtain word vectors, and finally obtain word vector data with 200-dimensional features to form a word segmentation vector library (word vector space).

[0076] Word2Vec comes from the word vector computing model developed by Google, which uses the idea of ​​deep learning to automatically learn the essential information of word data from large-scale text data. Deep-Learning (deep learning) learns more useful features in the data b...

Embodiment 2

[0185] This embodiment provides a device for computing semantic similarity based on CTW and KM algorithms, please refer to Figure 4 , the device consists of:

[0186] The word vector space obtaining module 401 is used to select a preset corpus, and train through preset word vectors combined with neural network learning to obtain a word vector space, wherein each word vector in the word vector space is used to represent the word segmentation semantic information;

[0187] The word component array building module 402 is used to carry out word segmentation between the text to be compared and the source text, and then according to the word vector space, respectively establishes a word component array corresponding to the text to be compared and the source text;

[0188] CTW distance calculation module 403, for calculating the CTW distance of each participle in the text to be compared and each participle in the source text in turn;

[0189] CTW matrix construction module 404, fo...

Embodiment 3

[0220] Based on the same inventive concept, the present application also provides a computer-readable storage medium 400, please refer to Figure 5 , on which a computer program 411 is stored, and the method in Embodiment 1 is implemented when the program is executed.

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention provides a semantic similarity calculation method and device based on CTW and KM algorithms, and aims to overcome the defect that in the semantic similarity calculation method in the prior art, the important influence of a word segmentation sequence on semantics is not considered, and the influence of the sequence on sentences is considered while a single semantic judgment rule is kept. The method comprises: using a Word2Vec deep learning platform for dividing a text into word segmentation vectors of a multi-dimensional space; obtaining a plurality of text similarity values, mapping the text similarity values to a multi-dimensional vector space, connecting vectors to form a curve in the multi-dimensional space, comparing the similarity values of a plurality of texts through aword vector curve by means of a relatively new time warping distance in the curve similarity values in an image, and adopting a KM algorithm in order to reduce the calculation scale. Compared with traditional longest common substrings, word frequency statistics and other methods, the method has higher robustness, has an obvious effect on sentences with the same word segmentation word order and different word orders which cannot be overcome by the traditional method, and improves the calculation accuracy.

Description

technical field [0001] The invention relates to the field of computer technology, in particular to a method and device for computing semantic similarity based on CTW and KM algorithms. Background technique [0002] With the deepening of artificial intelligence technology, research in the field of natural language processing has become more and more important. As a basic and core problem in the field of natural language processing, the calculation of similarity has been widely used in many fields of artificial intelligence. For example, in machine translation, speech recognition, text emotion recognition, automatic composition, etc., similarity models are required. Measure the degree of substitution of words in text or calculate the degree of matching between questions and answers. Similarity calculation has also become a research topic that has attracted the attention of many natural language processing researchers. [0003] At present, with the introduction of the concept...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/27G06F17/22
Inventor 李军钮焱刘宇强李星童坤
Owner HUBEI UNIV OF TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products