Eureka AIR delivers breakthrough ideas for toughest innovation challenges, trusted by R&D personnel around the world.

Mixed multi-feature sentence similarity calculation method and system, and storage medium

A technology of sentence similarity and calculation method, applied in calculation, computer parts, special data processing applications, etc., can solve problems such as difficult to reflect the importance of words, missing the contribution of sentence meaning, etc.

Inactive Publication Date: 2020-01-17
CHONGQING UNIV OF POSTS & TELECOMM
View PDF2 Cites 7 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

For example, the TF-IDF method extracts keywords through the IDF algorithm, but the simple structure of the algorithm is difficult to reflect the importance of words; there are also some documents that measure the similarity of sentences by extracting common keywords, but this method misses Contribution of words other than key words to sentence meaning

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Mixed multi-feature sentence similarity calculation method and system, and storage medium
  • Mixed multi-feature sentence similarity calculation method and system, and storage medium
  • Mixed multi-feature sentence similarity calculation method and system, and storage medium

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0052] The technical solutions in the embodiments of the present invention will be described clearly and in detail below with reference to the drawings in the embodiments of the present invention. The described embodiments are only some of the embodiments of the invention.

[0053] The technical scheme that the present invention solves the problems of the technologies described above is:

[0054] like figure 1 , figure 2 Shown, the present invention is a kind of mixed multi-feature sentence similarity calculation method, storage medium and system, comprises the following steps:

[0055] Step (1), obtain the test set and training set for sentence similarity calculation, and obtain the word vectors corresponding to each word in the test set and training set through the word vector model, further including:

[0056] In this embodiment, the natural language corpus can be trained with a word vector tool (for example: Word2Vec and other tools) to obtain the vector corresponding ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention requests to protect a mixed multi-feature sentence similarity calculation method and system, and a storage medium. The mixed multi-feature sentence similarity calculation method comprises the following steps: obtaining a test set and a training set for sentence similarity calculation, and obtaining a word vector corresponding to each word through a word vector model; calculating sentence word vector similarity through a computer by utilizing weighted sum to remove non-information noise from the word vectors based on a smooth inverse frequency algorithm; based on a word dependencytriple structure, respectively calculating the similarity between the test sentence and the sentence dependency syntax with the top ten screened similarities; and based on the sentence mixing similarity calculated by the two obtained sentence vectors, adjusting an optimization coefficient beta by adopting a P@N and MRR (mean sorting reciprocal) parameter determination method to obtain a sentencewith the maximum sentence similarity with the sentence in the training set. According to the mixed multi-feature sentence similarity calculation method, the characteristics of keywords, word vectors,syntactic structures and the like in sentences are considered, so that the deep meanings of the sentences are expressed more accurately, and the similarity of sentence contents is judged correctly.

Description

technical field [0001] The invention belongs to the technical field of natural language processing, and in particular relates to a sentence similarity calculation method. Background technique [0002] Sentence similarity calculation is based on the computer being endowed with rich meaning vocabulary, and constructs a sentence similarity calculation model through the features between sentences, so that the computer can quickly match the most similar sentences in the system. Sentence similarity calculation has a wide range of applications in various fields of natural language processing. For example, in the automatic question answering system, how to search the frequently asked questions database, how to find the corresponding answer in the knowledge base according to the user's question, and solve it by calculating the similarity between the sentence in question and the corresponding sentence in the knowledge base. In the information filtering technology, through the calcula...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06K9/62G06F16/35G06F40/211
CPCG06F16/35G06F18/22G06F18/2411
Inventor 刘继明谭云丹袁野万晓榆于敏敏
Owner CHONGQING UNIV OF POSTS & TELECOMM
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Eureka Blog
Learn More
PatSnap group products