Problem similarity calculation method based on a plurality of features

A similarity calculation and similarity technology, applied in the direction of calculation, special data processing application, natural language data processing, etc., can solve the impact of similarity calculation accuracy completeness and correctness, similarity accuracy is not high, not considered To the text to express the meaning of multi-faceted and multi-level information and other issues

Active Publication Date: 2019-02-15
JINAN UNIVERSITY
View PDF7 Cites 30 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, this method relies too much on the semantic dictionary, and the accuracy of similarity calculation is affected by the completeness and correctness of the semantic dictionary; similarly, the similarity calculation method based on semantics is less effective in dealing with long sentence patterns with complex syntax
[0010] At the same time, most of the methods in the prior art are based on a single type of information to extract text representation features, focusing on a single type of feature, without considering that the meaning of text representation is composed of multi-faceted and multi-level information, so the accuracy of calculating similarity is not high. high

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Problem similarity calculation method based on a plurality of features
  • Problem similarity calculation method based on a plurality of features
  • Problem similarity calculation method based on a plurality of features

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0068] The present invention will be further described in detail below in conjunction with the embodiments and the accompanying drawings, but the embodiments of the present invention are not limited thereto.

[0069] A method for calculating the similarity of questions based on multiple features in this embodiment uses five features to measure the similarity between two question sentences, which are character features, word semantic features, sentence semantic features, sentence implicit topic features, Answer Semantic Features. The similarity based on these five features is integrated into the final similarity between the new question and the historical question. Refer below figure 1 , each step of the method is described in detail by combining an example.

[0070] (1) Enter a new question sentence Q new : Where I can buy good oil for massage?

[0071] (2) Read a historical question sentence Q rel : is there any place i can find scented massage oils in qatar?

[0072] S...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a problem similarity calculation method based on a plurality of features, includes steps: For the input new question sentence, Compared with the stored historical questions andcorresponding answers, the similarities between the new questions and the historical questions are calculated based on character features, semantic features of words, semantic features of sentences,implied topic features of sentences and semantic features of answers. The final similarity is the product of the above five similarities and their corresponding weights, which are trained by linear regression method. The invention adopts a plurality of features to increase the diversity of sample attributes, and improves the generalization ability of the model. At that same time, the soft cosine distance is utilized to convert the TF-IDF is fused with editing distance, word semantics and other information, which overcomes the semantic gap between words and improves the accuracy of similarity calculation.

Description

technical field [0001] The invention relates to the research field of computer natural language processing and automatic question answering system, in particular to a method for calculating question similarity based on multiple features. Background technique [0002] With the rapid increase of digital information, it is more difficult for people to obtain the information resources they need from the Internet. How to accurately and quickly find the required information for users in the massive digital information has brought severe challenges to natural language processing (NLP) technology and information retrieval technology. Therefore, in order to provide users with real-time and high-precision information acquisition channels, research institutions and related technology companies have begun to study automatic question answering systems (QA). In the automatic question answering system, users only need to enter a question to get the corresponding answer directly. It is no ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F16/332G06F17/27
CPCG06F40/211G06F40/30
Inventor 刘波彭永幸
Owner JINAN UNIVERSITY
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products