Problem similarity calculation method based on subjects and focuses of problems

A similarity calculation and similarity technology, applied in computing, instruments, electronic digital data processing and other directions, can solve the problem of bad word similarity calculation, cannot well reflect text semantic similarity, semantic dictionary cannot contain words, etc. question

Inactive Publication Date: 2015-09-09
ZHEJIANG UNIV
View PDF3 Cites 17 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

But there are also two obvious disadvantages: the semantic dictionary cannot contain all the words; some words have multiple meanings, which makes it difficult to choose which meaning to calculate the word similarity
However, these methods are based on the statistical properties of the text, which cannot reflect the semantic similarity of the text very well.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Problem similarity calculation method based on subjects and focuses of problems
  • Problem similarity calculation method based on subjects and focuses of problems
  • Problem similarity calculation method based on subjects and focuses of problems

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0068] Such as figure 1 Shown, the inventive method comprises the following steps:

[0069] 1) Preprocessing frequently asked question set data: use natural language processing tools to segment the question set data, remove invalid words, and record the category to which each question belongs;

[0070] The natural language processing tools in the step 1) are tools such as fudanNLP, Harbin Institute of Technology Language Cloud Platform LTP, stammering word segmentation and the like. Use these tools to segment the FAQ data, remove invalid words, construct a word vector space, and record the category to which each question belongs.

[0071] 2) Divide the topic and focus structure of the question:

[0072] Such as figure 2 As shown, the word space is constructed according to the word segmentation results, and the specificity score of each word is calculated, and the words are reordered according to the specificity score of the words contained in the question to form the topic...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The present invention discloses a problem similarity calculation method based on subjects and focuses of problems Basic preprocessing, such as word segmentation and the like, is carried out on problem data by using a tokenizer, and based on the basic preprocessing, a tree tailor model based on the minimum description length divides each problem into a problem subject and a problem focus; with respect to subject structures and focus structures of two problems, a language model and a language model based on translation are respectively used to calculate a similarity score, and a joint similarity is obtained by means of weighted summation; and a subject similarity between the two problems is calculated by using a method based on a BTM subject model, and two similarities are finally subjected to weighted summation to obtain the final problem similarity. According to the present invention, architectural features and subject information of the problems are introduced into the problem similarity calculation, the information of the problems is more sufficiently used, and by introducing the subject information of the problems besides word statistics information into the problem similarity calculation, accuracy of the problem similarity calculation is improved.

Description

technical field [0001] The invention relates to a method for calculating similarity of problems, in particular to a method for calculating similarity of problems based on problem topics and focus. Background technique [0002] With the rapid development of the Internet, the ways for people to obtain information and knowledge are becoming more and more diverse, and the question answering system based on Frequently Asked Questions (FAQ) is one of the effective ways. The research on question similarity calculation is of great significance to question answering systems based on frequently asked questions, and the accuracy of question similarity calculation also plays an important role in the performance of question answering systems. So how to improve the accuracy of problem similarity calculation has naturally become a hot spot of current research. [0003] At present, the calculation of problem similarity is mainly divided into four methods: the method based on word statistic...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/27
Inventor 鲁伟明余瑶吴江琴庄越挺
Owner ZHEJIANG UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products