Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Short-text similarity degree calculation method based on multi-feature fusion

A similarity calculation and multi-feature fusion technology, applied in computing, computer components, special data processing applications, etc., can solve the problems of amplifying noise features and lack of deep integration

Inactive Publication Date: 2017-10-20
WUHAN UNIV OF TECH
View PDF2 Cites 16 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

In addition, in terms of word frequency feature combination, most of the current research is combined in the form of feature pool or two-dimensional feature space, which lacks deep integration; in terms of semantic dimension feature extraction, the current research direction is usually directly in the original short text collection. Applying BTM on the original short text set directly uses the rich word information of the original short text to extract features, which may amplify the adverse effects of noise features

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Short-text similarity degree calculation method based on multi-feature fusion
  • Short-text similarity degree calculation method based on multi-feature fusion
  • Short-text similarity degree calculation method based on multi-feature fusion

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0072] In order to make the object, technical solution and advantages of the present invention more clear, the present invention will be further described in detail below in conjunction with the examples. It should be understood that the specific embodiments described here are only used to explain the present invention, not to limit the present invention.

[0073] Such as figure 1 as shown, figure 1 It is the model structure diagram of HSBM (HTI-Skip_gram-BTM fusion Model), where the parameters are described as follows:

[0074] In process (I), the rounded rectangle (such as "HTI") represents the feature extraction method or model, the hexagon represents the short text collection; the circle represents the weight matrix: W is the HTI weight matrix obtained by the HTI method, and X is the HTI weight matrix obtained by Skip_gram The feature word vector set obtained by training the model, W' is the short text-feature word fusion weight matrix obtained by normalizing the HTI wei...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a short-text similarity degree calculation method based on multi-feature fusion. The method comprises the following steps: designing an HTI (Hybrid TF-IDF) method to extract word frequency features of short text; utilizing an existing word2vec Skip_gram training model to extract grammar features of the short text; designing an HSBM (HTI-Skip_gram-BTM fusion Model) to carry out organic fusion on the word frequency and grammar features on a semantic dimension; and designing an MFSM (Multi-Feature based Similarity-calculation Model) to vectorize a fusion result by calculation, and calculate similarity degrees between the short text. The method extracts the features of the short text from multiple dimensions, and thus can effectively improve the precision of short-text similarity degree calculation.

Description

technical field [0001] The invention relates to natural language processing technology, in particular to a short text similarity calculation method based on multi-feature fusion. Background technique [0002] The vector space model (VSM) converts the feature terms in the short text into a digital form that can be recognized by the computer, and reflects the importance of the feature terms in the short text to a certain extent. [0003] Feature extraction based on word frequency refers to the process of selecting a set of feature terms that can best reflect the characteristics of short texts according to a specific feature evaluation function in the original set of terms. Term frequency-inverse document frequency (TF-IDF) and mutual information (MI) are two commonly used word frequency feature extraction methods. The concept of information entropy (IE) comes from statistical thermodynamics and is used to measure the degree of chaos of the system. It is not directly used for ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06K9/62G06F17/27
CPCG06F40/216G06F18/22G06F18/253
Inventor 高曙周润王讷龚磊
Owner WUHAN UNIV OF TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products