Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

A short text similarity calculation method based on multi-feature fusion

A technology of similarity calculation and multi-feature fusion, applied in computing, computer parts, instruments, etc., can solve the problems of lack of deep integration and amplifying noise features.

Inactive Publication Date: 2020-04-21
WUHAN UNIV OF TECH
View PDF2 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

In addition, in terms of word frequency feature combination, most of the current research is combined in the form of feature pool or two-dimensional feature space, which lacks deep integration; in terms of semantic dimension feature extraction, the current research direction is usually directly in the original short text collection. Applying BTM on the original short text set directly uses the rich word information of the original short text to extract features, which may amplify the adverse effects of noise features

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A short text similarity calculation method based on multi-feature fusion
  • A short text similarity calculation method based on multi-feature fusion
  • A short text similarity calculation method based on multi-feature fusion

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0072] In order to make the object, technical solution and advantages of the present invention more clear, the present invention will be further described in detail below in conjunction with the examples. It should be understood that the specific embodiments described here are only used to explain the present invention, not to limit the present invention.

[0073] Such as figure 1 as shown, figure 1 It is the model structure diagram of HSBM (HTI-Skip_gram-BTM fusion Model), where the parameters are described as follows:

[0074] In process (I), the rounded rectangle (such as "HTI") represents the feature extraction method or model, the hexagon represents the short text collection; the circle represents the weight matrix: W is the HTI weight matrix obtained by the HTI method, and X is the HTI weight matrix obtained by Skip_gram The feature word vector set obtained by training the model, W' is the short text-feature word fusion weight matrix obtained by normalizing the HTI wei...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The present invention discloses a computing method based on multi -characteristic fusion, which includes the following steps: First, design the HTI method to extract the frequency characteristics of the short text.The grammar characteristics, then the design of the HSBM model organically integrates the word frequency and grammar symbol in semantic dimensions. Finally, the design of the MFSM model calculation will be vertically fused and the similarity between the short text is calculated.The present invention extracts the characteristics of short text from multiple dimensions, so it can effectively improve the calculation accuracy of short text similarity.

Description

technical field [0001] The invention relates to natural language processing technology, in particular to a short text similarity calculation method based on multi-feature fusion. Background technique [0002] The vector space model (VSM) converts the feature terms in the short text into a digital form that can be recognized by the computer, and reflects the importance of the feature terms in the short text to a certain extent. [0003] Feature extraction based on word frequency refers to the process of selecting a set of feature terms that can best reflect the characteristics of short texts according to a specific feature evaluation function in the original set of terms. Term frequency-inverse document frequency (TF-IDF) and mutual information (MI) are two commonly used word frequency feature extraction methods. The concept of information entropy (IE) comes from statistical thermodynamics and is used to measure the degree of chaos of the system. It is not directly used for ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Patents(China)
IPC IPC(8): G06K9/62G06F40/216
CPCG06F40/216G06F18/22G06F18/253
Inventor 高曙周润王讷龚磊
Owner WUHAN UNIV OF TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products