Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Method for calculating short text semantic similarity

A technology of semantic similarity and text similarity, applied in the field of computing semantic similarity of short texts, can solve problems such as confusion, ambiguous word order, and less word segmentation information

Inactive Publication Date: 2017-06-13
广州索答信息科技有限公司
View PDF0 Cites 16 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] In view of this, the present invention provides a method for calculating the semantic similarity of short texts to solve the problems of less word segmentation information, ambiguous words and chaotic word order in the prior art

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method for calculating short text semantic similarity
  • Method for calculating short text semantic similarity
  • Method for calculating short text semantic similarity

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0017] Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, the same numerals in different drawings refer to the same or similar elements unless otherwise indicated. The implementations described in the following exemplary examples do not represent all implementations consistent with the present invention. Rather, they are merely examples of apparatuses and methods consistent with aspects of the invention as recited in the appended claims.

[0018] The terminology used in the present invention is for the purpose of describing particular embodiments only and is not intended to limit the invention. As used herein and in the appended claims, the singular forms "a", "the", and "the" are intended to include the plural forms as well, unless the context clearly dictates otherwise. It should also be understood that the term "and / or" as use...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention provides a method for calculating short text semantic similarity. The calculating method comprises the steps of segmenting to-be-calculated short text; extending the segmented words through a continuous bag of words; eliminating ambiguities of the extended words through machine translation; calculating importance of the words having undergone ambiguity elimination, and weighting a word order of the words; calculating a semantic distance of the words having undergone word order weighting, and calculating text similarity according to the semantic distance. By implementation of the method, the similarity of the short text can be calculated quickly and accurately.

Description

technical field [0001] The invention relates to the field of electrical data processing, in particular to a method for calculating the semantic similarity of short texts. Background technique [0002] Text similarity calculation is mainly to study and calculate the similarity between multiple texts, and it has a wide range of applications in many fields such as question answering systems and copyright detection. Common classification, clustering and other machine learning algorithms also involve the comparison of similarity between texts. There are many methods for calculating text similarity. The traditional method is based on the vector space model, and there are also improved similarity calculation methods based on semantics. [0003] No matter which calculation method is used, the following problems exist for short texts: (1) The content of short texts is usually relatively short, and after word segmentation, there is less information that can be used for similarity cal...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F17/27
CPCG06F40/284G06F40/30
Inventor 石忠民徐叶强林嘉亮唐海涛
Owner 广州索答信息科技有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products