Short text similarity computing method based on searched result quantity

A similarity calculation and short text technology, which is applied in computing, electrical digital data processing, special data processing applications, etc., can solve problems such as irregular terms and insufficient features

Inactive Publication Date: 2012-07-11
WUHAN UNIV OF TECH
View PDF2 Cites 11 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] The purpose of the present invention is to provide a short text similarity calculation method based on the number of retrieval results, which can overcome the shortcomings of insufficient sample features and irregular terms in short texts, and improve the accuracy of similarity calculations through semantic analysis

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Short text similarity computing method based on searched result quantity
  • Short text similarity computing method based on searched result quantity
  • Short text similarity computing method based on searched result quantity

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0021] Embodiments of the present invention are now described with reference to the accompanying drawings, as figure 1 , the present embodiment takes two short texts S1 and S2 as an example to illustrate the short text similarity calculation method based on the number of retrieval results, including the following steps:

[0022] Step S1, preprocessing short texts with a length less than or equal to 200 characters, the specific steps are

[0023] Step S1-1, using a common stop words list (stop words list) to filter the short text, the common stop words are modal particles, adverbs, prepositions and conjunctions;

[0024] Step S1-2, filtering the endings of word segmentation transformation forms of each word forming the short text, extracting word stems, and calculating the word frequency of the word stems.

[0025] In step S2, a single short text and a pairwise combination of short texts are respectively submitted as search query words to a large-scale corpus, and the corpus u...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a short text similarity computing method based on the searched result quantity, which includes the steps: 1 preprocessing short texts; 2 utilizing single short texts and combinations of each two short texts as search words and respectively submitting the search words to a large-scale corpus; and 3 computing similarity between each two short texts by the aid of the returned searched result quantity. The computing method does not depend on traditional text processing and is capable of quickly and effectively obtaining computed results. The short texts are utilized as the search words, and the large-scale corpus returns searched results including the short texts. Content of the searched results includes text interpretations on the short texts, and the searched result quantity can be regarded as a compressor and implies semantic interpretations of the short texts in the corpus.

Description

technical field [0001] The present invention designs short text similarity calculation, specifically refers to a short text similarity calculation method based on the number of retrieval results, and belongs to the field of text mining. Background technique [0002] Short text (Short Text) refers to those short text forms, which have a wide range of extensions, and more and more communication platforms use short text more frequently, such as mobile phone short messages, instant messages, BBS titles, Weibo, online Chat records, blogs and news comments, etc. At present, the amount of short text data is increasing day by day, and the text mining of short text has broad application prospects in the fields of topic tracking and discovery, buzzword analysis, public opinion early warning, and image retrieval. [0003] However, due to the short text length of short text, its sample features are very sparse, which is not conducive to retrieval analysis. In addition, short texts are...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30
Inventor 李琳钟珞袁景凌夏红霞刘东飞
Owner WUHAN UNIV OF TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products