Computer-assisted computing method of semantic distance between short texts

A computer-aided, semantic distance technology, applied in computing, special data processing applications, instruments, etc., can solve problems such as only considering the structure and ignoring the meaning of words

Active Publication Date: 2012-08-01
BEIJING UNIV OF TECH
View PDF4 Cites 37 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

The calculation method based on unit semantics only considers the words of the text and ignores its organizational structure, while the method based on edit distance only considers the structure, while ignoring the meaning of words, and there are large errors in the calculation of texts of different lengths

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Computer-assisted computing method of semantic distance between short texts
  • Computer-assisted computing method of semantic distance between short texts
  • Computer-assisted computing method of semantic distance between short texts

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0062] Syntactic structure refers to the relationship between words in the text; unit semantics refers to the smallest semantic unit in the text, that is, the semantics of words.

[0063] The present invention comprises the following steps:

[0064] First, text preprocessing is performed, and the purpose of text preprocessing is to standardize the text data format. For the online comments extracted directly from the Internet, the text contains a large number of web page tags, and there are many short text content with variations, these noises have a great impact on the text distance calculation results. The present invention synthesizes commonly used data preprocessing operations to form a text preprocessing module. Commonly used preprocessing operations include removing web page tags, mutating short text processing, and text segmentation. By removing webpage tags and mutating short text processing operations on the online comment text, the online comment text is norma...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

A computer-assisted computing method of the semantic distance between short texts belongs to the technical field of Chinese written message treatment and is characterized in that the semantic distance between two short texts is defined as the sum of the syntactic structure distance and unit semantic distance for computation. Webpage mark removing, variation short text treatment and participle treatment are conducted on the texts to obtain a series of word strings, semantic alignment is conducted on corresponding word strings in the two short texts according to a word similarity array, the syntactic structure distance is obtained according to the word adjustment times in the process, the five-grade structure in words in the <extended synonym thesaurus>, simultaneously Chinese key words and near-synonym concept are introduced, so that 5 kinds of operations including insertion, deletion, replacement and the like are conducted on the words on the basis of semantic alignment with the words as unit, and weight of the sum of various operations after weight is added is used for showing unit semantic distance between the word strings. The relative accuracy of the semantic distance between the texts is higher than that of classical compile distance algorithm.

Description

technical field [0001] The invention relates to a novel short text semantic distance calculation method and system, belonging to the field of text information processing. Background technique [0002] At present, with the rise of independent media and the development of participatory media environment, the content and mode of network communication have caused social changes. The production of information has become a model centered on netizens. Netizens not only have the ability to produce and publish information, but also have convenient conditions for interacting with information users and readers, making information not only "readable" but also "writable". and "Interactive". Therefore, by analyzing the information on the Internet, especially the content published by users, we can understand the hot topics in the current society and people's views and positions on various social phenomena. [0003] Online comments usually start with a public event or hot topic, and the c...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/27
Inventor 杨震王来涛赖英旭高凯明张龙伯段立娟范科峰
Owner BEIJING UNIV OF TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products