Short text characteristic expanding method based on semantic atlas

A technology of semantic map and extension method, which is applied in the field of short text feature extension, can solve problems such as data sparsity, achieve the effect of improving classification performance, solving sparsity problems and semantic sensitivity problems

Active Publication Date: 2015-03-04
INST OF AUTOMATION CHINESE ACAD OF SCI
View PDF4 Cites 49 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0006] Aiming at the above two main problems, the present invention proposes a short text feature extension method based on semantic maps, which solves the problem of data sparsity and semantic sensitivity in the representation of short text features by the traditional bag-of-words model, and finally improves short text features. Ben's classification performance

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Short text characteristic expanding method based on semantic atlas
  • Short text characteristic expanding method based on semantic atlas
  • Short text characteristic expanding method based on semantic atlas

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0042] In order to make the object, technical solution and advantages of the present invention clearer, the present invention will be described in further detail below in conjunction with specific embodiments and with reference to the accompanying drawings.

[0043] The present invention proposes a short text feature extension method based on a semantic map, specifically a short text feature extension method based on a topic-keyword semantic map and link analysis, which can fully mine the semantic relationship between topic words to a certain extent, It can quickly and accurately extract the information most relevant to the seed keyword, and complete the expansion of the feature representation of the target short text. The basic features of the present invention mainly include the following six aspects: First, it does not rely on external large-scale auxiliary training corpus, directly uses short text data sets for topic modeling, improves modeling efficiency, and ensures seman...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a short text characteristic expanding method based on a semantic atlas. The method includes the steps: performing subject modeling by the aid of a training data set of a short text, and extracting subject term distribution; reordering the subject term distribution; building a candidate keyword dictionary and a subject-keyword semantic atlas; calculating comprehensive similarity degree evaluation of candidate keywords and seed keywords based on a link analysis method, and selecting the most similar candidate keywords to finish expanding the short text. Compared with a short text characteristic representation method based on a language model, the method is simple to operate and high in execution efficiency, and semantic correlation information between the keywords is sufficiently used. Compared with a traditional short text characteristic representation method based on a word bag model, the problems of data sparseness and semantic sensitivity are effectively relieved, and the method is independent of external large-scale auxiliary training corpus or a search engine.

Description

technical field [0001] The present invention relates to the technical field of text mining, and is a short text feature extension method based on topic-keyword semantic map and link analysis, which can be applied to feature representation in short text classification and clustering tasks, and finally applied to knowledge question answering, Subfields such as user intent understanding and intelligent retrieval. Background technique [0002] With the advent of the era of big data, the Internet and various mobile terminals have generated a large amount of short text information, such as web page retrieval fragments, Weibo, product reviews, news headlines, and various micro-information, etc. Information is also being overwhelmed by the vast amount of resources. How to make the system intelligently manage and better use these massive data resources is facing a huge challenge. Therefore, a high-precision short text classification method can help the system to deepen the understa...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30
CPCG06F40/216G06F40/30G06F18/24
Inventor 徐博王鹏王方圆张恒郝红卫
Owner INST OF AUTOMATION CHINESE ACAD OF SCI
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products