Text message extracting method and system

A text information and text technology, applied in the information field, can solve problems such as inaccurate microblog summaries

Inactive Publication Date: 2015-03-11
NAT UNIV OF DEFENSE TECH
View PDF3 Cites 17 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] The technical problem to be solved in this application is to provide a method and system for extracting text information, which solves the problem that the microblog abstract extracted in the prior art is not accurate enough

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Text message extracting method and system
  • Text message extracting method and system
  • Text message extracting method and system

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0150] Corresponding to the method provided in Example 1 of a method for extracting text information in the present application, see Figure 6 , the present application also provides Embodiment 1 of a system for extracting text information. In this embodiment, the system includes:

[0151] The first determining unit 601 is configured to determine a target object.

[0152] The preprocessing unit 602 is configured to preprocess the target object.

[0153] The first construction unit 603 is configured to construct a latent semantic analysis LSA according to the preprocessing result, and digitize the target object.

[0154] The clustering unit 604 is configured to use a k-means clustering algorithm to cluster the digitized target objects to obtain at least one cluster.

[0155] The first extraction unit 605 is configured to perform information extraction on information in each of the clusters by using an algorithm based on LSA, and combine the extracted information together.

Embodiment 2

[0156] see Figure 7 , the present application also provides Embodiment 2 of a system for extracting text information. In this embodiment, the preprocessing unit 602 includes:

[0157] The word segmentation unit 701 is configured to use a preset word segmentation tool to segment the target object.

[0158] The removing unit 702 is configured to remove the disabled word when judging whether the segmented word is disabled.

[0159] The second determining unit 703 is configured to determine that the word is a feature word when it is judged that the frequency of occurrence of the word exceeds a preset threshold.

Embodiment 3

[0160] see Figure 8 , the present application also provides a system embodiment 3 for extracting text information. In this embodiment, the first construction unit 603 includes:

[0161] The second construction unit 801 is configured to construct a feature word-text matrix according to the preprocessing result.

[0162] The decomposition unit 802 is configured to perform singular value decomposition processing on the matrix by using a preset method to obtain the hidden semantic space.

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides a text message extracting method. The method comprises the steps of determining a target object; preprocessing the target object; constructing LSA (latent semantic analysis) according to the preprocessing result, digitizing the target object; clustering the digitalized target object by the k-means clustering algorithm to obtain at least one clustering cluster; extracting the message of each clustering cluster by LSA-based algorithm; combining the extracted message, so as to accurately extract the summary of a microblog. The invention also provides a text message extracting system which can accurately extract the summary of the microblog.

Description

technical field [0001] This application relates to the field of information, in particular to a method and system for extracting text information. Background technique [0002] With the development of technology, people pay more and more attention to the extraction method of microblog information. [0003] Most of the existing methods for summarizing microblog information are based on vector space model (VSM) to extract microblog summaries from the representation method of microblog text, and the summaries extracted by this method are not accurate enough. [0004] Therefore, how to accurately extract microblog summaries is a technical problem to be solved by those skilled in the art. Contents of the invention [0005] The technical problem to be solved in this application is to provide a method and system for extracting text information, which solves the problem of inaccurate microblog abstracts extracted in the prior art. [0006] The specific plan is as follows: [00...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/27G06F17/30
Inventor 杨树强束阳雪黄鸿杰金松昌陈志坤尹洪薛竹君蒋千越贾焰周斌韩伟红李爱平
Owner NAT UNIV OF DEFENSE TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products