Text message extracting method and system
A text information and text technology, applied in the information field, can solve problems such as inaccurate microblog summaries
- Summary
- Abstract
- Description
- Claims
- Application Information
AI Technical Summary
Problems solved by technology
Method used
Image
Examples
Embodiment 1
[0150] Corresponding to the method provided in Example 1 of a method for extracting text information in the present application, see Figure 6 , the present application also provides Embodiment 1 of a system for extracting text information. In this embodiment, the system includes:
[0151] The first determining unit 601 is configured to determine a target object.
[0152] The preprocessing unit 602 is configured to preprocess the target object.
[0153] The first construction unit 603 is configured to construct a latent semantic analysis LSA according to the preprocessing result, and digitize the target object.
[0154] The clustering unit 604 is configured to use a k-means clustering algorithm to cluster the digitized target objects to obtain at least one cluster.
[0155] The first extraction unit 605 is configured to perform information extraction on information in each of the clusters by using an algorithm based on LSA, and combine the extracted information together.
Embodiment 2
[0156] see Figure 7 , the present application also provides Embodiment 2 of a system for extracting text information. In this embodiment, the preprocessing unit 602 includes:
[0157] The word segmentation unit 701 is configured to use a preset word segmentation tool to segment the target object.
[0158] The removing unit 702 is configured to remove the disabled word when judging whether the segmented word is disabled.
[0159] The second determining unit 703 is configured to determine that the word is a feature word when it is judged that the frequency of occurrence of the word exceeds a preset threshold.
Embodiment 3
[0160] see Figure 8 , the present application also provides a system embodiment 3 for extracting text information. In this embodiment, the first construction unit 603 includes:
[0161] The second construction unit 801 is configured to construct a feature word-text matrix according to the preprocessing result.
[0162] The decomposition unit 802 is configured to perform singular value decomposition processing on the matrix by using a preset method to obtain the hidden semantic space.
PUM
Abstract
Description
Claims
Application Information
- R&D Engineer
- R&D Manager
- IP Professional
- Industry Leading Data Capabilities
- Powerful AI technology
- Patent DNA Extraction
Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.
© 2024 PatSnap. All rights reserved.Legal|Privacy policy|Modern Slavery Act Transparency Statement|Sitemap|About US| Contact US: help@patsnap.com