Rubbish article classification method based on distributed feature representation of text
A classification method and distributed technology, applied in text database clustering/classification, unstructured text data retrieval, special data processing applications, etc., can solve the problems of not considering word order, high misjudgment rate, etc.
- Summary
- Abstract
- Description
- Claims
- Application Information
AI Technical Summary
Problems solved by technology
Method used
Image
Examples
Embodiment Construction
[0019] The present invention will be further described below in conjunction with the accompanying drawings.
[0020] Collect manuscript text data sets (including junk manuscripts and valid manuscripts), mark the categories of manuscripts, such as junk manuscripts are recorded as class: y=-1, valid manuscripts are recorded as class: y=1, support vector machine training text classification based on the above categories Model.
[0021] Segment the manuscript text corpus. The word segmentation method used in this embodiment is a Chinese word segmentation algorithm based on the combination of dictionary reverse maximum matching algorithm and statistical word segmentation strategy.
[0022] Firstly, the text of the manuscript to be segmented is preprocessed, and the non-Chinese character information in the text is normalized. Separators (such as spaces "") can be used to replace non-Chinese character information such as punctuation and English letters in the manuscript text.
[00...
PUM
Abstract
Description
Claims
Application Information
- R&D Engineer
- R&D Manager
- IP Professional
- Industry Leading Data Capabilities
- Powerful AI technology
- Patent DNA Extraction
Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.
© 2024 PatSnap. All rights reserved.Legal|Privacy policy|Modern Slavery Act Transparency Statement|Sitemap|About US| Contact US: help@patsnap.com