Technology and system for automatically recognizing Chinese new words in single-word-string mode and affix mode
A technology of automatic identification and new words, applied in the fields of electrical digital data processing, instruments, calculations, etc., can solve the problems of data sparseness, difficulty, and low extraction accuracy, and achieve the effect of improving the accuracy rate
- Summary
- Abstract
- Description
- Claims
- Application Information
AI Technical Summary
Problems solved by technology
Method used
Image
Examples
Embodiment Construction
[0014] Now describe technical scheme of the present invention in detail in conjunction with accompanying drawing:
[0015] figure 1 A flowchart showing a method for automatically recognizing new Chinese words according to a specific embodiment of the present invention.
[0016] First, step S101 is executed to segment large-scale short texts. The present invention uses short texts as the corpus for new word recognition. In this embodiment, since a webpage news is analyzed, news titles on the webpage are captured, and the captured news titles are word-segmented using ICTCLAS.
[0017] Next, step S102 is executed to store the news headlines and word-segmented news headlines in the local database. Those skilled in the art understand that specifically, after the above step S102 is performed, that is, after performing Chinese word segmentation on large-scale short texts, the word segmentation fragments are first stored in the database through physical disk storage means. In the ...
PUM
Abstract
Description
Claims
Application Information
- R&D Engineer
- R&D Manager
- IP Professional
- Industry Leading Data Capabilities
- Powerful AI technology
- Patent DNA Extraction
Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.
© 2024 PatSnap. All rights reserved.Legal|Privacy policy|Modern Slavery Act Transparency Statement|Sitemap|About US| Contact US: help@patsnap.com