Text word segmentation method and text word segmentation device
A word segmentation method and Chinese word segmentation technology, which can be used in instruments, digital data processing, computing, etc., and can solve the problems of high time cost and labeling a large number of long texts
- Summary
- Abstract
- Description
- Claims
- Application Information
AI Technical Summary
Problems solved by technology
Method used
Image
Examples
Embodiment 1
[0087] See figure 1 As shown, the flowchart of a method for word segmentation of a text provided in Embodiment 1 of the present application includes the following steps:
[0088] S101: Acquire the Chinese text to be processed.
[0089] In specific implementation, the Chinese text to be segmented can be obtained first.
[0090] It should be noted that in various scenarios of Chinese natural language processing, we usually need to use words as the smallest basic unit for research. However, Chinese is based on characters, and there is no space between words. Class signs indicate the boundaries of words, so word segmentation becomes the basic work of Chinese text processing. The quality of word segmentation plays an extremely critical role in the subsequent Chinese information processing.
[0091] S102: Divide the Chinese text into a plurality of Chinese short texts; wherein each of the Chinese short texts includes a plurality of consecutive Chinese characters representing a semantic.
[0...
Embodiment approach 1
[0096] Embodiment 1: Input multiple Chinese short texts into a pre-trained Chinese word segmentation model to obtain multiple Chinese short texts after word segmentation, and then stitch all the Chinese short texts after word segmentation to output the word segmentation Chinese text.
Embodiment approach 2
[0097] Embodiment 2: All short Chinese texts can be input in parallel into a pre-trained Chinese word segmentation model to output the Chinese text after word segmentation.
[0098] Here, the Chinese word segmentation model has been trained before the word segmentation of the Chinese text, and can be directly used for the word segmentation of the Chinese text. The Chinese word segmentation model can be a word segmentation model based on string matching, a word segmentation model based on understanding, a word segmentation model based on statistics, and so on.
[0099] In the embodiment of the present application, by acquiring the Chinese text to be processed, the Chinese text is divided into multiple Chinese short texts, where each Chinese short text includes multiple consecutive Chinese characters representing a semantic meaning, which can not only reduce the Chinese text The length of, can also filter out the interference of non-Chinese characters. Further, based on the segmented...
PUM
Abstract
Description
Claims
Application Information
- R&D Engineer
- R&D Manager
- IP Professional
- Industry Leading Data Capabilities
- Powerful AI technology
- Patent DNA Extraction
Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.
© 2024 PatSnap. All rights reserved.Legal|Privacy policy|Modern Slavery Act Transparency Statement|Sitemap|About US| Contact US: help@patsnap.com