A Chinese word segmentation method and device
A technology of Chinese word segmentation and Chinese characters, which is applied in the fields of instruments, computing, and electrical digital data processing, etc., can solve the problems of low efficiency and insufficient accuracy, and achieve the effect of small amount of calculation, high accuracy, and unsupervised extraction of candidate words
- Summary
- Abstract
- Description
- Claims
- Application Information
AI Technical Summary
Problems solved by technology
Method used
Image
Examples
Embodiment 1
[0043] image 3 It is a schematic flowchart of a Chinese word segmentation method provided by Embodiment 1 of the present invention, and the method can be executed by a Chinese word segmentation device. Such as image 3 As shown, the method includes:
[0044] Step 301. Divide the text set into multiple short sentences, and number the multiple short sentences.
[0045] Wherein, the text set includes at least one text.
[0046] Exemplarily, the device for executing the method of this embodiment may be realized by software and / or hardware, and may be integrated into a server for providing services such as word segmentation or retrieval.
[0047] In this embodiment, the text set can be divided into n short sentences, and the short sentences can be numbered as 1, 2, . . . n in sequence.
[0048] Preferably, the text set can be divided into multiple short sentences according to Chinese punctuation marks, and the multiple short sentences are numbered.
[0049] Preferably, when t...
Embodiment 2
[0061] Figure 4 It is a schematic flow chart of a Chinese word segmentation method provided by Embodiment 2 of the present invention. This embodiment is optimized based on the above embodiments. In this embodiment, for each Chinese character in the text set, the current Chinese character corresponding to Before the first short sentence number list, add a step: determine the short sentence number list and adjacent character set corresponding to all the different Chinese characters in the text set. The advantage of this is that when each Chinese character is processed, the short sentence number list and adjacent character set corresponding to the current Chinese character can be directly obtained from all the determined short sentence number lists and adjacent character sets, and directly obtained The short sentence number list corresponding to the adjacent Chinese characters improves the processing speed.
[0062] Further, this embodiment also optimizes the calculation proces...
Embodiment 3
[0094] Figure 5 A structural block diagram of a Chinese word segmentation device provided by Embodiment 3 of the present invention, the device can be implemented by software and / or hardware, and can perform word segmentation processing on Chinese text by executing the Chinese word segmentation method of the embodiment of the present invention. Typically, the device can be integrated into a server for providing services such as word segmentation or retrieval. Such as Figure 5 As shown, the device includes a text set segmentation module 501, a first short sentence number list acquisition module 502, a second short sentence number list acquisition module 503, a co-occurrence degree calculation module 504, an adjacent character set acquisition module 505, and an adjacent correlation degree Calculation module 506, candidate word set adding module 507 and word segmentation module 508.
[0095] Wherein, the text set segmentation module 501 is used to divide the text set into mult...
PUM
Abstract
Description
Claims
Application Information
- R&D Engineer
- R&D Manager
- IP Professional
- Industry Leading Data Capabilities
- Powerful AI technology
- Patent DNA Extraction
Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.
© 2024 PatSnap. All rights reserved.Legal|Privacy policy|Modern Slavery Act Transparency Statement|Sitemap|About US| Contact US: help@patsnap.com