A Word Segmentation Method for Historical Classics Based on Word Alignment
A word segmentation method and word alignment technology, which can be used in instrumentation, computing, electrical and digital data processing, etc., can solve problems such as ineffectiveness, and achieve the effect of improving accuracy.
- Summary
- Abstract
- Description
- Claims
- Application Information
AI Technical Summary
Problems solved by technology
Method used
Image
Examples
Embodiment 1
[0025] This embodiment uses Eclipse as the development platform and Java as the development language. It is carried out on the 4145 sentence pairs of ancient prose and vernacular Chinese in "Historical Records", "Benji of Qin Shihuang", "Benji of Qin", "Benji of Xiang Yu", "Benji of Gaozu" and "Benji of Lühou". The following is the specific process:
[0026] Step 1: Perform word segmentation on modern Chinese in parallel corpus, and perform word-by-word segmentation on ancient texts. Use IBM Model 3 to align ancient Chinese and modern Chinese.
[0027] Step 2: Preprocess the alignment result obtained in step 1 to eliminate the interference of punctuation and adverbs:
[0028] (1) Check the alignment results obtained in step 1 one by one, and delete the alignment results whose alignment probability is less than or equal to zero, ancient Chinese words or non-Chinese characters corresponding to modern Chinese;
[0029] (2) Check the part of speech of the two words or characters in each...
PUM
Abstract
Description
Claims
Application Information
- R&D Engineer
- R&D Manager
- IP Professional
- Industry Leading Data Capabilities
- Powerful AI technology
- Patent DNA Extraction
Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.
© 2024 PatSnap. All rights reserved.Legal|Privacy policy|Modern Slavery Act Transparency Statement|Sitemap|About US| Contact US: help@patsnap.com