Word segmentation method and device for ancient traditional Chinese medicine documents
A technology of ancient books and documents of traditional Chinese medicine, applied in the direction of text database query, unstructured text data retrieval, special data processing applications, etc., can solve the problem of no word segmentation device in the field of traditional Chinese medicine
- Summary
- Abstract
- Description
- Claims
- Application Information
AI Technical Summary
Problems solved by technology
Method used
Image
Examples
Embodiment Construction
[0069] Embodiments of the present invention will be described in detail below in conjunction with the accompanying drawings.
[0070] like figure 1 As shown, it is a kind of word segmentation method for ancient Chinese medicine literature according to the present invention, including:
[0071] Step 101, preprocessing the ancient documents in the field of traditional Chinese medicine to generate corpus for training language models; wherein, the step of preprocessing the ancient documents includes: obtaining the original text of the ancient documents, from the original text Delete the catalog of the ancient books and documents, and delete the sentences containing characters that cannot be represented by utf-8, and generate the cleaned text; add a space after each word in the cleaned text as the corpus for training the language model .
[0072] Step 102, training the corpus to generate a language model;
[0073] Step 103, using the language model to perform unsupervised word s...
PUM
Abstract
Description
Claims
Application Information
- R&D Engineer
- R&D Manager
- IP Professional
- Industry Leading Data Capabilities
- Powerful AI technology
- Patent DNA Extraction
Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.
© 2024 PatSnap. All rights reserved.Legal|Privacy policy|Modern Slavery Act Transparency Statement|Sitemap|About US| Contact US: help@patsnap.com