Traditional Chinese medicine literature content analysis method and device

A technology of content analysis and literature, applied in the field of natural language sequence labeling and retrieval, can solve the problems of cumbersome sequence labeling tasks and difficulty in obtaining training data neural networks, etc., to improve labeling effects and readability, optimize dependencies, and improve The effect of the final effect

Pending Publication Date: 2022-05-06
PEKING UNIV +1
View PDF0 Cites 1 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Sequence labeling tasks are usually cumbersome and require certain professional knowledge, and it is difficult to obtain a large amount of training data for neural network training

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Traditional Chinese medicine literature content analysis method and device
  • Traditional Chinese medicine literature content analysis method and device
  • Traditional Chinese medicine literature content analysis method and device

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0029] The technical solutions in the embodiments of the present invention will be clearly and completely described below in conjunction with the accompanying drawings in the embodiments of the present invention. It should be understood that the described examples are only part of the embodiments of the present invention, not all of them. . Based on the embodiments of the present invention, all other embodiments obtained by those skilled in the art without making creative efforts belong to the protection scope of the present invention.

[0030] The example of the present invention is based on the data set obtained by the content analysis and labeling of the texts in the Huangdi Neijing by experts for several years. Those skilled in the art should clearly understand that other candidate information sets and question sets may also be used in a specific implementation process.

[0031] Specifically, this example comes from four chapters of Jiazi Sui, Yi Chou Sui, Bing Yin Sui an...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a traditional Chinese medicine literature content analysis method and device. The method comprises the following steps: preprocessing an obtained classical Chinese text to obtain unsupervised pre-training data to pre-train a selected large-scale language model Bert; combining the pre-trained model Bert with a conditional random field model to obtain a sequence labeling model; training the obtained sequence labeling model by using the labeled traditional Chinese medicine literature content analysis data; segmenting each paragraph of the to-be-analyzed traditional Chinese medicine literature into clauses, inputting the clauses into the sequence labeling model to obtain a coding sequence of each clause, and generating a probability distribution sequence of a tag to which the corresponding clause belongs according to the coding sequence of the clause; inputting the probability distribution sequence of the clauses into a conditional random field model to obtain the probability that the sequence of the clauses is labeled as different tag sequences; and selecting the tag sequence with the maximum probability as a prediction result, combining adjacent clauses predicted as the same tag, and connecting paragraphs of the literature to obtain a content analysis result of the traditional Chinese medicine literature.

Description

technical field [0001] The invention belongs to the field of natural language sequence labeling and retrieval, and relates to a content analysis method of traditional Chinese medicine literature based on a large-scale classical Chinese pre-trained language model. This method can use a large-scale pre-trained language model and conditional random field to analyze the content of TCM-related texts, segment the text segments and label the attribute labels of each text segment, such as the five luck and six qi of the year, related functions, disease manifestations, Symptoms and diagnosis and treatment methods, etc. Background technique [0002] With the development of machine learning and artificial intelligence technology, machines have achieved excellent performance in many natural language processing tasks. Especially in some repetitive work that requires a lot of manpower, the machine has achieved excellent results and helped people save a lot of time. The machine sequence ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F40/211G06F40/237
CPCG06F40/211G06F40/237
Inventor 冯岩松杨威胡楠贾爱霞
Owner PEKING UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products