Unlock instant, AI-driven research and patent intelligence for your innovation.

Method for classifying medical terms through text classification of web search results

A network search and medical terminology technology, applied in the field of classified medical terminology, can solve problems such as research difficulties, complex analysis, and Chinese spelling errors, and achieve the effects of stable research methods, efficient research methods, and good classification accuracy

Pending Publication Date: 2021-11-23
上海基绪康生物科技有限公司
View PDF2 Cites 2 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0002] Research based on real world evidence (RWE) has received continuous attention in recent years. It is considered to be a powerful research method that can provide researchers with real background information. At the same time, the widely established electronic medical record system has also It provides the possibility for researchers to obtain a large amount of real case information. However, in most cases, important clinical information in real medical records is often included in free text, and most of the vocabulary used is non-standard or inconsistent with the design requirements of researchers. Inconsistency makes research based on these real information difficult, for example, from the perspective of clinical data analysis, it is not only necessary to know what drugs a patient is using, but more importantly, to know the indications of these drugs and their classification , such problems often arise in real-world evidence-based research
[0003] In addition, in many non-English-speaking regions, medical vocabulary and systems have not yet been developed and well-organized, and natural language processing (NLP) technology for languages ​​​​in this region is relatively lagging behind, especially when dealing with Chinese clinical terms, it will face more challenges : On the one hand, because Chinese is a character-based language, there is no clear boundary between words; on the other hand, Chinese is easily misspelled or written as a variant of a formal vocabulary, but to enumerate all Variations are almost impossible, because a medical term generally contains at least three or more Chinese characters, and each Chinese character has at least ten or more variants, which further highlights the problem of classification and recognition of medical terms in Chinese clinical information research
[0004] Although the parsing of other non-English text represented by Chinese is so complex, some online search engines, such as Google and Baidu, handle it very well. In addition to using traditional natural language processing methods, they also adopt some new strategies, such as creating Large databases of name entities, building semantic networks, or using deep learning methods to help continuously improve search engine performance and learn from user input, etc., such technologies can ensure that even in the presence of misspellings or variations, search engines will Infer the real meaning of the searcher and output the required search results. Although these methods represent a more systematic and comprehensive approach to solving NLP problems, they require a large amount of resource reserves. Most research institutions and hospitals do not have the ability to meet all RWE research needs. technology and resources, so research barriers still exist
[0005] In response to this problem, an effective and flexible framework mode is proposed here, that is, to use online search engines to search for vocabulary terms to be classified, and then dynamically classify these vocabulary terms at a given level according to the search results, because they have similar The terms of the search results are likely to belong to the same category, so how to achieve automatic classification? Some previous studies have pointed out that machine learning algorithms can classify texts well, so they are also suitable for classifying texts such as search results. However, most of the classification methods studied in the past are designed for English texts and focus on a limited number of The preset classification is not very applicable to Chinese RWE texts. In addition, the classification method for processing RWE texts needs to have high flexibility, which is mainly reflected in the following three aspects:

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method for classifying medical terms through text classification of web search results
  • Method for classifying medical terms through text classification of web search results
  • Method for classifying medical terms through text classification of web search results

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0045]The technical solutions in the embodiments of the present invention will be clearly and completely described below in conjunction with the embodiments of the present invention. Obviously, the described embodiments are only some of the embodiments of the present invention, not all of them. Based on the embodiments of the present invention, all other embodiments obtained by persons of ordinary skill in the art without making creative efforts belong to the protection scope of the present invention.

[0046] Aiming at the lack of prior art in related fields, the present invention proposes a method for classifying medical terms through text classification of web search results, which is a software pipeline comprising two major components:

[0047] 1) A feature generator, which collects descriptive words related to the term by performing text segmentation on the search results of the term to be classified in common search engines;

[0048] 2) A learning mechanism that uses the...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention relates to a method for classifying medical terms through text classification of network search results. The method comprises the following steps of: S1, network search and feature generation: carrying out text segmentation on search results of terms to be classified in a common search engine so as to collect descriptive vocabularies related to the terms; and 2) model training and term classification: establishing a model for term classification by using the collected features and a machine learning algorithm. The implementation process of the method is easy to be controlled by researchers to completely meet the requirements of research design, the degree and range of classification are not limited by predefined categories, irregular terms in real-world research evidences can be classified in real time only with little priori knowledge, a method for reliably identifying important classification information while saving time is provided for clinical researchers, and an economical and efficient research approach is provided for clinical research institutions.

Description

technical field [0001] The invention relates to the technical field of medical classification recognition, in particular to a method for classifying medical terms through text classification of network search results. Background technique [0002] Research based on real world evidence (RWE) has received continuous attention in recent years. It is considered to be a powerful research method that can provide researchers with real background information. At the same time, the widely established electronic medical record system has also It provides the possibility for researchers to obtain a large amount of real case information. However, in most cases, important clinical information in real medical records is often included in free text, and most of the vocabulary used is non-standard or inconsistent with the design requirements of researchers. Inconsistency makes research based on these real information difficult, for example, from the perspective of clinical data analysis, it...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06F16/35G06F16/953G06F40/216
CPCG06F16/353G06F16/953G06F40/216
Inventor 韦嘉付宁
Owner 上海基绪康生物科技有限公司