Natural language processing method and system for clinical phenotype information of infertility

A technology of natural language processing and infertility, applied in the field of natural language processing methods and systems for infertility clinical phenotype information, which can solve the inconvenience of rapid matching between infertility clinical phenotype information and phenotype ontology , complex and diverse formats, irregular grammar, etc.

Pending Publication Date: 2021-05-07
CARRIER GENE TECH SUZHOU CO LTD
View PDF5 Cites 2 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, most of the clinical phenotype information of infertility input by medical practitioners in the medical information platform is presented in non-standardized language, for example: the format is complex and diverse, often mixed with multiple languages, using non-standard grammar, using abbreviated Abbreviations or common names replace standard terms, wrong information is entered, symbols and other messy information are mixed in the text, and there is no unified standard, etc.
This brings inconvenience to the rapid matching of infertility clinical phenotype information and phenotype ontology

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Natural language processing method and system for clinical phenotype information of infertility
  • Natural language processing method and system for clinical phenotype information of infertility
  • Natural language processing method and system for clinical phenotype information of infertility

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0020] Such as figure 1 As shown, the present invention provides a Chinese segmentation and matching method for infertility clinical phenotype information.

[0021] Step 101, perform natural language preprocessing on the Chinese clinical phenotype character string to obtain the preprocessed Chinese clinical phenotype initial character string.

[0022] Since most of the infertility clinical phenotype information input by medical practitioners is presented in non-standardized language, it contains complex formats (for example: "OR 35, 18 MII"), multilingual mixed (for example: " Full external detection + Microdeletion / Microduplication", "day3 a 7C2 transplantation failed to conceive"), irregular grammar (for example: "prostate ca"), abbreviations or common names instead of standard terms (for example: "RSA", "IVF", "PCOS"), error messages (for example: "CVAVD", "CUAVD"), symbols in the text (for example: "No sperm?.", "FSH 55↑", "<1mL"), etc., increase the Difficulty of matchi...

Embodiment 2

[0060] Such as figure 2 As shown, the present invention provides an English segmentation and matching method for infertility clinical phenotype information.

[0061] Step 201, performing natural language preprocessing on the Chinese clinical phenotype strings to obtain the preprocessed English clinical phenotype initial strings.

[0062] Perform natural language preprocessing on the original strings of Chinese clinical phenotypes, and generate the preprocessed initial strings of English clinical phenotypes can be implemented in the following specific ways: uniformly modify the encoding of the original strings of Chinese clinical phenotypes to UTF-8 encoding format ;Convert all full-width symbols to half-width symbols; convert Arabic numerals to English numerals; eliminate meaningless character strings, such as focus on, none, unchecked, unchecked, normal, past medical history, specific manifestations, require inspection, auspicious see Attachments, etc.; replace irregular cl...

Embodiment 3

[0086] Such as image 3 As shown, the embodiment of the present invention provides the overall flow and weighting rules of the natural language processing method for the clinical phenotype of infertility.

[0087] Such as image 3 The overall process shown, through the natural language processing, splitting, exact matching, and fuzzy matching of the original string of Chinese clinical phenotype, the following string is output: a Chinese independent string that is exactly matched with the Chinese ontology dictionary (step 304) , the English independent character string (step 304) that exactly matches the English ontology dictionary, the Chinese split character string (step 306) that exactly matches the Chinese ontology dictionary, and the English split character string (step 306) that exactly matches the English ontology dictionary , one or more ontologies of the Chinese ontology dictionary that match the Chinese independent character string maximum (step 307), and one or more...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides a natural language processing method and system for clinical phenotype information of infertility. A Chinese clinical phenotype original character string is converted into a Chinese and English clinical phenotype initial character string, an independent character string and a split character string through natural language preprocessing, punctuation mark splitting and field splitting methods; and based on a pre-established Chinese and English ontology dictionary, accurate matching and fuzzy matching are performed on the clinical phenotype initial character string, the independent character string and the split character string, and finally one or more ontologies matched with the Chinese and English ontology dictionary are output through a weighting rule. The fuzzy matching is intended to be calculated through semantic approximation. The invention further provides a natural language processing system and a medium. The natural language processing system comprises a reading module, a conversion module, a splitting module, a matching module and an output module. According to the method, the problem of quick matching of Chinese clinical phenotype information and the ontology dictionary is solved, and convenience is brought to all-exon sequencing analysis of diseases such as infertility and the like.

Description

technical field [0001] The invention belongs to the field of computer processing of clinical phenotype information, in particular to a natural language processing method and system for infertility clinical phenotype information. Background technique [0002] There are more than 40 million infertility patients in my country, and it has become the third largest disease after tumors and cardiovascular diseases. With the sudden increase of social pressure and the aggravation of air and food pollution, the incidence of infertility has risen from 3.5% 20 years ago to 12.5% ​​in 2016, and some areas have exceeded 15%, which means that every 8 couples One of the couples suffers from infertility. According to research, in addition to physical, chemical, microbial and other environmental factors, the individual's own genetic factors also have an important and profound impact on the occurrence of infertility. [0003] With the wide application of high-throughput sequencing technology...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F16/33G06F16/36G06F40/242G06F40/247G06F40/289G06F40/30
CPCG06F16/3344G06F16/367G06F16/374G06F40/242G06F40/247G06F40/289G06F40/30
Inventor 张晶罗俊峰
Owner CARRIER GENE TECH SUZHOU CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products