Text recognition method and system suitable for life science

A text recognition and life technology, applied in the field of text recognition methods and systems in the life sciences, can solve the problems of unapplied, lack of pertinence in processing and analysis corpus and related applications, low recognition rate, etc., and achieve fast and accurate matching effects. Effect

Pending Publication Date: 2022-01-28
迪普佰奥生物科技(上海)股份有限公司
View PDF8 Cites 2 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, such general-purpose models often lack pertinence to the corpus and related applications to be processed and analyzed, resulting in low recognition rates, so they cannot be applied in actual scenarios

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Text recognition method and system suitable for life science

Examples

Experimental program
Comparison scheme
Effect test

Embodiment

[0037] The present invention first marks the professional texts through the biological professional team, and then uses the natural language processing method to carry out deep learning on the marked life science experimental methods, understand the context and semantics, train the methods from the supervision model and automatic data cleaning, and solve the problem of It solves the dilemma that conventional technology cannot be applied to life science majors, and also solves the problems of manual search difficulties, heavy workload, high cost, and low efficiency in traditional models. The invention can quickly and accurately identify the experimental technology in the text, and the accuracy rate can reach 95%.

[0038] Such as figure 1 , the recognition steps are:

[0039] Step 1. Use the fine-tuned BERT model to identify passages in the literature about the implementation method;

[0040] Step 2. Perform a series of preprocessing on the recognized paragraphs, such as sent...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides a text recognition method and system suitable for life science, and the method comprises the steps: 1, carrying out the semantic training of all literatures in a life science literature database through employing a BERT pre-training method, obtaining a literature pre-training model in the field of life science, and recognizing paragraphs related to life science in the literatures through employing the pre-training model; 2, preprocessing the recognizedparagraph to obtain a text to be recognized; 3, performing vector representation on each word in the text to be recognized through a Word2vec model; 4, performing weighted average and principal component analysis on the obtained word vectors by taking sentences as units to obtain corresponding target vectors; and 5, comparing the texts after target vectorization through cosine similarity to obtain texts meeting preset requirements. The problems of difficulty in searching, large workload, high cost and low efficiency in a manual recognition experiment method are solved.

Description

technical field [0001] The invention relates to the technical field of text recognition, in particular to a text recognition method and system suitable for life sciences. Background technique [0002] In the field of life sciences, most of the useful information exists in monographs, documents, conference journals and other magazines in the form of text. How to effectively extract the useful information and transform it into the actual practice of scientific researchers in basic research? Applications are of great value and significance. [0003] However, the reality is that the diversity and complexity of living organisms make life sciences highly specialized. Experimental methods exist in massive text information, and a large number of professional vocabulary and terminology in biological experimental methods make conventional recognition technology unable to play an effective role in the field of life sciences. [0004] Experimental methods are usually composed of multi...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F40/295G06F40/211G06F40/216G06F40/30G06K9/62
CPCG06F40/295G06F40/211G06F40/216G06F40/30G06F18/2135G06F18/22
Inventor 谢伟
Owner 迪普佰奥生物科技(上海)股份有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products