Patents
Literature
Hiro is an intelligent assistant for R&D personnel, combined with Patent DNA, to facilitate innovative research.
Hiro

475 results about "Sequence labeling" patented technology

In machine learning, sequence labeling is a type of pattern recognition task that involves the algorithmic assignment of a categorical label to each member of a sequence of observed values. A common example of a sequence labeling task is part of speech tagging, which seeks to assign a part of speech to each word in an input sentence or document. Sequence labeling can be treated as a set of independent classification tasks, one per member of the sequence. However, accuracy is generally improved by making the optimal label for a given element dependent on the choices of nearby elements, using special algorithms to choose the globally best set of labels for the entire sequence at once.

Methods for detecting genome-wide sequence variations associated with a phenotype

InactiveUS20040002090A1Microbiological testing/measurementFermentationSub populationsGenetic risk factor
The invention provides methods for determining genome-wide sequence variations associated with a phenotype of a species in a hypothesis-free manner. In the methods of the invention, a set of restriction fragments for each of a sub-population of individuals having the phenotype are generated by digesting nucleic acids from the individual using one or more different restriction enzymes. A set of restriction sequence tags for the individual is then determined from the set of restriction fragments. The restriction sequence tags for the sub-population of organisms are compared and grouped into one or more groups, each of which comprising restriction sequence tags that comprise homologous sequences. The obtained one or more groups of restriction sequence tags identify the sequence variations associated with the phenotype. The methods of the invention can be used for, e.g., analysis of large numbers of sequence variants in many patient samples to identify subtle genetic risk factors.
Owner:SOLEXA

Noninvasive Diagnosis of Fetal Aneuploidy by Sequencing

Disclosed is a method to achieve digital quantification of DNA (i.e., counting differences between identical sequences) using direct shotgun sequencing followed by mapping to the chromosome of origin and enumeration of fragments per chromosome. The preferred method uses massively parallel sequencing, which can produce tens of millions of short sequence tags in a single run and enabling a sampling that can be statistically evaluated. By counting the number of sequence tags mapped to a predefined window in each chromosome, the over- or under-representation of any chromosome in maternal plasma DNA contributed by an aneuploid fetus can be detected. This method does not require the differentiation of fetal versus maternal DNA. The median count of autosomal values is used as a normalization constant to account for differences in total number of sequence tags is used for comparison between samples and between chromosomes.
Owner:THE BOARD OF TRUSTEES OF THE LELAND STANFORD JUNIOR UNIV

Selective terminal tagging of nucleic acids

A method is provided for adding a terminal sequence tag to nucleic acid molecules for use in RNA or DNA amplification. The method involves contacting with a mixture of oligonucleotides, each having a sequence tag template, a random sequence and a blocked 3′ terminus, under conditions such that, the random sequence anneals with the nucleic acid molecules and the nucleic acid molecules are extended using the sequence tag template as template. For synthesis of RNA from DNA molecules having terminal sequence tags, the method includes forming DNA templates having a double stranded promoter sequence and synthesizing RNA from the DNA templates. For amplification of sequences from DNA molecules having terminal sequence tags, the method includes forming DNA templates by extension of one primer having a sequence that is complementary to the terminal sequence tag and another primer having a sequence that is derived form one of the DNA molecules.
Owner:EPICENT BIOTECH

Chinese medical knowledge atlas construction method based on deep learning

ActiveCN106776711AEasy to handleRelationship Accurate and ComprehensiveWeb data indexingSemantic analysisKnowledge unitHealthcare associated
The invention relates to the technology of a knowledge atlas, and aims to provide a Chinese medical knowledge atlas construction method based on deep learning. The Chinese medical knowledge atlas construction method comprises the following steps: obtaining relevant data of a medical field from a data source; using a word segmentation tool to carry out word segmentation on unstructured data, and using an RNN (Recurrent Neural Network) to finish a sequence labeling task to identify entities related to medical care, so as to realize the extraction of knowledge units; carrying out feature vector construction on the entity, and utilizing the RNN to carry out sequence labeling and finish the identification of a relationship among the knowledge units; carrying out entity alignment, and then utilizing the extracted entities and the relationship between the entities to construct the knowledge atlas. According to the Chinese medical knowledge atlas construction method, a recurrent neural network is artfully used for extracting the knowledge units and identifying the relationship among the knowledge units so as to favorably finish the processing of the unstructured data. According to the Chinese medical knowledge atlas construction method, features suitable for the medical care field are put forward to carry out a training task of a network. Compared with general features, the features put forward by the method can better represent a medical entity, and therefore, the relationship among the extracted knowledge units can be more accurate and comprehensive.
Owner:ZHEJIANG UNIV

Noninvasive Diagnosis of Fetal Aneuploidy by Sequencing

Disclosed is a method to achieve digital quantification of DNA (i.e., counting differences between identical sequences) using direct shotgun sequencing followed by mapping to the chromosome of origin and enumeration of fragments per chromosome. The preferred method uses massively parallel sequencing, which can produce tens of millions of short sequence tags in a single run and enabling a sampling that can be statistically evaluated. By counting the number of sequence tags mapped to a predefined window in each chromosome, the over- or under-representation of any chromosome in maternal plasma DNA contributed by an aneuploid fetus can be detected. This method does not require the differentiation of fetal versus maternal DNA. The median count of autosomal values is used as a normalization constant to account for differences in total number of sequence tags is used for comparison between samples and between chromosomes.
Owner:THE BOARD OF TRUSTEES OF THE LELAND STANFORD JUNIOR UNIV

Single cell analysis using sequence tags

The invention provides a method of making measurements on individual cells of a population by forming reactors containing single cells and a predetermined number, usually one, homogeneous sequence tag. In one aspect, the invention provides a method of making multiparameter measurements on individual cells of such a population by carrying out a polymerase cycling assembly (PCA) reaction to link their identifying nucleic acid sequences, such as sequence tag copies derived from a homogeneous sequence tag, to other cellular nucleic acids of interest, thereby forming fusion products. The fusion products of such PCA reactions are then sequenced and tabulated to generate multiparameter data for cells of the population.
Owner:ADAPTIVE BIOTECH

Method and system for automatically constructing knowledge maps for mass unstructured texts

The invention belongs to the technical field of computer software, and discloses a method and a system for automatically constructing knowledge maps for mass unstructured texts. The method comprises the steps of: abstracting a named entity recognition problem into a sequence labeling problem by giving a sentence and labeling each word in the sequence of sentences; designing effective features according to the training data, learning various classification models, and using trained classifiers to predict relationships; linking multiple existing knowledge to create a large-scale and unified knowledge network from the top; and capturing and integrating entity information from three online encyclopedias, open websites, related knowledge bases, or search engine logs. According to the method andthe system for automatically constructing knowledge maps for mass unstructured texts, the construction speed of the knowledge maps can be greatly improved, the time efficiency is improved, and the human resource cost is reduced by more than 30%. In addition, the method and the system have better domain portability, and the construction of the knowledge map can be quickly implemented by only optimizing the entities and relationship extraction algorithms in the invention.
Owner:GLOBAL TONE COMM TECH

Detection and quantification of sample contamination in immune repertoire analysis

The invention is directed to methods for detecting and quantifying nucleic acid contamination in a tissue sample of an individual containing T cells and / or B cells, which is used for generating a sequence-based clonotype profile. In one aspect, the invention is implemented by measuring the presence and / or level of an endogenous or exogenous nucleic acid tag by which nucleic acid from an intended individual can be distinguished from that of unintended individuals. Endogenous tags include genetic identity markers, such as short tandem repeats, rare clonotypes or the like, and exogenous tags include sequence tags employed to determine clonotype sequences from sequence reads.
Owner:ADAPTIVE BIOTECH

System and method for generating subphrase queries

A system for generating subphrase queries. The system includes a sequence label modeling engine and a regression modeling engine. The sequence label modeling engine generates a plurality of subphrase queries by indexing through each token in a search phrase and labeling each token based on an association to other tokens in the search phrase. The regression modeling engine scores each subphrase query at least partially on the association according to a scoring model. The regression modeling engine identifies the subphrase query with the highest score which may then be used for identifying a sponsored search list or a web search item.
Owner:OATH INC

Mosaic tags for labeling templates in large-scale amplifications

The invention relates to methods of labeling nucleic acids, such as fragments of genomic DNA, with unique sequence it referred to herein as “mosaic tag,” prior to amplification and / or sequencing. Such sequence tags are useful for identifying amplification and sequencing errors. Mosaic tags minimize sequencing and amplification artifacts due to inappropriate annealing priming, hairpin formation, or the like, that may occur with completely random sequence tags of the prior art. In one aspect, mosaic tags are sequence tags that comprise alternating constant regions and variable regions, wherein each constant region has it position in the mosaic tag and comprises a predetermined sequence of nucleotides and each variable region has a position in the mosaic tag and comprises a predetermined number of randomly selected nucleotides.
Owner:ADAPTIVE BIOTECH

Text key information identification method, electronic apparatus and readable storage medium

The invention relates to a text key information identification method, an electronic apparatus and a readable storage medium. The method comprises the steps of, after a to-be-identified text is received, performing segmentation processing on the received to-be-identified text by using a predetermined word segmentation model to obtain segmented words of the to-be-identified text, wherein the predetermined word segmentation model is a long and short term memory recurrent neural network model obtained by training a preset quantity of sample statements labeled by adopting a sequence labeling method; and based on word frequencies, positions and word spans of the segmented words in the to-be-identified text, and according to a preset scoring formula, performing calculation to obtain scores of the segmented words, sorting the segmented words in the to-be-identified text according to a sequence of the scores from high to low, extracting the segmented words sorted in front to serve as keywords,and according to the extracted keywords, obtaining key information of the to-be-identified text. A user can be enabled to quickly and accurately obtain the key information in the to-be-identified text.
Owner:PING AN TECH (SHENZHEN) CO LTD

Electronic medical record text named entity recognition method based on pre-trained language model

The invention belongs to the technical field of medical information data processing, and particularly relates to an electronic medical record text named entity recognition method based on a pre-training language model, which comprises the following steps: collecting an electronic medical record text from a public data set as an original text, and preprocessing the original text; labeling the preprocessed original text entity based on the standard medical term set to obtain a labeled text; inputting the annotation text into a pre-training language model to obtain a training text represented bya word vector; constructing a BiLSTM-CRF sequence labeling model, and learning the training text to obtain a trained labeling model; and taking the trained labeling model as an entity recognition model, and inputting a test text to output a labeled category label sequence. According to the method, text features and semantic information in the deep language model are obtained through training in the super-large-scale Chinese corpus, a better semantic compression effect can be provided, the problem that manual annotation is tedious and complex is avoided, the method does not depend on dictionaries and rules, and the recall ratio and accuracy of named entity recognition are improved.
Owner:SUZHOU INST OF BIOMEDICAL ENG & TECH CHINESE ACADEMY OF SCI

Event trigger word extraction method based on document level attention mechanism

The invention relates to an event trigger word extraction method, in particular to an event trigger word extraction method based on a document level attention mechanism, comprising the following steps: (1) preprocessing training corpus; (2) performing word vector training by using PubMed database corpus; (3) constructing a distributed representation way of a sample; (4) constructing a characteristic representation way based on BiLSTM-Attention; (5) adopting CRF learning, and acquiring an optimal sequence labeling result of the current document sequence; and (6) extracting event trigger words.The method provided by the invention has the advantages that firstly a BIO tag labeling way is adopted, and recognition including multi-word trigger word recognition is realized; secondly a corresponding simple word and characteristic distributed representation way is constructed for a trigger word recognition task; and thirdly, a BiLSTM-Attention model is proposed, a distributed representation structure specific to the currently input document level information is realized by introducing an Attention mechanism, and trigger word recognition effect is improved.
Owner:DALIAN UNIV OF TECH

Fine-grained word representation model-based sequence labeling model

ActiveCN108460013ABoundary Judgment ImprovementImprove entity recognitionSemantic analysisCharacter and pattern recognitionData setAlgorithm
The invention provides a fine-grained word representation model-based sequence labeling model, which is used for performing a sequence labeling task, and belongs to the field of computer application and natural language processing. The structure of the model is mainly composed of three parts including a feature representation layer, a BiLSTM layer and a CRF layer. When the sequence labeling task is performed by utilizing the model, firstly an attention mechanism-based character level word representation model Finger is proposed for fusing morphological information and character information ofwords; secondly the Finger and a BiLSTM-CRF model finish the sequence labeling task jointly; and finally a result with F1 of 91.09% is obtained in a CoNLL 2003 data set in end-to-end and no any feature engineering forms by a method. An experiment shows that the designed Finger model remarkably improves the recall rate of a sequence labeling system, so that the model identification capability is remarkably improved.
Owner:DALIAN UNIV OF TECH

A clinical medical entity and an attribute extraction method thereof

The invention relates to a clinical medical entity and an attribute extraction method thereof. The method comprises the following three modules: (1) preprocessing; (2) comprehensive expression of information of the sentences; and (3) joint learning of the clinical medical entity and attribute extraction thereof. The joint learning method mainly comprises two modes: (1) a serial joint mode; and (2)a parallel combination mode. The serial combination mode is divided into three sub-modules: (1) identification of'entities / attributes' of clinical medical treatment; (2) extracting an entity-attribute relation of clinical medical treatment; (3) joint learning. The parallel combination mode is that a sequence labeling method is adopted to carry out clinical medical entity and attribute combinationextraction. The method is of great significance to clinical medical aid decision making, clinical medical research and the like.
Owner:HARBIN INST OF TECH SHENZHEN GRADUATE SCHOOL

Picture information extraction method and device, computer device, and storage medium

The invention discloses a picture information extraction method and device, a computer device and a storage medium. The method comprises the following steps: obtaining a bill picture to be recognizedand adjusting the skew and the illumination to obtain the pre-processed bill picture; Identifying a plurality of text regions included in the obtained pre-processed bill picture; Acquiring spatial coordinates of each text region in a plurality of text regions, serially connecting corresponding vectors according to a stitching order to obtain a text box sequence; Taking the text box sequence as theinput of the sequence labeling model to obtain the sub-sequence corresponding to the region to be extracted; recognizing the text region corresponding to the sub-sequence of the region to be extracted to obtain the text information corresponding to the region to be extracted. The method does not need to extract and recognize all the text information frames of complex bill, and does not need to calculate the association between the characters in turn, which reduces the computational workload. Moreover, the image recognition technology can be used to train the labeling data of various angles and distorted bill pictures, and has good robustness.
Owner:CHINA PING AN PROPERTY INSURANCE CO LTD

Method and device for automatically identifying statement relationship and entity

The invention belongs to the technical field of intelligent identification and provides a method and a device for automatically identifying a statement relationship and an entity. The method for automatically identifying the statement relationship and the entity comprises the steps of projecting an input statement of a user to a fixed dimension space to obtain a sentence vector of the input statement in the fixed dimension space; inputting the sentence vector to a pretrained deep learning classifier to obtain a relationship category of the input statement; and identifying the entity in the input statement if the relationship category is identified. According to the method and the system provided by the invention, the input statement of the user is semantically judged by use of deep learning, and the relationship can be accurately identified; the entity identification is modeled as a sequence labeling problem, an optimal labeling is solved by use of a conditional random field and thus the entity is precisely identified; and the automation extraction of the relationship and the entity is realized in combination with the deep learning and the conditional random field.
Owner:EMOTIBOT TECH LTD

Methods for nucleic acid mapping and identification of fine-structural-variations in nucleic acids

A method of juxtaposing sequence tags (GVTs) that are unique positional markers along the length of a population of target nucleic acid molecules is provided, the method comprising: fragmenting the target nucleic acid molecule to form target DNA insert; ligating the target DNA insert to a DNA vector or backbone to create a circular molecule; digesting the target DNA insert endonuclease to cleave the target DNA insert at a distance from each end of the target DNA insert yielding two GVTs comprising terminal sequences of the target DNA insert attached to an undigested linear backbone; recircularizing the linear backbone with the attached GVTs to obtain a circular DNA containing a GVT-pair having two juxtaposed GVTs; and recovering the GVT-pair DNA by nucleic acid amplification or digestion with endonuclease having sites flanking the GVT-pair. Cosmid vectors are provided for creating GVT-pairs of ˜45- to 50-kb separation sequencable by next-generation DNA sequencers.
Owner:VERSITECH LTD

Method for generating clonotype profiles using sequence tags

The invention is directed to sequence-based profiling of populations of nucleic acids by multiplex amplification and attachment of one or more sequence tags to target nucleic acids anchor copies thereof followed by high-throughput sequencing of the amplification product. In some embodiments, the invention includes successive steps of primer extension, removal of unextended primers and addition of new primers either for amplification (for example by PCR) or for additional primer extension. Some embodiments of the invention are directed to minimal residual disease (MRD) analysis of patients being treated for cancer. Sequence tags incorporated into sequence reads provide an efficient means for determining clonotypes and at the same time provide a convenient means for detecting carry-over contamination from other samples of the same patient or from samples of a different patient which were tested in the same laboratory.
Owner:ADAPTIVE BIOTECH

Entity and entity relationship recognition method and device based on deep learning

The invention discloses an entity and entity relationship recognition method and device based on deep learning. The method comprises the following steps of inputting a text, and converting the text into a word vector, wherein the entity position, entity relationship and relationship position marking mode is adopted; performing sequence labeling on the word vector in a coding and decoding mode so as to obtain a sequence labeling word vector; performing secondary sorting on model output, wherein labels of the preset number with the highest probability of each word are selected as candidates, andlabel pairing is performed so as to obtain a correct label after pairing is successful. According to the method, the deep learning method is combined with the natural language processing technology,the multi-label and entity stacking phenomena are considered, and the brand new relationship extraction solution is put forward, so that the relationship extraction result precision is improved, and various complex conditions can be handled.
Owner:BEIJING UNIV OF POSTS & TELECOMM

Electronic official document entity extraction method

PendingCN110297913AGeneralization abilitySolve the time-consuming and labor-intensive problem of manually labeling a large amount of corpusSemantic analysisNeural architecturesPart of speechAlgorithm
The invention provides an electronic official document entity extraction method. The electronic official document entity extraction method comprises the following steps: A, preprocessing; B, constructing features; C, training an entity extraction model; D, obtaining a corpus; E, obtaining a word vector; F, training an algorithm model. According to the method, a traditional sequence labeling algorithm and a deep learning algorithm are combined, the advantage that a traditional sequence algorithm needs less corpus labeling is utilized, a semi-supervised method is adopted to expand corpuses, andthe problem that time and labor are wasted when a large number of corpuses need to be manually labeled in the deep learning algorithm is solved. Maximum forward and reverse dictionaries, syntax and semantic features are added into the CRF model, and front and rear boundary word features of entity words are fully considered, so that the algorithm has generalization ability. A dilated CNN and BiLSTM-CRF are combined, the dilated CNN takes a character-level vector and a character-level position feature as external features, and the external features and a part-of-speech vector are spliced into aword vector, so that more semantics and up-and-down related information can be expressed to a certain extent.
Owner:CETC BIGDATA RES INST CO LTD

Sequence labelling method and device

The invention discloses a sequence labelling method and device. The method comprises the following steps: a sequence labeling model acquires word vector expression corresponding to each word of an input text sequence through a word vector table; and then the sequence labeling model acquires character vector expression of each character through the character vector table for each character in the text sequence, and splices the character-level feature vector expression of the word and the word vector expression for each word, thereby obtaining the spliced word vector expression of the word; andfinally, the sequence labeling model performs sequence labeling processing on the spliced word vector expression of each word of the text sequence to obtain to a label sequence of the text sequence. The method is used for providing the sequence labeling model with high generalization and accuracy.
Owner:UNION MOBILE PAY

Academic-literature semantic restructuring method based on image processing and sequence labeling

The invention discloses an academic-literature semantic restructuring method based on image processing and sequence labeling. The method is characterized by through carrying out correlation processing, converting an academic literature into an image form and carrying out format analysis on the image form; using an OCR(optical character recognition) technology to identify each text block according with an academic literature logical structure and converting an image and the like into a plain text which can be read by a machine; using a sequence labeling model in nature language processing to carry out label sequence conversion on a processed literature content; through a literature logic structure result obtained by a contrast format analysis and sequence labeling, carrying out optimization so as to acquire a final literature logic structure. A semantic label is automatically added for the literature so as to assist reading. The literature is converted into a structural content to a certain degree and utilization efficiency of the academic literature is improved.
Owner:WUHAN UNIV

Chinese electronic medical record named entity recognition method and system based on attention mechanism

The invention discloses a Chinese electronic medical record named entity recognition method and system based on an attention mechanism, and belongs to the field of text information mining. The technical problem to be solved by the invention is how to identify named entities in an electronic medical record more accurately and conveniently based on a neural network and an attention mechanism. According to the technical scheme, the method comprises the following steps: S1, obtaining word vector and part-of-speech vector representation of Chinese word part-of-speech and splicing the word vector and the part-of-speech vector; S2, splicing the word vector and the part-of-speech vector, and inputting the spliced word vector and part-of-speech vector into a Double-LSTMs neural network model for feature extraction to obtain more accurate implicit strata vector representation; S3, adding an attention layer, and endowing relatively important information in the text with a higher weight; S4, endowing the weight with a hidden layer vector obtained by corresponding forward encoding and a hidden layer vector obtained by reverse encoding, and respectively splicing the hidden layer vectors to serveas feature vectors; and S5, carrying out sequence labeling based on the conditional random field model to realize an identification task of the named entity.
Owner:山东健康医疗大数据有限公司

Sequence labeling model training method, electronic medical record processing method and related device

The embodiment of the invention relates to the technical field of natural language processing, and provides a sequence labeling model training method, an electronic medical record processing method and a related device. The method comprises the steps: obtaining a sample sequence and a standard label sequence of the sample sequence; inputting the sample sequence into a pre-established sequence labeling model, and obtaining an initial vector sequence of the sample sequence by utilizing an initial feature network of the sequence labeling model; inputting the initial vector sequence into a featureextraction network of a sequence labeling model, and obtaining a feature sequence by adopting an attention mechanism; inputting the feature sequence into a label prediction network of a sequence labeling model to obtain a training label result of the sample sequence; and based on the training label result and the standard label sequence, performing iterative correction on the sequence labeling model to obtain a trained sequence labeling model. According to the embodiment of the invention, an attention mechanism is introduced to better learn long-distance feature information in the sequence, so that the accuracy of sequence labeling is effectively improved.
Owner:NEW H3C BIG DATA TECH CO LTD

Subject term extraction method and system based on sequence labeling model

ActiveCN104794169AAchieve preliminary extractionSpecial data processing applicationsData miningMachine learning
The invention discloses a subject term extraction method and system based on a sequence labeling model, and belongs to the technical field of data extraction. The method includes the steps that firstly, labeling and class label setting are performed on subject terms in training linguistic data to obtain a labeling sequence, a subject term extraction model is obtained through training with the training linguistic data serving as an observation sequence and the labeling sequence serving as a state sequence, and the subject terms in the linguistic data to be extracted are preliminarily extracted with the model serving as an extractor; then, preliminary extraction results are screened according to the similarity between the subject terms to obtain the true subject terms belonging to corresponding subject fields. According to the extraction method and system, when the subject terms are extracted, by performing labeling on the subject terms in a small quantity of training linguistic data, rapid and accurate extraction of the subject terms in the linguistic data is achieved, meanwhile, existing knowledge hierarchy structures of the subject fields can be gradually improved, and the defects of a traditional subject term extraction method are overcome.
Owner:明博教育科技股份有限公司 +1

Uyghur named entity recognition method based on depth learning

PendingCN109117472AAbandon manual acquisition of featuresSolving the Named Entity Labeling ProblemNatural language data processingSpecial data processing applicationsConditional random fieldSyllable
The invention discloses a Uyghur named entity recognition method based on depth learning. The method comprises the following steps: (1) segmenting Uyghur text, respectively extracting characters and segmenting syllables; (2) obtaining forward and reverse character vectors from extracted characters by bi-directional LSTM network, and splicing them together to form character vector representations of words; (3) obtaining forward and reverse syllable vectors from segmented syllables by bi-directional LSTM network, and splicing them together to form syllable vector representations of words; (4) splicing the character vector, syllable vector and word vector and modeling the context information of each word as a bi-directional LSTM neural network; (5) at the output of LSTM neural network, the whole sentence is labeled with named entity by using conditional random field. The invention extracts the abundant structure information of words by taking the splicing of characters, syllables and wordvectors as the input of the neural network, so that the invention can be widely applied in the sequence labeling of the morphologically rich languages.
Owner:XINJIANG UNIVERSITY
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products