Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

T cell receptor sequence classification method based on semi-supervised learning framework

A semi-supervised learning, cell receptor technology, applied in the field of T cell receptor sequence classification based on semi-supervised learning framework, can solve complex experiments and other problems, and achieve the effect of helping feature extraction

Active Publication Date: 2020-08-04
XI AN JIAOTONG UNIV +1
View PDF8 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] 1. Obtaining the epitope corresponding to the T cell receptor requires complex experiments;

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • T cell receptor sequence classification method based on semi-supervised learning framework
  • T cell receptor sequence classification method based on semi-supervised learning framework
  • T cell receptor sequence classification method based on semi-supervised learning framework

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0046] The present invention provides a T cell receptor sequence classification method based on a semi-supervised learning framework, which uses a support vector machine, a random forest and a decision tree supervised learning algorithm to construct a triple learning framework to solve the aforementioned problem of less data. Support vector machines, random forests, and decision trees have their own advantages and disadvantages. First, the support vector machine has good performance on small sample sets (the amount of data will be reduced when the epitope data set is divided in different proportions), so the method can improve the accuracy of the initial classifier prediction, which helps the model The iteration improves the prediction accuracy of the final model. Random forest is not easy to overfit, and has a high tolerance for outliers and noise, and has strong robustness to unbalanced data. Decision trees are suitable for high-dimensional data and for processing samples wi...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a T cell receptor sequence classification method based on a semi-supervised learning framework, and the method comprises the steps: selecting a CDR3 beta region as input data,and carrying out the feature coding of T cell receptor data; according to the obtained data, selecting supervised learning algorithms of a support vector machine, a random forest and a decision tree to respectively construct initial classifiers C1, C2 and C3; training the initial classifiers C1, C2 and C3 to obtain expanded new training sets, repeatedly sampling the generated training sets to obtain three marked training sets, generating a classifier from each new training set, and iteratively updating the classifiers; and after the training is finished, using the three classifiers C1, C2 andC3 as a classifier integration through a voting mechanism. The method is suitable for the situation that T cell receptor sequence data is difficult to obtain, and the performance is remarkably superior to that of an existing method.

Description

Technical field [0001] The invention belongs to the field of data science technology, and specifically relates to a T cell receptor sequence classification method based on a semi-supervised learning framework. Background technique [0002] T cell receptor (English name: T cell receptor, English abbreviation: TCR) refers to a protein complex carried on the surface of T cells that can interact with the major histocompatibility complex on the host cell (English name: Majorhistocompatibility complex, English abbreviation: The antigenic epitope presented by MHC molecules—antigenic peptide-MHC molecular complex (English name: Peptide-MHC, English abbreviation: pMHC) combines to transmit the recognition signal on the surface of T cells to the nucleus of T cells, thereby activating T cell. In most cases, the affinity and binding specificity of T cell receptors for a given epitope can be determined using only the β chain. The main region where T cell receptors bind to the antigen peptid...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G16B40/20G06K9/62
CPCG16B40/20G06F18/2411G06F18/24323G06F18/241G06F18/214
Inventor 王嘉寅边浩东易鑫张选平王科刘涛
Owner XI AN JIAOTONG UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products