Missing mark filling method based on sliding window sparse convolution denoising auto-encoder

A self-encoder and filling method technology, applied in the field of computer and bioinformatics, can solve the problems of inability to effectively use close-range markers, close chain relationship, over-fitting, dimension disaster, etc., to achieve improved filling accuracy, high filling accuracy, The effect of high training efficiency

Pending Publication Date: 2022-08-05
YANGZHOU UNIV
View PDF0 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

If only a general denoising autoencoder is used to encode and model a large number of markers on chromosomes, problems such as curse of dimensionality and overfitting will occur, making it impossible to effectively utilize the close linkage between markers in close proximity

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Missing mark filling method based on sliding window sparse convolution denoising auto-encoder
  • Missing mark filling method based on sliding window sparse convolution denoising auto-encoder
  • Missing mark filling method based on sliding window sparse convolution denoising auto-encoder

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0078] The technical solutions provided by the present invention will be described in detail below with reference to specific embodiments. It should be understood that the following specific embodiments are only used to illustrate the present invention and not to limit the scope of the present invention. Additionally, the steps shown in the flowcharts of the figures may be performed in a computer system, such as a set of computer-executable instructions, and, although shown in a logical order in the flowcharts, in some cases, may be executed differently The steps shown or described are performed in the order shown herein.

[0079] The present invention provides a missing marker filling method based on sliding window sparse convolution denoising autoencoder. The overall process is as follows: figure 1 shown, including the following steps:

[0080] Step 1: First, select the first 10,000 loci of the first chromosome of rice and maize respectively, and preprocess the selected gen...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides a missing marker filling method based on a sliding window sparse convolution denoising auto-encoder, which comprises the following steps: firstly, carrying out numerical conversion and one-hot coding on known gene data, dividing a training set and a verification set, then building a sparse convolution neural network model, carrying out missing marker filling on a gene sequence by adopting a segmented sliding window mode, and finally, obtaining a missing marker filling result; through overlapping of windows, central area prediction results with sufficient data features are obtained and spliced, and filling results of edge areas are abandoned. And then the filling precision of the missing mark is calculated, and the hyper-parameter of the neural network is adjusted according to the feedback result of the early-stage training. In practical application, gene filling results of multiple species such as corn and rice show that the filling precision of the method is remarkably higher than that of traditional algorithms such as KNN and SVD. The method is high in filling precision, has the advantages of simple model structure and high training efficiency, and has a wide application prospect in the field of gene sequence analysis.

Description

technical field [0001] The invention belongs to the technical field of computer and bioinformatics, and in particular relates to a missing marker filling method based on a sliding window sparse convolution denoising autoencoder. Background technique [0002] Genotype filling is a key step in the analysis of human, animal and plant genome sequences and can be used for genome-wide association analysis and genome-wide prediction. A large number of deletions are prevalent in genetic sequencing data due to a variety of reasons, including low call rates, deviations from Hardy-Weinberg equilibrium, and the presence of rare or low-frequency variants in the samples. Genotype imputation works by computationally inferring missing values ​​in a genotype distribution, typically using correlation or linkage information for untyped variants and markers near the genotype. The genetic variation in which the genotype is filled is mainly single nucleotide polymorphism (SNP), and other types o...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G16B30/00G16B40/00G06N3/04G06N3/08
CPCG16B30/00G16B40/00G06N3/08G06N3/047G06N3/048
Inventor 刘毅王欣
Owner YANGZHOU UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products