Base calling method, base calling model training method, and related device
By using a base recognition model to perform feature mapping and multi-head attention encoding on fluorescence images, the problem of poor base recognition accuracy in gene sequencing was solved, and accurate classification of nucleic acid sequence clusters to be tested was achieved.
Patent Information
- Authority / Receiving Office
- WO · WO
- Patent Type
- Applications
- Current Assignee / Owner
- MGI TECH CO LTD
- Filing Date
- 2024-12-18
- Publication Date
- 2026-06-25
AI Technical Summary
In existing technologies, it is difficult to accurately identify the correlation between multiple fluorescence images from the same sequencing cycle during gene sequencing, resulting in poor accuracy in base identification.
A base recognition method is adopted, which performs feature mapping on fluorescence images through the embedding module of a preset base recognition model, encodes the feature tensors using a multi-head attention mechanism, and classifies them using a classification module to identify the base categories of the nucleic acid sequence clusters to be tested.
It improves the accuracy of base identification by capturing intensity data features and correlations in fluorescence images through feature mapping and multi-head attention mechanism, thus achieving accurate classification of nucleic acid sequence clusters to be tested.
Smart Images

Figure CN2024140201_25062026_PF_FP_ABST
Abstract
Description
Base recognition methods, base recognition model training methods and related equipment Technical Field
[0001] This application relates to the field of gene sequencing technology, and in particular to a base recognition method, a base recognition model training method, and related equipment. Background Technology
[0002] Gene sequencing technology is being used more and more widely in various fields such as health, agriculture, and energy, promoting the development of life sciences, biomedicine and related industries.
[0003] Base recognition of deoxyribonucleic acid (DNA) nanospheres is a crucial step in gene sequencer algorithms, and the accuracy of base recognition directly affects sequencing quality. Related technologies often struggle to learn the correlation between fluorescence intensities in multiple fluorescence images from the same sequencing cycle during gene sequencing, leading to poor base recognition accuracy. Summary of the Invention
[0004] In view of the above, it is necessary to provide a base identification method, a base identification model training method, and related equipment to solve the technical problem of poor accuracy in base identification.
[0005] On one hand, embodiments of this application provide a base identification method, the base identification method comprising: acquiring multiple fluorescence images of multiple nucleic acid sequence clusters to be tested during a sequencing cycle of gene sequencing; performing feature mapping on the fluorescence intensity sequences corresponding to the multiple fluorescence images through an embedding module of a preset base identification model to obtain a feature tensor; encoding the feature tensor through an encoding module based on a multi-head attention mechanism in the base identification model to obtain an encoding tensor; and classifying the multiple nucleic acid sequence clusters to be tested according to the encoding tensor through a classification module of the base identification model to obtain the base category corresponding to the multiple nucleic acid sequence clusters to be tested in the sequencing cycle.
[0006] In some embodiments of this application, the generation of the fluorescence intensity sequence includes: extracting the target nucleic acid sequence clusters and the coordinates of the target nucleic acid sequence clusters corresponding to effective pixels based on the preprocessed image corresponding to each fluorescence image, and generating the fluorescence intensity sequence according to the intensity data of the coordinates of the multiple target nucleic acid sequence clusters in multiple preprocessed images.
[0007] In some embodiments of this application, the fluorescence intensity sequence has a first dimension and a second dimension. The dimension of the fluorescence intensity sequence in the first dimension represents the number of nucleic acid sequence clusters to be tested, and the dimension in the second dimension represents the number of multiple fluorescence images. The step of performing feature mapping on the fluorescence intensity sequence corresponding to the multiple fluorescence images through the embedding module of the preset base recognition model to obtain a feature tensor includes: performing a lookup operation on the dense matrix of the embedding module according to each intensity data in the fluorescence intensity sequence to obtain a vector corresponding to each intensity data. The number of element values in each vector is the dimension of the dense matrix in the second dimension. The feature tensor is generated according to the vector corresponding to each intensity data. The feature tensor has a first dimension, a second dimension, and a third dimension. The dimension of the feature tensor in the first dimension represents the number of nucleic acid sequence clusters to be tested, the dimension in the second dimension represents the number of multiple fluorescence images, and the dimension in the third dimension is the dimension of the dense matrix in the second dimension.
[0008] In some embodiments of this application, the encoding module includes a multi-head attention layer corresponding to the multi-head attention mechanism.
[0009] In some embodiments of this application, the encoding module further includes a residual network layer and a feedforward neural network layer. Encoding the feature tensor using the multi-head attention-based encoding module in the base recognition model to obtain the encoded tensor includes: performing fully connected operations on the feature tensor using multiple fully connected layers in the multi-head attention layer to obtain multiple operational tensors corresponding to the feature tensor; splitting the dimension of each operational tensor in the third dimension according to the number of multi-head attention mechanisms in the multi-head attention layer to obtain multiple split tensors corresponding to each operational tensor, each split tensor corresponding to one attention mechanism; performing self-attention calculation on the split tensors corresponding to each attention mechanism among the multiple operational tensors to obtain a self-attention tensor corresponding to each attention mechanism; concatenating the self-attention tensors corresponding to the multi-head attention mechanisms to obtain a concatenated tensor; performing residual connections on the feature tensor and the concatenated tensor using the residual network layer; and inputting the residually connected tensor into the feedforward neural network layer to obtain the encoded tensor.
[0010] In some embodiments of this application, the encoding module further includes a residual network layer and a feedforward neural network layer. The step of encoding the feature tensor using the multi-head attention-based encoding module in the base recognition model to obtain the encoded tensor includes: performing fully connected operations on the feature tensor using multiple fully connected layers in the multi-head attention layer to obtain multiple operational tensors corresponding to the feature tensor; performing self-attention calculations on the multiple operational tensors using a per-head attention mechanism to obtain a self-attention tensor corresponding to the per-head attention mechanism; concatenating the self-attention tensors corresponding to the multi-head attention mechanism to obtain a concatenated tensor; performing a linear transformation on the concatenated tensor to obtain a linear tensor; performing a residual connection on the feature tensor and the linear tensor using the residual network layer; and inputting the residually connected tensor into the feedforward neural network layer to obtain the encoded tensor.
[0011] In some embodiments of this application, the classification module of the base recognition model classifies the plurality of nucleic acid sequence clusters to be tested according to the coding tensor to obtain the base category corresponding to the plurality of nucleic acid sequence clusters to be tested in the sequencing cycle, which includes: using the classification module to map the coding tensor to a plurality of preset base categories, obtaining the confidence level of the coding tensor corresponding to each preset base category, and determining the base category from the plurality of preset base categories according to the confidence level.
[0012] The base identification method in this embodiment improves the dimensionality of the feature tensor through feature mapping, providing a foundation for encoding using a multi-head attention mechanism. Since the multi-head attention mechanism possesses powerful feature extraction and data association capabilities, it can comprehensively capture data features and the relationships between data. Therefore, by encoding the feature tensor using the multi-head attention mechanism, it is possible to accurately extract the features of the intensity data related to base categories of the nucleic acid sequence clusters to be tested in multiple fluorescence images, as well as the relationships between the intensity data corresponding to the nucleic acid sequence clusters in multiple fluorescence images, such as the distribution patterns and positional relationships of the intensity data, thereby improving the accuracy of the encoded tensor. Through the classification module, the nucleic acid sequence clusters to be tested can be classified according to the encoded tensor, thus accurately identifying the base category corresponding to each nucleic acid sequence cluster.
[0013] On the other hand, this application provides a base recognition model training method, which includes: acquiring multiple training samples and base category labels corresponding to each training sample, wherein each training sample includes multiple fluorescence images of multiple nucleic acid sequence clusters to be tested during a sequencing cycle of gene sequencing; and performing supervised classification training on a pre-trained classification model based on the multiple training samples and the base category labels corresponding to each training sample to obtain a base recognition model.
[0014] In some embodiments of this application, the generation of the pre-trained classification model includes: performing feature mapping on the sample intensity sequence corresponding to each training sample through the embedding module in a preset classification network to obtain a first feature tensor corresponding to each training sample; performing masking processing on the intensity data of a preset dimension in the sample feature tensor corresponding to each training sample to obtain a first masked tensor and a first unmasked tensor corresponding to each training sample; encoding the first unmasked tensor corresponding to each training sample through the encoding module based on a multi-head attention mechanism in the classification network to obtain a first encoded tensor corresponding to each training sample; decoding the first encoded tensor and the first masked tensor corresponding to each training sample through the decoding module in the classification network to obtain a decoded tensor; calculating a reconstruction loss based on the first feature tensor and the decoded tensor corresponding to each training sample; and training the classification network based on the reconstruction loss to obtain the pre-trained classification model.
[0015] In some embodiments of this application, the sample intensity sequence has a first dimension and a second dimension. The dimension of the sample intensity sequence in the first dimension is the number of the nucleic acid sequence clusters to be tested, and the dimension in the second dimension is the number of fluorescence images in each training sample. The step of masking the intensity data of the preset dimension in the sample feature tensor corresponding to each training sample to obtain the first masked tensor and the first unmasked tensor corresponding to each training sample includes: for the intensity data corresponding to all indices of the sample feature tensor in the second dimension, masking the intensity data corresponding to the preset index among all indices to obtain the first masked tensor and the first unmasked tensor, wherein the dimension of the first masked tensor in the second dimension is the number of the preset indexes, and the dimension of the first unmasked tensor in the second dimension is the number of the remaining indices other than the preset indexes among all indices.
[0016] In some embodiments of this application, the step of performing supervised classification training on a pre-trained classification model based on the multiple training samples and the base category label corresponding to each training sample to obtain a base recognition model includes: generating a second encoding tensor and a second masking tensor corresponding to each training sample based on the pre-trained classification model, wherein the second encoding tensor is obtained by encoding a second unmasked tensor corresponding to each training sample, the second masked tensor and the second unmasked tensor are obtained by masking a second feature tensor corresponding to each training sample, the second feature tensor is obtained by feature mapping of each training sample, classifying each training sample according to the second encoding tensor and the second masking tensor corresponding to each training sample through a preset classification module to obtain the base category corresponding to each training sample, calculating a classification loss according to the base category and base category label corresponding to each training sample, and training the classification module and the pre-trained classification model according to the classification loss to obtain the base recognition model.
[0017] On the other hand, this application provides a base recognition device, which includes: an acquisition unit for acquiring multiple fluorescence images of multiple nucleic acid sequence clusters to be tested during a sequencing cycle; a feature mapping unit for performing feature mapping on the fluorescence intensity sequences corresponding to the multiple fluorescence images through an embedding module of a preset base recognition model to obtain a feature tensor; an encoding unit for encoding the feature tensor through an encoding module based on a multi-head attention mechanism in the base recognition model to obtain an encoding tensor; and a classification unit for classifying the multiple nucleic acid sequence clusters to be tested according to the encoding tensor through a classification module of the base recognition model to obtain the base category corresponding to the multiple nucleic acid sequence clusters to be tested in the sequencing cycle.
[0018] On the other hand, this application provides a base recognition model training device, which includes: an acquisition unit for acquiring multiple training samples and base category labels corresponding to each training sample, wherein each training sample includes multiple fluorescent sample images of the nucleic acid sequence cluster to be tested; and a training unit for performing supervised classification training on a pre-trained classification model based on the multiple training samples and the base category labels corresponding to each training sample to obtain a base recognition model.
[0019] The base recognition model training method in this application training method trains the pre-trained classification model based on training samples with corresponding base category labels, which can improve the training accuracy of the base recognition model and thus improve the accuracy of the trained base recognition model in recognizing the base category of the nucleic acid sequence cluster to be tested.
[0020] On the other hand, this application provides an electronic device, the electronic device comprising: a memory storing at least one instruction; and a processor executing the at least one instruction to implement the base recognition method, or to implement the base recognition model training method.
[0021] On the other hand, this application provides a computer-readable storage medium storing a computer program, which, when executed by a processor in an electronic device, implements the base recognition method or the base recognition model training method. Attached Figure Description
[0022] To more clearly illustrate the technical solutions in the embodiments of the present invention or the prior art, the drawings used in the description of the embodiments or the prior art will be briefly introduced below. Obviously, the drawings described below are only embodiments of the present invention. For those skilled in the art, other drawings can be obtained based on the provided drawings without creative effort.
[0023] Figure 1 is a flowchart of a base recognition method provided in an embodiment of this application.
[0024] Figure 2 is a schematic diagram of the nucleic acid sequence clusters to be tested on a sequencing chip provided in an embodiment of this application.
[0025] Figure 3 is a schematic diagram of the fluorescence image of the first sequencing cycle of the gene sequencing process provided in an embodiment of this application.
[0026] Figure 4 is a schematic diagram of the fluorescence image of the second sequencing cycle of the gene sequencing process provided in an embodiment of this application.
[0027] Figure 5 is a schematic diagram of the fluorescence image of the third sequencing cycle of the gene sequencing process provided in an embodiment of this application.
[0028] Figure 6 is a schematic diagram of the fluorescence image of the fourth sequencing cycle of the gene sequencing process provided in an embodiment of this application.
[0029] Figure 7 is a schematic diagram of the generation of feature tensors provided in an embodiment of this application.
[0030] Figure 8 is a schematic diagram of the structure of an encoding module provided in an embodiment of this application.
[0031] Figure 9 is a flowchart of a base recognition model training method provided in an embodiment of this application.
[0032] Figure 10 is a flowchart of a training method for a classification network provided in an embodiment of this application.
[0033] Figure 11 is a schematic diagram of masking processing of sample feature tensors provided in an embodiment of this application.
[0034] Figure 12 is a flowchart of a training method for a classification model provided in an embodiment of this application.
[0035] Figure 13 is a functional block diagram of a base recognition device provided in an embodiment of this application.
[0036] Figure 14 is a functional block diagram of a base recognition model training device provided in an embodiment of this application.
[0037] Figure 15 is a schematic diagram of the structure of an electronic device provided in an embodiment of this application.
[0038] The following detailed description, in conjunction with the accompanying drawings, will further illustrate the present invention. Detailed Implementation
[0039] To make the objectives, technical solutions, and advantages of this application clearer, the application will be described in detail below with reference to the accompanying drawings and specific embodiments.
[0040] It should be noted that in this application, "at least one" means one or more, and "more than one" means two or more. "And / or" describes the relationship between related objects, indicating that three relationships can exist. For example, A and / or B can represent: A alone, A and B simultaneously, or B alone, where A and B can be singular or plural. The terms "first," "second," "third," "fourth," etc. (if present) in the specification, claims, and drawings of this application are used to distinguish similar objects, not to describe a specific order or sequence.
[0041] In the embodiments of this application, the terms "exemplary" or "for example" are used to indicate that something is an example, illustration, or description. Any embodiment or design that is described as "exemplary" or "for example" in the embodiments of this application should not be construed as being more preferred or advantageous than other embodiments or design. Specifically, the use of the terms "exemplary" or "for example" is intended to present the relevant concepts in a specific manner.
[0042] This application provides a base recognition method and a base recognition model training method. By training the base recognition model, the accuracy of identifying the base categories corresponding to the nucleic acid sequence clusters to be tested can be improved.
[0043] The base recognition method and base recognition model training method provided in this application can be applied to electronic devices.
[0044] In some embodiments of this application, the electronic device may be an embedded device or an embedded system composed of one or more embedded devices. For example, the electronic device may be a Jetson device. Jetson devices have the advantages of small size, low power consumption, and high performance, making them easy to integrate into space-constrained environments, such as gene sequencers. Therefore, Jetson devices have the ability to perform edge computing, bringing the base recognition process closer to the gene sequencing process.
[0045] In other embodiments of this application, the electronic device may be a gene sequencer, mobile phone, tablet computer, laptop computer, computer, etc. This application does not limit the electronic device.
[0046] Figure 1 shows a flowchart of a base recognition method provided in an embodiment of this application. The order of the steps in this flowchart can be adjusted according to different needs, and some steps can be omitted. The base recognition method is applied to electronic devices.
[0047] S11: Acquire multiple fluorescence images of multiple nucleic acid sequence clusters to be tested during one sequencing cycle of gene sequencing.
[0048] In some embodiments of this application, the nucleic acid sequence cluster to be tested is a large-scale molecular cluster obtained by nucleic acid amplification of ribonucleic acid (or deoxyribonucleic acid) sequences during gene sequencing. It can consist of groups of similar or identical nucleotide chains or deoxyribonucleic acid (DNA) chains. For example, the nucleic acid sequence cluster to be tested can be amplified oligonucleotides or polynucleotides with the same or similar sequences.
[0049] During nucleic acid sequencing cycles, the target nucleic acid sequence clusters may include deoxyribonucleic acid (DNB) and other nucleic acid (DNB) sequences, which may be immobilized to reaction sites and / or reaction chambers of the sequencing chip. Gene sequencing methods may include DNA nanoball (DNB) sequencing. In other embodiments, gene sequencing methods may also include bridge sequencing. This application does not limit the sequencing method.
[0050] In some embodiments of this application, during one sequencing cycle of gene sequencing, sequencing equipment or platforms such as gene sequencers use a micro-imaging optical system to take a single image of the sequencing chip, obtaining a field of view (FOV) image, which can be used as the aforementioned fluorescence image. This application does not limit the type of sequencing equipment and sequencing chip. For example, the sequencing equipment can be a gene sequencer based on DNBSEQ technology. Specifically, the fluorescence image can be an image obtained by the micro-imaging optical system capturing the sequencing chip when the fluorescent groups of the adenine (A), guanine (G), cytosine (C), or thymine (T) bases of the nucleic acid sequence cluster to be tested are excited to emit fluorescence signals. Each fluorescence image can correspond to multiple nucleic acid sequence clusters to be tested.
[0051] In this application, multiple fluorescence images belong to the same sequencing cycle, have the same image size, and can be customized in terms of the shooting interval and the number of fluorescence images. For example, Figure 2 shows a schematic diagram of the nucleic acid sequence cluster to be tested on a sequencing chip according to an embodiment of this application. Exemplarily, the shooting interval between the multiple fluorescence images can be 1 second, the number of fluorescence images can be 4, 12, or 64, and the width and height of each fluorescence image can be 5664 pixels and 8496 pixels, respectively.
[0052] Since the multiple fluorescence images belong to the same sequencing cycle, and the nucleic acid sequence clusters to be tested are relatively fixed in spatial position within the same sequencing cycle, the same coordinates among the multiple fluorescence images correspond to the same nucleic acid sequence clusters to be tested. Therefore, the effective pixels with the same coordinates in the multiple fluorescence images correspond to the fluorescence signals emitted by the same nucleic acid sequence clusters to be tested.
[0053] The acquisition of multiple fluorescence images from the same sequencing cycle described above does not constitute a limitation on the acquired fluorescence images. In practical applications, multiple fluorescence images from different sequencing cycles can be acquired, and base identification can be performed on the multiple fluorescence images from different sequencing cycles separately. Different sequencing cycles are performed independently, therefore the fluorescence images corresponding to different sequencing cycles can correspond to different nucleic acid sequence clusters to be tested. For example, Figure 3 is a schematic diagram of the fluorescence image of the first sequencing cycle of the gene sequencing process provided in an embodiment of this application. Figure 4 is a schematic diagram of the fluorescence image of the second sequencing cycle of the gene sequencing process provided in an embodiment of this application. Figure 5 is a schematic diagram of the fluorescence image of the third sequencing cycle of the gene sequencing process provided in an embodiment of this application. Figure 6 is a schematic diagram of the fluorescence image of the fourth sequencing cycle of the gene sequencing process provided in an embodiment of this application. In Figures 3-6, circular patterns of different shades can represent the intensity data, pixels, or coordinates corresponding to different nucleic acid sequence clusters to be tested. Figures 3-6 show the intensity data, pixels, or coordinates of a total of 9 nucleic acid sequence clusters to be tested.
[0054] In some embodiments of this application, if the electronic device is a sequencing device such as a gene sequencer, the electronic device can take pictures at preset shooting intervals using a built-in micro-imaging optical system to obtain the multiple fluorescence images. If the electronic device is not a sequencing device such as a gene sequencer, the electronic device can establish a communication connection with the sequencing device such as a gene sequencer and receive the multiple fluorescence images sent from the sequencing device such as a gene sequencer.
[0055] The method for acquiring multiple fluorescence images described above is merely an example and is not limited to this in practical applications. For instance, the electronic device can also read multiple image data from a folder under a preset path as the multiple fluorescence images. The preset path can be customized.
[0056] S12, through the embedding module of the preset base recognition model, feature mapping is performed on the fluorescence intensity sequence corresponding to the multiple fluorescence images to obtain the feature tensor.
[0057] In some embodiments of this application, in addition to the embedding module, the base recognition model may also include an encoding module and a classification module, the functions and roles of which will be described in detail below. The above description of the base recognition model does not constitute a limitation on the model. For example, the base recognition model may also include other types of network layers, such as pooling layers and sampling layers.
[0058] In some embodiments of this application, the electronic device can perform preprocessing operations such as denoising, removing chemical borders, and image opening operations on the preprocessed image corresponding to each fluorescence image to obtain the preprocessed image corresponding to each fluorescence image. From the preprocessed image, effective pixels are extracted as nucleic acid sequence clusters to be tested, and the coordinates of the effective pixels are determined as the coordinates of the nucleic acid sequence clusters to be tested. Based on the intensity data of the effective pixels corresponding to each coordinate in multiple preprocessed images, a fluorescence intensity sequence is generated.
[0059] In this context, each nucleic acid sequence cluster to be tested can correspond to the same coordinates in multiple preprocessed images, and each nucleic acid sequence cluster to be tested has multiple corresponding intensity data on multiple preprocessed images.
[0060] For example, an electronic device can construct a fluorescence intensity sequence by using multiple intensity data corresponding to each nucleic acid sequence cluster to be tested as row data.
[0061] The intensity data for each nucleic acid sequence cluster to be tested can be the pixel value of the corresponding effective pixel point in the corresponding preprocessed image. Each nucleic acid sequence cluster to be tested can correspond to the same coordinates in multiple preprocessed images, and each nucleic acid sequence cluster to be tested has multiple intensity data corresponding to multiple preprocessed images.
[0062] For example, an electronic device can filter out background noise from multiple fluorescence images based on a background image, perform Gaussian filtering on the multiple fluorescence images after background noise removal, divide the multiple fluorescence images after Gaussian filtering into multiple parts, select a preset number of fluorescence images from each part, and perform calculations on the pixel values between the selected fluorescence images to obtain a corresponding operation image for each part. The multiple corresponding operation images are binarized to obtain multiple binary images. The features of the multiple binary images are fused, and morphological opening and closing operations are performed on the feature-fused binary images to obtain a mask image. The pixels with pixel values in the mask image are determined as valid pixels.
[0063] The background image can be an image captured on the sequencing chip before the biochemical reaction of gene sequencing. Background noise can be filtered out by subtracting the pixel values of the corresponding pixels in each fluorescence image from those in the background image. The method for dividing the multiple fluorescence images after Gaussian filtering into multiple parts can be customized, and there can be overlap between the fluorescence images in each part. For example, if there are 12 fluorescence images after Gaussian filtering, the first 7 fluorescence images can be grouped into the first part, and the last 8 into the second part. The preset number can be customized. For example, the electronic device can calculate the sum of pixel values of each fluorescence image in each part, select the top two fluorescence images corresponding to the sums of pixel values arranged from high to low, thus obtaining two fluorescence images. The pixel values of the two selected fluorescence images are averaged to obtain the corresponding processed image. This application does not limit the method of feature fusion.
[0064] For example, the fluorescence intensity sequence can be in the form of a matrix. The electronic device can use multiple intensity data corresponding to each nucleic acid sequence cluster to be tested in multiple preprocessed images as row data, and multiple rows of data can construct a fluorescence intensity sequence in the form of a matrix.
[0065] The fluorescence intensity sequence has a first dimension and a second dimension. The intensity data in both dimensions have corresponding indices. Each index in the first dimension corresponds to a cluster of nucleic acid sequences to be tested, and each index in the second dimension corresponds to a fluorescence image. The number of indices in the first dimension represents the dimension of the fluorescence intensity sequence in that dimension, and the number of indices in the second dimension represents the dimension of the fluorescence intensity sequence in that dimension. Therefore, the dimension of the fluorescence intensity sequence in the first dimension represents the number of clusters of nucleic acid sequences to be tested, and the dimension in the second dimension represents the number of fluorescence images.
[0066] The shape of the fluorescence intensity sequence can be determined by the number of dimensions in the first dimension and the second dimension. The number of dimensions in the first dimension can be called the number of rows in the fluorescence intensity sequence, and the number of dimensions in the second dimension can be called the number of columns in the fluorescence intensity sequence. For example, if the number of nucleic acid sequence clusters to be tested is N, and the number of fluorescence images is 12, then the shape of the fluorescence intensity sequence can be Nx12, where N represents the number of rows in the fluorescence intensity sequence and 12 represents the number of columns in the fluorescence intensity sequence.
[0067] In some embodiments of this application, the electronic device performs a lookup operation on the dense matrix of the embedding module according to the index corresponding to each intensity data in the fluorescence intensity sequence to obtain a vector corresponding to each intensity data. The number of element values in each vector is the dimension of the dense matrix in the second dimension. Based on the vector corresponding to each intensity data, a feature tensor is determined. The feature tensor has a first dimension, a second dimension, and a third dimension. The dimension of the feature tensor in the first dimension represents the number of nucleic acid sequence clusters to be tested, the dimension in the second dimension represents the number of multiple fluorescence images, and the dimension in the third dimension is the dimension of the dense matrix in the second dimension.
[0068] A dense matrix can also be called an embedding matrix or a lookup table. The shape of a dense matrix can be determined by its first dimension and its second dimension. Each intensity data point in the fluorescence intensity sequence can correspond to a row index or column index in the dense matrix, and the vector corresponding to each intensity data point is the vector corresponding to the row index or column index in the dense matrix. For example, if the dense matrix has a first dimension of M and a second dimension of d, then the shape of the dense matrix can be M x d. M can be 2000, and d can be 64. The element values in the dense matrix can be preset.
[0069] The electronic device can combine or concatenate the vectors corresponding to each intensity data point based on their position or arrangement order within the fluorescence intensity sequence to obtain a feature tensor. The shape of the feature tensor can be determined by the number of its first, second, and third dimensions.
[0070] For example, continuing with the above embodiments, if the shape of the fluorescence intensity sequence is Nx12 and the shape of the dense matrix is Mxd, searching the dense matrix based on each intensity data in the fluorescence intensity sequence yields a d-dimensional vector corresponding to each intensity data. Concatenating the d-dimensional vectors of the Nx12 intensity data yields a feature tensor X of shape Nx12xd. e In this context, N represents the batch size of the feature tensor, 12 represents the sequence length of the feature tensor, and d represents the dimension of the feature tensor in the second dimension. A d-dimensional vector can represent a vector with d elements.
[0071] Figure 7 illustrates the generation of a feature tensor according to an embodiment of this application. In Figure 7, a 2x12 fluorescence intensity sequence is input to the embedding module. The module searches for the row index corresponding to each intensity data point in the dense matrix of the fluorescence intensity sequence. The shape of the dense matrix is 2000xd. The row index corresponding to each intensity data point is then used as the vector corresponding to that intensity data point in the d-dimensional vector of the dense matrix. Since there are a total of 2x12 intensity data points in the fluorescence intensity sequence, 2x12 d-dimensional vectors are obtained. Based on the order of each intensity data point in the fluorescence intensity sequence, these 2x12 d-dimensional vectors are concatenated to obtain a feature tensor X of shape 2x12xd. e .
[0072] In this embodiment, by performing feature mapping on the fluorescence intensity sequence, each intensity data in the fluorescence intensity sequence can be mapped to a vector in a dense matrix, thereby increasing the dimension of the feature tensor and providing a foundation for encoding using a multi-head attention mechanism as described below.
[0073] S13 uses the multi-head attention-based encoding module in the base recognition model to encode the feature tensor, thus obtaining the encoded tensor.
[0074] In some embodiments of this application, the encoding module includes a multi-head attention layer corresponding to a multi-head attention mechanism.
[0075] In other embodiments of this application, in addition to the multi-head attention layer, the encoding module may also include a residual network layer and a feedforward neural network layer. For example, Figure 8 shows a schematic diagram of the structure of an encoding module provided in an embodiment of this application. In Figure 8, the encoding module includes a multi-head attention mechanism, a first residual network layer, a regularization layer, a feedforward neural network layer, and a second residual network layer. The first residual network layer is used to perform residual concatenation between the input data and output data of the multi-head attention mechanism, and the second residual network layer is used to perform residual concatenation between the input data and output data of the feedforward neural network layer to obtain an encoded tensor.
[0076] The above description of the coding module's structure does not constitute a limitation on the coding module. In practical applications, the coding module may also include other network layers such as linear layers.
[0077] For example, the electronic device encodes the feature tensor using the multi-head attention-based encoding module in the base recognition model to obtain an encoded tensor, including: performing fully connected operations on the feature tensor using multiple fully connected layers in the multi-head attention layer to obtain multiple operational tensors corresponding to the feature tensor; splitting the dimension of each operational tensor in the third dimension according to the number of multi-head attention mechanisms in the multi-head attention layer to obtain multiple split tensors corresponding to each operational tensor, each split tensor corresponding to one attention mechanism; performing self-attention calculation on the split tensors corresponding to each attention mechanism in the multiple operational tensors to obtain the self-attention tensor corresponding to each attention mechanism, obtaining the self-attention tensor corresponding to the multi-head attention mechanism; concatenating the self-attention tensors corresponding to the multi-head attention mechanism to obtain a concatenated tensor; performing residual connections on the feature tensor and the concatenated tensor using a residual network layer; and inputting the tensor after residual connection to a feedforward neural network layer to obtain the encoded tensor.
[0078] The multiple operational tensors can include query tensors, key tensors, and value tensors. The query tensors, key tensors, and value tensors have the same dimensions and number of features as the feature tensors. The number of features in the second dimension can be a multiple of the number of multi-head attention mechanisms.
[0079] For example, if the feature tensor If the shape is Nx12xd, then the methods for obtaining the query tensor, key tensor, and value tensor can be referred to the following formulas (1) to (3):
[0080] Where Q represents the query tensor, K represents the key tensor, V represents the value tensor, FC represents the full join operation, and the shape of the query tensor, key tensor, and value tensor is Nx12xd.
[0081] Multiple split tensors Q are obtained after splitting the query tensor. n_heads The multiple split tensors K obtained after splitting the key tensor. n_heads and the multiple split tensors V obtained after splitting the value tensor. n_heads All shapes are Here, n_heads represents the number of multi-head attention mechanisms.
[0082] This application does not limit the number of multi-head attention mechanisms. For example, the number of multi-head attention mechanisms can be 8. The calculation method for the self-attention tensor corresponding to the multi-head attention mechanism can be found in formula (4):
[0083] Among them, Attention n_headsThis represents the self-attention tensor corresponding to the multi-head attention mechanism, with shape [formula missing]. Q n_heads K represents the multiple split tensors obtained after splitting the query tensor. n_heads T This represents the transpose of multiple split tensors obtained by splitting the key tensor, n_heads represents the number of multi-head attention mechanisms, d represents the dimension of the feature tensor in the second dimension, and V n_heads This represents multiple split tensors obtained after splitting a value tensor.
[0084] In some embodiments of this application, the self-attention tensors corresponding to the multi-head attention mechanism can be concatenated along the third dimension to obtain a concatenated tensor. For example, the concatenation method of the concatenated tensor can refer to formula (5): Attention = Concat(Attention n_heads (5)
[0085] Where Attention represents the concatenation tensor with shape Nx12xd, Concat represents the concatenation operation, and Attention... n_heads This represents the self-attention tensor corresponding to the multi-head attention mechanism.
[0086] In some embodiments of this application, the feedforward neural network layer may include multiple fully connected layers, which are connected by activation functions. For example, the feedforward neural network layer may include two fully connected layers, which are connected by the activation function GELU.
[0087] In some embodiments of this application, the method of using a residual network layer to perform residual connections on a feature tensor and a concatenated tensor, and then inputting the residually connected tensor into a feedforward neural network layer to obtain an encoded tensor includes: performing a residual connection on a feature tensor and a concatenated tensor using a first residual network layer to obtain a first residual tensor; inputting the residually connected tensor into a feedforward neural network layer; and performing a residual connection between the feedforward neural network layer and the first residual tensor using a second residual network layer to obtain an encoded tensor X. encoder The shape is Nx12xd.
[0088] In this embodiment, because the multi-head attention mechanism has powerful feature extraction and data association capabilities, it can comprehensively capture the features of the data and the relationships between the data. Therefore, by using the multi-head attention mechanism to encode the feature tensor, it is possible to accurately extract the features of the nucleic acid sequence cluster to be tested related to the base category in multiple fluorescence images, as well as the hidden information such as the relationship between the intensity data of the nucleic acid sequence cluster to be tested in multiple fluorescence images, such as the distribution pattern and positional relationship between the intensity data, thereby improving the accuracy of the encoded tensor.
[0089] In other embodiments, the electronic device encodes a feature tensor using a multi-head attention-based encoding module in a base recognition model to obtain an encoded tensor. This process includes: performing fully connected operations on the feature tensor using multiple fully connected layers in the multi-head attention layer to obtain multiple operational tensors corresponding to the feature tensor; performing self-attention calculations on the multiple operational tensors using a per-head attention mechanism to obtain a self-attention tensor corresponding to the per-head attention mechanism; concatenating the self-attention tensors corresponding to the multi-head attention mechanism to obtain a concatenated tensor; performing a linear transformation on the concatenated tensor to obtain a linear tensor; performing a residual connection on the feature tensor and the linear tensor using a residual network layer; and inputting the residually connected tensor into a feedforward neural network layer to obtain the encoded tensor.
[0090] The method for performing self-attention computation on the multiple computational tensors using a per-head attention mechanism can refer to relevant techniques. A linear layer (e.g., a fully connected layer) can be used to perform a linear transformation on the concatenated tensors to obtain linear tensors. Linear layers can be obtained through training.
[0091] S14, the classification module of the base recognition model classifies the multiple nucleic acid sequence clusters to be tested according to the coding tensor to obtain the base category corresponding to the multiple nucleic acid sequence clusters to be tested in the sequencing cycle.
[0092] In some embodiments of this application, the electronic device may use a classification module to map an encoded tensor to multiple preset base categories, obtain the confidence level of the encoded tensor corresponding to each preset base category, and determine the base category from the multiple preset base categories based on the confidence level.
[0093] The classification module can include fully connected layers and classification functions such as softmax. When there are multiple nucleic acid sequence clusters to be tested, the weight matrix in the fully connected layer is multiplied by the encoding tensor to obtain a multiplication vector. This multiplication vector is then added to the bias vector in the fully connected layer to obtain a feature vector. Each column element in the weight matrix can correspond to a preset base category, so that the feature vector includes the original score (logits) of each nucleic acid sequence cluster belonging to each preset base category. The original scores of each nucleic acid sequence cluster belonging to each preset base category are calculated using classification functions such as softmax to obtain the confidence / probability of each nucleic acid sequence cluster corresponding to each preset base category.
[0094] Multiple preset base categories can be four base categories: A, T, C, and G. The base category corresponding to each nucleic acid sequence cluster to be tested can be the preset base category corresponding to the highest confidence level, thus obtaining the base category corresponding to each nucleic acid sequence cluster to be tested. The base categories can be output in the form of a vector. For example, continuing with the above embodiment, if the shape of the encoding tensor is Nx12xd, the shape of the vector corresponding to the base category can be Nx1.
[0095] For example, the confidence / probability of the nucleic acid sequence cluster to be tested corresponding to each preset base class can be calculated using formula (6):
[0096] Among them, P i z represents the confidence level of the nucleic acid sequence cluster to be tested corresponding to the i-th preset base category. i The z represents the original score of the cluster of nucleic acid sequences to be tested in the feature vector corresponding to the i-th preset base category, C represents the number of multiple preset base categories, and z j This represents the original score of the cluster of nucleic acid sequences to be tested in the feature vector corresponding to the j-th preset base category.
[0097] Experiments showed that using DBSCAN for base identification of nucleic acid sequence clusters yielded a mapping rate of 84.77% and an average error rate of 1.65% for the base categories of the target nucleic acid sequence clusters on the reference genome. Using the base identification method provided in this application, the mapping rate of the base categories of the target nucleic acid sequence clusters on the reference genome was 88.83%, with an average error rate of 0.89%. Therefore, using the base identification method provided in this application can significantly improve the mapping rate of the base categories of the target nucleic acid sequence clusters on the reference genome (88.83%) and reduce the average error rate to 0.89%.
[0098] In the base identification method provided in this application embodiment, feature mapping can increase the dimensionality of the feature tensor, providing a foundation for encoding using a multi-head attention mechanism. Since the multi-head attention mechanism has powerful feature extraction and data association capabilities, it can comprehensively capture the features of the data and the relationships between them. Therefore, by encoding the feature tensor using the multi-head attention mechanism, the features of the intensity data related to the base category of the nucleic acid sequence cluster to be tested in multiple fluorescence images, as well as the relationships between the intensity data corresponding to the nucleic acid sequence cluster to be tested in multiple fluorescence images, such as the distribution pattern and positional relationships between the intensity data, can be accurately extracted, thereby improving the accuracy of the encoded tensor. Through the classification module, the nucleic acid sequence cluster to be tested can be classified according to the encoded tensor, thereby accurately identifying the base category corresponding to each nucleic acid sequence cluster to be tested.
[0099] Figure 9 shows a flowchart of a base recognition model training method provided in one embodiment of this application. The order of the steps in this flowchart can be adjusted according to different needs, and some steps can be omitted. The base recognition model training is applied to electronic devices.
[0100] S901, acquire multiple training samples and the base category label corresponding to each training sample, wherein each training sample includes multiple fluorescence images of multiple nucleic acid sequence clusters to be tested during one sequencing cycle of gene sequencing.
[0101] In some embodiments of this application, the same coordinates among multiple fluorescent sample images in each training sample correspond to the same cluster of nucleic acid sequences to be tested. Multiple training samples can correspond to different sequencing cycles. For example, fluorescent sample images from 15 sequencing cycles of a gene sequencing process can be obtained as training samples.
[0102] In some embodiments of this application, the description of the nucleic acid sequence clusters to be tested and the fluorescence sample images can be referred to step S11. The base category label can indicate the actual base category corresponding to the nucleic acid sequence cluster to be tested. When there are multiple nucleic acid sequence clusters to be tested, each nucleic acid sequence cluster to be tested has a corresponding base category label.
[0103] S902, based on the multiple training samples and the base category label corresponding to each training sample, supervised classification training is performed on the pre-trained classification model to obtain the base recognition model.
[0104] In some embodiments of this application, during the pre-training phase, a pre-trained classification network is trained using training samples to obtain a pre-trained classification model. During the supervised classification training phase, the classification module and the pre-trained classification model are trained using training samples with corresponding base category labels to obtain a base recognition model.
[0105] The classification network can include an embedding module, an encoding module, and a decoding module. During the pre-training phase, the encoding module and the decoding module are trained. The encoding module and the decoding module have the same structure. The pre-trained classification model can include the embedding module and the trained encoding module.
[0106] In other embodiments, during the pre-training phase, while training the encoding and decoding modules, the embedding module can also be trained. The pre-trained classification model can include the pre-trained embedding module and the pre-trained encoding module.
[0107] In one example, during the supervised classification training phase, the classification module and the encoding module can be trained to obtain a base recognition model. The base recognition model may include an embedding module and the encoding module and classification module trained under supervised classification.
[0108] In another example, during the supervised classification training phase, the embedding module, the classification module, and the encoding module can be trained to obtain a base recognition model. The base recognition model can include the embedding module, the encoding module, and the classification module after supervised classification training.
[0109] For a description of the structure of the embedding module, encoding module, and classification module, please refer to steps S12, S13, and S14, respectively. The above description of the structure of the classification network and base recognition model does not constitute a limitation on the classification network. For example, the classification network and base recognition model may also include other types of network layers, such as pooling layers and sampling layers.
[0110] In some embodiments of this application, the training samples used in the pre-training stage and the supervised classification training stage may be the same or different. To clearly illustrate the training process in the pre-training stage and the supervised classification training stage, the following description will use the same training samples in both stages as an example.
[0111] The base recognition model training method in this application training method trains the pre-trained classification model based on training samples with corresponding base category labels, which can improve the training accuracy of the base recognition model and thus improve the accuracy of the trained base recognition model in recognizing the base category of the nucleic acid sequence cluster to be tested.
[0112] Figure 10 shows a flowchart of a training method for a classification network provided in an embodiment of this application, including the following steps:
[0113] S101, through the embedding module in the preset classification network, feature mapping is performed on the sample intensity sequence corresponding to each training sample to obtain the first feature tensor corresponding to each training sample.
[0114] In some embodiments of this application, the sample intensity sequence has a first dimension and a second dimension. The dimension of the sample intensity sequence in the first dimension is the number of nucleic acid sequence clusters to be tested, and the dimension in the second dimension is the number of fluorescence images in each training sample.
[0115] For a description of the sample intensity sequence, please refer to the description of the fluorescence intensity sequence in step S12. For a description of the method for obtaining the first feature tensor corresponding to each training sample, please refer to the method for obtaining the feature tensor in step S12. This application will not repeat the description.
[0116] S102, perform occlusion processing on the intensity data of the preset dimension in the sample feature tensor corresponding to each training sample to obtain the first occluded tensor and the first unoccluded tensor corresponding to each training sample.
[0117] In some embodiments of this application, the preset dimension can be a second dimension. If the sample feature tensor has multiple dimensions in the second dimension, intensity data in some dimensions of the second dimension can be masked.
[0118] The electronic device performs occlusion processing on the intensity data of a preset dimension in the sample feature tensor corresponding to each training sample, obtaining a first occluded tensor and a first unoccluded tensor corresponding to each training sample, including:
[0119] For the intensity data corresponding to all indices in the second dimension of the sample feature tensor, the intensity data corresponding to the preset index among all the indices are masked to obtain a first masked tensor and a first unmasked tensor, wherein the dimension of the first masked tensor in the second dimension is the number of the preset indexes, and the dimension of the first unmasked tensor in the second dimension is the number of the remaining indices among all the indices other than the preset indexes.
[0120] The preset indices can be customized, and this application does not impose any restrictions on them. For example, if the sample feature tensor has 12 indices in the second dimension, including the 1st, 2nd, 3rd, 4th, 5th...11th and 12th indices, the preset indices can be the 1st, 3rd, 4th, 6th, 8th, 9th, 10th and 12th indices, and the intensity data corresponding to the preset indices is masked. If the number of preset indices is 8, the number of other indices besides the preset indices in the 12 indices is 4 = 12 - 8, resulting in a first masking tensor with shape Nx8xd. and the first unmasked tensor of Nx4xd
[0121] Figure 11 is a schematic diagram of masking processing of sample feature tensors according to an embodiment of this application. In Figure 11, the black in the sample feature tensor represents the masking processing of the intensity data corresponding to the 1st, 3rd, 4th, 6th, 8th, 9th, 10th and 12th samples.
[0122] The masking process can involve replacing the intensity data corresponding to a preset index in the second dimension of the sample feature tensor with a preset threshold, or masking the intensity data corresponding to the preset index in the second dimension of the sample feature tensor, so that the intensity data corresponding to the preset index in the second dimension does not participate in subsequent encoding operations. The preset threshold can be customized, and this application does not impose any restrictions on it. For example, the preset threshold can be zero.
[0123] In this embodiment, by masking the intensity data corresponding to the preset index of the second dimension of the sample feature tensor, the intensity data in the first unmasked tensor can be reduced, so that when encoding the first unmasked tensor data in the following process, the amount of data to be encoded and the encoding complexity can be reduced, thereby improving the encoding speed.
[0124] S103, the first unmasked tensor corresponding to each training sample is encoded through the encoding module based on the multi-head attention mechanism in the classification network to obtain the first encoded tensor corresponding to each training sample.
[0125] In some embodiments of this application, the method for obtaining the first encoding tensor corresponding to each training sample can refer to the method for obtaining the encoding tensor in step S14, and will not be described again in this application.
[0126] S104, the first encoding tensor and the first occlusion tensor corresponding to each training sample are decoded through the decoding module in the classification network to obtain the decoded tensor.
[0127] In some embodiments of this application, the decoding module and the encoding module have the same structure. The electronic device can concatenate the first encoded tensor and the first occlusion tensor corresponding to each training sample to obtain the first target tensor corresponding to each training sample, and use the multi-head attention mechanism in the decoding module to decode the first target tensor corresponding to each training sample to obtain the decoded tensor corresponding to each training sample.
[0128] The electronic device can use various methods to concatenate the first encoding tensor and the first occlusion tensor corresponding to each training sample to obtain the first target tensor corresponding to each training sample. This application does not limit the concatenation method. For example, the concat function can be used to concatenate the first encoding tensor and the first occlusion tensor corresponding to each training sample to obtain the first target tensor corresponding to each training sample.
[0129] The number of multi-head attention mechanisms in the decoding module can be customized, and this application does not impose any restrictions on this. The method for decoding the first target tensor corresponding to each training sample using the multi-head attention mechanism in the decoding module to obtain the decoded tensor corresponding to each training sample can refer to the method for obtaining the encoded tensor in step S13.
[0130] S105, calculate the reconstruction loss based on the first feature tensor and the decoding tensor corresponding to each training sample.
[0131] In some embodiments of this application, the reconstruction loss can be the error between the feature tensor corresponding to each training sample and the decoding tensor. The error can be the average error, mean absolute error, mean square error, root mean square error, etc., and this application does not impose any limitations. For example, the reconstruction loss can be calculated using the following formula (7):
[0132] in, X represents the reconstruction loss. e Let X represent the characteristic tensor. decoder This represents the decoded tensor.
[0133] In this embodiment, the reconstruction loss can reflect the error or gap between the first feature tensor and the decoding tensor corresponding to each training sample.
[0134] S106, Train the classification network based on the reconstruction loss to obtain a pre-trained classification model.
[0135] In one embodiment, the electronic device can adjust the network parameters of the encoding module and the decoding module until the reconstruction loss meets the first convergence condition, then stop adjusting, and determine the embedding module and the trained encoding module as the pre-trained classification model.
[0136] In another embodiment, the electronic device can adjust the network parameters of the embedding module, the encoding module, and the decoding module until the reconstruction loss meets the first convergence condition, then stop adjusting, and determine the trained embedding module and the trained encoding module as the pre-trained classification model.
[0137] The first convergence condition can be customized, and this application does not impose any restrictions on it. For example, the first convergence condition can be that the re-loss falls within a first preset interval. The first preset interval can also be customized, and this application does not impose any restrictions on it. The first preset interval can be 0.1 to 0.2.
[0138] In this embodiment, since the reconstruction loss can reflect the error or gap between the first feature tensor and the decoding tensor corresponding to each training sample, the encoding capability of the encoding module in the pre-trained classification model can be improved by training the classification network through the reconstruction loss.
[0139] Figure 12 shows a flowchart of a training method for a classification model provided in an embodiment of this application, including the following steps:
[0140] S121, Based on the pre-trained classification model, generate the second encoding tensor and the second occlusion tensor corresponding to each training sample.
[0141] In some embodiments of this application, the second encoded tensor corresponding to each training sample is obtained by encoding the second unmasked tensor corresponding to each training sample; the second masked tensor and the second unmasked tensor corresponding to each training sample are obtained by masking the second feature tensor corresponding to each training sample; and the second feature tensor corresponding to each training sample is obtained by feature mapping of each training sample.
[0142] The method for obtaining the second feature tensor corresponding to each training sample can refer to the method for obtaining the feature tensor in step S12. The method for obtaining the second occlusion tensor and the second unocclusion tensor corresponding to each training sample can refer to the method for obtaining the first occlusion tensor and the first unocclusion tensor in step S112. The method for obtaining the second encoding tensor corresponding to each training sample can refer to the method for obtaining the encoding tensor in step S13. This embodiment will not repeat the description.
[0143] S122, using a preset classification module, classify each training sample according to the second encoding tensor and the second masking tensor corresponding to each training sample to obtain the base category corresponding to each training sample.
[0144] In some embodiments of this application, the electronic device can splice the second encoding tensor and the second masking tensor corresponding to each training sample to obtain the second target tensor corresponding to each training sample, and use a classification module to classify the second target tensor corresponding to each training sample to obtain the base category corresponding to each training sample.
[0145] The method of classifying the second target tensor corresponding to each training sample using the classification module to obtain the base category corresponding to each training sample can refer to the method of obtaining the base category corresponding to each nucleic acid sequence cluster to be tested in step S14, and will not be repeated in this application.
[0146] S123, calculate the classification loss based on the base category and base category label corresponding to each training sample.
[0147] In some embodiments of this application, the electronic device can vectorize the base category and base category label corresponding to each training sample to obtain a first vector and a second vector, and calculate the classification loss based on the first vector and the second vector using a preset loss function. This application does not limit the type of the preset loss function. For example, the preset loss function can be the cross-entropy loss function.
[0148] In this embodiment, the base category label can indicate the actual base category corresponding to the nucleic acid sequence cluster to be tested, and the classification loss can reflect the degree of difference or similarity between the actual base category and the identified base category.
[0149] S124, Based on the classification loss, train the classification module and the pre-trained classification model to obtain the base recognition model.
[0150] In one embodiment, the electronic device can adjust the network parameters of the encoding module and the classification module until the classification loss meets the second convergence condition, then stop adjusting, and determine the embedding module and the trained encoding module and classification module as the base recognition model.
[0151] In another embodiment, the electronic device can adjust the network parameters of the embedding module, the encoding module, and the classification module until the classification loss meets the second convergence condition, then stop adjusting and determine the trained embedding module, encoding module, and classification module as the base recognition model.
[0152] The second convergence condition can be customized, and this application does not impose any restrictions on it. For example, the second convergence condition can be that the classification loss is within a second preset interval. The second preset interval can be 0-0.1.
[0153] In this embodiment, since the classification loss reflects the degree of difference or similarity between the actual base category and the identified base category, training the pre-trained classification model using the classification loss can improve the training accuracy of the trained base recognition model, thereby improving the accuracy of the base recognition model in identifying the base category of the nucleic acid sequence cluster to be tested. In other embodiments of this application, other parameters in the training process during the pre-training stage and the supervised classification training stage, such as the learning rate, batch size, and optimizer, can be customized, and this application does not impose any restrictions on them. For example, the learning rate can be 5e-3, the optimizer can be Adam, and the batch size can be 1000.
[0154] Figure 13 shows a functional block diagram of a base recognition device provided in an embodiment of this application. The base recognition device 13 includes an acquisition unit 130, a feature mapping unit 131, an encoding unit 132, and a classification unit 133. The module / unit referred to in this application refers to a series of computer-readable instruction segments that can be acquired by the processor 153 in Figure 15 and can perform a fixed function, stored in the memory 152 in Figure 15. In this embodiment, the functions of each module / unit will be described in detail in subsequent embodiments.
[0155] The acquisition unit 130 is used to acquire multiple fluorescence images of multiple nucleic acid sequence clusters to be tested during a sequencing cycle.
[0156] In some embodiments of this application, the acquisition unit 130 is further configured to: extract the target nucleic acid sequence clusters corresponding to effective pixels and the coordinates of the target nucleic acid sequence clusters based on the preprocessed image corresponding to each fluorescence image, and generate the fluorescence intensity sequence based on the intensity data of the coordinates of the multiple target nucleic acid sequence clusters in the multiple preprocessed images.
[0157] The feature mapping unit 131 is used to perform feature mapping on the fluorescence intensity sequence corresponding to the multiple fluorescence images through the embedding module of the preset base recognition model to obtain a feature tensor.
[0158] In some embodiments of this application, the fluorescence intensity sequence has a first dimension and a second dimension. The dimension of the fluorescence intensity sequence in the first dimension represents the number of nucleic acid sequence clusters to be tested, and the dimension in the second dimension represents the number of multiple fluorescence images. The feature mapping unit 131 is further configured to: perform a lookup operation on the dense matrix of the embedding module according to each intensity data in the fluorescence intensity sequence to obtain a vector corresponding to each intensity data. The number of element values in each vector is the dimension of the dense matrix in the second dimension. Based on the vector corresponding to each intensity data, generate the feature tensor. The feature tensor has a first dimension, a second dimension, and a third dimension. The dimension of the feature tensor in the first dimension represents the number of nucleic acid sequence clusters to be tested, the dimension in the second dimension represents the number of multiple fluorescence images, and the dimension in the third dimension is the dimension of the dense matrix in the second dimension.
[0159] The encoding unit 132 is used to encode the feature tensor through the encoding module based on the multi-head attention mechanism in the base recognition model to obtain the encoded tensor.
[0160] In some embodiments of this application, the encoding module includes a multi-head attention layer corresponding to the multi-head attention mechanism.
[0161] In some embodiments of this application, the encoding module further includes a residual network layer and a feedforward neural network layer. The encoding unit 132 is further configured to: perform fully connected operations on the feature tensor using multiple fully connected layers in the multi-head attention layer to obtain multiple operational tensors corresponding to the feature tensor; split the dimension of each operational tensor in the third dimension according to the number of multi-head attention mechanisms in the multi-head attention layer to obtain multiple split tensors corresponding to each operational tensor, each split tensor corresponding to one head attention mechanism; perform self-attention calculation on the split tensors corresponding to each head attention mechanism in the multiple operational tensors to obtain the self-attention tensor corresponding to each head attention mechanism; concatenate the self-attention tensors corresponding to the multi-head attention mechanisms to obtain a concatenated tensor; perform residual connection on the feature tensor and the concatenated tensor using the residual network layer; and input the tensor after residual connection to the feedforward neural network layer to obtain the encoded tensor.
[0162] In some embodiments of this application, the encoding module further includes a residual network layer and a feedforward neural network layer. The encoding unit 132 is further configured to: perform fully connected operations on the feature tensor using multiple fully connected layers in the multi-head attention layer to obtain multiple operational tensors corresponding to the feature tensor; perform self-attention calculation on the multiple operational tensors using a per-head attention mechanism to obtain a self-attention tensor corresponding to the per-head attention mechanism; concatenate the self-attention tensors corresponding to the multi-head attention mechanism to obtain a concatenated tensor; perform a linear transformation on the concatenated tensor to obtain a linear tensor; perform residual connection on the feature tensor and the linear tensor using the residual network layer; and input the residually connected tensor to the feedforward neural network layer to obtain the encoded tensor.
[0163] The classification unit 133 is used to classify the plurality of nucleic acid sequence clusters to be tested according to the coding tensor through the classification module of the base recognition model, and obtain the base category corresponding to the plurality of nucleic acid sequence clusters to be tested in the sequencing cycle.
[0164] In some embodiments of this application, the classification unit is further configured to: map the encoded tensor to multiple preset base categories using the classification module, obtain the confidence level of the encoded tensor corresponding to each preset base category, and determine the base category from the multiple preset base categories based on the confidence level.
[0165] Figure 14 shows a functional block diagram of a base recognition model training device provided in an embodiment of this application. The base recognition model training device 14 includes an acquisition unit 140 and a training unit 141. The module / unit referred to in this application refers to a series of computer-readable instruction segments that can be acquired by the processor 153 in Figure 15 and can perform a fixed function, and are stored in the memory 152 in Figure 15. In this embodiment, the functions of each module / unit will be described in detail in subsequent embodiments.
[0166] The acquisition unit 140 is used to acquire multiple training samples and the base category label corresponding to each training sample. Each training sample includes multiple fluorescent sample images of the nucleic acid sequence cluster to be tested.
[0167] Training unit 141 is used to perform supervised classification training on the pre-trained classification model based on the multiple training samples and the base category label corresponding to each training sample, so as to obtain a base recognition model.
[0168] In some embodiments of this application, the training unit 141 is further configured to perform feature mapping on the sample intensity sequence corresponding to each training sample through the embedding module in the preset classification network to obtain a first feature tensor corresponding to each training sample; perform masking processing on the intensity data of a preset dimension in the sample feature tensor corresponding to each training sample to obtain a first masked tensor and a first unmasked tensor corresponding to each training sample; encode the first unmasked tensor corresponding to each training sample through the encoding module based on the multi-head attention mechanism in the classification network to obtain a first encoded tensor corresponding to each training sample; decode the first encoded tensor and the first masked tensor corresponding to each training sample through the decoding module in the classification network to obtain a decoded tensor; calculate the reconstruction loss based on the first feature tensor and the decoded tensor corresponding to each training sample; and train the classification network based on the reconstruction loss to obtain the pre-trained classification model.
[0169] In some embodiments of this application, the sample intensity sequence has a first dimension and a second dimension. The dimension of the sample intensity sequence in the first dimension is the number of nucleic acid sequence clusters to be tested, and the dimension in the second dimension is the number of fluorescence images in each training sample. The training unit 141 is further configured to: perform masking processing on the intensity data corresponding to a preset index among all the indices of the sample feature tensor in the second dimension, to obtain a first masked tensor and a first unmasked tensor, wherein the dimension of the first masked tensor in the second dimension is the number of the preset indexes, and the dimension of the first unmasked tensor in the second dimension is the number of the remaining indices other than the preset indexes among all the indices.
[0170] In some embodiments of this application, the training unit 141 is further configured to: generate a second encoding tensor and a second occlusion tensor corresponding to each training sample based on the pre-trained classification model, wherein the second encoding tensor is obtained by encoding the second unoccluded tensor corresponding to each training sample, the second occlusion tensor and the second unoccluded tensor are obtained by occlusion processing the second feature tensor corresponding to each training sample, the second feature tensor is obtained by feature mapping of each training sample, classify each training sample according to the second encoding tensor and the second occlusion tensor corresponding to each training sample through a preset classification module to obtain the base category corresponding to each training sample, calculate the classification loss according to the base category and base category label corresponding to each training sample, and train the classification module and the pre-trained classification model according to the classification loss to obtain the base recognition model.
[0171] Figure 15 is a schematic diagram of the structure of an electronic device provided in an embodiment of this application. In Figure 15, the electronic device 15 may include a communication module 151, a memory 152, a processor 153, an input / output (I / O) interface 154, and a bus 155. The processor 153 is coupled to the communication module 151, the memory 152, and the input / output interface 154 via the bus 155.
[0172] Communication module 151 may include a wired communication module and / or a wireless communication module. The wired communication module may provide one or more wired communication solutions such as Universal Serial Bus (USB) and Controller Area Network (CAN). The wireless communication module may provide one or more wireless communication solutions such as Wireless Fidelity (Wi-Fi), Bluetooth (BT), mobile communication networks, frequency modulation (FM), near field communication (NFC), and infrared (IR).
[0173] Memory 152 may include one or more random access memory (RAM) and one or more non-volatile memory (NVM). The RAM can be directly read and written by the processor 153, and can be used to store executable programs (e.g., machine instructions) of other running programs, as well as user and application data. The RAM may include static random-access memory (SRAM), dynamic random-access memory (DRAM), synchronous dynamic random-access memory (SDRAM), double data rate synchronous dynamic random-access memory (DDR SDRAM), etc.
[0174] Non-volatile memory can also store executable programs and user and application data, and can be pre-loaded into random access memory for direct reading and writing by the processor 153. Non-volatile memory can include disk storage devices and flash memory. For example, flash memory can be Nand Flash.
[0175] The memory 152 is used to store one or more computer programs. The one or more computer programs are configured to be executed by the processor 153. The one or more computer programs include multiple instructions, which, when executed by the processor 153, can implement a base recognition method and a base recognition model training method that are executed on the electronic device 15.
[0176] In other embodiments, the electronic device 15 shown in FIG15 further includes an external memory interface for connecting to an external memory to expand the storage capacity of the electronic device 15.
[0177] Processor 153 may include one or more processing units, such as application processor (AP), modem processor, graphics processing unit (GPU), image signal processor (ISP), controller, video codec, digital signal processor (DSP), and / or neural network processing unit (NPU). These different processing units may be independent devices or integrated into one or more processors.
[0178] The processor 153 provides computing and control capabilities. For example, the processor 153 is used to execute computer programs stored in the memory 152 to implement the base recognition method and the base recognition model training method described above.
[0179] Input / output interface 154 is used to provide a channel for user input or output. For example, input / output interface 154 can be used to connect various input / output devices, such as mouse, keyboard, touch device, display screen, etc., so that users can enter information or visualize information.
[0180] Bus 155 is used at least to provide a channel for communication between communication modules 151, memory 152, processor 153, and input / output interface 154 in electronic device 15.
[0181] It is understood that the structures illustrated in the embodiments of this application do not constitute a specific limitation on the electronic device 15. In other embodiments of this application, the electronic device 15 may include more or fewer components than illustrated, or combine some components, or split some components, or have different component arrangements. The illustrated components may be implemented in hardware, software, or a combination of software and hardware.
[0182] This application also provides a computer-readable storage medium storing a computer program, which includes program instructions. When the program instructions are executed, the method implemented can refer to the methods in the above embodiments of this application.
[0183] The computer-readable storage medium can be the internal memory of the electronic device described in the above embodiments, such as the hard disk or memory of the electronic device. Alternatively, the computer-readable storage medium can be an external storage device of the electronic device, such as a plug-in hard disk, smart media card (SMC), secure digital (SD) card, flash card, etc., provided on the electronic device.
[0184] In some embodiments, a computer-readable storage medium may include a stored program area and a stored data area, wherein the stored program area may store an operating system, an application program required for at least one function, etc.; and the stored data area may store data created based on the use of the electronic device, etc.
[0185] It is understood that the structures illustrated in the embodiments of this application do not constitute a specific limitation on the electronic device. In other embodiments of this application, the electronic device may include more or fewer components than illustrated, or combine some components, or split some components, or have different component arrangements. The illustrated components may be implemented in hardware, software, or a combination of software and hardware.
[0186] In the several embodiments provided in this application, it should be understood that the disclosed systems, apparatuses, and methods can be implemented in other ways. For example, the apparatus embodiments described above are merely illustrative; for instance, the division of modules is only a logical functional division, and other division methods may be used in actual implementation.
[0187] The modules described as separate components may or may not be physically separate. The components shown as modules may or may not be physical units; that is, they may be located in one place or distributed across multiple network units. Some or all of the modules can be selected to achieve the purpose of this embodiment according to actual needs.
[0188] Furthermore, the functional modules in the various embodiments of this application can be integrated into one processing unit, or each unit can exist physically separately, or two or more units can be integrated into one unit. The integrated unit can be implemented in hardware or in the form of hardware plus software functional modules.
[0189] Therefore, the embodiments should be considered exemplary and non-limiting in all respects, and the scope of this application is defined by the appended claims rather than the foregoing description. Thus, all variations falling within the meaning and scope of equivalents of the claims are intended to be embraced within this application. No appended diagram markings in the claims should be construed as limiting the scope of the claims.
[0190] Furthermore, it is clear that the word "comprising" does not exclude other units or steps, and the singular does not exclude the plural. Multiple units or devices described in this application may also be implemented by a single unit or device through software or hardware. The terms "first," "second," etc., are used to indicate names and do not indicate any specific order.
[0191] Finally, it should be noted that the above embodiments are only used to illustrate the technical solutions of this application and are not intended to limit it. Although this application has been described in detail with reference to preferred embodiments, those skilled in the art should understand that modifications or equivalent substitutions can be made to the technical solutions of this application without departing from the spirit and scope of the technical solutions of this application.
Claims
1. A base recognition method, characterized in that, The base recognition method includes: Acquire multiple fluorescence images of multiple nucleic acid sequence clusters to be tested during one sequencing cycle of gene sequencing; The feature tensor is obtained by feature mapping of the fluorescence intensity sequence corresponding to the multiple fluorescence images through the embedding module of the preset base recognition model. The feature tensor is encoded using the multi-head attention-based encoding module in the base recognition model to obtain the encoded tensor. The classification module of the base recognition model classifies the plurality of nucleic acid sequence clusters to be tested according to the coding tensor, thereby obtaining the base category corresponding to the plurality of nucleic acid sequence clusters to be tested in the sequencing cycle.
2. The base recognition method as described in claim 1, characterized in that, The generation of the fluorescence intensity sequence includes: Based on the preprocessed image corresponding to each fluorescence image, extract the nucleic acid sequence clusters to be tested corresponding to the effective pixels and the coordinates of the nucleic acid sequence clusters to be tested; The fluorescence intensity sequence is generated based on the intensity data of the coordinates of the multiple nucleic acid sequence clusters to be tested in multiple preprocessed images.
3. The base recognition method as described in claim 1, characterized in that, The fluorescence intensity sequence has a first dimension and a second dimension. The dimension of the fluorescence intensity sequence in the first dimension represents the number of nucleic acid sequence clusters to be tested, and the dimension in the second dimension represents the number of multiple fluorescence images. The feature tensor obtained by feature mapping of the fluorescence intensity sequence corresponding to the multiple fluorescence images through the embedding module of the preset base recognition model includes: Based on each intensity data in the fluorescence intensity sequence, a lookup table operation is performed on the dense matrix of the embedding module to obtain a vector corresponding to each intensity data. The number of element values in each vector is the dimension of the dense matrix in the second dimension. Based on the vector corresponding to each intensity data, a feature tensor is generated. The feature tensor has a first dimension, a second dimension, and a third dimension. The dimension of the feature tensor in the first dimension represents the number of nucleic acid sequence clusters to be tested, the dimension in the second dimension represents the number of multiple fluorescence images, and the dimension in the third dimension is the dimension of the dense matrix in the second dimension.
4. The base recognition method as described in claim 3, characterized in that, The encoding module includes a multi-head attention layer corresponding to the multi-head attention mechanism.
5. The base recognition method as described in claim 4, characterized in that, The encoding module further includes a residual network layer and a feedforward neural network layer. The feature tensor is encoded using the multi-head attention-based encoding module in the base recognition model to obtain an encoded tensor including: The feature tensor is subjected to fully connected operations by multiple fully connected layers in the multi-head attention layer to obtain multiple operational tensors corresponding to the feature tensor. Based on the number of multi-head attention mechanisms in the multi-head attention layer, each operational tensor is split in the dimension of the third dimension to obtain multiple split tensors corresponding to each operational tensor, and each split tensor corresponds to one attention mechanism. Self-attention calculation is performed on the split tensor corresponding to each head attention mechanism in the plurality of operational tensors to obtain the self-attention tensor corresponding to each head attention mechanism; By concatenating the self-attention tensors corresponding to the multi-head attention mechanism, a concatenated tensor is obtained; The feature tensor and the concatenated tensor are residually connected using the residual network layer, and the residually connected tensor is input into the feedforward neural network layer to obtain the encoded tensor.
6. The base recognition method as described in claim 4, characterized in that, The encoding module further includes a residual network layer and a feedforward neural network layer. The feature tensor is encoded using the multi-head attention-based encoding module in the base recognition model to obtain an encoded tensor including: The feature tensor is subjected to fully connected operations by multiple fully connected layers in the multi-head attention layer to obtain multiple operational tensors corresponding to the feature tensor. Self-attention computation is performed on the multiple operational tensors using a per-head attention mechanism to obtain the self-attention tensor corresponding to the per-head attention mechanism; By concatenating the self-attention tensors corresponding to the multi-head attention mechanism, a concatenated tensor is obtained; Perform a linear transformation on the spliced tensor to obtain a linear tensor; The feature tensor and the linear tensor are residually connected using the residual network layer, and the residually connected tensor is input into the feedforward neural network layer to obtain the encoded tensor.
7. The base recognition method as described in claim 1, characterized in that, The classification module of the base recognition model classifies the plurality of nucleic acid sequence clusters to be tested according to the coding tensor, and obtains the base categories corresponding to the plurality of nucleic acid sequence clusters to be tested in the sequencing cycle, including: The classification module is used to map the encoded tensor to multiple preset base categories to obtain the confidence level of the encoded tensor for each preset base category; The base category is determined from the plurality of preset base categories based on the confidence level.
8. A method for training a base recognition model, characterized in that, The training method for the base recognition model includes: Multiple training samples and base category labels corresponding to each training sample are obtained. Each training sample includes multiple fluorescence images of multiple nucleic acid sequence clusters to be tested during one sequencing cycle of gene sequencing. Based on the multiple training samples and the base category label corresponding to each training sample, the pre-trained classification model is subjected to supervised classification training to obtain a base recognition model.
9. The base recognition model training method as described in claim 8, characterized in that, The generation of the pre-trained classification model includes: By using the embedding module in the preset classification network, feature mapping is performed on the sample intensity sequence corresponding to each training sample to obtain the first feature tensor corresponding to each training sample. The intensity data of a preset dimension in the sample feature tensor corresponding to each training sample is masked to obtain the first masked tensor and the first unmasked tensor corresponding to each training sample. The first unmasked tensor corresponding to each training sample is encoded through the multi-head attention-based encoding module in the classification network to obtain the first encoded tensor corresponding to each training sample. The decoding module in the classification network decodes the first encoding tensor and the first occlusion tensor corresponding to each training sample to obtain the decoded tensor. The reconstruction loss is calculated based on the first feature tensor and the decoding tensor corresponding to each training sample. The classification network is trained based on the reconstruction loss to obtain the pre-trained classification model.
10. The base recognition model training method as described in claim 9, characterized in that, The sample intensity sequence has a first dimension and a second dimension. The dimension of the sample intensity sequence in the first dimension is the number of nucleic acid sequence clusters to be tested, and the dimension in the second dimension is the number of fluorescence images in each training sample. The step of masking the intensity data of the preset dimension in the sample feature tensor corresponding to each training sample to obtain the first masked tensor and the first unmasked tensor corresponding to each training sample includes: For the intensity data corresponding to all indices of the sample feature tensor in the second dimension, the intensity data corresponding to the preset index among all the indices is masked to obtain the first masked tensor and the first unmasked tensor, wherein the dimension of the first masked tensor in the second dimension is the number of the preset indexes, and the dimension of the first unmasked tensor in the second dimension is the number of the remaining indices among all the indices other than the preset indexes.
11. The base recognition model training method as described in claim 8, characterized in that, The step of performing supervised classification training on the pre-trained classification model based on the multiple training samples and the base category label corresponding to each training sample to obtain the base recognition model includes: Based on the pre-trained classification model, a second encoding tensor and a second occlusion tensor are generated for each training sample. The second encoding tensor is obtained by encoding the second unoccluded tensor corresponding to each training sample. The second occlusion tensor and the second unoccluded tensor are obtained by occlusion processing the second feature tensor corresponding to each training sample. The second feature tensor is obtained by feature mapping for each training sample. The pre-defined classification module classifies each training sample based on the second encoding tensor and the second masking tensor corresponding to each training sample, thereby obtaining the base category corresponding to each training sample. The classification loss is calculated based on the base category and base category label corresponding to each training sample. Based on the classification loss, the classification module and the pre-trained classification model are trained to obtain the base recognition model.
12. A base recognition device, characterized in that, The base recognition device includes: The acquisition unit is used to acquire multiple fluorescence images of multiple nucleic acid sequence clusters to be tested during a single sequencing cycle. The feature mapping unit is used to perform feature mapping on the fluorescence intensity sequence corresponding to the multiple fluorescence images through the embedding module of the preset base recognition model to obtain a feature tensor; The encoding unit is used to encode the feature tensor through the multi-head attention-based encoding module in the base recognition model to obtain the encoded tensor; A classification unit is used to classify the plurality of nucleic acid sequence clusters to be tested according to the coding tensor through the classification module of the base recognition model, and obtain the base category corresponding to the plurality of nucleic acid sequence clusters to be tested in the sequencing cycle.
13. A base recognition model training device, characterized in that, The base recognition model training device includes: The acquisition unit is used to acquire multiple training samples and the base category label corresponding to each training sample. Each training sample includes multiple fluorescent sample images of the nucleic acid sequence cluster to be tested. The training unit is used to perform supervised classification training on the pre-trained classification model based on the multiple training samples and the base category label corresponding to each training sample, so as to obtain a base recognition model.
14. An electronic device, characterized in that, The electronic device includes: Memory, storing at least one instruction; and The processor executes at least one instruction to implement the base recognition method as described in any one of claims 1 to 7, or to implement the base recognition model training method as described in any one of claims 8 to 11.
15. A computer-readable storage medium, characterized in that, The computer-readable storage medium stores a computer program that, when executed by a processor in an electronic device, implements the base recognition method as described in any one of claims 1 to 7, or implements the base recognition model training method as described in any one of claims 8 to 11.