Mass spectrometry data neural network model and training method

By processing feature data through the input layer, convolutional layer, and SE attention layer of the mass spectrometry data neural network model, and combining data augmentation and validation processes, the problems of low training efficiency and insufficient feature channel recognition in existing technologies are solved, achieving higher classification accuracy and training efficiency.

CN122287705APending Publication Date: 2026-06-26AFFILIATED YONGCHUAN HOSPITAL OF CHONGQING MEDICAL UNIV

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
AFFILIATED YONGCHUAN HOSPITAL OF CHONGQING MEDICAL UNIV
Filing Date
2025-12-17
Publication Date
2026-06-26

AI Technical Summary

Technical Problem

Existing mass spectrometry data analysis methods suffer from low training efficiency when processing high-dimensional sequence data, and traditional CNN models cannot automatically identify important feature channels, resulting in poor performance in classification tasks.

Method used

A neural network model for mass spectrometry data is adopted, which processes feature data through an input layer, multiple convolutional layers and SE attention layers, and updates model parameters using backpropagation. Combined with data augmentation and validation processes, it automatically identifies important channels to improve classification accuracy and training efficiency.

Benefits of technology

It improves the classification accuracy and training efficiency of the neural network model for mass spectrometry data, effectively extracts key features, automatically identifies important channels, and enhances the model's generalization ability in drug-resistant bacteria classification tasks.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN122287705A_ABST
    Figure CN122287705A_ABST
Patent Text Reader

Abstract

This application provides a mass spectrometry data neural network model and training method. The training method includes: inputting first feature data into the input layer of the mass spectrometry data neural network model to be trained to obtain second feature data; inputting the second feature data into multiple convolutional layers and corresponding SE attention layers to output seventh feature data; flattening the seventh feature data into a one-dimensional feature vector to obtain eighth feature data; inputting the eighth feature data into a classifier to obtain a first prediction result; calculating the loss between the first prediction result and the true class label, and updating the model parameters through backpropagation; inputting preset validation data into the trained mass spectrometry data neural network model and outputting validation results; determining whether the trained mass spectrometry data neural network model is the optimal model based on the validation results; if so, saving the currently trained mass spectrometry data neural network model as the optimal model. This application can improve the classification accuracy and training efficiency of the model.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This application relates to the field of deep learning technology, and more specifically, to a mass spectrometry data neural network model and training method. Background Technology

[0002] Currently, existing mass spectrometry data analysis methods often face the following problems when processing high-dimensional sequence data: on the one hand, they rely on a large amount of training data and a complex training process, resulting in low training efficiency of the model; on the other hand, traditional CNN (Convolutional Neural Networks) models cannot automatically identify which feature channels are more important, resulting in less than ideal performance of the model in classification tasks.

[0003] Therefore, how to solve the above problems is an urgent issue that needs to be addressed. Summary of the Invention

[0004] This application provides a mass spectrometry data neural network model and training method, which can improve the classification accuracy and training efficiency of the model.

[0005] In a first aspect, this application provides a method for training a neural network model for mass spectrometry data, the method comprising: The first feature data is input into the input layer of the mass spectrometry data neural network model to be trained to obtain the second feature data; The second feature data is input into multiple convolutional layers and corresponding SE attention layers of the mass spectrometry data neural network model to be trained, and the seventh feature data is output. The seventh feature data is flattened into a one-dimensional feature vector to obtain the eighth feature data; The eighth feature data is input into the classifier of the mass spectrometry data neural network model to be trained to obtain the first prediction result; Calculate the loss between the first prediction result and the true category label, and update the model parameters of the mass spectrometry data neural network model through backpropagation; Input preset verification data into the trained mass spectrometry data neural network model, and output the verification results; Based on the verification results, determine whether the trained mass spectrometry data neural network model is the optimal mass spectrometry data neural network model; If so, save the currently trained mass spectrometry data neural network model as the optimal mass spectrometry data neural network model.

[0006] In one possible embodiment, before inputting the first feature data into the input layer of the mass spectrometry data neural network model to be trained to obtain the second feature data, the method further includes: The training samples are normalized and data augmented to obtain the augmented first feature data.

[0007] In one possible embodiment, the number of convolutional layers is 5, and the step of inputting the second feature data into multiple convolutional layers and corresponding SE attention layers of the mass spectrometry data neural network model to be trained, and outputting the seventh feature data, includes: The second feature data is input into the first convolutional layer and the first SE attention block of the mass spectrometry data neural network model to be trained, and the third feature data is output. The third feature data is input into the second convolutional layer and the second SE attention block of the mass spectrometry data neural network model to be trained, and the fourth feature data is output. The fourth feature data is input into the third convolutional layer and the third SE attention block of the mass spectrometry data neural network model to be trained, and the fifth feature data is output. The fifth feature data is input into the fourth convolutional layer and the fourth SE attention block of the mass spectrometry data neural network model to be trained, and the sixth feature data is output. The sixth feature data is input into the fifth convolutional layer and the fifth SE attention block of the mass spectrometry data neural network model to be trained, and the seventh feature data is output.

[0008] In one possible embodiment, inputting the eighth feature data into the classifier of the mass spectrometry data neural network model to be trained to obtain a first prediction result includes: The eighth feature data is input into the first layer of the classifier of the mass spectrometry data neural network model to be trained, and the first result is output. The first result is input into the second layer of the classifier, and the second result is output. The second result is input into the output layer of the classifier to output the first prediction result.

[0009] In one possible embodiment, calculating the loss between the first prediction result and the true class label, and updating the model parameters of the mass spectrometry data neural network model through backpropagation, includes: Convert the first prediction result into a probability distribution; Calculate the weighted cross-entropy loss based on the true category label and the probability distribution, and output the loss value. The model parameters of the mass spectrometry data neural network model are updated through backpropagation based on the loss value.

[0010] In one possible embodiment, the probability distribution satisfies: ; in, This represents the predicted probability of the (i)th category. Represents the (i)th category value.

[0011] In one possible embodiment, the loss value satisfies: ; in, This represents the actual category label. It is a category The weight, It is predicted as a category The probability of.

[0012] In one possible embodiment, updating the model parameters of the mass spectrometry data neural network model based on the loss value through backpropagation includes: The gradient of the output layer is calculated based on the predicted probability and the loss value; Gradient propagation back layer by layer; Calculate the gradients of the convolutional layer weights, fully connected layer weights, batch normalization parameters, and SE attention weights respectively; If the total gradient of the convolutional layer weights, the fully connected layer weights, the batch normalization parameter, and the SE attention weights is greater than a threshold, then the gradients of the convolutional layer weights, the fully connected layer weights, the batch normalization parameter, and the SE attention weights are scaled proportionally.

[0013] In one possible embodiment, the method further includes: The optimal neural network model for mass spectrometry data was validated using K-fold cross-validation.

[0014] Secondly, embodiments of this application also provide a mass spectrometry data neural network model, which is trained using the mass spectrometry data neural network model training method described in any one of the above claims. Beneficial effects: The mass spectrometry data neural network model and training method provided in this application involve inputting first feature data into the input layer of the mass spectrometry data neural network model to be trained to obtain second feature data; inputting the second feature data into multiple convolutional layers and corresponding SE attention layers of the mass spectrometry data neural network model to be trained to output seventh feature data; flattening the seventh feature data into a one-dimensional feature vector to obtain eighth feature data; inputting the eighth feature data into the classifier of the mass spectrometry data neural network model to be trained to obtain a first prediction result; calculating the loss between the first prediction result and the true class label, and updating the model parameters of the mass spectrometry data neural network model through backpropagation; inputting preset verification data into the trained mass spectrometry data neural network model and outputting verification results; determining whether the trained mass spectrometry data neural network model is the optimal mass spectrometry data neural network model based on the verification results; if so, saving the currently trained mass spectrometry data neural network model as the optimal mass spectrometry data neural network model. This enables the trained neural network model for mass spectrometry data to effectively extract key information from the mass spectrometry data. Features are automatically identified to improve the model's classification accuracy and training efficiency, thereby enhancing the model's accuracy and generalization ability in drug-resistant bacteria classification tasks. Attached Figure Description

[0015] To more clearly illustrate the technical solutions of the embodiments of this application, the accompanying drawings used in the embodiments will be briefly introduced below. It should be understood that the following drawings only show some embodiments of this application and should not be regarded as a limitation of the scope. For those skilled in the art, other related drawings can be obtained based on these drawings without creative effort.

[0016] Figure 1 This is a schematic diagram of the structure of an electronic device provided in the first embodiment of this application; Figure 2 This is a flowchart of a method for training a mass spectrometry data neural network model, provided in the second embodiment of this application. Detailed Implementation

[0017] To make the objectives, technical solutions, and advantages of the embodiments of this application clearer, the technical solutions of this application will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of this application, not all embodiments. Based on the embodiments of this application, all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of this application.

[0018] First Embodiment Figure 1 This is a schematic diagram of the structure of an electronic device provided in an embodiment of this application. In this application, it can be... Figure 1 The schematic diagram shown illustrates an electronic device 100 used to implement an example of a mass spectrometry data neural network model and training method according to embodiments of this application. Figure 1 The diagram shows the structure of an electronic device 100, which includes one or more processors 102 and one or more storage devices 104. These components are interconnected via a bus system and / or other forms of connection mechanisms (not shown). It should be noted that... Figure 1 The components and structure of the electronic device 100 shown are merely exemplary and not limiting; the electronic device may be adapted as needed. With Figure 1 The components shown may also have Figure 1 Other components and structures not shown.

[0019] The processor 102 may be a central processing unit (CPU) or other form of processing unit with data processing capabilities and / or instruction execution capabilities, and may control other components in the electronic device 100 to perform desired functions.

[0020] It should be understood that the processor 102 in the embodiments of this application can be a central processing unit (CPU), or it can be other general-purpose processors, digital signal processors (DSPs), application-specific integrated circuits (ASICs), field-programmable gate arrays (FPGAs), or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc. The general-purpose processor can be a microprocessor or any conventional processor, etc.

[0021] The storage device 104 may include one or more computer program products, which may include various forms of computer-readable storage media.

[0022] It should be understood that the storage device 104 in the embodiments of this application may be volatile memory or non-volatile memory, or may include both volatile and non-volatile memory. The non-volatile memory may be read-only memory (ROM), programmable read-only memory (PROM), erasable programmable read-only memory (EPROM), electrically erasable programmable read-only memory (EEPROM), or flash memory. The volatile memory may be random access memory (RAM), which is used as an external cache. By way of example, but not limitation, many forms of random access memory (RAM) are available, such as static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double data rate synchronous DRAM (DDR SDRAM), enhanced synchronous DRAM (ESDRAM), synchronous linked DRAM (SLDRAM), and direct rambus RAM (DR RAM).

[0023] The computer-readable storage medium may store one or more computer program instructions, which the processor 102 may execute to implement the client functions (implemented by the processor) in the embodiments of this application described below, and / or other desired functions. Various applications and various data may also be stored in the computer-readable storage medium, such as various data used and / or generated by the applications.

[0024] Second embodiment: Reference Figure 2 The flowchart shown illustrates a method for training a neural network model for mass spectrometry data. This method specifically includes the following steps: Step S201: Input the first feature data into the input layer of the mass spectrometry data neural network model to be trained to obtain the second feature data.

[0025] It is understandable that the second feature data is the feature data after the first feature data has undergone data shape transformation.

[0026] For example, the implementation process of data shape transformation is as follows: def forward(self, x): # Data Shape Conversion if x.dim() == 3: # [batch, length, channels] x = x.transpose(1, 2) # [batch, channels, length] elif x.dim() == 2: # [batch, length] x = x.unsqueeze(1) # [batch, 1, length] Conversion algorithm: 1. If the input is 3-dimensional: [batch, length, channels] → [batch, channels, length] Use transpose(1, 2) to swap dimension 1 and dimension 2; 2. If the input is 2-dimensional: [batch, length] → [batch, 1, length] Use unsqueeze(1) to insert a dimension into dimension 1, converting single-channel data into channel dimensions.

[0027] Understandably, the purpose of the conversion is to ensure that the data format meets the input requirements [batch_size, in_channels, sequence_length] of a 1D convolutional layer.

[0028] For example, the second feature data is as follows: Shape: [batch_size, 1, 3001]; Content: Standardized mass spectrometry data, with the first dimension being batch size, the second dimension being the number of channels (1 channel), and the third dimension being the sequence length (3001 data points).

[0029] As one implementation method, before step S201, the mass spectrometry data neural network model training method further includes: normalizing and data augmenting the training samples to obtain augmented first feature data.

[0030] The training samples are mass spectrometry data loaded from the preprocessed file. The data format is as follows: # Load preprocessed data with open('5cnn network data preprocessing / cnn_preprocessed_data.pkl', 'rb') as f: data_dict = pickle.load(f) # Extract data X_train_cnn = data_dict['X_train'] # Training set features [N_train, 3001] y_train = data_dict['y_train'] # Training set labels[N_train] X_val_cnn = data_dict['X_val'] # Validation set features[N_val, 3001] y_val = data_dict['y_val'] # Validation set labels [N_val] class_weights = data_dict['class_weights'] # Class weights in: • X_train_cnn and X_val_cnn have the shape [number of samples, 3001], indicating that each sample has 3001 feature dimensions (mass spectrometry data points); ·y_train and y_val contain 3 categories: 0 (susceptible bacteria), 1 (drug-resistant bacteria), and 2 (heterogeneous drug-resistant bacteria); `class_weights` is used to balance the number of samples from different classes.

[0031] Alternatively, data augmentation processing may employ one or more of the following methods, including but not limited to intensity translation augmentation, random scaling augmentation, left and right translation augmentation, random noise augmentation, combined augmentation, and pre-computed augmentation of data.

[0032] For example, the following is a method to achieve a small intensity shift enhancement of mass spectrometry data: def augment_spectrum_intensity(spectrum, intensity_shift_range=0.05): """ An algorithm for minor intensity shift enhancement of mass spectrometry data: 1. Generate a random intensity shift factor shift_factor ∈ [1-intensity_shift_range, 1+intensity_shift_range] 2. Apply intensity shift: augmented_spectrum = spectrum × shift_factor """ # Generate random intensity shift factor The `uniform(-0.05, 0.05)` function generates random numbers in the range [-0.05, 0.05]. # Adding 1.0 gives a scaling factor in the range [0.95, 1.05]. shift_factor = 1.0 + np.random.uniform(-intensity_shift_range,intensity_shift_range) # Apply intensity shift: Multiply all intensity values ​​in the entire spectrum by a scaling factor This is equivalent to globally amplifying or reducing the signal. augmented_spectrum = spectrum * shift_factor return augmented_spectrum Mathematical formula: .

[0033] in, The coefficient is for random scaling.

[0034] For example, the following is an example of how to perform a small amount of random scaling enhancement on mass spectrometry data: def augment_spectrum_scale(spectrum, scale_range=0.05): """ algorithm: 1. Generate a random scaling factor _scale_factor ∈ [1-scale_range, 1+scale_range] 2. Apply scaling: augmented_spectrum = spectrum × scale_factor """ # Generate a random scaling factor (0.95-1.05) # Same as the `intensity` function, but semantically represents "scaling" rather than "translation". scale_factor = 1.0 + np.random.uniform(-scale_range, scale_range) # Apply scaling: Multiply the entire spectrum by a scaling factor augmented_spectrum = spectrum * scale_factor return augmented_spectrum For example, the following is a method for enhancing mass spectrometry data by left-right shifting: def augment_spectrum_shift(spectrum, shift_range=1): """ algorithm: 1. Randomly select a translation amount shift ∈ [-shift_range, shift_range] 2. If shift > 0 (shift right): - Preserve the original data: shifted_spectrum[shift:] = spectrum[:-shift] - Fill the blank space on the left with the average of the nearest surrounding values. 3. If shift < 0 (shift left): - Preserve the original data: shifted_spectrum[:-shift] = spectrum[shift:] - Fill the blank space on the right with the mean of the nearest surrounding values. """ length = len(spectrum) # Get the length of the data # Randomly select translation amount `randint(-shift_range, shift_range + 1)` generates an integer between [-shift_range, shift_range]. For example, when shift_range=1, possible values ​​are: -1, 0, 1. shift = np.random.randint(-shift_range, shift_range + 1) # If the translation is 0, return the original data directly. if shift == 0: return spectrum # No translation, returns the original data. # Create a new array and initialize it to 0 shifted_spectrum = np.zeros_like(spectrum) if shift>0: # Shift right (data moves to the right, leaving blank space on the left) # Preserve data: Copy the [0:length-shift] portion of the original data to the [shift:length] position of the new array. For example: when shift=2, the original data [0:length-2] is copied to the new array [2:length]. shifted_spectrum[shift:] = spectrum[:-shift] # Fill the blank space on the left with the nearest surrounding value. # Use the average of the three most recent values ​​from the right to fill the blank area on the left. if shift <length: # Fill each blank space on the left for i in range(shift): # Calculate the reference area to use for filling (the 3 most recent values ​​from the right). start_idx = max(0, shift - 3) # Starting index, ensure it is not less than 0 end_idx = min(length, shift + 3) # End index, ensuring it does not exceed length if end_idx > start_idx: # Fill with the mean of the reference area shifted_spectrum[i] = np.mean(spectrum[start_idx:end_idx]) else: # If the reference range is invalid, use the first value. shifted_spectrum[i] = spectrum[0] else: # Shift left (shift<0, data shifts left, blank space appears on the right) shift = abs(shift) # Convert to a positive number for easier processing # Retained data: Copy the [shift:length] portion of the original data to the [0:length-shift] position of the new array. # For example: when shift=2, the original data [2:length] is copied to the new array [0:length-2]. shifted_spectrum[:-shift] = spectrum[shift:] # Fill the blank space on the right with the nearest surrounding value. # Use the average of the three most recent values ​​from the left to fill the blank area on the right. if shift <length: # Fill each blank space on the right for i in range(length - shift, length): # Calculate the reference area to use for filling (the 3 most recent values ​​from the left). start_idx = max(0, length - shift - 3) # Starting index end_idx = min(length, length - shift + 3) # End index if end_idx > start_idx: # Fill with the mean of the reference area shifted_spectrum[i] = np.mean(spectrum[start_idx:end_idx]) else: # If the reference range is invalid, use the last value. shifted_spectrum[i] = spectrum[-1] return shifted_spectrum For example, the random noise enhancement of mass spectrometry data can be implemented as follows: def augment_spectrum_noise(spectrum, noise_std=0.01): """ algorithm: 1. Calculate the standard deviation of the data: data_std = std(spectrum) 2. Generate Gaussian noise: noise ~ N(0, (data_std × noise_std) 2 ) 3. Add noise: noise_spectrum = spectrum + noise 4. Ensure non-negativity: noisy_spectrum = max(noisy_spectrum, 0) """ # Calculate the standard deviation of the data # Used to determine the intensity of noise, making the noise level proportional to the signal strength. data_std = np.std(spectrum) # Generate Gaussian noise (normal distribution) `normal(0, data_std * noise_std)` generates random numbers with a mean of 0 and a standard deviation of `data_std * noise_std`. # size=spectrum.shape Ensure the noise array has the same shape as the input data. noise = np.random.normal(0, data_std * noise_std, size=spectrum.shape) # Add noise: Add noise to the original data noise_spectrum = spectrum + noise # Ensure there are no negative values # Mass spectrum intensity values ​​must be non-negative; use the maximum value to ensure all values ​​are >= 0. noisy_spectrum = np.maximum(noisy_spectrum, 0.0) return noisy_spectrum For example, randomly select one enhancement method or combine two enhancement methods: def augment_spectrum_combined(spectrum, ...): """ algorithm: 1. Randomly select enhancement type: ['intensity', 'scale', 'shift', 'noise', 'combined'] 2. If it is 'combined', randomly select two enhancement methods and apply them in sequence. 3. Ensure the output is non-negative and within a reasonable range. """ # Randomly select enhancement type The `choice` function randomly selects an enhancement type based on a probability distribution. # p=[0.2, 0.2, 0.2, 0.2, 0.2] means that the probability of each type is 20%. augmentation_type = np.random.choice( ['intensity', 'scale', 'shift', 'noise', 'combined'], p=[0.2, 0.2, 0.2, 0.2, 0.2] # Equal probability selection ) # Apply the appropriate enhancement method based on the selected enhancement type. if augmentation_type == 'intensity': # Application of intensity translation enhancement augmented_spectrum = augment_spectrum_intensity(spectrum,intensity_shift_range) elif augmentation_type == 'scale': # Application scaling enhancement augmented_spectrum = augment_spectrum_scale(spectrum, scale_range) elif augmentation_type == 'shift': # Apply translation enhancement augmented_spectrum = augment_spectrum_shift(spectrum, shift_range) elif augmentation_type == 'noise': # Application of noise enhancement augmented_spectrum = augment_spectrum_noise(spectrum, noise_std) else: # combined - Combine two enhancement methods # Randomly select two different enhancement methods (replace=False to ensure no duplication) # For example: you might choose ['intensity', 'noise'] or ['scale', 'shift'], etc. aug1, aug2 = np.random.choice(['intensity', 'scale', 'shift','noise'], 2, replace=False) # Apply the first enhancement first temp_spectrum = spectrum # Temporary variable to store intermediate results if aug1 == 'intensity': temp_spectrum = augment_spectrum_intensity(temp_spectrum,intensity_shift_range) elif aug1 == 'scale': temp_spectrum = augment_spectrum_scale(temp_spectrum,scale_range) elif aug1 == 'shift': temp_spectrum = augment_spectrum_shift(temp_spectrum,shift_range) elif aug1 == 'noise': temp_spectrum = augment_spectrum_noise(temp_spectrum,noise_std) # Apply the second enhancement (on the result of the first enhancement) if aug2 == 'intensity': augmented_spectrum = augment_spectrum_intensity(temp_spectrum, intensity_shift_range) elif aug2 == 'scale': augmented_spectrum = augment_spectrum_scale(temp_spectrum, scale_range) elif aug2 == 'shift': augmented_spectrum = augment_spectrum_shift(temp_spectrum, shift_range) elif aug2 == 'noise': augmented_spectrum augment_spectrum_noise(temp_spectrum, noise_std) # Data Post-processing: Ensuring Data Quality # 1. Ensure there are no negative values ​​(mass spectrometry intensity must be non-negative) augmented_spectrum = np.maximum(augmented_spectrum, 0.0) # 2. Avoid outliers: If the maximum value of the augmented data is too large, normalize it. # This can prevent unreasonable extreme values ​​from being generated during the enhancement process. max_val = np.max(augmented_spectrum) if max_val>1e6: # If the maximum value exceeds 1e6, it is considered abnormal. # Scale the enhanced data to the maximum value range of the original data. # Maintain the relative proportion of the data, but limit the maximum value. augmented_spectrum = augmented_spectrum / max_val * np.max(spectrum) return augmented_spectrum For example, augmentation is performed on all training samples: def precompute_augmented_data(data, labels, augmentation_factor=1): """ Pre-computed data augmentation functions Function: Generates multiple enhanced versions for each original sample, expanding the training dataset. Workflow: 1. For each original sample, retain the original sample. 2. Generate augmentation_factor augmented versions. 3. All augmented samples use the same label (because they are variants of the same class). body) 4. Conduct data quality checks. Advantages: Pre-computation can avoid repetitive calculations during training, thus improving training efficiency. - The quality of enhanced data can be checked in advance. - Scenarios suitable for fixed enhancement strategies Notice: - Augmented dataset size = Original dataset size × (1 + augmentation_factor) - For example: with 100 samples and augmentation_factor=10, there will be 1100 samples after augmentation. Args: data: A NumPy array containing the original data, with a shape of (n_samples, length). labels: A NumPy array containing the original labels, with a shape of (n_samples, ...). augmentation_factor: int, augmentation factor (default 10, meaning 10 augmented versions are generated for each sample). intensity_shift_range: float, the range of intensity shift (passed to augmentation_spectrum_combined) scale_range: float, the scaling range (passed to augmentation_spectrum_combined) shift_range: int, the range of translation (passed to augmentation_spectrum_combined) noise_std: float, the standard deviation of noise (passed to augmentation_spectrum_combined) Returns: augmented_data: A NumPy array containing the augmented data, with a shape of (n_samples*(1+augmentation_factor), length). augmented_labels: A NumPy array containing the augmented labels, with a shape of (n_samples*(1+augmentation_factor),). """ # Print enhanced parameter information for easier debugging and monitoring. print(f"Starting pre-computation of data augmentation, augmentation factor: {augmentation_factor}") print(f"Intensity shift range: ±{intensity_shift_range*100:.1f}%") print(f"Scaling range: ±{scale_range*100:.1f}%") print(f"Horizontal translation range: ±{shift_range} positions") print(f"Noise standard deviation: {noise_std*100:.1f}%") # Initialize the enhanced data list augmented_data = [] # Store all augmented data augmented_labels = [] # Store all augmented labels # Iterate through each original sample for i in range(len(data)): # Step 1: Add raw data (raw samples without augmentation) # Preserving the original samples is important to ensure that the model can also learn the features of the original data. augmented_data.append(data[i]) augmented_labels.append(labels[i]) # Step 2: Generate augmented data # Generate augmentation_factor augmented versions for the current sample for _ in range(augmentation_factor): # Generate augmented samples using combined augmentation functions # Each call randomly selects an enhancement strategy, generating different variants. aug_spectrum = augment_spectrum_combined( data[i], # Original sample intensity_shift_range=intensity_shift_range, scale_range=scale_range, shift_range=shift_range, noise_std=noise_std ) # Add the enhanced sample and its corresponding label to the list augmented_data.append(aug_spectrum) augmented_labels.append(labels[i]) # Labels remain unchanged # Convert the list to a NumPy array for easier subsequent processing. augmented_data = np.array(augmented_data) augmented_labels = np.array(augmented_labels) # Print enhanced results statistics print(f"Original data: {len(data)} Sample") print(f"Augmented data: {len(augmented_data)} samples") print(f"Enhancement factor: {len(augmented_data) / len(data):.1f}x") # Data quality check: Ensure the enhanced data meets expectations print(f"Data range check:") print(f"Minimum value: {np.min(augmented_data):.6f}") # Should be >= 0 print(f"Maximum value: {np.max(augmented_data):.6f}") # Should be within a reasonable range print(f"Number of negative values: {np.sum(augmented_data<0)}") # Should be 0 (mass spectrometry intensity cannot be negative) print(f"Number of outliers: {np.sum(augmented_data>1e6)}") # Should be 0 or very few return augmented_data, augmented_labels It should be noted that the enhanced first feature data is as follows: Shape: [N_augmented, 3001], where N_augmented = N_original × (1 +augmentation_factor); Contents: Raw data and enhanced mass spectrometry data.

[0035] Step S202: Input the second feature data into multiple convolutional layers and corresponding SE attention layers of the mass spectrometry data neural network model to be trained, and output the seventh feature data.

[0036] Optionally, the number of convolutional layers is 5-8. For example, the number of convolutional layers can be 5, 6, 7, or 8. Optionally, the number of convolutional layers is 5. As one implementation, step S202 includes: inputting the second feature data into the first convolutional layer and the first SE attention block of the mass spectrometry data neural network model to be trained, and outputting the third feature data; and then processing the third feature data... The fourth feature data is output by inputting the second convolutional layer and the second SE attention block of the mass spectrometry data neural network model to be trained; the fourth feature data is input into the third convolutional layer and the third SE attention block of the mass spectrometry data neural network model to be trained, and the fifth feature data is output; the fifth feature data is input into the fourth convolutional layer and the fourth SE attention block of the mass spectrometry data neural network model to be trained, and the sixth feature data is output; the sixth feature data is input into the fifth convolutional layer and the fifth SE attention block of the mass spectrometry data neural network model to be trained, and the seventh feature data is output.

[0037] As one implementation method, the implementation process of step S202 is as follows: Step S2021: Input the second feature data into the first convolutional layer: self.conv1 = nn.Sequential( nn.Conv1d(1, 8, kernel_size=5, padding=2, bias=False), # 1->8 channels nn.BatchNorm1d(8), # Batch normalization to accelerate convergence nn.ReLU(inplace=True) # ReLU is enabled (inplace=True saves memory) ) Convolution operation algorithm: Step S20211: One-dimensional convolution calculation Convolution formula:

[0038] in: i: Batch index. Indicates that this is the i-th sample in the current batch.

[0039] c: Output channel index. Given the context, there are 8 convolutional kernels here, so c ranges from 0 to 7, representing the feature channel calculated by the c-th convolutional kernel.

[0040] j: Position index. Represents the value at the j-th position in the output sequence.

[0041] input[i, 0, j + k]: This is a specific value in the input tensor. Where i: Batch index, consistent with i in output. 0: Input channel index. j + k: Position index in the input sequence. This is the core of the convolution operation; j is the starting point of the computation, and k slides on the convolution kernel. j + k together constitute the position covered by the "sliding window" on the input data.

[0042] `weight[c, 0, k]`: This is a specific value in the convolution kernel weight tensor. `c`: Output channel index. It determines which of the 8 convolution kernels (the c-th one) is currently being used. `0`: Input channel index. Corresponds to the input channel; since the input has 1 channel, it's 0 here. `k`: Position index within the convolution kernel. `k`: Represents the kernel size. `k`: This is the iteration variable for the summation. It iterates from 0 to K-1 (i.e., 0, 1, 2, 3, 4). Its function is to iterate through each weight value on the convolution kernel.

[0043] (Summation symbol): This means adding up all the calculation results for k from 0 to 4. This "weighted summation" is the essence of convolution calculation: multiplying a local region of the input data (determined by j and k) element-wise with a convolution kernel (determined by c), then summing the results, and finally obtaining the value of a channel (c) of a point (j) in the output sequence.

[0044] The specific calculation process is as follows: 1. Convolve the input using 8 different convolution kernels (each kernel size is 5); 2. Each convolutional kernel learns different feature patterns (e.g., different peak patterns, frequency features, etc.); 3. Output 8 feature channels, each channel capturing different local features.

[0045] Step S20212: Batch Normalization (BatchNorm1d) Normalize each channel:

[0046] in: , , , , This represents the normalized value of the i-th channel. This represents the specific value of the i-th channel.

[0047] Understandably, batch normalization can accelerate training convergence and improve model stability.

[0048] Step S20213: ReLU activation function Apply a non-linear activation function:

[0049] Function: Introducing nonlinearity enables the model to learn complex feature representations.

[0050] Output results: The output of conv1 is as follows: • Shape: [batch, 8, 3001] • Content: 8 different feature channels, each containing 3001 activation values.

[0051] Step S2022: Channel weighting of the first SE attention block (se1): Input the output of conv1 into the first SE attention block: self.se1 = SEBlock(8, reduction=4) SE attention algorithm: Step S20221: Squeeze - Global average pooling.

[0052] Perform global average pooling on all positions of each channel:

[0053] in, , , .

[0054] Its code implementation is as follows: y = self.global_avg_pool(x).view(b, c) # [batch, 8] Output shape: [batch, 8], compressing 3001 values ​​per channel into a single scalar. Step S20222: Excitation - Generate channel weights Channel weights are generated using two fully connected layers: self.fc = nn.Sequential( nn.Linear(8, 2, bias=False), # Compression: 8 -> 2 (reduction=4) nn.ReLU(inplace=True), nn.Linear(2, 8, bias=False), # Restore: 2 -> 8 nn.Sigmoid() ) The calculation process is as follows: 1. First linear transformation (compression): Input: [batch, 8]; Output: [batch, 2] (compressed to 1 / 4 of the original); z represents the input feature vector, which comes from the global average pooling output of the Squeeze stage, with a shape of [batch, channels] (e.g., [batch, 8]). W1 represents the weight matrix of the first fully connected layer, with shape [channels, channels / / reduction] (e.g., [8, 2]), used for dimensionality reduction; b1 represents the bias vector of the first fully connected layer, with shape [channels / / reduction] (e.g., [2]). Note: bias=False in the code, so b1=0; sc represents the compressed feature vector, with shape [batch,channels / / reduction] (e.g., [batch, 2]), and the dimension is compressed to 1 / 4 of the original. 2. Second linear transformation (recovery): ; Input: [batch, 2]; Output: [batch, 8].

[0055] s (i.e. sc) is the output of the first step, a compressed feature vector with shape [batch, channels / / reduction] (e.g., [batch, 2]). W2 represents the weight matrix of the second fully connected layer, with shape [channels / / reduction, channels] (e.g., [2, 8]), used to recover dimensions; b2 represents the bias vector of the second fully connected layer, with shape [channels] (e.g., [8]). Note: in the code, bias=False, so b2=0; s c represents the recovered feature vector, with shape [batch, channels] (e.g., [batch, 8]), and dimensions restored to the original size; 3. Sigmoid activation:

[0056] Output range: [0, 1]; in, This represents the recovered feature vector output from the second step, with a shape of [batch, channels] (e.g., [batch, 8]). σ represents the Sigmoid activation function, which maps any real number to the interval (0, 1). `weightc` represents the final channel attention weights, with a shape of [batch, channels] (e.g., [batch, 8]), where each value is in the range [0, 1] and represents the importance of the corresponding channel. Output: Channel weights • Shape: [batch, 8, 1]; • Content: Weight values ​​for 8 channels, ranging from [0, 1], representing the importance of each channel; Step S2023: Scale - Apply weights: Apply the weights to the original features:

[0057] Among them, xc This represents the weighted feature map of channel c, the final output of the SE module for channel c, with shape [batch, channels, length]. c represents the channel of the original input feature map, the original input of the SE module, and the shape [batch, channels, length]. weightc represents the attention weight scalar value of channel c, with shape [batch, channels, 1] and range [0,1], indicating the importance of the channel; Code implementation: return x * y.expand_as(x) # [batch, 8, 3001] × [batch, 8, 1] Output: Third feature data (weighted features) Shape: [batch, 8, 3001]; Content: Features after channel attention weighting, important channels are enhanced, while unimportant channels are suppressed.

[0058] Optionally, the processing procedure for the second convolutional layer and its corresponding SE attention layer is as follows: First, the third feature data is input into the second convolutional layer, and the code implementation is as follows: self.conv2 = nn.Sequential( nn.Conv1d(8, 16, kernel_size=3, padding=1, bias=False), # 8->16 channels nn.BatchNorm1d(16), nn.ReLU(inplace=True), nn.Dropout1d(0.1), # 10% dropout, slight regularization nn.MaxPool1d(2) # Max pooling, halving the sequence length (dimensionality reduction) ) Next, one-dimensional convolution is performed, and the calculation process is as follows: Convolution formula:

[0059] in: Input shape: [batch, 8, 3001]; Weight shape: [16, 8, 3] (16 convolutional kernels, each kernel has 8 input channels, and the kernel size is 3). Output shape: [batch, 16, 3001] (using padding=1 to maintain length); output[i, c, j] represents the output values ​​for batch i, output channel c, and spatial position j, with a shape of [batch, 16, 3001]. Batch index: range [0, batch-1]; Output channel index: range [0, 15] (16 output channels); Output spatial location index: range [0, 3000] (3001 locations); Convolution kernel spatial location index: range [0, K-1], K=3 (kernel_size=3); Convolution kernel size: K=3; Input channel index: range [0, 7] (8 input channels); input[i, c', j + k] represents the values ​​of batch i, input channel c', and spatial position j+k. weight[c, c', k] represents the convolution kernel weight, which connects the output channel c, the input channel c', and the kernel position k, with a shape of [16, 8, 3].

[0060] The difference between the second convolutional layer and the first convolutional layer is understandable: The number of input channels increased from 1 to 8; The number of output channels increased from 8 to 16; The kernel size was reduced from 5 to 3 (to capture finer-grained features).

[0061] Next, batch normalization is performed on the 16 channels, the process of which is the same as step S20212, and will not be repeated here.

[0062] Next, the ReLU activation function is applied, and the process is the same as step S20213, which will not be repeated here.

[0063] Next, Dropout regularization is performed, and the specific process is as follows: Dropout1d algorithm:

[0064] in: ; Samples are taken from a uniform distribution Uniform(0, 1), with each location sampled independently; ; output[i, c, j], the shape of the output values ​​after Dropout [batch, 16, 3001]; input[i, c, j] is the shape of the input values ​​before Dropout [batch, 16, 3001]; Dropout probability: p = 0.1 (10% dropout rate); (1-p) represents the scaling factor. To keep the expected value unchanged, when p=0.1, the retained value is scaled up to input / 0.9. Understandably, the above regularization process can randomly set 10% of the channels to zero to prevent overfitting.

[0065] Finally, max pooling (MaxPool1d) is performed, and the process is as follows: Max pooling algorithm:

[0066] Where output[i, c, j] represents the pooled output value, with a shape of [batch, 16, 1500] (length halved from 3001 to 1500); input[i, c, 2j + k] represents the input value within the pooling window, with a shape of [batch, 16, 3001]; i represents the batch index; c represents the channel index, such as 16 channels; j represents the output spatial position index, with a range of [0, 1499] (1500 positions); k represents the position index within the pooling window, with a range of [0, 1] (window size is 2); max_{k=0}^{1} represents the maximum value operation, taking the maximum value within the pooling window.

[0067] Optionally, the pooling window size is 2; the stride is 2 (non-overlapping pooling).

[0068] The specific calculation results are as follows: Input shape: [batch, 16, 3001]; Output shape: [batch, 16, 1500] (length halved).

[0069] Understandably, max pooling can effectively reduce computation, increase the receptive field, and extract more abstract features.

[0070] Output: The output of conv2 has the following shape: Shape: [batch, 16, 1500]; Contents: 16 feature channels, each containing 1500 activation values.

[0071] Correspondingly, the processing procedure for the second SE attention block (se2) in the second convolutional layer is as follows: First, the output of conv2 is fed into the second SE attention block, part of which is shown below: self.se2 = SEBlock(16, reduction=4).

[0072] Then, the following formula is used for Squeeze (compression):

[0073] in, Scalar value representing the global spatial feature descriptor of channel c, and average activation intensity of all spatial locations on channel c, shape [batch, channels]. L represents the total number of spatial locations on the feature map. For example, L=3001 for the first layer SE block and L=1500 for the second layer SE block. xc,i represents the activation value of channel c at the i-th spatial position, and its input feature map has the shape [batch,channels,length]. Σ(i=1 to L) represents the summation symbol, used to sum the activation values ​​of all L spatial locations of channel c; (1 / L) represents the normalization factor, which makes zc the average of all spatial activation values ​​on channel c.

[0074] Next, excitation is performed, and the algorithm is as follows: • Compression: [batch, 16] → [batch, 4] (16 / / 4 = 4); • Restore: [batch, 4] → [batch, 16]; • The weights are obtained by Sigmoid activation.

[0075] Next, the data obtained in the previous step is scaled. The scaling algorithm is as follows: ; Output: Fourth feature data, the structure of which is as follows: Shape: [batch, 16, 1500]; Content: Features after channel attention weighting.

[0076] The processing procedure for the third convolutional layer (conv3) is as follows: First, the fourth feature data is input into the third convolutional layer, and part of the code is as follows: self.conv3 = nn.Sequential( nn.Conv1d(16, 32, kernel_size=3, padding=1, bias=False), # 16->32 nn.BatchNorm1d(32), nn.ReLU(inplace=True), nn.Dropout1d(0.1), nn.MaxPool1d(2) # Further dimensionality reduction ) Convolution operation: • Input shape: [batch, 16, 1500]; • Output shape: [batch, 32, 750] (double the number of channels, halve the length).

[0077] It should be noted that the third convolutional layer uses the same convolutional algorithm as conv2, but with different parameters. In the third convolutional layer, the convolutional kernel is [32, 16, 3]; batch normalization is 32 channels; dropout is 0.1; and the maximum pool size is 2.

[0078] The processing procedure for the third SE attention block is as follows: The output of conv3 is input into the third SE attention block, and part of its code is as follows: self.se3 = SEBlock(32, reduction=4).

[0079] Among them, the SE attention algorithm: • Compression: [batch, 32] → [batch, 8] (32 / / 4 = 8); • Restore: [batch, 8] → [batch, 32].

[0080] Output: Fifth feature data: • Shape: [batch, 32, 750].

[0081] The processing of the fourth convolutional layer (conv4) is as follows: The fifth feature data is input into the fourth convolutional layer, and part of the code is as follows: self.conv4 = nn.Sequential( nn.Conv1d(32, 48, kernel_size=3, padding=1, bias=False), # 32->48 nn.BatchNorm1d(48), nn.ReLU(inplace=True), nn.Dropout1d(0.1), nn.MaxPool1d(2), # Further dimensionality reduction nn.AdaptiveAvgPool1d(1) # Global pooling ) Convolution operation: • Input shape: [batch, 32, 750]; After MaxPool1d(2): [batch, 48, 375].

[0082] Next, adaptive global average pooling is performed, and the process is as follows:

[0083] in: (L = 375) is the length of the input sequence; Output fixed length: 1; Its function is to compress feature maps of different lengths into a fixed length of 1, which facilitates subsequent processing by fully connected layers.

[0084] Its output shape is: [batch, 48, 1].

[0085] The processing procedure for the fourth SE attention block (se4) is as follows: First, the output of conv4 is input into the fourth SE attention block, and part of the code is as follows: self.se4 = SEBlock(48, reduction=4) Secondly, SE attention processing is performed, and the algorithm is as follows: Compression: [batch, 48] → [batch, 12] (48 / / 4 = 12); Restore: [batch, 12] → [batch, 48].

[0086] Output: Sixth feature data: Shape: [batch, 48, 1].

[0087] The processing procedure for the fifth convolutional layer (conv5) is as follows: The sixth feature data is input into the fifth convolutional layer, and part of the code is as follows: self.conv5 = nn.Sequential( nn.Conv1d(48, 64, kernel_size=3, padding=1, bias=False), # 48->64 nn.BatchNorm1d(64), nn.ReLU(inplace=True), nn.Dropout1d(0.1) ) Then perform the convolution operation: Input shape: [batch, 48, 1]; • Output shape: [batch, 64, 1].

[0088] It should be noted that the sequence length is already 1 at this point, and the convolution operation is mainly performed in the channel dimension.

[0089] The processing procedure for the fifth SE attention block (se5) is as follows: First, the output of conv5 is input into the fifth SE attention block, and part of the code is as follows: self.se5 = SEBlock(64, reduction=4) Then, SE attention processing is performed, and the algorithm is as follows: Compression: [batch, 64] → [batch, 16] (64 / / 4 = 16); • Restore: [batch, 16] → [batch, 64]; Output: Seventh feature data: • Shape: [batch, 64, 1]; • Content: Final feature representation of 64 channels, each channel containing 1 global feature value.

[0090] Step S203: Flatten the seventh feature data into a one-dimensional feature vector to obtain the eighth feature data.

[0091] Alternatively, a flattening algorithm: ; The specific process is as follows: Input shape: [batch, 64, 1]; • Remove the last dimension (of length 1); • Output shape: [batch, 64].

[0092] Understandably, the eighth feature data is as follows: • Shape: [batch, 64]; • Content: A 64-dimensional feature vector, with each dimension corresponding to the global features of one channel.

[0093] Step S204: Input the eighth feature data into the classifier of the mass spectrometry data neural network model to be trained to obtain the first prediction result.

[0094] Optionally, the classifier can have 3 to 5 layers, for example, 3, 4 or 5 layers.

[0095] Optionally, in this embodiment, the classifier has 3 layers.

[0096] For example, in this embodiment, the classifier has 3 layers. Step S204 includes: inputting the eighth feature data into the first layer of the classifier of the mass spectrometry data neural network model to be trained, and outputting a first result; inputting the first result into the second layer of the classifier, and outputting a second result; inputting the second result into the output layer of the classifier, and outputting a first prediction result.

[0097] For example, the specific implementation process of step S204 is as follows; The eighth feature data is input into the first layer of the classifier (i.e., the first fully connected layer): nn.Dropout(dropout_rate), # 30% dropout, stronger regularization nn.Linear(64, 32), # First layer fully connected: 64-dimensional -> 32-dimensional nn.BatchNorm1d(32), # Batch normalization nn.ReLU(inplace=True), # Non-linear activation

[0101] Fully connected layer algorithm:

[0098] in: ( ) is the weight matrix; ( It is the bias vector; Input: [batch, 64]; Output: [batch, 32]; Normalization:

[0099] ReLU activation:

[0100] Output shape: [batch, 32].

[0101] The output of the first layer is input to the second layer (i.e., the second fully connected layer): nn.Dropout(dropout_rate), # dropout_rate = 0.3 nn.Linear(32, 16), # 32->16 nn.BatchNorm1d(16), nn.ReLU (inplace=True) fully connected layer algorithm:

[0102] in: ( ) is the weight matrix; Input: [batch, 32]; Output: [batch, 16]; Output shape: [batch, 16].

[0103] Input the second layer output to the output layer: nn.Dropout(dropout_rate), # dropout_rate = 0.3 nn.Linear(16, num_classes) # Algorithm for 16->3 fully connected layer:

[0104] in: ( ) is the weight matrix (3 categories); Input: [batch, 16]; Output: [batch, 3] (unnormalized logits).

[0105] Output: First prediction result (logits), which includes: Shape: [batch, 3]; Content: Unnormalized scores (logits) for three categories.

[0106] Step S205: Calculate the loss between the first prediction result and the true category label, and update the model parameters of the mass spectrometry data neural network model through backpropagation.

[0107] As one implementation, step S205 includes: converting the first prediction result into a probability distribution; calculating a weighted cross-entropy loss based on the true category label and the probability distribution, and outputting a loss value; and updating the model parameters of the mass spectrometry data neural network model through backpropagation based on the loss value.

[0108] Optionally, the probability distribution satisfies: ; in, This represents the predicted probability of the (i)th category. Represents the (i)th category value.

[0109] Optionally, the loss value satisfies: ; in, This represents the actual category label. It is a category The weight, It is predicted as a category The probability of.

[0110] Optionally, updating the model parameters of the mass spectrometry data neural network model through backpropagation based on the loss value includes: calculating the gradient of the output layer based on the predicted probability and the loss value; backpropagating the gradient layer by layer; calculating the gradients of the convolutional layer weights, fully connected layer weights, batch normalization parameters, and SE attention weights respectively; if the total gradient of the convolutional layer weights, fully connected layer weights, batch normalization parameters, and SE attention weights is greater than a threshold, then scaling the gradients of the convolutional layer weights, fully connected layer weights, batch normalization parameters, and SE attention weights proportionally.

[0111] Optionally, the threshold is 1. It should be noted that the total gradient refers to the weights of the convolutional layer, the weights of the fully connected layer, and so on. The gradient value is obtained by summing the squared gradients of the normalized parameters and the SE attention weights, and then taking the square root.

[0112] It should be understood that the total gradient is the overall norm of the gradients of all parameters in the entire model, that is, the square root of the sum of the squares of the gradients of thousands of parameters in the model.

[0113] For example, the specific implementation process of step S205 is as follows: Calculate the loss between the predicted result and the true label: criterion = nn.CrossEntropyLoss(weight=class_weights_tensor) loss = criterion(output, target).

[0114] Softmax normalization First, convert logits into a probability distribution:

[0115] in:( ) is the logit value of the (i)th category; ) is the predicted probability of the (i)th category.

[0116] Calculate the weighted cross-entropy loss:

[0117] in:( ) is the true category label, ( ) is a category ( The weights of (from class_weights), ) is predicted as a category ( The probability of ).

[0118] Understandably, the role of class weights is to balance the number of samples from different classes. For example, if there are fewer samples in class 0, then a larger weight is given, making the model pay more attention to the classification accuracy of that class.

[0119] Output: Loss value, which includes: Shape: Scalar (single numerical value); Content: The average loss value of the current batch.

[0120] Calculate the gradient using backpropagation: loss.backward().

[0121] Backpropagation algorithm (chain rule): Calculate the gradient of the output layer:

[0122] in,( () is the Kronecker delta (1 if (i = y), otherwise 0). This represents the partial derivative of the loss function. Represents the predicted probability of the (i)th class; gradients are backpropagated layer by layer: ; ; in, This represents the weight parameters of the fully connected layer. This represents the j-th output value of the current layer.

[0123] Calculate the gradient for all parameters: Convolutional layer weights: ( ); Fully connected layer weights: ( ); Batch normalization parameters: ( ); SE attention weights: ( ).

[0124] Output: Gradients of all parameters.

[0125] Gradient clipping: Gradient clipping is used to prevent gradient explosion. Part of the code implementing this is shown below: torch.nn.utils.clip_grad_norm_(model.parameters(), max_grad_norm=1.0).

[0126] Among them, the gradient clipping algorithm:

[0127]

[0128] in:( `(x)` represents the maximum gradient norm. If the total gradient norm exceeds 1.0, all gradients are scaled proportionally. `p` represents each trainable parameter in the model, referring to all weights and biases in the neural network. `p` sums the squared gradients of all parameters across all layers in the model. The code corresponding to `p` is `model.parameters()`, which iterates through all parameters of the model.

[0129] Function: To prevent gradient explosion during training and improve training stability.

[0130] The specific implementation process of parameter update is as follows: Update model parameters using the optimizer: optimizer = optim.AdamW( model.parameters(), lr=0.001, weight_decay=5e-3 ) optimizer.step().

[0131] For example, the AdamW optimizer algorithm is used to update model parameters, and its specific calculation process is as follows: First, calculate the first moment estimate (momentum):

[0132] in: This is the current gradient; ( ) is the momentum decay rate; It is a first-moment estimate (the first moment of the gradient).

[0133] Then, calculate the second-order moment estimate (adaptive learning rate):

[0134] in: It is the second-order moment decay rate; It is a second-order moment estimate (the second moment of the gradient).

[0135] Secondly, deviation correction is performed:

[0136]

[0137] Where (t) is the current iteration number.

[0138] Finally, perform parameter updates (with weight decay):

[0139] in: This represents the updated model parameters obtained after the t-th iteration. This represents the bias-corrected first-moment estimate in the Adam optimizer. This represents the bias-corrected second-moment estimate in the Adam optimizer. This represents the model parameters at the (t-1)th iteration. ; ; .

[0140] In one possible embodiment, after step S205, the method further includes: learning rate scheduling. Specifically: Using a cosine annealing learning rate scheduler: scheduler = CosineAnnealingWarmRestarts( optimizer T_0=50, # Initial restart cycle: 50 epochs T_mult=2, # Restart cycle multiplication factor: 50, 100, 200, 400... eta_min=1e-7 # Minimum learning rate ) scheduler.step() The cosine annealing restart algorithm is as follows:

[0141] in: ; ; ; .

[0142] Among them, the learning rate changes as follows: • Epoch 0-50: The cosine value decreases from 0.001 to near its minimum, then restarts at 0.001; • Epoch 50-100: The cosine value decreases from 0.001 to near its minimum, then restarts at 0.001; • Epoch 100-200: The cycle becomes 200, and the above process is repeated; And so on... Understandably, by scheduling the learning rate, the model can be periodically restarted to help it escape local optima, and the cosine descent makes the learning rate change smoothly, avoiding training oscillations.

[0143] Step S206: Input preset verification data into the trained mass spectrometry data neural network model and output the verification results.

[0144] The validation results include: validation loss, validation accuracy, second prediction result, and true label.

[0145] For example, the implementation process of step S206 is as follows: Evaluate model performance on the validation set: def validate_epoch(model, val_loader, criterion, device): model.eval() with torch.no_grad(): for data, target in val_loader: data, target = data.to(device), target.to(device) output = model(data) loss = criterion(output, target) # ... Calculate accuracy... Verification process: 1. Set the model to evaluation mode: model.eval(): Disable Dropout and BatchNorm training mode behaviors; 2. Disable gradient computation: torch.no_grad(): Save memory and computing resources; 3. Forward propagation: Input the validation data into the model to obtain the prediction results (the same forward propagation process as during training). 4. Calculate the loss: Use the same loss function; 5. Calculate the accuracy:

[0146] Output result: • Validation loss: val_loss; • Verify accuracy: val_acc; • Prediction results: val_preds; • Real tags: val_targets.

[0147] Save model checkpoints: checkpoint_manager = ModelCheckpoint( save_dir=checkpoint_dir, patience=200, min_delta=0.001, save_every=10 ) Specifically, the checkpoint saving strategy: 1. Optimal model saving: Whenever the validation loss decreases by more than min_delta=0.001, save the current model as the best model; 2. Regularly check and save: Save a checkpoint every 10 epochs (save_every=10); 3. Early shutdown mechanism: If the validation loss does not improve for 200 consecutive epochs (patience=200), early termination is triggered.

[0148] Saved content: torch.save({ 'epoch': epoch, 'model_state_dict': model.state_dict(), 'val_loss': val_loss }, checkpoint_path).

[0149] Step S207: Determine whether the trained mass spectrometry data neural network model is the optimal mass spectrometry data neural network model based on the verification results.

[0150] For example, whenever the validation loss decreases by more than min_delta=0.001, the current model is saved as the best model.

[0151] Step S208: If yes, save the currently trained mass spectrometry data neural network model as the optimal mass spectrometry data neural network model.

[0152] In one possible embodiment, after step S208, the method further includes: validating the optimal mass spectrometry data neural network model using K-fold cross-validation.

[0153] For example, K is 5, and the specific calculation process is as follows: GroupKFold is used to group samples by ID to prevent data leakage. from sklearn.model_selection import GroupKFold sample_groups = [info['sample_id'] for info in all_info] gkf = GroupKFold(n_splits=5) Grouping algorithm: • Ensure that different augmented versions of the same sample are not assigned to the training and validation sets; • Ensure that all data for each sample are either in the training set or the validation set.

[0154] Perform steps S201 to S207 for each fold: for fold, (train_idx, val_idx) in enumerate(gkf.split(...)): # Split data fold_train_data = all_data[train_idx] fold_val_data = all_data[val_idx] # Train the model (execute steps S101-S1202) trained_model, history, val_acc = train_fold(...) # Save Model torch.save(trained_model.state_dict(), f"model_fold_{fold+1}.pth") 50% cross-validation process: • Fold 1: Train using fold 2-5, validate using fold 1; • Fold 2: Train using fold 1, 3-5, and validate using fold 2; • Fold 3: Train using fold 1-2, 4-5, and validate using fold 3; • Fold 4: Trained using folds 1-3 and 5, validated using fold 4; • Fold 5: Trained using fold 1-4, validated using fold 5.

[0155] Use models from all folds for ensemble prediction: all_test_probs = [] for fold in range(5): model.load_state_dict(torch.load(f"model_fold_{fold+1}.pth")) fold_probs = model(test_data) # Get probabilities all_test_probs.append(fold_probs) # Ensemble prediction (average probability) ensemble_probs = np.mean(all_test_probs, axis=0) ensemble_preds = np.argmax(ensemble_probs, axis=1) Ensemble algorithms:

[0156] in: ; ; .

[0157] Understandably, the advantages of ensemble prediction include: reducing model variance, improving generalization ability, and making comprehensive use of features learned from multiple models.

[0158] It should be noted that the code snippets provided in this embodiment are merely examples and not limitations.

[0159] Based on the same inventive concept, this application also provides a mass spectrometry data neural network model, which is trained using the method shown in the above-described method embodiments. Furthermore, this embodiment also provides a computer-readable storage medium storing a computer program, which, when run by a processing device, executes the above-described embodiments. The steps for training a neural network model using mass spectrometry data provided in the example.

[0160] The computer program product of the mass spectrometry data neural network model training method provided in this application includes a computer-readable storage medium storing program code. The instructions included in the program code can be used to execute the methods described in the preceding method embodiments. For specific implementation, please refer to the method embodiments, which will not be repeated here.

[0161] It should be noted that the above embodiments can be implemented, in whole or in part, by software, hardware (such as circuits), firmware, or any other combination thereof. When implemented using software, the above embodiments can be implemented, in whole or in part, in the form of a computer program product. The computer program product includes one or more computer instructions or computer programs. When the computer instructions or computer programs are loaded or executed on a computer, all or part of the processes or functions described in the embodiments of this application are generated. The computer can be a general-purpose computer, a special-purpose computer, a computer network, or other programmable device. The computer instructions can be stored in a computer-readable storage medium or transmitted from one computer-readable storage medium to another. For example, the computer instructions can be transmitted from one website, computer, server, or data center to another website, computer, server, or data center via wired (e.g., infrared, wireless, microwave, etc.) means. The computer-readable storage medium can be any available medium that a computer can access or a data storage device such as a server or data center that includes one or more sets of available media. The available medium can be a magnetic medium (e.g., floppy disk, hard disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium. A semiconductor medium can be a solid-state drive.

[0162] It should be understood that the term "and / or" in this document is merely a description of the relationship between related objects, indicating that three relationships can exist. For example, A and / or B can represent: A existing alone, A and B existing simultaneously, or B existing alone, where A and B can be singular or plural. Additionally, the character " / " in this document generally indicates an "or" relationship between the preceding and following related objects, but it may also indicate an "and / or" relationship; please refer to the context for specific interpretations. In this application, "at least one" refers to one or more, and "more" refers to two or more. "At least one of the following" or similar expressions refer to any combination of these items, including any combination of single or plural items. For example, at least one of a, b, or c. The missing item (one) can be represented as: a, b, c, ab, ac, bc, or abc, where a, b, and c can be a single item or multiple items.

[0163] It should be understood that in the various embodiments of this application, the order of the above-mentioned processes does not imply the order of execution. The execution order of each process should be determined by its function and internal logic, and should not constitute any limitation on the implementation process of the embodiments of this application.

[0164] Those skilled in the art will recognize that the units and algorithm steps of the various examples described in conjunction with the embodiments disclosed herein can be implemented in electronic hardware, or a combination of computer software and electronic hardware. Whether these functions are implemented in hardware or software depends on the specific application and design constraints of the technical solution. Those skilled in the art can use different methods to implement the described functions for each specific application, but such implementation should not be considered beyond the scope of this application.

[0165] Those skilled in the art will understand that, for the sake of convenience and brevity, the specific working processes of the systems, devices, and units described above can be referred to the corresponding processes in the foregoing method embodiments, and will not be repeated here.

[0166] In the several embodiments provided in this application, it should be understood that the disclosed systems, apparatuses, and methods can be implemented in other ways. For example, the apparatus embodiments described above are merely illustrative; for instance, the division of units is only a logical functional division, and in actual implementation, there may be other division methods. For example, multiple units or components may be combined or integrated into another system, or some features may be ignored or not executed. Furthermore, the coupling or direct coupling or communication connection shown or discussed may be through some interfaces; the indirect coupling or communication connection between apparatuses or units may be electrical, mechanical, or other forms.

[0167] The units described as separate components may or may not be physically separate. The components shown as units may or may not be physical units; that is, they may be located in one place or distributed across multiple network units. Some or all of the units can be selected to achieve the purpose of this embodiment according to actual needs. Furthermore, the functional units in the various embodiments of this application may be integrated into one processing unit, or each unit may exist physically separately, or two or more units may be integrated into one unit. The above descriptions are merely preferred embodiments of this application and are not intended to limit this application. For those skilled in the art, this application can have various modifications and variations. Any modifications, equivalent substitutions, improvements, etc., made within the spirit and principles of this application should be included within the protection scope of this application. It should be noted that similar reference numerals and letters in the following figures indicate similar items; therefore, once an item is defined in one figure, it does not need to be further defined and explained in subsequent figures.

Claims

1. A method for training a neural network model for mass spectrometry data, characterized in that, The method includes: The first feature data is input into the input layer of the mass spectrometry data neural network model to be trained to obtain the second feature data; The second feature data is input into multiple convolutional layers and corresponding SE attention layers of the mass spectrometry data neural network model to be trained, and the seventh feature data is output. The seventh feature data is flattened into a one-dimensional feature vector to obtain the eighth feature data; The eighth feature data is input into the classifier of the mass spectrometry data neural network model to be trained to obtain the first prediction result; Calculate the loss between the first prediction result and the true category label, and update the model parameters of the mass spectrometry data neural network model through backpropagation; Input preset verification data into the trained mass spectrometry data neural network model, and output the verification results; Based on the verification results, determine whether the trained mass spectrometry data neural network model is the optimal mass spectrometry data neural network model; If so, save the currently trained mass spectrometry data neural network model as the optimal mass spectrometry data neural network model.

2. The method according to claim 1, characterized in that, Before inputting the first feature data into the input layer of the mass spectrometry data neural network model to be trained to obtain the second feature data, the method further includes: The training samples are normalized and data augmented to obtain the augmented first feature data.

3. The method according to claim 1 or 2, characterized in that, The number of convolutional layers is 5. The second feature data is input into multiple convolutional layers and corresponding SE attention layers of the mass spectrometry data neural network model to be trained, and the seventh feature data is output, including: The second feature data is input into the first convolutional layer and the first SE attention block of the mass spectrometry data neural network model to be trained, and the third feature data is output. The third feature data is input into the second convolutional layer and the second SE attention block of the mass spectrometry data neural network model to be trained, and the fourth feature data is output. The fourth feature data is input into the third convolutional layer and the third SE attention block of the mass spectrometry data neural network model to be trained, and the fifth feature data is output. The fifth feature data is input into the fourth convolutional layer and the fourth SE attention block of the mass spectrometry data neural network model to be trained, and the sixth feature data is output. The sixth feature data is input into the fifth convolutional layer and the fifth SE attention block of the mass spectrometry data neural network model to be trained, and the seventh feature data is output.

4. The method according to claim 3, characterized in that, The step of inputting the eighth feature data into the classifier of the mass spectrometry data neural network model to be trained to obtain the first prediction result includes: The eighth feature data is input into the first layer of the classifier of the mass spectrometry data neural network model to be trained, and the first result is output. The first result is input into the second layer of the classifier, and the second result is output. The second result is input into the output layer of the classifier to output the first prediction result.

5. The method according to claim 1, 2, 3, or 4, characterized in that, The calculation of the loss between the first predicted result and the true class label, and the updating of the model parameters of the mass spectrometry data neural network model through backpropagation, includes: Convert the first prediction result into a probability distribution; Calculate the weighted cross-entropy loss based on the true category label and the probability distribution, and output the loss value. The model parameters of the mass spectrometry data neural network model are updated through backpropagation based on the loss value.

6. The method according to claim 5, characterized in that, The probability distribution satisfies: ; in, This represents the predicted probability of the (i)th category. Represents the (i)th category value.

7. The method according to claim 6, characterized in that, The loss value satisfies: ; in, This represents the actual category label. It is a category The weight, It is predicted as a category The probability of.

8. The method according to claim 7, characterized in that, The step of updating the model parameters of the mass spectrometry data neural network model through backpropagation based on the loss value includes: The gradient of the output layer is calculated based on the predicted probability and the loss value; Gradient propagation back layer by layer; Calculate the gradients of the convolutional layer weights, fully connected layer weights, batch normalization parameters, and SE attention weights respectively; If the total gradient of the convolutional layer weights, the fully connected layer weights, the batch normalization parameter, and the SE attention weights is greater than a threshold, then the gradients of the convolutional layer weights, the fully connected layer weights, the batch normalization parameter, and the SE attention weights are scaled proportionally.

9. The method according to claim 1, characterized in that, The method further includes: The optimal neural network model for mass spectrometry data was validated using K-fold cross-validation.

10. A mass spectrometry data neural network model, characterized in that, The mass spectrometry data neural network model is trained using the mass spectrometry data neural network model training method as described in any one of claims 1-9.