Sleep stage staging method and apparatus, computer readable storage medium
By combining converter encoding with a multi-level encoding structure of bidirectional gated loop units, the problem that existing sleep staging methods cannot fully utilize timing and channel information is solved, thus achieving higher accuracy in sleep staging.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- BOE TECHNOLOGY GROUP CO LTD
- Filing Date
- 2026-03-11
- Publication Date
- 2026-06-12
AI Technical Summary
Existing automatic sleep staging methods cannot fully utilize the temporal and channel information of polysomnography, resulting in a need to improve classification performance such as accuracy and efficiency.
A multi-level coding structure combining converter coding and bidirectional gated loop unit is adopted. The first-level coding module extracts the waveform timing features within the segment, the second-level coding module extracts the timing transformation features between segments, and the topological correlation coding between channels is combined to improve the accuracy of sleep stage segmentation.
It captures more comprehensively the intra-segment waveform temporal features and inter-segment temporal transition features of polysomnography images, improving the accuracy of automatic sleep staging and aligning with the analysis process of clinicians.
Smart Images

Figure CN122201771A_ABST
Abstract
Description
Technical Field
[0001] This disclosure relates to the field of sleep staging technology. More specifically, it relates to a method and apparatus for sleep stage staging, and a computer-readable storage medium. Background Technology
[0002] Sleep is a vital physiological process for maintaining physical and mental health, occupying about one-third of human life. Polysomnography (PSG) is the gold standard for clinically assessing sleep quality and diagnosing sleep disorders, as it contains multi-channel physiological temporal signals collected from different parts of the body.
[0003] Sleep staging based on PSG is an important prerequisite for analyzing sleep patterns and sleep disorders. Currently, there are automatic sleep staging methods based on machine learning or deep learning, but the staging performance needs to be further improved. Summary of the Invention
[0004] The purpose of this disclosure is to provide a method and apparatus for sleep stage segmentation, and a computer-readable storage medium, to solve at least one of the above-mentioned technical problems.
[0005] To achieve the above objectives, the present disclosure adopts the following technical solution: The first aspect of this disclosure provides a method for sleep stage segmentation, including the following steps: Obtain a sequence of polysomnography images to be staged, wherein the sequence of polysomnography images includes multiple temporally consecutive polysomnography images. The polysomnography image segment sequence is input into a pre-trained sleep stage staging model, which is configured to perform sleep stage staging on the input polysomnography image segment sequence and output the staging results; wherein... The sleep stage staging model includes a first-level encoding module, a second-level encoding module, and a classification output module. The first-level encoding module is used to extract the intra-segment waveform temporal features of each polysomnography image segment in the polysomnography image segment sequence. The intra-segment waveform temporal features corresponding to each polysomnography image segment constitute a waveform temporal feature sequence. The second-level encoding module is used to extract inter-segment temporal transition features based on the waveform temporal feature sequence. The classification output module is used to perform sleep stage staging on the target segments in the polysomnography image segment sequence according to the inter-segment temporal transition features and output the sleep stage staging results. The first-level encoding module and / or the second-level encoding module adopt an encoding structure that combines converter encoding and bidirectional gated cyclic units.
[0006] Optionally, the first-level encoding module adopts an encoding structure combining converter encoding and bidirectional gated cyclic units, and the first-level encoding module is used for: For each channel of the multi-sleep image segment in the multi-sleep image segment sequence, the single-channel signal is transformed by time-frequency to generate a two-dimensional time-frequency spectrum. The two-dimensional time-frequency spectrum is input to the converter encoder for encoding to obtain the first context feature sequence corresponding to the single-channel signal. The first context feature sequence is input to the bidirectional gated loop unit for encoding to obtain the second context feature sequence with time enhancement. The second context feature sequence is fused to obtain the single-channel waveform time-series feature of the single-channel signal. In this context, the single-channel waveform timing features corresponding to each channel within each polysomnography image segment constitute the intra-segment waveform timing features of that polysomnography image segment, and the intra-segment waveform timing features corresponding to each polysomnography image segment constitute the waveform timing feature sequence of that polysomnography image segment sequence.
[0007] Optionally, the second-level encoding module adopts an encoding structure combining converter encoding and bidirectional gated cyclic units. The second-level encoding module is used for: Position encoding is added to the waveform timing feature sequence. The position-encoded waveform timing feature sequence is input to the converter encoder for encoding to obtain the third context feature sequence corresponding to the waveform timing feature sequence. The third context feature sequence is input to the bidirectional gated loop unit for encoding to obtain the timing-enhanced fourth context feature sequence. The fourth context feature sequence is fused to obtain the inter-segment timing transformation feature corresponding to the waveform timing feature sequence.
[0008] Optionally, the sleep staging model further includes an inter-channel encoding module, which is used to perform inter-channel topological correlation encoding on the waveform time-series feature sequence output by the first-level encoding module to obtain a channel topological correlation feature sequence and output it to the second-level encoding module. The second-level encoding module is used to extract inter-segment time-series transition features based on the channel topological correlation feature sequence.
[0009] Optionally, the step of the inter-channel coding module performing inter-channel topological correlation coding on the waveform timing feature sequence output by the first-level coding module includes: For each polysomnography image segment in the polysomnography image segment sequence, the waveform temporal features within the segment corresponding to the polysomnography image segment are used as a graph node matrix. Graph structure data is constructed based on the graph node matrix and a preset adjacency matrix. The graph node matrix includes multiple graph nodes, and each graph node corresponds to the single-channel waveform temporal features of one channel of the polysomnography image segment. The preset adjacency matrix represents the topological connection relationship between channels. Intermediate graph structure data is obtained by encoding the inter-channel topological correlation of the graph structure data using a graph convolutional neural network. The channel topology correlation features of the multi-channel sleep image segment are obtained by fusing the graph nodes in the intermediate graph structure data using a pooling layer. The channel topology correlation features of each polysomnography image segment in the polysomnography image segment sequence constitute the channel topology correlation feature sequence.
[0010] Optionally, the preset adjacency matrix is a fully connected adjacency matrix that does not include node self-connections, and the value of each matrix element in the preset adjacency matrix is a first value.
[0011] Optionally, the classification output module includes a fully connected layer and a classifier connected in sequence. The step of classifying the target segment in the polysomnography image segment sequence into sleep stages based on the inter-segment temporal transition features includes: The inter-segment temporal transformation features are input into a fully connected layer and linearly mapped to obtain the output features of the fully connected layer. The output features of the fully connected layer are input into the classifier, and the classifier calculates the probability distribution of the target segment belonging to each sleep stage. The sleep stage segmentation result of the target segment is determined based on the probability distribution.
[0012] Optionally, the target segment is a polysomnography segment located at an intermediate time in the polysomnography segment sequence.
[0013] Optionally, the steps for obtaining the polysomnography image segment sequence to be staged include: Obtain the polysomnography to be staged, and divide the polysomnography to be staged into multiple time-continuous polysomnography image segments according to a preset segment length; A first number of polysomnography images are selected sequentially from the plurality of polysomnography images using a sliding window method. Each time the first number of polysomnography images are selected, a sequence of polysomnography images to be staged is formed. The first number is a natural number greater than 2.
[0014] Optionally, the step of inputting the polysomnography image segment sequence into the pre-trained sleep stage staging model further includes: Obtain a publicly available dataset, which includes multiple raw multichannel sleep graphs; The original polysomnography is divided into multiple time-continuous polysomnography segments according to the preset segment length. A first number of polysomnography segments are selected sequentially from the multiple polysomnography segments as training samples by means of a sliding window. Each training sample corresponds to a real sleep stage label. The initial sleep stage segmentation model is pre-trained using the training samples to obtain the pre-trained sleep stage segmentation model.
[0015] Optionally, the polysomnography includes at least two of the following: simultaneous monitoring of electroencephalogram (EEG), eye movement (EMG), electromyography (EMG), electrocardiogram (ECG), and respiratory recording. The sleep stage to which the target segment belongs includes wakefulness, non-rapid eye movement (NREM) stage 1, NREM stage 2, NREM stage 3, and rapid eye movement (REM) stage.
[0016] A second aspect of this disclosure provides a sleep stage staging apparatus, including a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein the processor executes the program to implement the steps of the sleep stage staging method as described above.
[0017] A third aspect of this disclosure provides a computer-readable storage medium having a computer program stored thereon that, when executed by a processor, implements the steps of the sleep stage staging method as described above.
[0018] The beneficial effects of this disclosure are as follows: The sleep stage staging method of this disclosure uses a sleep stage staging model that encodes intra-segment waveform temporal features through a first-level encoding module and inter-segment temporal transition features through a second-level encoding module. At least one of the first-level and second-level encoding modules employs a multi-level encoding structure combining Transformer encoding and BiGRU. This configuration allows the use of the self-attention mechanism of the transformer encoding to capture contextual features in the input data during encoding of intra-segment waveform temporal features or inter-segment temporal transition features. Furthermore, a bidirectional gated loop unit is used to enhance these contextual features temporally, resulting in enhanced contextual features. Compared to a single-level encoding structure, the encoding structure combining transformer encoding and a bidirectional gated loop unit in this disclosure can more comprehensively capture intra-segment waveform temporal features and inter-segment temporal transition features of polysomnography segments, better aligning with the clinical analysis process and effectively improving the accuracy of automatic sleep staging. Attached Figure Description
[0019] The specific embodiments of this disclosure will be described in further detail below with reference to the accompanying drawings.
[0020] Figure 1 A flowchart of an embodiment of the sleep stage segmentation method provided in this disclosure; Figure 2 A schematic diagram of a structure of an embodiment of the sleep stage segmentation model provided in this disclosure; Figure 3 This is a signal processing flowchart of an embodiment of the first-level coding module provided in this disclosure; Figure 4 A schematic diagram of the network architecture of an embodiment of the Transformer encoder provided in this disclosure; Figure 5 This is a signal processing flowchart of an embodiment of the inter-channel coding module provided in this disclosure; Figure 6 A schematic diagram of another embodiment of the sleep stage staging model provided in this disclosure; Figure 7 for Figure 6 The signal processing flowchart of the sleep stage staging model is shown below. Figure 8 A flowchart for obtaining a sequence of polysomnography images to be staged; Figure 9 This is a flowchart for a pre-trained sleep stage segmentation model. Detailed Implementation
[0021] To make the objectives, technical solutions, and advantages of the embodiments of this disclosure clearer, the technical solutions of the embodiments of this disclosure will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some, not all, of the embodiments of this disclosure. All other embodiments obtained by those skilled in the art based on the described embodiments of this disclosure without creative effort are within the scope of protection of this disclosure.
[0022] Unless otherwise defined, the technical or scientific terms used in this disclosure shall have the ordinary meaning understood by one of ordinary skill in the art to which this disclosure pertains. The terms “first,” “second,” and similar terms used in this disclosure do not indicate any order, quantity, or importance, but are merely used to distinguish different components. Similarly, the terms “an,” “a,” or “the,” and similar terms do not indicate a quantity limitation, but rather indicate the presence of at least one. The terms “including,” “comprising,” or “containing,” and similar terms mean that the element or object preceding the word encompasses the elements or objects listed following the word and their equivalents, without excluding other elements or objects. The terms “connected,” “linked,” or similar terms are not limited to physical or mechanical connections, but can include electrical connections, whether direct or indirect. The terms “upper,” “lower,” “left,” and “right,” etc., are used only to indicate relative positional relationships, and these relative positional relationships may change accordingly when the absolute position of the described objects changes.
[0023] Polysomnography (PSG) contains physiological time-series signals from multiple channels, including electroencephalography (EEG), electrooculography (EOG), electromyography (EMG), electrocardiography (ECG), and respiratory-related signals such as nasal and oral airflow and blood oxygen saturation, totaling more than ten physiological parameters. In the field of sleep staging, according to the sleep staging guidelines published by the American Academy of Sleep Medicine, physicians divide the entire night's PSG into 30-second segments. Each 30-second segment can be labeled as one of five sleep stages: wakefulness (W), non-rapid eye movement (NREM) stages 1-3, or rapid eye movement (REM). Typically, manual sleep staging takes about 2 hours to process a single night's PSG recording (7-8 hours). When the data volume increases significantly, this method becomes extremely time-consuming and labor-intensive for physicians. Therefore, automated sleep staging using computer algorithms that mirror the physician's visual analysis process has gained widespread attention due to its efficiency and accuracy, and the improvement in diagnostic efficiency is of great significance to the development of sleep medicine.
[0024] During clinical staging, physicians focus on analyzing the temporal and channel information of the sleep apnea-glucan sac (PSG) to accurately classify sleep stages. Temporal information includes intra-segment waveform temporal characteristics and inter-segment temporal transition characteristics. Intra-segment waveform temporal characteristics represent the specific waveform information related to the sleep stage contained in each PSG segment (e.g., 30 seconds in length), such as the location, proportion, and duration of key information like alpha waves, theta waves, K-complexes, and sleep spindles. Inter-segment temporal transition characteristics represent the sleep stage transition patterns observed between consecutive PSG segments, such as the prolonged duration of similar sleep stages or the short transition between dissimilar sleep stages. Channel information mainly represents the correlation between channels, that is, the correlation between the waveforms exhibited by various signal channels in a given sleep stage. For example, during REM sleep, low-amplitude mixed-frequency waves from the EEG channel and rapid eye movement (REM) waves from the EOG channel may appear simultaneously.
[0025] The inventors' research on existing automatic sleep staging methods revealed that these methods primarily involve the following research directions: 1. Traditional machine learning-based automatic sleep staging methods capture waveform temporal features within a segment. These methods extract numerous handcrafted feature parameters from the target PSG segment, such as mean, standard deviation, and skewness in the time domain; band energy and power spectral density in the frequency domain; and nonlinear parameters like sample entropy, Shannon entropy, and Hearst exponent. A typical example is the paper "Do not sleep on traditional machine learning: Simple and interpretable techniques are competitive to deep learning for sleepscoring" by Van Der Donckt et al. These algorithms characterize waveform temporal information through feature parameters and combine them with classifiers like random forests and support vector machines to achieve sleep staging. While this alleviates the pressure of manual staging, the handcrafted feature parameters are usually directly derived from the original signal or calculated through simple transformations (such as Fourier transforms). They only describe the surface and local statistical properties of the signal and cannot contain semantic information strongly related to the task, meaning they cannot characterize the deeper temporal characteristics of the signal itself, resulting in poor classification performance and algorithm generalization.
[0026] 2. Utilize deep learning methods to capture waveform temporal features within segments. This includes employing neural networks to directly encode complex deep representations of the target segment. Examples include convolutional neural networks (CNNs) for extracting local waveform features, gated recurrent unit (GRU) networks for extracting temporal features, and Transformer architectures for extracting waveform contextual features. Typical research includes Supratak et al.'s "DeepSleepNet: A model for automatic sleep stage scoring based on raw single-channel EEG," Phan et al.'s "SeqSleepNet: End-to-end hierarchical recurrent neural network for sequence-to-sequence automatic sleep staging," and "SleepTransformer: Automatic sleep staging With interpretability and uncertainty quantification." These methods typically use only a single-layer architecture to capture waveform temporal information, resulting in the model focusing only on a single type of waveform feature pattern without fully considering the temporal characteristics of the signal waveform within the segment.
[0027] 3. Encoding temporal transition features between segments using deep learning methods. Commonly used network architectures include temporal convolutional networks (TCNs), long short-term memory (LSTM) networks, and Transformer architectures. Typical research works include Khalili et al.'s "Automatics sleep stage classification using temporal convolutional neural network and new data augmentation technique from raw single-channel EEG", Dong et al.'s "Mixed neural network approach for temporal sleep stage classification", and Zhang et al.'s "CTCNet: A CNN Transformer capsule network for sleep stage classification". These methods also use only a single-layer architecture to encode temporal transition features, causing the model to focus only on a single type of transition feature pattern, thus failing to fully consider the temporal characteristics of sleep stage transitions between segments.
[0028] 4. Regarding inter-channel correlation information, some researchers, in order to reduce model complexity and improve algorithm efficiency, only select single-channel physiological time-series signals as model inputs, resulting in the model being unable to consider this necessary information. A typical research work includes "SleepContextNet: A temporal context network for automatic sleep staging based single-channel EEG" published by Zhao et al.
[0029] Other models that use multi-channel signals as input typically employ simple feature concatenation or attention mechanisms to fuse multi-channel features. Typical research includes "A deep learning architecture for temporal sleep stage classification using multivariate and multimodal time series" by Chambon et al., and "A hybrid self-attention deep learning framework for multivariate sleep stage classification" by Yuan et al. While these methods consider the differences in information content between different signal channels, they primarily rely on grid data representation to fuse channel correlation information, making it difficult to capture the topological information of interconnected signal channels.
[0030] In summary, current automatic sleep staging methods cannot fully utilize the temporal and channel information of PSG, and the classification performance of the model, such as accuracy and efficiency, needs further improvement.
[0031] To address at least one of the aforementioned technical problems, this disclosure provides a method and apparatus for sleep stage segmentation, and a computer-readable storage medium.
[0032] Please refer to Figure 1 , Figure 1 A flowchart of an embodiment of the sleep stage segmentation method provided in this disclosure is shown below. Figure 1 As shown, it includes the following steps: Step S101: Obtain the polysomnography image segment sequence to be staged, wherein the polysomnography image segment sequence includes multiple temporally consecutive polysomnography image segments.
[0033] A polysomnography (PSG) is typically a full-night record, such as 6-8 hours at night. A polysomnography segment can be understood as a short PSG fragment, which is a shorter segment extracted from the polysomnography record. For example, if the length of the polysomnography record is 30 seconds, then a 6-8 hour polysomnography record is divided into multiple 30-second segments. A sequence of multiple (e.g., L) consecutive polysomnography segments arranged chronologically is called a polysomnography segment sequence.
[0034] For example, suppose the i-th polysomnography (PSG) segment in the polysomnography segment sequence (hereinafter referred to as the PSG segment for simplicity) is denoted as... Then, the sequence of polysomnography images to be staged can be represented as: Where C represents the number of channels in the polysomnography (PSG). For example, when PSG includes three channels: electroencephalography (EEG), electrooculography (EOG), and electrocardiography (ECG), then C=3. N represents the number of data points in each channel within a PSG segment, N=fs×30, where fs represents the signal sampling rate, and L represents the length of the polysomnography segment sequence.
[0035] Step S102: Input the polysomnography image segment sequence into a pre-trained sleep stage segmentation model. The sleep stage segmentation model is configured to perform sleep stage segmentation on the input polysomnography image segment sequence and output the segmentation results.
[0036] Optional, such as Figure 2 As shown, the sleep stage staging model 100 includes a first-level encoding module 11, a second-level encoding module 12, and a classification output module 13. The first-level encoding module 11 is used to extract the intra-segment waveform temporal features of each polysomnography image segment in the polysomnography image segment sequence. The intra-segment waveform temporal features corresponding to each polysomnography image segment constitute a waveform temporal feature sequence. The second-level encoding module 12 is used to extract inter-segment temporal transition features based on the waveform temporal feature sequence. The classification output module 13 is used to perform sleep stage staging on the target segments in the polysomnography image segment sequence according to the inter-segment temporal transition features and output the sleep stage staging results. The first-level encoding module 11 and / or the second-level encoding module 12 adopt an encoding structure that combines converter encoding and bidirectional gated cyclic unit (BiGRU).
[0037] It should be noted that in this embodiment, the converter encoding specifically refers to Transformer encoding, that is, encoding implemented using a converter encoder. Converter encoding is an encoding architecture that uses a self-attention mechanism to capture the temporal features of a time series. The bidirectional gated loop unit is an encoding architecture that uses a gated loop unit to capture the temporal features of a time series. It can model the temporal information from both forward and reverse directions, thereby more comprehensively expressing the temporal features, i.e., achieving temporal enhancement. In this embodiment, the encoding structure combining converter encoding and bidirectional gated loop unit specifically refers to a multi-level encoding structure composed of a converter encoder and a bidirectional gated loop unit cascaded together. That is, the PSG segment sequence or waveform temporal feature sequence is first encoded by the converter encoder, and then the converter encoder inputs the encoding result of the first encoding to the BiGRU for a second encoding.
[0038] Compared with related technologies, the sleep stage staging method of this disclosure uses a sleep stage staging model that encodes intra-segment waveform temporal features through a first-level encoding module and inter-segment temporal transition features through a second-level encoding module. At least one of the first-level and second-level encoding modules employs a multi-level encoding structure combining Transformer encoding and BiGRU. This configuration allows the use of the self-attention mechanism of the transformer encoding to capture contextual features in the input data during encoding of intra-segment waveform temporal features or inter-segment temporal transition features. Furthermore, a bidirectional gated loop unit is used to enhance these contextual features temporally, resulting in enhanced contextual features. Compared to a single-level encoding structure, the encoding structure combining transformer encoding and a bidirectional gated loop unit in this disclosure can more comprehensively capture intra-segment waveform temporal features and inter-segment temporal transition features of polysomnography segments, better aligning with the clinical analysis process and effectively improving the accuracy of automatic sleep staging.
[0039] In some embodiments, the first hierarchical encoding module 11 employs an encoding structure combining converter encoding and a bidirectional gated loop unit, and the first hierarchical encoding module 11 is used for: For each channel of the polysomnography image segment sequence, the single-channel signal is transformed by time-frequency conversion to generate a two-dimensional time-frequency spectrum. The two-dimensional time-frequency spectrum is input to the converter encoder for encoding to obtain the first context feature sequence corresponding to the single-channel signal. The first context feature sequence is input to the bidirectional gated loop unit for encoding to obtain the second context feature sequence with enhanced timing. The second context feature sequence is fused through an attention mechanism to obtain the single-channel waveform timing feature of the single-channel signal. The single-channel waveform timing feature corresponding to each channel of each polysomnography image segment constitutes the intra-segment waveform timing feature of the polysomnography image segment, and the intra-segment waveform timing feature corresponding to each polysomnography image segment constitutes the waveform timing feature sequence of the polysomnography image segment sequence.
[0040] In a specific example, for a sequence of polysomnography images to be staged... The first-level encoding module 11 performs time-frequency transformation on the one-dimensional signal (i.e., single-channel signal) of each PSG segment and each channel to obtain a two-dimensional time-frequency spectrum. Assume the single-channel signal corresponding to the i-th PSG segment and the j-th channel is represented as... The encoding process of the first-level encoding module 11 is as follows: Figure 3 As shown, steps S201 to S205 are included.
[0041] Step S201: For single-channel signals Perform a short-time Fourier transform (STFT) to obtain a two-dimensional time-frequency spectrum. , where T represents the number of frequency column vectors (time index number) and F represents the number of frequency bands.
[0042] In some embodiments, a two-dimensional time-frequency spectrum is obtained. Subsequently, further analysis of this two-dimensional time-spectrum diagram is possible. Filtering is performed to obtain a filtered two-dimensional time-spectrum image. For example, a linear mapping method is used to filter the two-dimensional time-spectrum image. Perform learnable bandpass filtering to increase the number of frequency bands from Mapped to The learnable linear mapping weight matrix has a dimension of . The filtered two-dimensional time-frequency spectrum can be represented as: The frequency band mapping number m can be set to 128. This parameter can effectively control the feature dimension while maintaining the frequency band resolution, balancing the model's expressive power and computational efficiency.
[0043] Step S202: Input the two-dimensional time spectrum into the converter encoder for encoding to obtain the first context feature sequence.
[0044] When step S201 also examines the two-dimensional time spectrum... After filtering, specifically, the filtered two-dimensional time-frequency spectrum is... The input is given to the converter encoder, and in this embodiment of the disclosure, the filtered two-dimensional time spectrum is used. The following description uses the input to the converter encoder as an example.
[0045] In this embodiment of the disclosure, for each filtered two-dimensional time-spectrum graph The two-dimensional time spectrum It is considered as a frequency feature time series composed of multiple consecutive column vectors containing different frequency band energies, where the sequence length is T. The Transformer encoder can learn the waveform context features of this frequency feature time series.
[0046] Optionally, the network architecture of the Transformer encoder is as follows: Figure 4 As shown in Figure (a), the basic components of a Transformer encoder include a multi-head attention module, a feed-forward network, a layer normalization module, and residual connections. In some embodiments, the specific structure of the multi-head attention module is as follows: Figure 4 As shown in Figure (b), its function is to capture the contextual dependencies of the input time series from different representation subspaces.
[0047] Specifically, the multi-head attention module first maps the input time series through H parallel linear layers. These are mapped to H different representation subspaces, with each linear mapping corresponding to an attention head. The calculation process for the u-th attention head is as follows:
[0048] in, Let represent an input time series of length l, where each timestamp in the input time series has a feature dimension of d. They represent the first The query matrix, key matrix, and value matrix of each attention head. , , Indicates the first Each attention head query matrix, key matrix, and value matrix corresponds to a learnable weight parameter matrix.
[0049] Then, for any attention head, the query matrix of that attention head is... The input is a normalized dot product attention layer, and the network structure of the normalized dot product attention layer is as follows: Figure 4 As shown in Figure (c), it includes processing structures such as matrix multiplication, normalization, masking, and softmax classification, where the masking is an optional structure. The normalized dot product attention layer is used to calculate the attention weights of the query matrix, key matrix, and value matrix, and then uses these attention weights to perform a weighted summation of the query matrix, key matrix, and value matrix to obtain the output features of the attention head. For the ... Each attention head, its calculation process is expressed as follows:
[0050] in, Indicates the first The output features of each attention head, where Attention· represents the normalized dot product attention function of the normalized dot product attention layer. Next, the output features of the H attention heads are concatenated. This process can be represented as:
[0051] in, This represents the output feature vector of the multi-head attention module, i.e., the final output. `Concat·` indicates feature concatenation. This represents the learnable weight parameter matrix.
[0052] Optionally, the feedforward neural network mainly includes two fully connected (FC) layers and a ReLU activation function, which is used to perform a nonlinear transformation on the final output of the multi-head attention module, enhancing the model's expressive power; while residual connections and layer normalization are used to ensure stable training and fast convergence of the network architecture. The overall computation process of a complete Transformer encoding layer can be represented as:
[0053]
[0054]
[0055]
[0056] in, Here is a simplified representation of the multi-head attention module calculation process (Equation 1)-(Equation 3). Presentation layer normalization operation, This represents the normalized sequence of eigenvectors. This represents the output feature vector sequence of the feedforward neural network module. , , This represents the learnable weight parameters of the fully connected layer. This represents the output feature vector sequence of the overall computation process of the Transformer encoder.
[0057] It should be noted that the self-attention mechanism in the Transformer encoder itself does not contain positional information, so positional encoding (PE) is required to process the temporal information in the input time series.
[0058] For example, for the Transformer encoder in the first-level encoding module 11, its input time series Z is a filtered two-dimensional time-spectrum graph. The process of adding positional encoding to a two-dimensional time-spectrum graph can be represented as:
[0059] in, This represents a two-dimensional time-frequency spectrum after adding location information. The PE matrix, whose elements are calculated using sine and cosine functions, can be represented as:
[0060]
[0061] in, Represents the f-th row and the first position in the PE matrix. Column elements, Represents the f-th row and the first position in the PE matrix. +1 column of elements.
[0062] Next, the position-encoded two-dimensional time-spectrum map Split into time-frequency time series and input. The layered Transformer encoder learns waveform context features, and the calculation process is represented as follows:
[0063] in, Indicates the last layer (the first layer) The context feature sequence output by the Transformer encoder (layer). This is a simplified representation of the encoding process of the converter from Equation 4 to Equation 7. express Stacked cascade form of layer Transformer encoders.
[0064] For the first-level encoding module 11, its input data is a PSG fragment sequence, and the output data of the Transformer encoder in step S202 is... To distinguish the representation, it is denoted as the first context feature sequence.
[0065] The Transformer encoder can employ a single-layer or multi-layer stacked structure, with each layer of the Transformer encoder having the same structure. (See reference...) Figure 4 The illustrated embodiment. In one specific embodiment, the Transformer encoder employs a 2-layer stacked structure (i.e., N=2), with each layer containing 4 attention heads (i.e., H=4). The hidden layer dimension of the feedforward network is set to 128. This parameter configuration achieves a good balance between model complexity and computational efficiency. Specifically, the 2-layer stacking enables the model to extract deeper feature representations, the 4 attention heads capture contextual dependencies from different representation subspaces, and the 128-dimensional hidden layers control the number of parameters while ensuring sufficient expressive power, thus avoiding overfitting.
[0066] Step S203: Input the first context feature sequence into the bidirectional gated recurrent unit for encoding to obtain the temporally enhanced second context feature sequence.
[0067] The bidirectional gated recurrent unit (BiGRU) can be a single-layer architecture or a multi-layer stacked cascade. This embodiment describes an example of a multi-layer (e.g., N-layer) stacked BiGRU. The process of the BiGRU performing inter-channel encoding on the first context feature sequence can be represented as follows:
[0068] in, The hidden layer vector feature dimension of a GRU unit is m / 2. Indicates the first The temporal enhancement feature sequence output by the layered BiGRU architecture is also known as the second context feature sequence.
[0069] Step S204: The second context feature sequence is fused to obtain the single-channel waveform timing features of the single-channel signal. This process can be represented as follows:
[0070]
[0071] in, Represents the two-dimensional time-frequency spectrum after filtering. The learned waveform timing features, that is, the single-channel waveform timing features corresponding to the single-channel signal of the i-th PSG segment and the j-th channel, Indicates the second context feature sequence of the first Temporal enhancement features The corresponding attention weight score, It is the Sigmoid activation function. and The exp function represents the learnable weight parameters, and exp represents the exponential function with the natural constant e as the base. The exp function is used to convert the real values of the input into positive numbers and obtain the attention weight scores in the form of a probability distribution through normalization.
[0072] In one specific embodiment, BiGRU adopts a 2-layer stacked structure, i.e., N=2, and the hidden layer vector feature dimension of each GRU unit is m / 2 (i.e., 64 dimensions). The 2-layer stacked BiGRU can capture higher-level temporal dependencies from both forward and backward directions, further enhancing the feature representation of temporal information.
[0073] After steps S201 to S204, the single-channel waveform timing characteristics of each channel of each PGS segment can be obtained. For example, for a single-channel signal... After being encoded by the first-level encoding module 11, the corresponding single-channel waveform timing characteristics can be obtained. .
[0074] Step S205: Construct the intra-segment waveform timing features of each polysomnography image segment based on the single-channel waveform timing features corresponding to each channel within each polysomnography image segment; and construct the waveform timing feature sequence of the polysomnography image segment sequence based on the intra-segment waveform timing features corresponding to each polysomnography image segment.
[0075] For example, for the i-th PSG segment, based on the single-channel signal of each of its channels. Corresponding single-channel waveform timing characteristics This allows us to obtain the waveform timing characteristics within the PSG segment. Furthermore, for a PSG segment sequence, the waveform timing characteristics within each PSG segment of the sequence are... The waveform timing characteristic sequence that constitutes this PSG segment sequence.
[0076] In some embodiments, the second-level encoding module 12 employs an encoding structure combining converter encoding and a bidirectional gated loop unit, and the second-level encoding module 12 is used for: Position encoding is added to the waveform timing feature sequence. The position-encoded waveform timing feature sequence is input to the converter encoder for encoding to obtain the third context feature sequence corresponding to the waveform timing feature sequence. The third context feature sequence is input to the bidirectional gated loop unit for encoding to obtain the timing-enhanced fourth context feature sequence. The fourth context feature sequence is fused to obtain the inter-segment timing transformation feature corresponding to the waveform timing feature sequence.
[0077] In this embodiment, the second-level encoding module 12 includes a converter encoder and a bidirectional gated loop unit cascaded in sequence. The network architecture of the converter encoder and the bidirectional gated loop unit can refer to the network architecture of the corresponding encoder in the first-level encoding module 11. The difference lies only in the network parameters and the input data used for processing. Therefore, the specific structure of the converter encoder and the bidirectional gated loop unit in the second-level encoding module 12 will not be described in detail here. Similarly, since the self-attention mechanism in the Transformer encoder itself does not contain position information, position encoding is also required for the input time series of the second-level encoding module 12.
[0078] The main difference between the second-level encoding module 12 and the first-level encoding module 11 lies in their input and output data. Specifically, the input data of the first-level encoding module 11 is the sequence of PSG segments to be segmented, and its output data is a waveform timing feature sequence, including the intra-segment waveform timing features of each PSG segment within the PSG segment sequence, reflecting the waveform information related to sleep stages contained in each PSG segment. The input data of the second-level encoding module 11 is the output data of the first-level encoding module 11, i.e., the waveform timing feature sequence as shown above, and its output data is the inter-segment waveform transition features, reflecting the sleep stage transition pattern information between multiple consecutive PSG segments. Furthermore, the second-level encoding module 12 does not require time-frequency conversion of the input data compared to the first-level encoding module 11.
[0079] In some embodiments, such as Figure 2 As shown, the sleep staging model 100 further includes an inter-channel encoding module 14, which is used to perform inter-channel topological correlation encoding on the waveform time-series feature sequence output by the first-level encoding module 11 to obtain a channel topological correlation feature sequence and output it to the second-level encoding module 12. The second-level encoding module 12 is used to extract inter-segment time-series transition features based on the channel topological correlation feature sequence.
[0080] In this embodiment of the present disclosure, an inter-channel encoding module 14 is provided between the first-level encoding module 11 and the second-level encoding module 12. The output data of the first-level encoding module 11 is encoded by the inter-channel encoding module 14 and then input to the second-level encoding module 12.
[0081] Optionally, the inter-channel coding module 14 performs inter-channel topological correlation coding on the waveform timing feature sequence output by the first-level coding module 11 as follows: Figure 5 As shown, it includes steps S301 to S304.
[0082] Step S301: For each polysomnography image segment in the polysomnography image segment sequence, the waveform timing features within the segment corresponding to the polysomnography image segment are used as a graph node matrix. Graph structure data is constructed based on the graph node matrix and a preset adjacency matrix. The graph node matrix includes multiple graph nodes, and each graph node corresponds to the single-channel waveform timing features of one channel of the polysomnography image segment. The preset adjacency matrix represents the topological connection relationship between channels.
[0083] Taking the i-th PSG segment as an example, in specific implementation, the waveform timing characteristics within the segment corresponding to the PSG segment are... As a graph node matrix, C represents the number of channels in the multichannel sleep graph, which is also the number of graph nodes in the graph node matrix. In other words, each of the C graph nodes corresponds one-to-one with a single-channel waveform timing feature, with each graph node representing a single-channel waveform timing feature. .
[0084] For the i-th PSG segment, the pre-defined adjacency matrix can be represented as: Based on graph node matrix and the preset adjacency matrix The constructed graph structure data can be represented as Among them, the pre-defined adjacency matrix It reflects the topological connections between the channels.
[0085] Step S302: Use a graph convolutional neural network (GCN) to encode the inter-channel topological correlation of the graph structure data to obtain intermediate graph structure data.
[0086] Among them, Graph Convolutional Neural Network (GCN) refers to a deep learning model specifically designed for processing graph-structured data. GCN can directly perform feature learning on graph data in non-Euclidean space.
[0087] In practice, a graph convolutional neural network may include one or more cascaded and stacked graph convolutional layers. For any graph convolutional layer, its encoding process can be represented as follows:
[0088] in, Indicates inclusion Input data for each graph node, This represents the predefined adjacency matrix. This represents the adjacency matrix of a self-loop. This represents the corresponding degree matrix, with its main diagonal elements... , and This represents the channel index, with a value range of [1, C]. Indicates inclusion The filtering and convolution parameter matrix of each graph convolution kernel. This represents the output feature matrix of the graph convolutional layer.
[0089] When a graph convolutional neural network includes two graph convolutional layers, the process of encoding graph-structured data by the graph convolutional neural network can be represented as follows:
[0090] in, This is a simplified representation of the graph convolutional layer encoding process represented by Equation 15. This represents the graph structure data of the i-th PSG segment, the high-dimensional graph convolutional features learned by the graph convolutional neural network, which is also the intermediate structure graph data.
[0091] Step S303: The pooling layer is used to fuse the graph nodes in the intermediate graph structure data to obtain the channel topology correlation features of the multi-channel sleep image segment.
[0092] Optionally, the pooling layer can be a max pooling layer, an average pooling layer, etc. For example, if the pooling layer is an average pooling layer, the processing procedure corresponding to step S303 can be represented as follows:
[0093] in, This indicates that at the level of graph convolutional features, the average pooling layer performs average pooling operations on graph nodes. This represents the channel topology correlation feature corresponding to the i-th PSG segment.
[0094] Step S304: The channel topology correlation features of each polysomnography image segment in the polysomnography image segment sequence constitute the channel topology correlation feature sequence.
[0095] The channel topology correlation feature sequence R can be obtained by sorting the channel topology correlation features of each PSG segment, which can be represented as follows: .
[0096] In one possible implementation, the preset adjacency matrix is a fully connected adjacency matrix that does not include node self-connections, and the value of each matrix element in the preset adjacency matrix is a first value.
[0097] For the i-th PSG segment, a pre-defined adjacency matrix is used. It can be represented as: (Equation 18) in, Represents the predefined adjacency matrix Inner Line number Column matrix elements, hour =0 indicates that the preset adjacency matrix is... It does not include node self-connections, and the default adjacency matrix has each element as the first value, i.e., for p. The first element of the matrix element q is 1.
[0098] During the pre-training phase of the sleep stage staging model 100, the polysomnography (PSG) samples in its training dataset may involve different numbers of channels and signal modes. For example, some PSG samples include two channels: electroencephalography (EEG) and electrooculography (EOG); some PSG samples include three channels: EEG, EOG, and ECG; and some PSG samples include two channels: EEG and EMG. This difference leads to the lack of a universal, prior channel topology connection relationship, making it difficult to apply modeling methods based on fixed topology.
[0099] To address this issue, the embodiments of this disclosure do not rely on any prior knowledge. Instead, the preset adjacency matrix is directly set to a fully connected adjacency matrix that does not contain node self-connections. This setting has two advantages: First, regardless of the number of channels in the training dataset or the signal modes involved, the preset adjacency matrix can be directly applied without redesigning the topology for different training samples. Second, the fully connected adjacency matrix can robustly represent the topological connection relationship between any two different channels while eliminating the influence of node self-connections. This allows the graph convolutional neural network to focus more on learning the mutual influence between channels and adaptively learn the actual correlation strength between channels during training, resulting in strong robustness.
[0100] When the sleep stage staging model 100 includes the aforementioned inter-channel coding module 14, the input data of the second-level coding module 12 is the channel topology correlation feature sequence R. At this point, the processing procedure of the second-level encoding module 12 includes the following steps: Step A1 involves adding positional encoding to the channel topological correlation feature sequence. This process can be represented as follows:
[0101] in, Characteristic sequences representing channel topological correlation Feature sequences after adding location information The PE matrix is represented, and its calculation method can be found in Equations 9-10.
[0102] Step A2 involves inputting the position-encoded channel topology correlation feature sequence into the converter encoder for encoding to obtain the third context feature sequence corresponding to the channel topology correlation feature sequence. This process can be represented as follows:
[0103] in, Indicates the first The context feature sequence output by the Transformer encoder, also known as the third context feature sequence.
[0104] Step A3 involves inputting the third context feature sequence into a bidirectional gated recurrent unit for encoding to obtain a time-enhanced fourth context feature sequence. This process can be represented as follows:
[0105] in, Indicates the first The temporal enhancement feature sequence output by the layer BiGRU, also known as the fourth context feature sequence.
[0106] Step A4: The fourth context feature sequence is fused to obtain the inter-segment timing transition features corresponding to the waveform timing feature sequence. This process is represented as follows:
[0107] in, The attention mechanism feature fusion process represented by (Equation 13)-(Equation 14) Indicates a fourth context feature sequence The temporal transition features between segments obtained through learning.
[0108] In this embodiment, the sleep stage segmentation model 100 includes a first-level encoding module 11, an inter-channel encoding module 14, a second-level encoding module 12, and a classification output module 13 connected in sequence. This configuration can significantly improve the performance of automatic sleep stage segmentation, specifically in the following aspects: Firstly, hierarchical feature extraction better aligns with the clinical analysis process. This embodiment achieves hierarchical feature extraction from the raw signal to the final sleep staging result through the sequential connection of four modules. This closely matches the actual analysis process of clinicians. Specifically: the first-level encoding module 11 is responsible for extracting the waveform temporal features within a segment, equivalent to a doctor analyzing specific waveform information within a single 30-second PSG segment, such as the position, proportion, and duration of alpha rhythm waves, theta rhythm waves, K-complex waves, and sleep spindle waves; the inter-channel encoding module 14 is responsible for fusing inter-channel topological correlation information, equivalent to a doctor comprehensively judging the coordinated change patterns of different signal channels (such as EEG, EOG, EMG) in the same sleep stage, for example, the simultaneous appearance of low-amplitude mixed-frequency waves in the EEG channel and rapid eye movement waves in the EOG channel during REM sleep; the second-level encoding module 12 is responsible for extracting inter-segment temporal transition features, equivalent to a doctor analyzing the transition patterns between sleep stages in multiple consecutive segments, such as long duration of similar stages or short transition of dissimilar stages; and the classification output module 13 is responsible for outputting the final sleep stage staging result based on the above multi-level features. This hierarchical feature extraction method makes the model's decision-making process interpretable, and the function of each module can be mapped to a specific step in the clinical analysis, thus better aligning with the clinical staging and calibration process.
[0109] Secondly, multi-level timing coding can fully capture the timing information of PSG segments. In the embodiments of this disclosure, the first-level coding module 11 adopts a progressive structure combining a Transformer encoder and a BiGRU. First, the one-dimensional signal of each channel of the PSG segment is converted into a two-dimensional time spectrum using a short-time Fourier transform. Then, the self-attention mechanism of the Transformer encoder is used to capture the long-distance dependencies in the time spectrum and learn the waveform context features. Finally, the BiGRU is used to perform timing modeling of the context feature sequence from both forward and reverse directions to enhance the waveform timing representation. Compared with a single-level coding architecture, this Transformer→BiGRU progressive coding method can more comprehensively capture the multi-dimensional timing characteristics of the signal waveform within the segment, focusing on both the contextual correlation of the waveform in the frequency domain and strengthening its evolution law on the time axis. The second-level encoding module 12 also adopts a progressive structure combining a Transformer encoder and a BiGRU to encode the sequence of waveform timing features within each segment. It fully learns the transition patterns of sleep stages between multiple consecutive PSG segments, such as the long duration of similar stages or the short transition of dissimilar stages. This design allows the model to refer to the contextual information of its preceding and following segments when judging a target segment, avoiding misjudgments caused by isolated analysis of a single PSG segment, and significantly improving the accuracy of staging.
[0110] Thirdly, graph convolutional neural network (GCN) encoding can effectively fuse channel topology information. Specifically, in this embodiment, an inter-channel encoding module 14 is set between the first-level encoding module 11 and the second-level encoding module 12, and a graph convolutional neural network (GCN) is used to encode the multi-channel waveform representation, thus solving the problem that the prior art is difficult to capture inter-channel topology information. To address the technical challenges of varying channel numbers and signal modes in polysomnography (PSG) maps across different application scenarios, and the lack of prior topological connectivity, this disclosure proposes a fully connected adjacency matrix without self-connections. This approach, independent of any prior knowledge, offers strong universality and robustness. By constructing the multi-channel waveform representation of each PSG segment as graph-structured data, GCN can perform convolution operations on the graph structure, aggregating information from adjacent nodes along graph edges. This explicitly models the topological connectivity between channels and extracts discriminative cross-channel association patterns, such as the collaborative changes between EEG and EOG during REM. Furthermore, the design excluding self-connections ensures that the GCN's information aggregation process focuses solely on information transfer between different channels, preventing excessive dominance of node characteristics in the aggregation process and allowing for a greater focus on learning meaningful cross-channel associations.
[0111] Furthermore, the sleep stage staging model in this embodiment adopts an end-to-end neural network architecture, with PSG fragment sequences as input and sleep stage staging results as output, and the entire process requires no manual intervention; and by processing overnight PSG records through a sliding window method, the sleep staging task for a single patient can be completed in seconds, which is significantly more efficient than the 2 hours of manual analysis time; when processing large-scale datasets (such as research projects containing hundreds of PSG records), this embodiment can complete the entire staging work in hours, while manual annotation would take weeks or even months.
[0112] In some embodiments, the classification output module 13 includes a fully connected layer and a classifier connected in sequence. The fully connected layer FC can be a single layer or multiple layers, and the classifier can be a softmax classifier or other types of classifiers.
[0113] At this point, the step of performing sleep stage segmentation on the target segment in the polysomnography image segment sequence based on the inter-segment temporal transition features includes: Step B1: Input the inter-segment temporal transformation features into a fully connected layer and perform linear mapping to obtain the output features of the fully connected layer.
[0114] Step B2: Input the output features of the fully connected layer into the classifier, and use the classifier to calculate the probability distribution of the target segment belonging to each sleep stage.
[0115] In some embodiments, the target segment is a polysomnography segment located at the middle time point in the polysomnography segment sequence, i.e., the middle segment of the sequence. The PSG segment at time 1. Continuing with the example of a PSG segment sequence consisting of L consecutive PSG segments, assuming L=11, the target segment is the 6th PSG segment in the sequence.
[0116] This setup fully utilizes contextual information and aligns with the clinical analysis process. Specifically, in clinical sleep staging, when doctors determine the sleep stage of a 30-second PSG segment, they do not analyze the PSG segment in isolation but refer to the information of its preceding and following PSG segments. For example, if the waveform characteristics of a PSG segment are not obvious, but its preceding and following PSG segments are both REM sleep, then the segment is also likely to belong to the REM sleep stage. Furthermore, sleep stage transitions usually occur between several consecutive PSG segments, such as a gradual transition from non-rapid eye movement (NREM) stage 1 to NREM stage 2. This gradual change requires contextual information for accurate identification. This embodiment of the present disclosure, by inputting L consecutive PSG segments, enables the sleep staging model 100 to simultaneously acquire the preceding and following contextual information of the target segment, thus better aligning with the actual analysis process of clinicians. Furthermore, placing the target segment in the middle of the PSG segment sequence ensures that there are an equal number of reference segments before and after it (5 before and 5 after), making the contextual information symmetrical and complete. This avoids the problem of missing information on one side due to the target segment being located at the edge of the PSG segment sequence, effectively improving the accuracy and reliability of sleep stage segmentation.
[0117] In a specific example, when the classification output module 13 includes two fully connected layers (FC) and a softmax classifier, the processing procedure of the classification output module 13 can be represented as follows:
[0118] in, This represents the learnable weight parameters of a fully connected (FC) layer. The target segment is located in the sleep stage, ReLU represents the activation function, and F represents the temporal transition feature between segments.
[0119] Step B3: Determine the sleep stage segmentation result of the target segment based on the probability distribution. Specifically, the sleep stage with the highest probability in the classifier output is taken as the final determined sleep stage.
[0120] In a specific example, such as Figure 6 As shown, the sleep stage staging model includes a first-level encoding module 11, an inter-channel encoding module 14, a second-level encoding module 12, and a classification output module 13. The input data for this staging model is the PSG fragment sequence to be staged. , Figure 6Taking L=11 and time t as an example, the input data x6, x5, and x7 at times t, (t-1), and (t+1) are shown. The first-level encoding module 11 includes L first-level encoding units, each of which includes structures such as time-frequency transformation, position encoding, Transformer encoder, BiRGU, and attention fusion. The inter-channel encoding module 14 includes L graph convolutional neural networks (GCNs). The second-level encoding module 12 includes structures such as position encoding, Transformer encoder, BiRGU, and attention fusion. The classification output module 13 includes fully connected layers and a softmax classifier, ultimately outputting the sleep stage segmentation result of the target segment.
[0121] Correspondingly, such as Figure 7 As shown, the process of the sleep stage staging model predicting the stage of an input polysomnography image sequence includes the following steps: Step S401: Perform time-frequency transformation on the single-channel signal of each channel of any PSG segment in the PSG segment sequence to generate a two-dimensional time-frequency spectrum.
[0122] Step S402: Input the two-dimensional time spectrum into the Transformer encoder of the first-level encoding module to encode and obtain the first context feature sequence corresponding to the single-channel signal.
[0123] Step S403: Input the first context feature sequence into BiGRU for encoding to obtain the second context feature sequence.
[0124] Step S404: The second context feature sequence is fused to obtain the single-channel waveform timing features corresponding to the single-channel signal. The single-channel waveform timing features corresponding to each PSG segment constitute the intra-segment waveform timing features of the PSG segment. The intra-segment waveform timing features corresponding to each PSG segment constitute the waveform timing feature sequence.
[0125] Step S405: Construct graph structure data for waveform time sequence features within segments in the waveform time sequence.
[0126] Step S406: Input the graph structure data into the graph convolutional neural network for encoding to obtain intermediate graph structure data.
[0127] Step S407: The pooling layer is used to fuse the graph nodes in the intermediate graph structure data to obtain the channel topology correlation features of the PSG fragments. The channel topology correlation features of each PSG fragment constitute the channel topology correlation feature sequence.
[0128] Step S408: Input the channel topology correlation feature sequence into the Transformer encoder of the second-level encoding module to obtain the third context feature sequence.
[0129] Step S409: Input the third context feature sequence into the bidirectional gated recurrent unit for encoding to obtain the temporally enhanced fourth context feature sequence.
[0130] Step S410: The fourth context feature sequence is fused to obtain the inter-segment temporal transition features corresponding to the channel topological correlation feature sequence.
[0131] Step S411: Input the inter-segment temporal transformation features into the classification output module to obtain the sleep stage to which the target segment belongs.
[0132] In some embodiments, such as Figure 8 As shown, the steps for obtaining the polysomnography image segment sequence to be staged include: Step S501: Obtain the polysomnography to be staged, and divide the polysomnography to be staged into multiple time-continuous polysomnography image segments according to the preset segment length.
[0133] The duration of the polysomnography to be staged is typically the entire night, such as 6-8 hours at night, and the preset segment length is 30 seconds as shown above. It is understood that in practical applications, the preset segment length can be set to other lengths according to the actual scenario, and this disclosure does not limit it.
[0134] Step S502: Select a first number of polysomnography segments from the plurality of polysomnography segments in a sliding window manner. Each time the first number of polysomnography segments are selected, a sequence of polysomnography segments to be staged is formed. The first number is a natural number greater than 2.
[0135] For example, with a preset segment length of 30 seconds and a first quantity L=11, the sliding window extracts 11 consecutive PSG segments each time as a PSG segment sequence to be staged. Based on this PSG segment sequence, the sleep stage stage prediction is performed on the 6th PSG segment in the middle using the sleep stage stage stage prediction model 100. Then the sliding window moves forward by one PSG segment and continues to predict the sleep stage of the next PSG segment until the entire polysomnography of the night is traversed.
[0136] By using a sliding window approach, L consecutive segments are sequentially extracted from the overnight PSG recording. The window slides one PSG segment at a time, allowing prediction of almost all PSG segments throughout the night. For example, the first sliding window covers PSG segments 1-11, predicting the sleep stage of PSG segment 6; the second covers PSG segments 2-12, predicting the sleep stage of PSG segment 7; the third covers PSG segments 3-13, predicting the sleep stage of PSG segment 8, and so on, until the entire overnight PSG recording is covered. With this setup, all PSG segments in the middle can be accurately predicted, except for the first and last (L-1) / 2 PSG segments which cannot be predicted because they cannot form a complete window.
[0137] In some embodiments, the polysomnography includes at least two of the following: electroencephalogram (EEG), eye movement graph (EMG), electromyogram (EMG), electrocardiogram (ECG), and respiratory recording. The sleep stage to which the target segment belongs includes wakefulness, non-rapid eye movement (NREM) stage 1, NREM stage 2, NREM stage 3, and rapid eye movement (REM) stage.
[0138] It is understandable that polysomnography can also include physiological signals from other channels, and is not limited to the few physiological signals recorded above; sleep stages can also be included according to specific application scenarios, including sleep stages under other classification standards, and are not limited to the five sleep stages in the examples above.
[0139] In one possible implementation, such as Figure 9 As shown, the step of inputting the polysomnography image segment sequence into the pre-trained sleep stage staging model further includes: Step S601: Obtain a public dataset, which includes multiple raw polysomnography maps.
[0140] For example, the public datasets include the SleepEDF dataset and the HMC dataset. The SleepEDF dataset contains 197 overnight PSG records, and the HMC dataset contains 151 overnight PSG records. During the pre-training phase, the public datasets can be divided into training and test sets in a certain ratio, such as 9:1, denoted as […]. and ,in This represents the original polysomnography plots in the training set. This represents the raw polysomnography (PSG) images from the test set, all of which are overnight PSG records. express The corresponding all-night tag, express The corresponding night tags include the sleep stages at each location throughout the night.
[0141] Step S602: Divide the original polysomnography into multiple time-continuous polysomnography image segments according to the preset segment length. Select a first number of polysomnography image segments from the multiple polysomnography image segments in sequence as training samples by means of a sliding window. Each training sample corresponds to a real sleep stage label.
[0142] Step S602 involves the preprocessing of the public dataset. Specifically, each overnight PSG record in the training and test sets is divided into 30-second time intervals, generating 30-second PSG segment sequences at the overnight scale. Then, the overnight PSG segment sequences are sequentially extracted in a first-order manner, thus preprocessing the training and test sets into multiple PSG segment sequences of length L. Any PSG segment sequence in the training set is used as a training sample, and similarly, any PSG segment sequence in the test set is used as a test sample.
[0143] For any training sample, its true sleep stage label is the PSG segment at the middle time in the PSG segment sequence, that is, the sleep stage to which the PSG segment at time t=(L+1) / 2 belongs is used as the true sleep stage label of the training sample.
[0144] Step S603: Use the training samples to pre-train the initial sleep stage segmentation model to obtain the pre-trained sleep stage segmentation model.
[0145] The training process can be represented as follows: input the training sample into the initial sleep stage segmentation model, calculate the sleep stage to which the target segment in the training sample belongs through forward propagation, that is, the predicted sleep stage, calculate the loss value of the loss function based on the predicted sleep stage and the actual sleep stage label, update the network parameters of the sleep stage segmentation model through the backpropagation algorithm based on the loss value, and iteratively execute the above forward propagation and backpropagation steps until the model converges to obtain the trained sleep stage segmentation model.
[0146] Optionally, the cross-entropy loss function is used. Assuming that during the pre-training phase, for the z-th training sample, the sleep stage segmentation model predicts the sleep stage for the PSG segment at intermediate time t as follows: The true sleep stage label is represented as Then the loss function It can be represented as:
[0147] Where M represents the number of training samples. Let M represent the Mth training sample, and e represent the category index of the sleep stage, ranging from 1 to 5, corresponding to 5 different sleep stages. This represents the true value of the z-th training sample during the e-th sleep stage. It is 1 if the training sample truly belongs to the e-th class, and 0 otherwise. This represents the probability that the z-th training sample is predicted to be in the e-th sleep stage, and its value ranges from 0 to 1.
[0148] It should be noted that the process of pre-training the initial sleep stage segmentation model using training samples also includes: testing the pre-trained sleep stage segmentation model using test samples, and fine-tuning the model parameters of the sleep stage segmentation model based on the test results.
[0149] In one specific embodiment, the model structure parameters in the sleep stage staging model are as follows: the number of frequency band mappings m is 128, the number of Transformer encoder and BiGRU layers is 2, the number of attention heads of the Transformer encoder is 4, and the hidden layer dimension of the feedforward network is 128.
[0150] Based on this sleep stage segmentation model, the following parameters were used to train the model: the number of epochs for training set traversal was set to 50, the batch size for training samples was set to 64, the optimizer was Adam, and the learning rate was set to 10%. -4 Experiments show that this parameter configuration enables the model to converge stably during training and achieve excellent generalization performance on the test set. Specifically, the Adam optimizer can adaptively adjust the learning rate, making it suitable for handling high-dimensional sparse gradients; 10 -4 The learning rate ensures convergence speed while avoiding oscillations; 50 traversals allow the model to fully learn the features of the training data; and the batch size of 64 achieves a balance between memory efficiency and gradient estimation accuracy.
[0151] The performance of sleep staging results can be reflected by classification task metrics such as overall accuracy, macro-average F1 score, and Cohen's Kappa coefficient. To verify the staging performance of the sleep stage staging model in this embodiment, the applicant used 10-fold cross-validation based on the aforementioned public dataset. The verification showed that, compared to traditional sleep staging methods, this disclosure has higher staging accuracy, macro-average F1 score, and Cohen's Kappa coefficient. Specifically, the classification accuracy can be improved by 1-2%, and the F1 score and Cohen's Kappa coefficient can be improved by 2-3%.
[0152] Based on the same inventive concept, a second aspect of this disclosure provides a sleep stage segmentation apparatus, including a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein the processor executes the program to implement the steps of the sleep stage segmentation method as described above.
[0153] Based on the same inventive concept, a third aspect of this disclosure provides a computer-readable storage medium having a computer program stored thereon, which, when executed by a processor, implements the steps of the sleep stage staging method as described above.
[0154] In specific implementation, computer-readable storage media may include: Universal Serial Bus Flash Drive (USB), portable hard drive, Read Only Memory (ROM), Random Access Memory, magnetic disk or optical disk, and other storage media that can store program code.
[0155] Obviously, the above embodiments of this disclosure are merely examples for clearly illustrating this disclosure, and are not intended to limit the implementation of this disclosure. For those skilled in the art, other variations or modifications can be made based on the above description. It is impossible to exhaustively list all implementation methods here. Any obvious variations or modifications derived from the technical solutions of this disclosure are still within the protection scope of this disclosure.
Claims
1. A method for sleep stage segmentation, characterized in that, Includes the following steps: Obtain a sequence of polysomnography images to be staged, wherein the sequence of polysomnography images includes multiple temporally consecutive polysomnography images. The polysomnography image segment sequence is input into a pre-trained sleep stage staging model, which is configured to perform sleep stage staging on the input polysomnography image segment sequence and output the staging results; wherein... The sleep stage staging model includes a first-level encoding module, a second-level encoding module, and a classification output module. The first-level encoding module is used to extract the intra-segment waveform temporal features of each polysomnography image segment in the polysomnography image segment sequence. The intra-segment waveform temporal features corresponding to each polysomnography image segment constitute a waveform temporal feature sequence. The second-level encoding module is used to extract inter-segment temporal transition features based on the waveform temporal feature sequence. The classification output module is used to perform sleep stage staging on the target segments in the polysomnography image segment sequence according to the inter-segment temporal transition features and output the sleep stage staging results. The first-level encoding module and / or the second-level encoding module adopt an encoding structure that combines converter encoding and bidirectional gated cyclic units.
2. The sleep stage segmentation method according to claim 1, characterized in that, The first-level encoding module adopts an encoding structure combining converter encoding and bidirectional gated loop units. The first-level encoding module is used for: For each channel of the multi-sleep image segment in the multi-sleep image segment sequence, the single-channel signal is transformed by time-frequency to generate a two-dimensional time-frequency spectrum. The two-dimensional time-frequency spectrum is input to the converter encoder for encoding to obtain the first context feature sequence corresponding to the single-channel signal. The first context feature sequence is input to the bidirectional gated loop unit for encoding to obtain the second context feature sequence with time enhancement. The second context feature sequence is fused to obtain the single-channel waveform time-series feature of the single-channel signal. In this context, the single-channel waveform timing features corresponding to each channel within each polysomnography image segment constitute the intra-segment waveform timing features of that polysomnography image segment, and the intra-segment waveform timing features corresponding to each polysomnography image segment constitute the waveform timing feature sequence of that polysomnography image segment sequence.
3. The sleep stage segmentation method according to claim 1, characterized in that, The second-level encoding module employs an encoding structure combining converter encoding and bidirectional gated loop units. The second-level encoding module is used for: Position encoding is added to the waveform timing feature sequence. The position-encoded waveform timing feature sequence is input to the converter encoder for encoding to obtain the third context feature sequence corresponding to the waveform timing feature sequence. The third context feature sequence is input to the bidirectional gated loop unit for encoding to obtain the timing-enhanced fourth context feature sequence. The fourth context feature sequence is fused to obtain the inter-segment timing transformation feature corresponding to the waveform timing feature sequence.
4. The sleep stage segmentation method according to any one of claims 1 to 3, characterized in that, The sleep staging model further includes an inter-channel encoding module, which is used to perform inter-channel topological correlation encoding on the waveform time-series feature sequence output by the first-level encoding module to obtain a channel topological correlation feature sequence and output it to the second-level encoding module. The second-level encoding module is used to extract inter-segment time-series transition features based on the channel topological correlation feature sequence.
5. The sleep stage segmentation method according to claim 4, characterized in that, The steps of the inter-channel coding module performing inter-channel topological correlation coding on the waveform time-series feature sequence output by the first-level coding module include: For each polysomnography image segment in the polysomnography image segment sequence, the waveform temporal features within the segment corresponding to the polysomnography image segment are used as a graph node matrix. Graph structure data is constructed based on the graph node matrix and a preset adjacency matrix. The graph node matrix includes multiple graph nodes, and each graph node corresponds to the single-channel waveform temporal features of one channel of the polysomnography image segment. The preset adjacency matrix represents the topological connection relationship between channels. Intermediate graph structure data is obtained by encoding the inter-channel topological correlation of the graph structure data using a graph convolutional neural network. The channel topology correlation features of the multi-channel sleep image segment are obtained by fusing the graph nodes in the intermediate graph structure data using a pooling layer. The channel topology correlation features of each polysomnography image segment in the polysomnography image segment sequence constitute the channel topology correlation feature sequence.
6. The sleep stage segmentation method according to claim 5, characterized in that, The preset adjacency matrix is a fully connected adjacency matrix that does not include node self-connections, and the value of each matrix element in the preset adjacency matrix is a first value.
7. The sleep stage segmentation method according to claim 1, characterized in that, The classification output module includes a fully connected layer and a classifier connected in sequence. The step of classifying the target segments in the polysomnography image segment sequence into sleep stages based on the inter-segment temporal transition features includes: The inter-segment temporal transformation features are input into a fully connected layer and linearly mapped to obtain the output features of the fully connected layer. The output features of the fully connected layer are input into the classifier, and the classifier calculates the probability distribution of the target segment belonging to each sleep stage. The sleep stage segmentation result of the target segment is determined based on the probability distribution.
8. The sleep stage segmentation method according to claim 7, characterized in that, The target segment is the polysomnography segment located at the middle time point in the polysomnography segment sequence.
9. The sleep stage segmentation method according to claim 1, characterized in that, The steps to obtain the polysomnography image segment sequence to be staged include: Obtain the polysomnography to be staged, and divide the polysomnography to be staged into multiple time-continuous polysomnography image segments according to a preset segment length; A first number of polysomnography images are selected sequentially from the plurality of polysomnography image segments using a sliding window method. Each time the first number of polysomnography image segments are selected, a sequence of polysomnography image segments to be staged is formed. The first number is a natural number greater than 2.
10. The sleep stage segmentation method according to claim 1, characterized in that, Before the step of inputting the polysomnography image segment sequence into the pre-trained sleep stage staging model, the following steps are also included: Obtain a publicly available dataset, which includes multiple raw multichannel sleep graphs; The original polysomnography is divided into multiple time-continuous polysomnography image segments according to the preset segment length. A first number of polysomnography image segments are selected sequentially from the multiple polysomnography image segments as training samples by means of a sliding window. Each training sample corresponds to a real sleep stage label. The initial sleep stage segmentation model is pre-trained using the training samples to obtain the pre-trained sleep stage segmentation model.
11. The sleep stage segmentation method according to claim 1, characterized in that, The polysomnography includes at least two of the following: simultaneous monitoring of electroencephalogram (EEG), eye movement (EMG), electromyography (EMG), electrocardiogram (ECG), and respiratory recording. The sleep stage to which the target segment belongs includes wakefulness, non-rapid eye movement (NREM) stage 1, NREM stage 2, NREM stage 3, and rapid eye movement (REM) stage.
12. A sleep stage segmentation device, characterized in that, It includes a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein the processor, when executing the program, implements the steps of the sleep stage staging method as described in any one of claims 1 to 11.
13. A computer-readable storage medium having a computer program stored thereon, characterized in that, When the program is executed by the processor, it implements the steps of the sleep stage staging method as described in any one of claims 1 to 11.