A method and device for training and applying a classification model of an abdominal audio signal

By using a multi-point signal fusion and feature adaptive optimization mechanism, the abdominal audio signal is optimized using distance and correlation weights, and a high-precision model is trained. This solves the problems of relying on doctors' subjective experience and ignoring spatial distribution in existing technologies, and realizes a refined and objective assessment of abdominal health status.

CN122245352APending Publication Date: 2026-06-19THE FIFTH MEDICAL CENT OF CHINESE PLA GENERAL HOSPITAL

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
THE FIFTH MEDICAL CENT OF CHINESE PLA GENERAL HOSPITAL
Filing Date
2026-02-06
Publication Date
2026-06-19

AI Technical Summary

Technical Problem

Existing technologies rely on doctors' subjective experience when assessing gastrointestinal function, lack objective quantitative standards, and ignore the spatial distribution differences of different anatomical regions of the abdomen, resulting in poor consistency, low repeatability, and insufficient classification accuracy in diagnostic results.

Method used

By constructing a multi-point signal fusion and feature adaptive optimization mechanism, and utilizing distance weight, correlation weight, and internal weight parameters, intelligent purification and dynamic weighting of abdominal audio signals are achieved. The spatial distribution information of multi-channel signals is fused to train a high-precision machine learning model.

Benefits of technology

It enables a refined and objective assessment of abdominal health, improves the accuracy and consistency of diagnosis, provides repeatable objective assessment results, and eliminates subjective differences among doctors.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN122245352A_ABST
    Figure CN122245352A_ABST
Patent Text Reader

Abstract

A method and apparatus for training and applying a classification model of abdominal audio signals are disclosed. The method includes: acquiring a training dataset; preprocessing the audio signals of each location in each training dataset to obtain audio segments of each location, and extracting multidimensional acoustic features from the audio segments of each location to construct an initial feature vector for each location; calculating the correlation metric between each feature dimension and the category label of each location based on the category label of the training dataset, and assigning an initial value of the correlation weight to each feature dimension according to the correlation metric; performing an iterative process starting from the initial value of the correlation weight until a preset convergence condition is reached; outputting the machine learning model corresponding to the convergence condition as the completed classification model, and outputting the correlation weight at this point as the final correlation weight, thereby improving the accuracy and objectivity of the classification model used for abdominal health status assessment.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This article relates to medical signal processing technology, and in particular to a method and apparatus for training and applying a classification model of abdominal audio signals. Background Technology

[0002] In clinical medicine, collecting abdominal bowel sounds with a stethoscope is an important means of assessing gastrointestinal function. In recent years, with the development of electronic stethoscopes and signal processing technology, digital analysis and feature extraction of bowel sound audio to achieve objective evaluation of gastrointestinal function, postoperative recovery, or the effects of specific interventions (such as acupuncture) has become an important research direction.

[0003] In existing technologies, using abdominal bowel sounds for health status assessment faces two main problems: First, the assessment process heavily relies on the physician's subjective experience and auditory judgment, lacking objective and quantitative standards, resulting in poor consistency and low reproducibility of diagnostic results. Second, traditional methods typically analyze audio signals acquired from a single point, neglecting the potential spatial distribution differences of bowel sounds across different anatomical regions of the abdomen. This spatial information is crucial for comprehensively assessing intestinal function (such as the direction of peristalsis and the strength of regional activity). Existing technologies struggle to effectively integrate complementary and differential information from multi-point audio signals, leading to limited model representation capabilities and insufficient classification accuracy for complex abdominal physiological states. Summary of the Invention

[0004] This application provides a method and apparatus for training and applying a classification model of abdominal audio signals.

[0005] A training method for a classification model of abdominal audio signals includes: Step A1: Obtain the training dataset, wherein each training dataset contains abdominal bowel sounds collected from different subjects under the same collection conditions, and is labeled with a category label representing the health status of the abdomen; the audio signals in each training dataset are audio signals collected from at least two different points on the same abdomen. Step A2: Preprocess the audio signals of each point in each training data set to obtain audio segments of each point, and extract multidimensional acoustic features from the audio segments of each point to construct the initial feature vector of each point. Step A3: Based on the category labels of the training dataset, calculate the correlation metric between each feature dimension and the category label for each point, and assign an initial value of the correlation weight to each feature dimension according to the correlation metric. Step A4: Starting with the relevance weights as the initial values, execute the iterative process including sub-steps A41 to A43 until the preset convergence condition is reached: Step A41: Using the relevance weights of the current iteration round, weight the initial feature vectors of each point to obtain weighted feature vectors for different points; for each feature dimension in the multidimensional acoustic features, arrange the feature values ​​of the weighted feature vectors of each point in that feature dimension according to the spatial order of the points, combine them into a fusion sub-vector corresponding to that feature dimension, and then concatenate the fusion sub-vectors corresponding to all feature dimensions in sequence to form a fusion feature vector; Step A42: Using the fused feature vector and the corresponding category label, train a machine learning model to optimize its internal weight parameters, and obtain scoring information reflecting the importance of each fused feature dimension based on the trained model; Step A43: Based on the scoring information, adjust the value of the relevance weight of the current iteration round as the relevance weight of the next iteration round; Step A5: The machine learning model that reaches the convergence condition is output as the completed classification model, and the relevance weight at this time is output as the final relevance weight.

[0006] An application method for a classification model of abdominal audio signals includes: Step B1: Under the preset acquisition conditions, collect bowel sound signals from at least two acquisition points on the abdomen of the subject to be tested, and obtain the acquisition signals at each point; Step B2: Preprocess the collected signals from each location to obtain audio segments from each location, and extract multidimensional acoustic features from the audio segments from each location to construct the initial feature vector for each location. Step B3: Using preset relevance weights, the initial feature vectors of each point are weighted to obtain weighted feature vectors of different points, wherein the relevance weights are used to characterize the relevance measure between each feature dimension of each point and the category label. Step B4: For each feature dimension in the multidimensional acoustic features, arrange the feature values ​​of the weighted feature vector of each point in the feature dimension according to the spatial order of the points, combine them into a fusion sub-vector corresponding to the feature dimension, and then concatenate the fusion sub-vectors corresponding to all feature dimensions in sequence to form a fusion feature vector. Step B5: Input the fused feature vector into a preset classification model to obtain the classification result output by the classification model, so as to obtain the information of the abdomen of the subject to be tested; wherein the classification model is obtained by training the fused feature vector corresponding to the training data in the training dataset corresponding to the acquisition conditions; wherein each set of training data records the abdominal bowel sounds collected from different subjects under the acquisition conditions, and is labeled with a category label representing the health status of the abdomen; the audio signal in each set of training data is the audio signal collected from at least two different points on the same abdomen.

[0007] An electronic device includes a memory and a processor, characterized in that the memory stores a computer program, and the processor is configured to run the computer program to perform the methods described above.

[0008] In this embodiment, during model training, by introducing correlation-based feature weights and dynamically weighting and iteratively optimizing the features at each location, the system can automatically strengthen features that contribute significantly to classification and suppress secondary or interfering features, achieving intelligent purification of multi-channel signals and effectively enhancing the discriminative information of the input model. By splicing the same feature at different locations into a fusion sub-vector in spatial order, this method explicitly encodes the distribution information of acoustic characteristics in the abdomen within the features, enabling the model to learn and utilize the spatial patterns of intestinal activity for more refined comprehensive state discrimination beyond single-point analysis. Through the closed-loop iteration of "feature weight optimization - model training feedback," the front-end feature fusion strategy and the back-end classification model training are strongly coupled and synergistically optimized, ensuring optimal overall system performance from feature construction to classification decision-making, ultimately driving a systematic improvement in the accuracy and objectivity of abdominal health status assessment.

[0009] During model application, the introduction of correlation weights enables intelligent purification of feature signals, effectively enhancing the discriminative information of the input model. Through structured fusion, spatial distribution patterns are encoded in the features, enabling the model to make more refined comprehensive judgments. Furthermore, by adopting a dedicated model that is strongly coupled with the front-end process, the optimal performance of the overall system is ensured. Thus, by providing higher-quality and more structured input signals, the dedicated model drives a systematic improvement in the accuracy and objectivity of abdominal health status assessment.

[0010] Furthermore, by collecting standardized audio signals, extracting quantitative acoustic features, and using a trained high-precision model for discrimination, subjective differences among doctors are completely eliminated, providing repeatable and comparable objective evaluation results, and offering a reliable tool for clinical diagnosis and research.

[0011] Other features and advantages of this application will be set forth in the following description, and will be apparent in part from the description, or may be learned by practicing the application. Other advantages of this application can be realized and obtained by means of the embodiments described in the description and the accompanying drawings. Attached Figure Description

[0012] The accompanying drawings are used to provide an understanding of the technical solutions of this application and constitute a part of the specification. They are used together with the embodiments of this application to explain the technical solutions of this application and do not constitute a limitation on the technical solutions of this application.

[0013] Figure 1A flowchart illustrating the training method for the classification model of abdominal audio signals provided in this application embodiment; Figure 2 This is a flowchart illustrating the application method of the classification model for abdominal audio signals provided in this embodiment of the application. Detailed Implementation

[0014] This application describes several embodiments, but these descriptions are exemplary and not limiting, and it will be apparent to those skilled in the art that many more embodiments and implementations are possible within the scope of the embodiments described herein. Although many possible combinations of features are shown in the drawings and discussed in the detailed description, many other combinations of the disclosed features are also possible. Unless specifically limited, any feature or element of any embodiment may be used in combination with, or may replace, any feature or element of any other embodiment.

[0015] This application includes and contemplates combinations of features and elements known to those skilled in the art. The embodiments, features, and elements disclosed in this application can also be combined with any conventional features or elements to form unique inventive solutions. Any feature or element of any embodiment can also be combined with features or elements from other inventive solutions to form another unique inventive solution. Therefore, it should be understood that any feature shown and / or discussed in this application can be implemented individually or in any suitable combination. Therefore, the embodiments are not limited except by the limitations imposed by the appended claims and their equivalents. Furthermore, various modifications and changes can be made within the scope of the appended claims.

[0016] Furthermore, in describing representative embodiments, the specification may have presented methods and / or processes as a specific sequence of steps. However, the method or process should not be limited to the specific order of steps described herein, to the extent that it does not depend on such a specific order. As will be understood by those skilled in the art, other sequences of steps are also possible. Therefore, the specific order of steps set forth in the specification should not be construed as a limitation of the claims. Moreover, the claims concerning the method and / or process should not be limited to the steps performed in the written order, and those skilled in the art will readily understand that these orders can be varied and still remain within the spirit and scope of the embodiments of this application.

[0017] One of the core innovations of this solution lies in the construction of an adaptive, multi-level weighted fusion framework. This framework achieves fine-grained control over the entire process of abdominal bowel sound signal processing, from spatial perception to feature optimization, through the coordinated application of multiple weights. These weights play a role in different stages of signal processing and model training, and their specific definitions and explanations are as follows: 1. Distance weight Symbol: q i , where i represents the distance weight of the i-th collection point, and i is a positive integer.

[0018] Target and Purpose: This applies to the raw audio signals from each acquisition point. Its purpose is to pre-weight signals from different spatial locations based on prior knowledge of anatomical structures before preprocessing, amplifying signal components closer to the core area of ​​interest (such as acupuncture stimulation targets) and attenuating distant signals that may contain more noise or irrelevant physiological activity, thus achieving signal-level spatial filtering.

[0019] Calculation basis and method: The weight is calculated based on the physical distance di between the i-th sampling point and the preset center point (usually the navel or a major acupuncture point). The closer the distance, the greater the weight; the farther the distance, the smaller the weight.

[0020] 2. Relevance weights Symbol: r ij , represents the relevance weight of the j-th dimension in the feature vector of the i-th collection point, where j is a positive integer.

[0021] Target and Purpose: The correlation weights are defined as a set of adjustable coefficients used to scale the initial feature vector dimension by dimension to characterize the contribution of different features to the classification task. The purpose is to, at the feature level, weight different feature dimensions based on the statistical regularities of the data itself, enhance the influence of features strongly correlated with the acupuncture intervention time series state, and suppress weakly correlated or noisy features.

[0022] Calculation basis: The initial value or update basis of this weight is the statistical correlation between the feature dimension and the time-series category label. Among them, the feature dimension with higher correlation at the collection point will be assigned a larger weight.

[0023] Preferably, for features that are linearly related to the label, the Pearson correlation coefficient can be used as a measure; for non-linear dependencies, mutual information entropy can be used as a measure.

[0024] In addition, during model training, the relevance weights r are also adjusted. ij Iterative optimization is also performed to dynamically adjust the feature importance assessment based on feedback during model training, so that the feature weighting strategy can be adaptively optimized.

[0025] 3. Internal weighting parameters Internal weight parameters are variables that machine learning models (such as logistic regression, support vector machines, neural networks, etc.) optimize themselves during their learning process to establish a mapping relationship between the input fused feature vector and the output class label.

[0026] During the model training phase, constraints can also be imposed on the internal weight parameters, causing the values ​​of some internal weight parameters that contribute little to the final classification decision in the machine learning model to approach or equal to zero, thereby achieving sparsity filtering of the feature space at the model level.

[0027] This scheme constructs a complete analysis chain through the sequential and synergistic application of the three weighting methods mentioned above: First, distance weighting is used to enhance spatial signals based on anatomical priors; then, correlation weighting is used for data-driven initial screening and weighting of feature importance; next, the model completes the classification task by optimizing its internal weight parameters, while generating feature importance feedback; finally, this feedback is used to continuously optimize the correlation weights through a dynamic weight update mechanism. This multi-layered framework ensures that the system can simultaneously utilize prior knowledge, statistical data patterns, and model feedback to adaptively focus on bowel sound features that best characterize the temporal effects of acupuncture intervention, thereby achieving high-precision and highly robust intelligent analysis.

[0028] Example 1 Current technologies typically analyze audio signals acquired from a single point, neglecting the potential spatial distribution differences of bowel sounds across different abdominal regions. This spatial information is crucial for comprehensively assessing intestinal function (e.g., peristaltic direction and regional activity intensity). Existing technologies struggle to effectively integrate complementary and differential information from multi-point audio signals, resulting in limited model representation of complex abdominal physiological states and insufficient classification accuracy.

[0029] Therefore, this embodiment provides a training method for a classification model of abdominal audio signals, aiming to construct a classification model that can accurately identify the health status of the abdomen through an innovative multi-point signal fusion and feature adaptive optimization mechanism.

[0030] The core of this method lies in proposing a multi-site signal fusion mechanism based on iterative optimization of correlation weights. This method first assigns initial importance weights to features by calculating the correlation between each site, each feature dimension, and the health status category label. Based on this, it creatively introduces an iterative optimization closed loop encompassing "feature weighting - model training - weight adjustment." The principle is as follows: initial weights guide the model's initial learning; during training, the model provides importance scores for each fused feature dimension; these scores are used to dynamically adjust the correlation weights, thereby strengthening important features and weakening secondary or interfering features in the next iteration. In this way, the system can adaptively learn how to optimally fuse acoustic feature information from different sites in the abdomen, achieving refined and high-precision automatic classification of intestinal functional health status.

[0031] See Figure 1 The method includes steps A1 to A5.

[0032] Step A1: Obtain the training dataset, wherein each training dataset contains abdominal bowel sounds collected from different subjects under the same acquisition conditions, and is labeled with a category label representing the health status of the abdomen; the audio signals in each training dataset are audio signals collected from at least two different locations on the same abdomen.

[0033] For example, suppose there are M acquisition points set on the abdomen (5 in standard configuration), and the original audio signal acquired by the i-th acquisition point is: x i =[x i (1), x i (1),…, , x i (L)] T ; Where L = f s ×T is the signal length (f) s = 44.1kHz, T = 120s, then L≈5.3×10 6 ).

[0034] This step is used to acquire training data containing abdominal bowel sound audio from multiple locations and their corresponding health status labels, providing a high-quality, information-rich sample foundation for model training. Specifically, each data set in the training dataset corresponds to one abdominal examination of a subject (e.g., a patient). Each data set must contain bowel sound audio signals collected simultaneously or sequentially from at least two different acquisition points on the same abdomen (e.g., the umbilicus, left and right quadrants) to ensure the capture of potential spatial differences. Simultaneously, each data set is labeled with a clear category label, which characterizes the subject's abdominal health status at the time of acquisition (e.g., paralytic ileus, initial functional recovery, normal function, etc.).

[0035] This step uses audio signals from at least two locations as a set of training samples, providing an essential data foundation for subsequent spatial information fusion. This allows the model to potentially learn spatially distributed pathophysiological patterns that go beyond single-point auditory features. By constructing this structured, multi-location labeled dataset, a solid data foundation is laid for training a classification model with spatial awareness capabilities.

[0036] Step A2: Preprocess the audio signals of each point in each training data set to obtain audio segments of each point, and extract multidimensional acoustic features from the audio segments of each point to construct the initial feature vector of each point.

[0037] For example, D-dimensional features are extracted from the k-th frame at the i-th position to form the feature vector of that frame: f i,k =[f i,k,1 ,f i,k,2 ,…,f i,k,j ,…,f i,k,D ] T ; Where D is the total dimension of the features (e.g., 28 dimensions), and D is an integer greater than or equal to 2.

[0038] This step transforms the raw time-domain audio signal into a fixed-dimensional numerical feature vector that characterizes its acoustic properties for processing by the machine learning model. Specifically, the audio signal at each location is first preprocessed, including amplitude normalization to eliminate acquisition gain differences, pre-emphasis filtering to enhance high-frequency components, and frame-by-frame windowing to obtain short-term stationary audio segments. Then, multi-dimensional acoustic features are extracted from each audio segment, encompassing characteristics in the time domain (e.g., energy, zero-crossing rate) and frequency domain (e.g., Mel-frequency cepstral coefficients, spectral centroid). Finally, an initial feature vector is constructed for each location by calculating the statistical values ​​(e.g., mean) of all audio segments across each feature dimension.

[0039] This step systematically extracts a multi-dimensional acoustic feature set covering the time domain, frequency domain, and dynamic characteristics from short audio clips. The aim is to more comprehensively and precisely characterize the physical properties of bowel sounds, providing a rich feature base for subsequent models to distinguish subtle changes in physiological state.

[0040] Step A3: Based on the category labels of the training dataset, calculate the correlation metric between each feature dimension of each point and the category label, and assign initial values ​​for the correlation weights according to the correlation metric.

[0041] The purpose of this step is to provide a reasonable starting point for weights based on data statistics for subsequent iterative optimization processes. Specifically, for each feature dimension of the data sequence at each point (e.g., the values ​​of all training samples in the first dimension of MFCC at point P1), a correlation metric is calculated between the data sequence and the entire training set category label sequence.

[0042] This step introduces a data-driven weight initialization strategy, which uses prior statistical correlation information to set the optimization starting point. This avoids starting the training process from a completely uninformed state, guides the iterations to converge more quickly toward a more promising direction, and provides a statistically significant initial estimate for the feature fusion weights. This accelerates the convergence speed of subsequent iterative optimization processes and may increase the likelihood of finally converging to a better solution.

[0043] Step A4: Starting with the correlation weight as the initial value, execute the iterative process including sub-steps A41 to A43 until the preset convergence condition is reached.

[0044] This step aims to construct a closed-loop optimization framework that dynamically adjusts the feature fusion strategy to maximize model classification performance. It achieves this by continuously executing three sub-steps: first, fusing features with the current weights and inputting them into the model; second, training the model and evaluating the importance of each feature under the current feature fusion strategy; and finally, updating the weights based on the importance information provided by the model. This process is repeated until the changes in weights and / or model performance are less than a preset threshold, reaching convergence.

[0045] This step establishes a dynamic, feedback-driven iterative optimization loop, ensuring that the weights for feature fusion are no longer fixed in advance or based solely on simple rules. Instead, they can adaptively adjust based on the actual learning performance of the classification model, achieving synergistic optimization between the fusion strategy and model training. The model not only learns how to classify based on given features but also, in turn, guides how features should be fused to make them easier to classify, thereby significantly improving the quality of feature representation and the final performance of the model.

[0046] Step A41: Using the relevance weights of the current iteration round, weight the initial feature vectors of each point to obtain weighted feature vectors for different points; for each feature dimension in the multidimensional acoustic features, arrange the feature values ​​of the weighted feature vectors of each point in that feature dimension according to the spatial order of the points, combine them into a fusion sub-vector corresponding to that feature dimension, and then concatenate the fusion sub-vectors corresponding to all feature dimensions in sequence to form a fusion feature vector.

[0047] For example, apply correlation weights to the feature vector of each point: In the t-th iteration, the weighted feature vector of the k-th frame at the i-th point is f.i,k ′=[ r i1 (t) f i,k,1 , r i2 (t) f i,k,2 ,…, r ij (t) f i,k,j ,…, r iD (t) f i,k,D ] T ; Where, r ij (t) Let t be the relevance weight of the i-th point and j-th feature dimension in the t-th iteration, where t is a positive integer.

[0048] The weighted feature vectors of all M points are structurally concatenated according to their feature dimensions: In the t-th iteration, the fused sub-vector of the j-th feature dimension in the k-th frame is F. j,k =[ r 1j (t) f 1,j , r 2j (t) f 2,k,j ,…, r Mj (t) f M,k,j ] T ; The feature vector is merged into a concatenation of vectors from all feature dimensions: The fused feature vector of the k-th frame in the t-th iteration is F. k =[ F 1,k , F 2,k …,F j,k ,…, F D,k ] T .

[0049] The purpose of this step is to fuse feature information from multiple points into a unified feature representation based on the current relevance weights in each iteration, for model training. Specifically, the initial feature vector of each point is multiplied by the current weights corresponding to each dimension of that point to achieve feature-level weighting. Next, for each acoustic feature dimension (e.g., zero-crossing rate), the weighted feature values ​​of all points in that dimension are arranged according to the actual spatial order of the points in the abdomen (e.g., from left to right, from top to bottom) to form a "fusion sub-vector". This sub-vector reflects the distribution pattern of the feature in the abdominal space. Finally, the fusion sub-vectors of all feature dimensions are concatenated in order to form the final fused feature vector.

[0050] This step achieves "feature-level weighting" and "structured spatial fusion". It not only adjusts each feature numerically according to its importance, but also preserves the spatial arrangement information of the features, enabling the model to learn valuable spatial distribution patterns. This generates a high-quality fused feature vector that contains both importance-weighted information and spatial structure information, providing optimized input for effective training of the model in each iteration.

[0051] Step A42: Using the fused feature vector and the corresponding category label, train a machine learning model to optimize its internal weight parameters, and obtain scoring information reflecting the importance of each fused feature dimension based on the trained model.

[0052] This step aims to evaluate the model's classification ability under the current fused feature vector and deduce the contribution of each feature dimension in the current fusion strategy. Specifically, a classification model (such as logistic regression, gradient boosting tree, etc.) is trained using the fused feature vector generated in the current iteration and its labels. After the model is trained, the importance score of each fused feature dimension (i.e., each position in each fused sub-vector) for the classification decision is obtained by analyzing the model's internal parameters (such as the coefficients of a linear model and the feature importance attributes of a tree model).

[0053] This step treats model training as an evaluation and feedback process. It not only pursues the model's classification accuracy but also proactively mines the features upon which the model's decisions rely, providing crucial feedback signals for the next step of optimizing the fusion strategy. This achieves interpretable evaluation from "model classification performance" to "feature fusion quality." The obtained feature importance scores quantify the contribution of each spatial-feature unit under the current fusion strategy, providing direct and objective data for dynamically adjusting fusion weights.

[0054] Step A43: Based on the scoring information, adjust the value of the relevance weight of the current iteration round as the relevance weight of the next iteration round.

[0055] The purpose of this step is to complete the optimization loop by using the importance information fed back from the model to improve the feature fusion strategy. Specifically, based on the importance score obtained in step A42, the relevance weights corresponding to each feature dimension at each original point are adjusted. For example, the gradient ascent approach can be used, adding the change in importance score to the existing weights according to a certain proportion (learning rate), thereby increasing the weights of features that contribute significantly to classification and decreasing or keeping the weights of features that contribute less significantly unchanged.

[0056] This is a key step in achieving adaptive optimization, making the relevance weights a learnable variable that can continuously evolve in response to model feedback. This ensures that the fusion strategy is always optimized in the direction of improving the model's discriminative ability. Each iteration makes the feature fusion more in line with the current data distribution and the needs of the classification task, thereby gradually approaching the optimal fusion state.

[0057] Step A5: The machine learning model that reaches the convergence condition is output as the completed classification model, and the relevance weight at this time is output as the final relevance weight.

[0058] The purpose of this step is to solidify the final result of the iterative optimization, generating a complete tool that can be directly used for evaluation of new data. Specifically, the iteration stops when the process meets the convergence condition (such as the change in weights or the improvement in model performance being less than a threshold). The machine learning model obtained at this point, which has undergone sufficient co-optimization, is the final classification model that has been trained. At the same time, the relevance weights at this point are the optimized feature fusion parameters that match the model.

[0059] In the specific model implementation, the machine learning classification model can employ at least one of several classic algorithms. To achieve better classification performance, its key parameters can be configured; the following are exemplary settings: Logistic Regression: L2 regularization is used to prevent overfitting, and the regularization strength parameter C is set to 1.0; Support Vector Machine (SVM): Uses radial basis function (RBF) as kernel function, with penalty parameter C set to 10 and kernel function coefficient γ set to 0.01; K-Nearest Neighbors (KNN): Set the number of nearest neighbors K=5, and use Euclidean distance as the similarity metric; Adaptive boosting algorithm (AdaBoost): Uses decision trees as base classifiers and sets the number of iterations to 50; Extreme Gradient Boosting (XGBoost): The learning rate is configured to be 0.1 and the maximum depth of a single tree (max_depth) is 3, in order to balance model complexity and generalization ability.

[0060] These algorithms and their parameter configurations provide a reliable and reproducible foundation for model training, and can be adapted and optimized according to specific data and task requirements in practical applications.

[0061] Through the above steps, a high-performance and highly stable abdominal health status assessment system was produced. This system includes both a powerful classifier and a data-validated feature fusion scheme, which can be directly used to reliably evaluate the effectiveness of unknown samples.

[0062] Through the above-described classification model training method for abdominal audio signals, this embodiment addresses the problems of mismatch between training data and evaluation objectives, and the single information dimension by constructing a temporally and spatially standardized training dataset (step A1); it solves the problems of insufficient and non-specific feature representation by extracting multi-dimensional acoustic features to construct initial feature vectors (step A2); it solves the problems of blind optimization starting point and low efficiency by initializing fusion weights based on statistical correlation (step A3); and it fundamentally solves the core technical problem of static and rigid feature fusion strategies in existing technologies, which cannot adaptively optimize to accurately capture subtle changes in health status, by designing and implementing an iterative adaptive feature fusion and weight optimization mechanism (step A4), and finally outputting a collaboratively optimized model and fusion parameters (step A5). This effectively achieves a significant improvement in the accuracy and generalization ability of the classification model for assessing abdominal health status.

[0063] In one specific embodiment, the method for obtaining the audio segment is described.

[0064] Because abdominal bowel sound signals attenuate during spatial propagation, and the physiological correlation between different acquisition points and the target observation area (such as suspected diseased intestinal segments or the core area for functional assessment) varies, the audio signals acquired from points closer to the target area typically contain more effective information directly related to the target state, resulting in a higher signal-to-noise ratio. Conversely, signals from points farther away may be subject to interference from other intestinal segment activity, abdominal wall friction sounds, or environmental noise. Treating all raw signals from all points equally would introduce unnecessary noise, dilute key information, and affect the purity of subsequent feature extraction and the accuracy of model training.

[0065] To incorporate spatial prior knowledge during the preprocessing stage and optimize the quality of the input signal, this paper proposes a signal-level weighted preprocessing scheme based on physical distance. The specific implementation includes: First, for each audio signal at each location in the training data set, a distance weight is calculated based on the physical distance between that location and a preset center point. This center point is typically determined based on the clinically relevant area. The calculated distance weight reflects the "initial confidence" or "spatial relevance strength" of the signal at that location. Subsequently, this distance weight is used to weight the raw audio signal at each location; that is, each sample point of the signal is multiplied by the weight value. This operation is performed before the signal enters the conventional preprocessing steps (such as filtering and framing).

[0066] Specifically, this step aims to achieve signal-level spatial information fusion and noise suppression. It optimizes the data processing at its source by simulating the clinical experience of doctors focusing on sounds near lesions or target areas during auscultation. In detailed implementation, the physical distance (in centimeters) between each acquisition point and the center point needs to be pre-measured or set. The weighting operation is equivalent to applying a distance-related gain coefficient to the original signal; closer distances result in higher gain (signal enhancement), while greater distances result in lower gain (signal attenuation).

[0067] Optionally, the distance weights are calculated using a Gaussian kernel function: For the i-th point, its distance weight is q. i =exp(-d i 2 / 2σ 2 ) ; Where d i The distance between this point and the center point is σ, where σ∈[5cm,15cm] is an adjustable attenuation coefficient.

[0068] Five sampling points (P0 to P4) were selected with the umbilicus as the central origin, and the distance d between each point and the center was accurately measured. i After acquiring signals from each location using a multi-channel device, the data from each channel (location) is multiplied by a calculated distance weight w before preprocessing. i For example, the center point P0 has a weight of 1 (full amplitude preservation), and the signal x1(n) of point P1, which is 3cm away from the center, will be multiplied by a weight less than 1 (such as 0.91), thereby reducing its amplitude contribution at the signal level. The weighted signal is then fed into subsequent standard preprocessing procedures such as amplitude normalization, pre-emphasis filtering, and frame windowing, ultimately yielding short audio segments for feature extraction at each point.

[0069] In the above embodiments, weighting based on spatial priors effectively enhances the physiological signal components of the region most relevant to the evaluation target (near the center point), while suppressing more irrelevant noise and interference that may be carried by the distal endpoints. This improves the overall signal-to-noise ratio of the input signal, making the subsequently extracted acoustic features (such as event rate and spectral features) more reflective of the true state of the target region. It also reduces the feature ambiguity caused by spatial clutter, thereby improving the quality and discriminative power of the constructed initial feature vector. High-quality input features help the machine learning model capture patterns that are truly relevant to the classification of abdominal health status more quickly and accurately, avoiding the burden of the model having to learn how to "filter" spatial noise, thus improving training efficiency and the accuracy and robustness of the final classification model. The use of a Gaussian kernel function to calculate the weights, whose continuously decaying characteristics conform to the general law of sound wave intensity attenuation with distance in biological tissues, makes the weighting process more physically meaningful and physiologically reasonable. The adjustable parameter σ provides flexibility and can be adapted according to the actual abdominal size or the range of the area of ​​interest.

[0070] In one specific embodiment, the multidimensional acoustic features extracted from the audio segment are specifically defined and described.

[0071] Abdominal bowel sounds are a complex bioacoustic signal, and their clinical assessment typically focuses on two core dimensions: first, the temporal rhythmic pattern of their occurrence (e.g., frequency, duration, interval), which directly reflects the dynamics and regularity of intestinal peristalsis; and second, the timbre characteristics of the emitted sound (e.g., pitch, sharpness or dullness, purity or noise), which indirectly reflects the state of the intestinal contents (gas-to-fluid ratio) and the tension of the intestinal wall during peristalsis. Traditional general-purpose audio feature sets (e.g., those containing only energy and fundamental frequency) cannot comprehensively and accurately characterize these features closely related to the physiological and pathological states of the intestine. This leads to a semantic gap between the features learned by the model and the clinical diagnostic goals, limiting the accuracy and interpretability of classification.

[0072] Therefore, in order to construct a feature representation that is highly correlated with abdominal health status, this solution has designed and combined a set of dedicated multidimensional acoustic features, aiming to systematically quantify the rhythm and sound quality of bowel sounds from both time and frequency domains.

[0073] Specific implementation methods include: The multidimensional acoustic features mentioned above include time-domain features and frequency-domain features.

[0074] Implementation of temporal features: The temporal features include at least the event rate, average duration, and average interval time obtained from bowel sound event detection. Specifically, firstly, an adaptive threshold detection algorithm is used to identify individual bowel sound events (i.e., a single audible bowel movement sound) from the preprocessed audio segments. Then, the following are statistically calculated: 1) Event rate: the number of events detected per unit time, directly quantifying the frequency of bowel movement; 2) Average duration: the average length of all detected events, reflecting the duration of a single bowel sound; 3) Average interval time: the average time interval between the start points of consecutive events, reflecting the rhythmicity of bowel movement. These features directly translate into key indicators for clinicians to assess bowel movements through counting and timing.

[0075] Implementation of frequency domain features: The frequency domain features include at least Mel-frequency cepstral coefficients (MFCC), spectral centroid, spectral bandwidth, and the power ratio of frequency bands calculated by dividing the low, mid, and high frequency bands. Specifically: 1) MFCC: Extracting its coefficients can effectively characterize the short-time power spectrum envelope of the audio signal, simulate the characteristics of human hearing, and encode the "timbre" of bowel sounds. 2) Spectral centroid: Describes the central position of the spectral energy distribution, which can reflect the overall pitch of the sound (e.g., high-pitched bowel sounds may indicate intestinal obstruction or spasm). 3) Spectral bandwidth: Describes the degree of diffusion of the spectrum around the centroid, which can reflect the concentration or "noise" of the sound. 4) Power ratio of frequency bands: Dividing the spectrum into physiologically meaningful sub-bands (e.g., 0-1kHz for low frequency, 1-2kHz for mid frequency, and 2-4kHz for high frequency), and calculating the proportion of energy in each frequency band to the total energy and its ratio (e.g., high / low frequency power ratio). This classification helps to distinguish between bowel sounds that are predominantly low-frequency rumbling and those that are predominantly high-frequency bubbling or friction sounds, the latter of which may have different clinical significance.

[0076] For example, a 2-minute audio clip, after being segmented into frames, results in approximately 240 short segments. A 28-dimensional feature vector is calculated for each segment, where: The frequency domain characteristics (18 dimensions) include the static coefficients of the 13-dimensional MFCC, as well as the spectral centroid, bandwidth, flatness, roll-off point, and power ratios of the low, mid, and high frequency bands (defined as 0-1kHz, 1-2kHz, and 2-4kHz).

[0077] The temporal features (10 dimensions) include statistical features such as root mean square energy, zero-crossing rate, kurtosis, and skewness, as well as core rhythmic features such as event rate, average duration, and average interval obtained through adaptive threshold detection.

[0078] To further capture dynamic characteristics, the first and second order difference coefficients of MFCC can also be calculated as supplementary features, so that the MFCC features include 13-dimensional MFCC static coefficients, 13-dimensional first order difference coefficients (ΔMFCC), and 13-dimensional second order difference coefficients (ΔΔMFCC).

[0079] Subsequently, by calculating the statistical mean of each feature of all segments in a complete audio recording, a fixed-dimensional feature vector for this acquisition is formed, providing input for model training.

[0080] In the above embodiments, the extracted feature set is not a general acoustic parameter, but is designed closely around the key clinical auscultation points of the specific signal "bowel sounds". Among them, the time-domain event features directly quantify the peristaltic rhythm of clinical concern, while the frequency-domain features systematically describe its sound quality characteristics. This makes each dimension of the feature vector carry a clear physiological or pathological semantic, greatly enhancing the interpretability of the model. In addition, by combining the time-domain rhythm features with multi-dimensional frequency-domain descriptors (including MFCC, spectral shape parameters, and specific frequency band energy), a digital "portrait" of bowel sounds is created from multiple complementary perspectives. This can more comprehensively and delicately distinguish different intestinal functional states (such as paralysis, normal, and hyperactivity), overcoming the problem of insufficient representation ability of single-type features. When these high-quality, high-discrimination features are sensitive to health status and have discriminative power, the correlation-based weight allocation and fusion mechanism can play its maximum role, guiding the model to focus on the truly critical information.

[0081] In one specific embodiment, the method for obtaining the initial value of the correlation weight is described in detail.

[0082] In training classification models based on multi-point feature fusion, directly assigning initial weights to the massive feature dimensions (number of points × feature dimension) is a crucial starting point. Simply using uniform or random initialization fails to reflect the inherent, varying degrees of discriminative contribution of different features to the classification task of "abdominal health status." In the training data, some feature dimensions (such as the specific frequency energy of a point) may have a clear linear positive or negative correlation with the class label; while other features (such as complex spectral shape changes) may have a non-linear, but equally important, dependency on the label. Using a single correlation metric cannot accurately capture these two different association patterns, potentially leading to inaccurate initial weight allocation, which in turn affects the convergence speed and final performance of subsequent iterative optimizations.

[0083] Therefore, in order to scientifically and accurately assess the initial importance of each feature dimension, this scheme proposes a method that adaptively selects the Pearson correlation coefficient or mutual information entropy as the correlation metric based on the type of relationship between the feature and the label, and allocates the initial weight accordingly.

[0084] Specific implementation methods include: First, for each feature dimension at each location, calculate the correlation metric between its value and the category label. This step requires distinguishing the type of relationship between features and labels: For feature dimensions that are linearly related to the label, the Pearson correlation coefficient is used as the measure of correlation.

[0085] First, for each feature dimension at each location, calculate the correlation metric between its value and the category label. This step requires distinguishing the type of relationship between features and labels: For feature dimensions that are linearly related to the label, the Pearson correlation coefficient is used as the measure of correlation.

[0086] Calculation expression: ρ ij = cov(f ij ,y) / (σ fij *σ y ); in: ρ ij This represents the Pearson correlation coefficient between the j-th feature dimension at the i-th location and the category label y; cov(f ij ,y) represents feature f ij Covariance with label y; σ fij Representing feature f ij Standard deviation; σ y This represents the standard deviation of the label y.

[0087] Physical meaning: ρ ij The value range is [-1, 1]. The closer its absolute value is to 1, the stronger the linear correlation between the feature and the label. Positive values ​​indicate positive correlation, and negative values ​​indicate negative correlation.

[0088] For feature dimensions that have a non-linear dependency on labels, mutual information entropy is used as the correlation metric.

[0089] Calculation expression: I(f) ij ; y) = =∑ f∈F ∑ y∈Y p(f ij ,y)log (p(f ij ,y) / p (f ij )*p(y)); in: I(f ij;y) represents the mutual information entropy between the j-th feature dimension and the category label y at the i-th point; p(f ij ,y) represents feature f ij The joint probability distribution with label y; p(f ij Let f and p(y) represent the features f respectively. ij Marginal probability distribution of label y; F and Y represent the sets of feature values ​​and label values, respectively.

[0090] Physical meaning: I(f) ij The value of y is always non-negative. It quantifies the known feature f. ij After obtaining the information, how much does the uncertainty about the label y decrease? The larger the value, the more information about the label the feature contains, regardless of whether the relationship is linear or non-linear.

[0091] Secondly, based on the calculated correlation metric, initial values ​​of correlation weights are assigned to each feature dimension, with feature dimensions having higher correlation being assigned greater weights.

[0092] Typically, this requires calculating the absolute value of the Pearson correlation coefficient for each feature dimension |ρ|. ij | or mutual information entropy value I(f) ij Normalize (e.g., using the Softmax function) to transform y into a probability distribution that sums to 1, and use this as the initial weight r. ij (0). Thus, the higher the correlation metric value, the greater the initial weight it receives.

[0093] Example explanation: For the "MFCC mean" feature at a certain location, if it shows a roughly linear trend with the "postoperative recovery status" label, then calculate its Pearson correlation coefficient. If the relationship between the other feature, "spectral flatness," and the label is complex and non-linear, then calculate its mutual information entropy.

[0094] After obtaining the correlation metrics for all feature dimensions, they are converted into initial weights using the Softmax normalization function. Assume the correlation metric calculated for a certain feature dimension (after appropriate scaling for comparability) is s. ij Then its initial weight It can be calculated as: ; in: This is the summation of the indexed measures of all feature dimensions of all points, i.e., the normalized denominator; i is used as the summation subscript, representing the index of the point, with a value range from 1 to M (the total number of points, such as M=5). j is used as the summation subscript, representing the index of the feature dimension, with a value range from 1 to D (total feature dimension, such as D=28). s ij This represents the correlation measure of the j-th feature dimension at the i-th point in the table.

[0095] In this embodiment, by distinguishing between linear and nonlinear relationships and selecting the most appropriate statistical measure, the initial importance of each feature dimension can be assessed more realistically and comprehensively. This provides a high-quality starting point that closely approximates the real data structure for subsequent iterative optimization, avoiding the detours that may result from blind initialization and effectively accelerating the convergence process of model training and weight updates. Combining Pearson correlation coefficient and mutual information entropy allows the scheme to capture both obvious linear trend features (such as a monotonically increasing energy in a certain frequency band with recovery) and hidden nonlinear discriminative features (such as a complex spectral pattern appearing only under specific conditions), ensuring that no potentially discriminative information is overlooked. The calculated initial correlation measure provides insights for clinical experts or researchers, revealing which acoustic characteristics at which points are most closely related to abdominal health at the data level, enhancing the transparency and credibility of the entire model. Furthermore, accurate initial weight allocation means that the model already tends to focus on more discriminative features before the iterative optimization cycle begins. This establishes favorable initial conditions for the subsequent "feature weighting-model training-weight adjustment" closed loop, enabling the entire adaptive fusion mechanism to operate on a better track, thereby ultimately improving the overall accuracy and robustness of the classification model.

[0096] In one specific embodiment, the model training method in step A42 is specifically defined and explained.

[0097] After generating high-dimensional fused feature vectors through the fusion mechanism (e.g., 5 points × 28-dimensional features = 140 dimensions), the feature space inevitably contains a large number of redundant features, noisy features, and features that contribute little to the classification task. Directly inputting all features into the model for training would lead to an excessive number of model parameters, increasing computational complexity and training time. Furthermore, the model might overemphasize noise and random patterns in the training data, resulting in overfitting and reduced generalization ability on new data. Therefore, a mechanism is urgently needed to automatically identify and select the truly important subset of core features for the classification task during model training.

[0098] Therefore, in order to simultaneously achieve feature selection and model optimization during the training phase and improve the efficiency and generalization of the model, this solution specifically proposes to use a machine learning model with L1 regularization constraints for training in step A42.

[0099] Specific implementation methods include: In step A42, a machine learning model with L1 regularization constraint is used for training. Its core operation is to filter out the top K feature points with the highest contribution by sparsifying the feature weights inside the model for final classification. The value of K is in the range of [5, 15].

[0100] Model construction: The loss function of the trained machine learning model (such as logistic regression, linear support vector machine, or linear model as a base learner of an ensemble model) is constructed to include two terms: an empirical loss term (such as mean squared error, log loss) and an L1 regularization term (the L1 norm of the model weight vector).

[0101] General loss function form: ; in: L(w) represents the total loss function of the model, which is the objective to be minimized; w represents the model's weight vector, whose dimension is equal to the dimension of the fused feature vector (e.g., 140 dimensions), where each value represents the weight of the fused feature dimension; : Measuring model predictions With real labels The empirical loss function for the difference between them; N represents the number of training samples; λ represents the regularization strength coefficient, a hyperparameter greater than 0, used to control the influence of the L1 regularization term on the total loss. The larger the value of λ, the greater the pressure on weight sparsity.

[0102] =∑ j=1 D∣w j |, the L1 norm of the weight vector w, which is the sum of the absolute values ​​of all weights.

[0103] Sparsity and Feature Selection Mechanism: During model training (i.e., minimizing the above loss function), the L1 regularization term... This will cause the weights w corresponding to many unimportant features to increase. j They are precisely compressed to zero. Ultimately, only those features that significantly contribute to the predicted target y will retain non-zero weights. By setting appropriate... The value can control the number of non-zero weights, thereby filtering out the top K (e.g., K=12) most important feature dimensions, which correspond to the point feature combinations that contribute the most in the original fused feature vector.

[0104] For example, the construction of a linear model with an L1 penalty term is described: ,in X Let w represent the feature matrix composed of the fused feature vectors of all training samples, w be the weight vector to be learned by the model, y be the corresponding class label vector, and λ be the regularization strength coefficient, with the value range set to [0.01, 0.1]. The model is solved using optimization algorithms such as coordinate descent, causing many unimportant feature weights to be zeroed out. Experiments show that after this process, the original 140-dimensional fused features can be effectively sparsified, retaining only approximately the first K=12 effective feature dimensions to input into the final classifier (such as XGBoost) for training and prediction. This step is automatically completed internally during model training, without the need for additional feature selection algorithms.

[0105] In this embodiment, L1 regularization imposes constraints on model complexity, forcing the model to learn simpler and more essential feature patterns. This effectively avoids the model over-memorizing noise and irrelevant details in the training data, thereby significantly improving the model's prediction accuracy and stability on unseen test data or new patient data. The feature selection process is seamlessly embedded into the objective function of model training. The model automatically completes the screening of massive fused features while optimizing weights. This is more efficient and direct than the traditional two-stage process of first extracting features and then selecting features independently, ensuring the optimal match between the selected features and the model. In addition, by sparsifying, the feature dimension is reduced from high dimension (e.g., 140 dimensions) to a very small core subset (e.g., 12 dimensions). This not only greatly reduces the computational load during model inference but also reduces the overhead of model storage and transmission, making this technology easier to deploy on edge devices or mobile terminals with limited computing resources. Finally, the model relies on only a few non-zero weight features for decision-making. This allows clinicians or researchers to visually examine which acoustic features at which locations (e.g., the third dimension of MFCC at location P2, the event rate at location P0) are driving classification decisions, greatly enhancing the transparency and credibility of the model's decision-making process and helping to gain clinical trust.

[0106] In one specific embodiment, the specific method for dynamically adjusting the correlation weight in step A43 is specifically defined and explained.

[0107] During the iterative optimization process in step A4, the "feature importance score information" returned by the model trained in step A42 reflects the model's assessment of the importance of each fused feature dimension under the feature weight settings of the current iteration. However, this assessment is a phased, local perception based on the current model state. If this score information is directly used as a simple replacement for the weights in the next round, the weight updates may fluctuate drastically due to the randomness, insufficient convergence, or local optima during model training, thus disrupting the stability of the optimization process. Furthermore, the true importance of features often needs to be gradually revealed through multiple iterations, requiring a smooth, cumulative learning process to approach the optimal value.

[0108] Therefore, in order to achieve robust and incremental optimization of relevance weights, and to ensure that the weight update process can respond to the latest evaluation of the model while maintaining the continuity of historical information, this solution proposes a dynamic weight adjustment mechanism based on incremental updates.

[0109] Specific implementation methods include: In step A43, the specific method for dynamically adjusting the weights is as follows: Based on the scoring information, determine the change in feature importance ΔI of the j-th feature dimension at the i-th point. ij According to the expression r ij (t+1) =r ij (t) +η·ΔI ij The relevance weight r of the j-th feature dimension at the i-th point in the t-th iteration. ij (t) The update is performed to obtain the relevance weight r of the j-th feature dimension at the i-th point in the (t+1)-th iteration. ij (t+1) , where η is the learning rate, and i, j and t are all positive integers.

[0110] Specifically, the change in feature importance ΔI at the i-th point in the j-th feature dimension ij It quantifies how much the current model considers the importance of a feature to have changed relative to the previous round (or the initial state). This can be calculated based on the direct feature importance score provided by the model, or by comparing the importance scores between the current round and the previous round.

[0111] or This is a constant hyperparameter greater than 0. It controls the magnitude by which the weights are adjusted based on the current evaluation information. Smaller... or This makes the updates more moderate, which is beneficial for stable convergence but may be slower; larger updates... or This allows for rapid updates, but may introduce oscillations.

[0112] This update mechanism is essentially a simplified application of the gradient ascent / descent concept, designed to fine-tune feature weights in a direction that enhances their "model-aware importance." In each iteration, the system does not completely discard the old relevance weights, but rather updates them based on new evidence ΔI from the model feedback. ij With a controllable step size or Adjust it accordingly. If the model considers a certain feature to be more important at the moment (ΔI), then... ij If the weight is greater than 0, its weight increases; otherwise, it decreases. Through multiple iterations, the weight values ​​will gradually converge to a relatively stable state, at which point the model's evaluation of features and weight allocation reach a dynamic balance.

[0113] For example, suppose in round t, the feature weight r of the "spectral centroid" at a certain point is... ij (t) = 0.8. After the current round of model training, the importance score of this feature has significantly improved, and ΔI is calculated. ij =0.1. Take or If r = 0.5, then the updated weight r ij (t +1) = 0.8 + 0.5 × 0.1 = 0.85. This feature will account for a slightly higher proportion in the next round of feature fusion.

[0114] In this embodiment, incremental updates are used instead of direct replacement, effectively avoiding drastic jumps in weight values ​​caused by accidental biases or noise in a single round of model evaluation. This ensures the numerical stability of the entire iterative optimization process, allowing the weights to smoothly approach the optimal configuration. The model training results (importance scores) are fed back to the feature fusion weights in real time, forming a closed-loop control system of "evaluation-adjustment-re-evaluation." This makes the feature fusion strategy no longer static or based solely on initial statistics, but rather capable of dynamic evolution and self-improvement based on the model's deepening understanding during training.

[0115] Example 2 In existing technologies, the accuracy of automatic abdominal health status assessment methods based on bowel sounds is limited by two key aspects: First, at the feature construction level, features extracted from different anatomical locations are typically simply concatenated or averaged, failing to differentiate based on the objective correlation between each feature and health status, resulting in insufficient discriminative power of the input model's feature vectors. Second, at the model application level, there is a disconnect between the general model and the front-end feature processing workflow; the model is not optimized for the feature structure generated by a specific workflow, preventing the full realization of its performance potential.

[0116] In view of this, this embodiment provides an application method for a classification model of abdominal audio signals. The aim is to provide a highly discriminative input signal for the classification model through a standardized application process that integrates feature optimization weighting and structured fusion, and to ensure that the model is highly matched with it, thereby significantly improving the accuracy and objectivity of the classification.

[0117] Specifically, in the application phase, this process utilizes quantified correlation weights learned from historical data to intelligently weight and spatially restructure multidimensional features from different locations on the abdomen, generating a fusion feature vector rich in discriminative information. This vector is then input into a machine learning model specifically trained to process such vectors, thereby achieving accurate classification of abdominal health status.

[0118] See Figure 2 The method includes steps B1 to B5.

[0119] Step B1: Under preset acquisition conditions, collect bowel sound signals from at least two different points on the abdomen of the subject to be tested, and obtain the acquisition signals at each point.

[0120] The purpose of this step is to acquire raw audio data containing spatial distribution information, laying the foundation for comprehensive analysis. In practice, a standardized acquisition protocol is used to ensure data consistency. For example, the patient can be asked to lie supine, and audio can be acquired using a medical electronic stethoscope at at least two predefined points on the abdomen (such as the umbilicus and the left and right lower quadrants). Each acquisition session is fixed in duration (e.g., 2 minutes), and the sampling rate is set to 44.1 kHz to ensure signal quality and cover sufficient physiological activity cycles. The acquired signals are stored in a structured format (such as a .wav file containing the patient ID and timestamps).

[0121] Step B2: Preprocess the acquired signals from each location to obtain audio segments from each location, and extract multidimensional acoustic features from the audio segments from each location to construct the initial feature vector for each location.

[0122] This step aims to transform the raw audio into standardized, computable numerical features. First, consistent preprocessing operations are performed on the signal at each point, including amplitude normalization (linearly mapping the signal amplitude to the [-1, 1] interval), pre-emphasis filtering (using a first-order FIR filter to enhance high-frequency components, with a transfer function of x′(n) = x(n) - a·x(n-1), where the pre-emphasis coefficient a∈[0.9, 1.0)), and frame-by-frame windowing (e.g., frame length N is 20ms to 40ms, frame shift H is 1 / 4 to 1 / 2 of the frame length, and the window function is a Hanning window: w[n] = 0.5 - 0.5·cos(2πn / (N-1))), resulting in a series of short-time stationary audio segments. Subsequently, predefined multidimensional acoustic features are extracted from each audio segment. These features need to comprehensively cover the key characteristics of bowel sounds, such as: extracting root mean square energy and zero-crossing rate from the time domain, and calculating event rate and average duration using event detection algorithms; extracting Mel frequency cepstral coefficients and their differences, spectral centroid, spectral bandwidth, and power ratios calculated by dividing low, mid, and high frequency bands from the frequency domain. Finally, by calculating the statistical mean of each feature of all audio segments in a complete acquisition, an initial feature vector of fixed dimension is constructed for each location.

[0123] Step B3: Using preset relevance weights, weight the initial feature vectors of each point to obtain weighted feature vectors for different points.

[0124] The core improvement in this step lies in introducing data-driven prior knowledge to intelligently filter and enhance features. The "relevance weight" is a quantified value of the importance of each feature dimension in distinguishing different abdominal health states, calculated during the model training phase by analyzing a large-scale training dataset. In application, this weight vector is directly multiplied element-wise with the initial feature vectors obtained in step B2. For example, if the "spectral centroid" feature at a certain point is assigned a high relevance weight, that feature value will be significantly enhanced after weighting; conversely, features with weak discriminative power will be relatively weakened. This operation significantly improves the signal-to-noise ratio and discriminative power of the feature vectors, enabling subsequent models to more clearly capture key patterns.

[0125] Step B4: For each feature dimension in the multidimensional acoustic features, arrange the feature values ​​of the weighted feature vectors of each point in the feature dimension according to the spatial order of the points, combine them into a fusion sub-vector corresponding to the feature dimension, and then concatenate the fusion sub-vectors corresponding to all feature dimensions in sequence to form a fusion feature vector.

[0126] The key improvement in this step lies in creating a feature representation method that preserves spatial structural information. Specifically, for each dimension of the acoustic feature (e.g., event rate), the weighted feature values ​​of all points in that dimension are extracted and arranged strictly according to the actual anatomical spatial order of the points (e.g., by point numbers P0, P1, P2, etc.), forming a "fusion sub-vector" describing the spatial distribution pattern of that feature in the abdomen. Subsequently, the fusion sub-vectors corresponding to all feature dimensions are concatenated according to the order of the feature dimensions themselves, ultimately forming a long, structured fusion feature vector. For example, if there are 5 points and 28 features, the final vector dimension is 140, but its internal structure is clear, encoding the spatial distribution information of the features.

[0127] Step B5: Input the fused feature vector into a preset classification model to obtain the classification result output by the classification model, so as to obtain the information of the abdomen of the subject to be tested. The classification model is obtained by training the fused feature vector corresponding to the training data in the training dataset corresponding to the acquisition conditions. Each set of training data records the abdominal bowel sounds collected from different subjects under the acquisition conditions and is labeled with a category label representing the health status of the abdomen. The audio signals in each set of training data are audio signals collected from at least two different points on the same abdomen.

[0128] This step performs the final classification decision, and its improvement lies in ensuring optimal matching between the model and the front-end processing flow. The classification model used is not a general-purpose model, but rather one specifically trained using the same fused feature vectors generated through steps identical to those in this application (i.e., B1-B4) as training samples. This means that the decision boundary learned by the model is specifically suited for interpreting this weighted and spatially structured feature input. Inputting the fused feature vectors generated in step B4 into this dedicated model allows it to output the corresponding abdominal health status classification result (e.g., category labels such as "postoperative ileus," "preliminary functional recovery," and "normal function"). This end-to-end collaborative design ensures full utilization of the model's performance.

[0129] This embodiment achieves intelligent purification of feature signals by introducing correlation weights (step B3), effectively enhancing the discriminative information of the input model. Through structured fusion (step B4), spatial distribution patterns are encoded in the features, enabling the model to make more refined comprehensive judgments. Furthermore, by adopting a dedicated model that is strongly coupled with the front-end process (step B5), the optimal performance of the overall system is ensured. Thus, by providing higher-quality and more structured input signals, the dedicated model is driven to achieve a systematic improvement in the accuracy and objectivity of abdominal health status assessment.

[0130] Optionally, the classification model described above can be trained using the method described in Embodiment 1 above, in order to improve training efficiency.

[0131] In one specific embodiment, the method for obtaining the classification result output by the classification model is described.

[0132] When inputting abdominal audio signals into a classification model for prediction, directly outputting a single category label for the entire audio file (e.g., a 2-minute audio file) has inherent flaws. Bowel sounds are inherently non-stationary physiological signals; their activity fluctuates within a single acquisition period, potentially including quiet periods, active periods, and segments that may be interfered with by transient noises (such as clothing friction or brief bowel sounds). If a final judgment is made based solely on global features or a single forward propagation of the entire file by the model, it is impossible to effectively mitigate the random errors caused by these internal fluctuations and transient interferences, leading to unstable evaluation results and poor repeatability. This does not align with the decision-making logic in clinical practice, where doctors form an overall judgment by comprehensively listening to multiple bowel sound events over a period of time.

[0133] Therefore, to improve the robustness, reliability, and clinical consistency of automated assessment results, this proposal suggests a two-tiered assessment mechanism. The core of this mechanism lies in using a complete audio acquisition file as the basic unit of clinical assessment. First, each short audio segment constituting the file is finely classified; then, the overall decision for this acquisition is made by combining the classification results of all segments.

[0134] The specific implementation involves two logical levels: Segment-level classification: All short segments (e.g., about 240) obtained after preprocessing a complete audio signal are input one by one into the trained classification model to obtain an independent classification result for each segment, i.e., segment-level classification result.

[0135] File-level decision (majority voting): Statistically analyze all segment-level classification results generated in the same acquisition, and determine the final and unique category label of the audio file acquired in that acquisition based on the preset majority rule (e.g., the category with the most votes).

[0136] In this way, the model simulates the overall thought process of a clinician after listening to a bowel sound and making a comprehensive judgment. It effectively aggregates detailed information from the segment analysis and smooths out random errors, making the final output highly consistent with the clinical decision-making scenario.

[0137] The specific implementation of segment-level classification is as follows: In step B2, the original acquired signal has been preprocessed and segmented into multiple short audio segments (for example, 2 minutes of audio is divided into approximately 240 46ms frames). In step B5, after processing by steps B2-B4, each segment generates an independent fused feature vector. This vector is input into the classification model, which then outputs a predicted category for that segment (segment-level classification result). In this way, a complete acquisition will yield hundreds of independent, fine-grained state judgments at various time points.

[0138] The specific implementation of file-level voting involves: after all segments have been predicted, the frequency of each category appearing in all segments is counted. Based on the "majority pass" principle, the category with the highest frequency is determined as the final category label for this acquisition (i.e., the audio file). The "preset majority pass condition" can be further refined; for example, it can require the highest number of votes to exceed a certain percentage of the total number of segments (e.g., >50%), otherwise it can be marked as "insufficient confidence" or a review mechanism can be triggered.

[0139] Example explanation: A majority vote is conducted on the prediction results of approximately 240 audio segments from the same file (one collection), and the category with the most votes is used as the final classification result for that collection.

[0140] For example, an abdominal audio recording was divided into 240 segments. After preliminary model prediction, the statistical results were: "Recovery status on postoperative day 1" received 150 votes, "Recovery status on postoperative day 2" received 70 votes, and other categories received a total of 20 votes. According to the majority voting principle, the final judgment for this recording was "Recovery status on postoperative day 1". If the confidence threshold is set to 50%, then 150 / 240 = 62.5% > 50%, and the result is valid.

[0141] In this embodiment, by basing a single judgment on the statistical prediction of hundreds of independent segments, this mechanism can effectively smooth the physiological fluctuations of bowel sounds and resist misjudgments caused by occasional noise contamination of a few segments, thereby enhancing the fault tolerance of the system and the stability of the output results.

[0142] In one specific embodiment, the application of the classification model in the evaluation of the effect of a specific abdominal intervention event is described.

[0143] When assessing the immediate physiological effects of specific abdominal interventions (such as acupuncture, massage, transcutaneous electrical stimulation, or specific pharmacological treatments), the core lies in accurately quantifying the dynamic changes in abdominal condition (especially bowel function) before and after the intervention. However, if only a general classification model is used to categorize audio signals before and after the intervention, the model may struggle to sensitively capture subtle changes in the intervention relative to an individual's baseline. This is because the model needs to learn both the diverse baseline state before the intervention and the complex response patterns afterward, which increases the learning burden on the model and may dilute its sensitivity to the "change" itself. Furthermore, clinical assessments often focus on the "degree of improvement" relative to the pre-intervention baseline state, requiring an analytical framework that can isolate intervention factors and perform paired comparisons.

[0144] Therefore, in order to accurately and sensitively quantify the immediate effects of abdominal intervention events, this proposal specifically proposes a strategy of using independently trained "pre-intervention models" and "post-intervention models," and performing paired analysis and comparison of pre- and post-intervention signals obtained from the same subject under the same observation conditions.

[0145] Specific implementation methods include: A method for applying the classification model under specific data collection conditions: Model construction conditions: The classification model includes independently trained pre-intervention and post-intervention models. Wherein: The pre-intervention model is specifically trained using a training dataset collected a predetermined time after a pre-set triggering event (such as the end of surgery or the start of the intervention course) but before the abdominal intervention event. The model learns the typical abdominal state pattern at a specific time point (such as the morning of the first day after surgery) before the intervention.

[0146] The post-intervention model was specifically trained using a training dataset collected at a time after the abdominal intervention event (e.g., 30 minutes after the intervention), following the same triggering event and the same preset duration. The model learns the various possible state patterns of the abdomen after the intervention within the same time context.

[0147] Application evaluation process: When evaluating a test object, the process is as follows: First, signals are acquired before and after the same abdominal intervention event is performed on the same subject, after a set time interval from the same pre-set trigger event. These are referred to as the pre-intervention acquisition signal and the post-intervention acquisition signal, respectively. This ensures that the background conditions (such as postoperative days and diurnal rhythm) of the two acquisitions are completely consistent, with the only variable being the intervention itself.

[0148] Then, the pre-intervention model was used to classify the pre-intervention collected signals to obtain a classification result C_before that characterizes the pre-intervention abdominal state (such as "inhibited intestinal motility state").

[0149] Meanwhile, the post-intervention model was used to classify the signals collected after the intervention, resulting in a classification result C_after that characterizes the abdominal state after the intervention (such as "initial activation of intestinal peristalsis").

[0150] Finally, based on the difference or change relationship between C_before and C_after, the immediate intervention effect of the abdominal intervention event performed on the test subject at the preset duration is obtained. For example, the effect can be quantified as the improvement level of the state (such as from "inhibition" to "initial activation"), or a certain "improvement score" can be calculated through the probability vectors output by the two models.

[0151] This application corresponds to an extension of the method from simple "state classification" to "efficacy assessment." For example, to assess the immediate effect of acupuncture on postoperative day 0: Model training: The "pre-intervention model" was trained using data from all patients at the "before" time point, and the "post-intervention model" was trained using data from all patients at the "after" time point.

[0152] Effect evaluation: For new patients, audio was collected on day 0 before acupuncture ("before 0" condition), and entered into the "pre-intervention model" to obtain state S1; audio was collected within 30 minutes after acupuncture ("after 0" condition), and entered into the "post-intervention model" to obtain state S2.

[0153] Effect Quantification: If S1 is state A and S2 is state B, then according to a predefined "state transition-effect" mapping table (e.g., A→B represents "moderate activation effect"), an immediate and personalized effect assessment of the acupuncture session can be obtained. This is more clinically interpretable than simply comparing the absolute numerical changes in audio features.

[0154] In this embodiment, by modeling the "pre-intervention" and "post-intervention" states as two independent but conditionally corresponding classification tasks and comparing paired pre- and post-intervention signals, confounding factors such as time and individual differences can be controlled to the greatest extent possible. This allows for more reliable attribution of observed state changes to the intervention itself, improving the internal validity of the assessment. The independent post-intervention model focuses on learning all possible state patterns after the intervention and is more sensitive to feature changes caused by the intervention. Compared to using a general model, this method can more keenly capture sometimes subtle feature pattern changes brought about by the intervention, thereby improving the ability to detect weak but effective physiological responses. It can assess the immediate effects of the intervention on the same individual at the same time baseline (e.g., day N post-operation). This helps physicians determine whether a specific intervention is effective for a specific patient at the current time point, providing an objective basis for adjusting treatment plans (e.g., continuing, strengthening, or changing the intervention method), and promoting the personalization of treatment plans.

[0155] In one specific embodiment, the specific method for generating temporal intervention effects is described.

[0156] The physiological effects of abdominal interventions (such as acupuncture, massage, and electrical stimulation) are often not static but dynamically evolve with repeated interventions, the body's recovery process, and the passage of time. A single, immediate assessment can only reflect the local state shortly after the intervention and cannot reveal the persistence, cumulative effect, or trend of the effect over time. For example, a single intervention may immediately improve intestinal motility, but the effect may diminish after a few hours; or, interventions over several days may produce a progressively increasing cumulative effect. A single assessment cannot distinguish these important dynamic patterns, thus limiting its guiding value for optimizing treatment plans (such as adjusting the frequency and timing of interventions).

[0157] Therefore, in order to comprehensively evaluate the overall effect and dynamic pattern of intervention measures over the entire observation period, this plan specifically proposes a method to generate a time-series intervention effect curve or report by integrating the real-time intervention effects at multiple different time points (preset duration).

[0158] Specific implementation methods include: Acquiring Real-Time Effects at Multiple Time Points: For the same subject, acquire the real-time intervention effects produced when the same abdominal intervention event is performed after at least two different preset time intervals from the same preset trigger event (such as surgery). Here, "different preset time intervals" typically correspond to different observation time points within an observation period (e.g., postoperative day 1, day 2, and day 3). At each time point, acquire signals before and after the intervention, and use the corresponding pre- and post-intervention models to calculate the real-time intervention effect (E1, E2, ..., En) at that time point.

[0159] Dynamic effects are generated by integrating time sequence: Based on the observation time points corresponding to at least two different preset durations, the real-time intervention effects calculated at each time point are arranged, correlated and comprehensively analyzed in chronological order, thereby generating a time sequence intervention effect that reflects the abdominal intervention event performed on the subject throughout the entire observation period.

[0160] Example description: The constructed time-series labeling system (such as "0 before, 0 after, 1 before, 1 after, 2 before, 2 after, 3 before, 3 after") provides a perfect data foundation for this application.

[0161] Data Acquisition and Modeling: The focus is on the effect of acupuncture intervention once a day for three consecutive days after surgery (days 0, 1, and 2). Six specialized models need to be trained, one for each day before intervention ("before 0", "before 1", and "before 2") and one for each day after intervention ("after 0", "after 1", and "after 2").

[0162] Effect calculation: For a postoperative patient, audio recordings were collected before and after acupuncture on days 0, 1, and 2. For day t, the immediate intervention effect E_t (e.g., quantified as "state improvement score" or "state transition category") was calculated using the "before t" model and the "after t" model.

[0163] Time-series effect generation: The calculated E_0 (effect on the day of surgery), E_1 (effect on day 1 post-surgery), and E_2 (effect on day 2 post-surgery) are arranged in chronological order. The system can generate an effect-time curve or a time-series effect report. The report can not only display the daily effects but also analyze trends, such as: "The effect is significant on day 0, maintains itself on day 1, and slightly decreases on day 2," or "The effect shows a cumulative trend of increasing day by day."

[0164] In this embodiment, the evaluation perspective is expanded from a static "point" to a dynamic "line," clearly and intuitively displaying the trajectory of intervention effects over time. This allows clinicians or researchers to identify whether the effect is immediate and transient, or delayed, persistent, or cumulative, thus gaining a deeper understanding of the dynamics of the intervention. By observing the temporal effects, it is possible to determine whether the current intervention plan (such as frequency and intensity) is appropriate. For example, if the effect continuously weakens after several days, it may indicate that the body has developed tolerance or that the plan needs to be adjusted; if the effect increases day by day, it confirms the effectiveness of the current plan. This provides an objective and quantitative basis for real-time adjustment and optimization of personalized treatment plans. Measurement: By analyzing the differences in intervention effects at different postoperative days or different stages of the disease, it may help identify the "optimal time window" for intervention (i.e., on which day or at which stage the intervention is most effective), where the effect trend at early time points may have predictive value for subsequent overall efficacy. The generated temporal intervention effect is a comprehensive summary of the patient's response throughout the entire treatment or observation period, containing richer information dimensions than a single evaluation report, and can more comprehensively support clinical efficacy summaries, research data analysis, and doctor-patient communication.

[0165] It will be understood by those skilled in the art that all or some of the steps, systems, or apparatuses disclosed above, and their functional modules / units, can be implemented as software, firmware, hardware, or suitable combinations thereof. In hardware implementations, the division between functional modules / units mentioned above does not necessarily correspond to the division of physical components; for example, a physical component may have multiple functions, or a function or step may be performed collaboratively by several physical components. Some or all components may be implemented as software executed by a processor, such as a digital signal processor or microprocessor, or as hardware, or as an integrated circuit, such as an application-specific integrated circuit (ASIC). Such software may be distributed on a computer-readable medium, which may include computer storage media (or non-transitory media) and communication media (or transient media). As is known to those skilled in the art, the term "computer storage medium" includes volatile and non-volatile, removable and non-removable media implemented in any method or technology for storing information (such as computer-readable instructions, data structures, program modules, or other data). Computer storage media include, but are not limited to, RAM, ROM, EEPROM, flash memory or other memory technologies, CD-ROM, digital versatile disc (DVD) or other optical disc storage, magnetic cartridges, magnetic tape, disk storage or other magnetic storage devices, or any other medium that can be used to store desired information and can be accessed by a computer. Furthermore, it is well known to those skilled in the art that communication media typically contain computer-readable instructions, data structures, program modules, or other data in modulated data signals such as carrier waves or other transmission mechanisms, and may include any information delivery medium.

Claims

1. A method of training a model for classifying an abdominal audio signal, characterized in that, include: Step A1: Obtain the training dataset, wherein each training dataset contains abdominal bowel sounds collected from different subjects under the same collection conditions, and is labeled with a category label representing the health status of the abdomen; the audio signals in each training dataset are audio signals collected from at least two different points on the same abdomen. Step A2: Preprocess the audio signals of each point in each training data set to obtain audio segments of each point, and extract multidimensional acoustic features from the audio segments of each point to construct the initial feature vector of each point. Step A3: Based on the category labels of the training dataset, calculate the correlation metric between each feature dimension and the category label for each point, and assign an initial value of the correlation weight to each feature dimension according to the correlation metric. Step A4: Starting with the relevance weights as the initial values, execute the iterative process including sub-steps A41 to A43 until the preset convergence condition is reached: Step A41: Using the relevance weights of the current iteration round, weight the initial feature vectors of each point to obtain weighted feature vectors for different points; for each feature dimension in the multidimensional acoustic features, arrange the feature values ​​of the weighted feature vectors of each point in that feature dimension according to the spatial order of the points, combine them into a fusion sub-vector corresponding to that feature dimension, and then concatenate the fusion sub-vectors corresponding to all feature dimensions in sequence to form a fusion feature vector; Step A42: Using the fused feature vector and the corresponding category label, train a machine learning model to optimize its internal weight parameters, and obtain scoring information reflecting the importance of each fused feature dimension based on the trained model; Step A43: Based on the scoring information, adjust the value of the relevance weight of the current iteration round as the relevance weight of the next iteration round; Step A5: The machine learning model that reaches the convergence condition is output as the completed classification model, and the relevance weight at this time is output as the final relevance weight.

2. The method of claim 1, wherein, The methods for obtaining the audio segment include: For the audio signal of each point in each training data set, a distance weight is calculated based on the physical distance between the point and the preset center point, and the audio signal of each point is weighted using the distance weight. The signals from each location, after being weighted by the distance weight, are preprocessed to obtain audio segments for each location.

3. The method of claim 2, wherein, The distance weight is calculated by a Gaussian kernel function: for the ith point, its distance weight is exp(-d i 2 / 2σ 2 ), where d i is the distance between the point and the center point, and σ is an adjustable attenuation coefficient.

4. The method of claim 1, wherein, In step A2, the extracted multidimensional acoustic features include time-domain features and frequency-domain features; The time-domain features include at least the event rate, average duration, and average interval time obtained based on bowel sound event detection; the frequency-domain features include at least the Mel frequency cepstral coefficients, spectral centroid, spectral bandwidth, and the frequency band power ratio calculated by dividing the low, medium, and high frequency bands.

5. The method of claim 1, wherein, In step A3, the initial values ​​of the relevance weights are obtained through the following methods: For each feature dimension of each point, calculate the correlation measure between its value and the category label; for feature dimensions that are linearly correlated with the label, use the Pearson correlation coefficient as the correlation measure; for feature dimensions that are non-linearly correlated with the label, use mutual information entropy as the correlation measure. Based on the calculated correlation metric, an initial value for the correlation weight is assigned to each feature dimension, wherein the feature dimension with higher correlation is assigned a larger weight.

6. The method of claim 1, wherein, In step A42, a machine learning model with L1 regularization constraint is used for training. By sparsifying the feature weights of the model, the top K feature points with the highest contribution are selected for final classification. The value of K is in the range of [5, 15].

7. The method of claim 1, wherein, In step A43, a feature importance change amount ΔI of the jth feature dimension at the ith point site is determined according to the score information ij , the expression r ij (t+1) = r ij (t) + η·ΔI ij is updated, to obtain the correlation weight r ij (t) of the jth feature dimension at the ith point site in the t+1th iteration ij (t+1) , where η is a learning rate, and i, j and t are all positive integers.

8. A method of applying a classification model of an abdominal audio signal, characterized in that, include: Step B1: Under the preset acquisition conditions, collect bowel sound signals from at least two acquisition points on the abdomen of the subject to be tested, and obtain the acquisition signals at each point; Step B2: Preprocess the acquired signals from each location to obtain audio segments from each location, and extract multidimensional acoustic features from the audio segments from each location to construct the initial feature vector for each location. Step B3: Using preset relevance weights, the initial feature vectors of each point are weighted to obtain weighted feature vectors of different points, wherein the relevance weights are used to characterize the relevance measure between each feature dimension of each point and the category label. Step B4: For each feature dimension in the multidimensional acoustic features, arrange the feature values ​​of the weighted feature vector of each point in the feature dimension according to the spatial order of the points, combine them into a fusion sub-vector corresponding to the feature dimension, and then concatenate the fusion sub-vectors corresponding to all feature dimensions in sequence to form a fusion feature vector. Step B5: Input the fused feature vector into a preset classification model to obtain the classification result output by the classification model, so as to obtain the information of the abdomen of the test object; The classification model is obtained by training the fused feature vectors corresponding to the training data in the training dataset corresponding to the acquisition conditions. Each training data set records abdominal bowel sounds collected from different subjects under the aforementioned acquisition conditions, and is labeled with a category label representing the abdominal health status; the audio signals in each training data set are audio signals collected from at least two different locations on the same abdomen.

9. The method according to claim 8, characterized in that: Each location has at least two audio clips; Obtaining the classification result output by the classification model includes: The classification results of the classification model for each audio segment in the acquired signal are obtained to obtain the segment-level classification results; Based on a preset majority pass condition, the category label of the acquired signal is determined according to the segment-level classification result of each audio segment in the acquired signal.

10. The method of claim 8 or 9, wherein, The classification model is obtained using the method described in any one of claims 1 to 7.

11. The method according to claim 10, characterized in that: The data collection condition is as follows: after a preset time has elapsed since the time elapsed since the preset trigger event, an abdominal intervention event is performed. The classification model includes a pre-intervention model and a post-intervention model, wherein: The pre-intervention model is obtained based on the training dataset corresponding to the collection time before the abdominal intervention event is performed, after a preset time has elapsed since the preset trigger event. The pre-intervention model is obtained based on the training dataset corresponding to the collection time after the abdominal intervention event is performed, after a preset time has elapsed since the preset trigger event. The method further includes: Acquire the signals collected before and after the same abdominal intervention event is performed on the subject under test after the same preset time interval from the same preset trigger event, and obtain the pre-intervention signal and the post-intervention signal. The classification results corresponding to the pre-intervention collected signals are obtained using the pre-intervention model; and the classification results corresponding to the post-intervention collected signals are obtained using the post-intervention model. Based on the classification results corresponding to the pre-intervention signals and the post-intervention signals, the immediate intervention effect of the abdominal intervention event performed on the subject under the preset duration is obtained.

12. The method of claim 11, wherein, The method further includes: The immediate intervention effect is obtained when the same abdominal intervention event is performed on the subject after at least two different preset time intervals from the same preset trigger event; Based on the observation periods corresponding to at least two different preset durations, the temporal intervention effect of the abdominal intervention event performed on the subject within the observation period is generated. 13.An electronic device comprising a memory and a processor, the electronic device characterized by, The memory stores a computer program, and the processor is configured to run the computer program to perform the method as described in any one of claims 1 to 12.