A medical complaint risk early warning method based on natural language processing

By conducting multimodal analysis of real-time audio streams and operation logs, a speech interaction tension function is constructed, and monitoring parameters are dynamically adjusted. This solves the problems of lag and dimensional isolation in complaint risk warning in existing technologies, and realizes real-time accurate profiling and forward-looking warning of the doctor-patient interaction status.

CN122245346APending Publication Date: 2026-06-19HUADU DISTRICT GUANGZHOU CITY PEOPLES HOSPITAL

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
HUADU DISTRICT GUANGZHOU CITY PEOPLES HOSPITAL
Filing Date
2026-03-27
Publication Date
2026-06-19

AI Technical Summary

Technical Problem

Existing technologies cannot effectively capture the dynamic evolution of tension in doctor-patient interactions in medical clinics, resulting in delayed and isolated early warning of complaint risks, and making it impossible to accurately assess the transformation of risks from latent to eruptive.

Method used

By collecting real-time audio streams and operation logs, sound source separation and temporal analysis are performed. Natural language processing is used to identify speech and text fragments of doctors and patients, construct a speech interaction tension function, calculate the system stability index using fundamental frequency coupling deviation and operational rhythm disorder, and dynamically adjust monitoring parameters to generate risk warning signals.

Benefits of technology

It enables accurate identification of doctor-patient emotional tensions and communication barriers before risks materialize, providing proactive early warnings, enhancing the ability to manage complaint risks in advance, and solving the problems of delayed early warnings and isolated data dimensions.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN122245346A_ABST
    Figure CN122245346A_ABST
Patent Text Reader

Abstract

This invention discloses a medical complaint risk early warning method based on natural language processing, belonging to the field of medical information analysis technology. This method collects real-time audio streams and real-time operation logs from the target clinic, obtains audio signal sequences from both the doctor and patient through sound source separation, and calculates a system stability index. When the index is below a threshold, retrospective speech recognition is triggered to generate text fragments, and a natural language model is used to identify interrogative and explanatory sentences, calculating the average response delay. Based on this, a speech interaction tension function is constructed, and a risk outbreak probability early warning signal is generated by combining the current value and rate of change of this function. This invention further refines the stability evaluation through fundamental frequency coupling deviation and operational rhythm disorder, and introduces an environmental signal-to-noise ratio correction mechanism and a dynamic sampling strategy for nonlinear outbreak periods. Through cross-modal coupling of acoustic low-level features and high-level semantic features, it achieves real-time quantitative monitoring and accurate early warning of the entire process of medical complaint risk.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of medical information analysis technology, and more specifically, this application relates to a medical complaint risk warning method based on natural language processing. Background Technology

[0002] In the current field of voice risk monitoring in medical clinics, traditional technologies mainly rely on sound pressure level threshold alarms or static text analysis based on keyword retrieval. These methods typically only respond when a dispute has already occurred, the volume at the scene has significantly increased, or specific provocative words have appeared, exhibiting a significant lag.

[0003] Existing technologies exhibit severe dimensional isolation when processing doctor-patient interaction data. They fail to coordinate the analysis of doctors' operational rhythms with the acoustic characteristics of the consultation room, and ignore the temporal logic characteristics in the interaction process. This lack of perception of the evolutionary laws of interaction logic makes it difficult for early warning methods to accurately assess the nonlinear probability of risk transformation from latent to outbreak, and thus cannot provide managers with forward-looking early warning references.

[0004] In summary, existing technologies have a core technical problem in capturing the dynamic evolution of tension in medical complaint risk warning scenarios. Summary of the Invention

[0005] To address the aforementioned technical problems, a medical complaint risk warning method based on natural language processing is provided. This technical solution resolves the issues raised in the background section.

[0006] In a first aspect, embodiments of this application provide a medical complaint risk warning method based on natural language processing, comprising the following steps: collecting real-time audio streams and real-time operation logs from a target clinic; performing source separation processing on the real-time audio streams to obtain doctor audio signal sequences and patient audio signal sequences, and performing time-series analysis accordingly to obtain a system stability index; when the system stability index is lower than a first preset threshold, performing retrospective speech recognition on the doctor audio signal sequences and patient audio signal sequences within a preset sliding time window to generate doctor audio text fragment sequences and patient audio text fragment sequences; in the doctor audio text fragment sequences and patient audio text fragment sequences, identifying a first text fragment sequence representing a preset interrogative sentence and a second text fragment sequence representing a preset explanatory sentence through a preset natural language model, and calculating the average response delay of the first text fragment sequence and the second text fragment sequence within the preset sliding time window; based on the average response delay, constructing a speech interaction tension function; if the current value of the speech interaction tension function exceeds a second preset threshold, generating and outputting a warning signal representing the probability of risk outbreak based on the rate of change of the speech interaction tension function within the preset sliding time window.

[0007] Secondly, this application provides a computer-readable storage medium storing a computer program that, when executed by a processor, implements the aforementioned method for early warning of medical complaints based on natural language processing.

[0008] One or more technical solutions provided in the embodiments of this application have at least the following technical effects or advantages:

[0009] 1. By extracting the first fundamental frequency fluctuation sequence of the doctor's audio signal sequence and the second fundamental frequency fluctuation sequence of the patient's audio signal sequence, the fundamental frequency coupling deviation between the two can be calculated, which can capture the emotional synchronicity between the doctor and the patient from the acoustic level. Combined with the rhythm disorder analysis of the operational perturbation rhythm sequence relative to the preset historical baseline model, the abnormality of the doctor's business operation and speech abnormality are considered together, which significantly improves the accuracy of risk identification, realizes real-time accurate portrait of the doctor-patient interaction status, and solves the problem of single risk identification dimension.

[0010] 2. Instead of relying entirely on energy-intensive speech recognition, the system first performs initial screening using a less computationally expensive system stability index. Retrospective speech recognition is only initiated when this index falls below a first preset threshold. This logic of routine acoustic monitoring combined with in-depth semantic analysis during high-risk periods ensures the real-time nature and cost-effectiveness of the early warning method. Furthermore, the tiered triggering mechanism optimizes the allocation of computing resources, alleviating the load pressure of real-time big data processing.

[0011] 3. The verbal interaction tension function is updated using an exponentially weighted moving average algorithm, and the probability of risk outbreak is calculated based on its rate of change within a sliding window. Specifically, when the continuous growth rate of the risk outbreak probability exceeds a preset threshold, the method automatically shortens the preset sliding window to a preset step size and increases the sampling rate. This mechanism enables the method to accurately identify the nonlinear transition of complaint risks from the incubation period to the outbreak period, providing crucial time for intervention by relevant personnel, dynamically capturing the nonlinear characteristics of risk evolution, and addressing the problem of failed early warning systems for sudden conflicts. Attached Figure Description

[0012] Figure 1 This is a schematic diagram illustrating the steps of a medical complaint risk warning method based on natural language processing, provided in an embodiment of this application. Detailed Implementation

[0013] This application embodiment solves the core technical problem in the prior art that it is difficult to capture the dynamic evolution of tension by using a medical complaint risk early warning method based on natural language processing.

[0014] This solution first establishes a foundation for collaborative analysis of doctor-patient dialogue and medical operations by separating the sound sources from the real-time audio stream in the consultation room and aligning the timing with real-time operation logs. The initial aim of this step is to overcome the limitations of traditional methods that rely solely on volume or isolated text. By extracting the fundamental frequency fluctuation sequence from the doctor and patient audio and calculating the fundamental frequency coupling deviation, the synchronicity and antagonism of emotions between the two parties can be captured at the physiological acoustic level. Simultaneously, by analyzing the rhythmic disorder generated from the operation logs, the doctor's operational status is correlated with their verbal performance. This system stability index calculation based on underlying signals enables the provision of a triggering condition for early warning through subtle perturbations in multimodal data, before the risk manifests as intense verbal conflict.

[0015] Upon detecting a decrease in stability, this solution enters a deep quantification phase targeting the interaction logic to address the inability of existing technologies to capture the evolutionary patterns of communication tension. By triggering retrospective speech recognition, this solution no longer blindly processes the entire text but focuses on semantic interactions within high-risk windows. It utilizes natural language models to identify pre-defined interrogative and explanatory sentence structures and further calculates the average response delay between them. The core design of this logic lies in the fact that the tension between doctors and patients is often directly reflected in the efficiency and rhythm of information exchange. By calculating the average response delay and constructing a verbal interaction tension function using an exponentially weighted moving average algorithm, the abstract communication atmosphere can be transformed into a dynamically evolving numerical curve. This not only quantifies the current level of antagonism but also preserves the historical effects of accumulated emotions through a smoothing factor.

[0016] To address the nonlinear characteristics of risk evolution, this scheme generates a risk outbreak probability warning by collaboratively analyzing the current value of the verbal interaction tension function and its rate of change within a sliding window. This two-dimensional assessment mechanism can sensitively capture the critical point where risk transitions from slow accumulation to an explosive leap. When the continuous growth rate of the risk outbreak probability exceeds the leap threshold, the scheme automatically shortens the sliding window length and increases the sampling rate of retrospective speech recognition, thereby enhancing monitoring of the nonlinear outbreak phase of risk. This dynamic adjustment mechanism ensures low-energy operation in calm conditions and extremely high sensitivity in high-risk conditions.

[0017] The problem of data silos was solved through the coupled analysis of multimodal signals; the problem of missing interaction logic was solved through the semantic quantification of response latency; and the problem of insufficient capture of evolutionary trends was solved through real-time monitoring of the rate of change of tension. The ultimate technical effect is that, in complex medical treatment scenarios, it can accurately identify emotional tension and communication barriers hidden beneath routine discourse, and provide highly forward-looking early warning signals before complaint risks truly escalate into behavioral conflicts, effectively mitigating regulatory blind spots caused by single-dimensional and insufficiently timely early warnings.

[0018] To better understand the above technical solutions, the following will provide a detailed explanation of the technical solutions in conjunction with the accompanying drawings and specific implementation methods.

[0019] Real-time audio streams can include real-time diagnostic and treatment audio streams from the target consultation room, which are collected in real time through an omnidirectional microphone in the consultation room and a standardized interface of the hospital's HIS / EMR system, serving as the core real-time monitoring data source of this invention; on-site complaint audio data, which are collected in real time through omnidirectional microphones deployed in the hospital's complaint reception room and doctor-patient communication room, with collection parameters completely consistent with those of the consultation room audio collection parameters: collection frequency range of 80Hz-8kHz, sampling rate of 16kHz, sampling precision of 16bit, transmission delay of no more than 100ms, and synchronous collection of the complaint system operation logs of reception staff, aligned with the audio stream timestamps; telephone complaint audio data, which are collected in full real time through the standardized recording interface of the hospital's complaint hotline control switch, synchronously acquiring the call time, call duration, called receptionist ID, and call start and end timestamps, generating an original audio file with a unique time anchor point, with the collected audio parameters uniformly set to a sampling rate of 16kHz, mono, and 16bit quantization precision;

[0020] In addition to real-time audio streams, raw data can also be text data. For example, complaint text data from the hospital's internal complaint system can be collected in real time through the standardized data interface of the hospital's complaint management system, with a collection frequency of 100ms / time, and the complaint submission timestamp, the department / medical staff identifier of the complaint, the complaint type tag, the complainant's anonymity identifier, and the main content of the complaint text can be extracted simultaneously; complaint text data transferred from the 12345 government hotline can be collected on a timed and targeted basis through the standardized API interface of the local government data sharing and open platform, with a collection frequency of no less than once / hour; and complaint text data from the hospital's official website online message board can be obtained by acquiring the publicly accessible content of the official complaint message board on the hospital's official website, and the message posting timestamp, message board identifier, anonymous user identifier, and the complete text of the complaint can be extracted simultaneously.

[0021] Text data can be extracted and processed to obtain audio text fragment sequences of doctors and audio text fragment sequences of patients.

[0022] like Figure 1The diagram shown is a structural schematic of a medical complaint risk warning method based on natural language processing provided in this application embodiment. The method includes the following steps: collecting real-time audio streams and real-time operation logs from a target clinic; performing source separation processing on the real-time audio streams to obtain doctor audio signal sequences and patient audio signal sequences, and performing time-series analysis to obtain a system stability index; when the system stability index is lower than a first preset threshold, performing retrospective speech recognition on the doctor audio signal sequences and patient audio signal sequences within a preset sliding time window to generate doctor audio text fragment sequences and patient audio text fragment sequences; in the doctor audio text fragment sequences and patient audio text fragment sequences, identifying a first text fragment sequence representing a preset interrogative sentence and a second text fragment sequence representing a preset explanatory sentence using a preset natural language model, and calculating the average response delay of the first text fragment sequence and the second text fragment sequence within the preset sliding time window; based on the average response delay, constructing a speech interaction tension function; if the current value of the speech interaction tension function exceeds a second preset threshold, generating and outputting a warning signal representing the probability of risk outbreak based on the rate of change of the speech interaction tension function within the preset sliding time window.

[0023] Real-time audio stream acquisition uses an omnidirectional microphone deployed in the target clinic, with a sampling frequency range of 80Hz to 8kHz, an initial sampling rate of 16kHz, a sampling accuracy of 16bit, and a transmission delay of no more than 100ms.

[0024] Real-time operation log collection utilizes standardized interfaces of hospital information methods to collect real-time operation logs of the HIS, EMR, and laboratory test ordering methods of the corresponding doctors in the target clinic. The collection frequency is 100ms / time. The log content includes structured data such as operation type, timestamp, operator, and operation object. The data transmission complies with the requirements of the "Hospital Information Interconnection and Interoperability Standardization Maturity Assessment Scheme".

[0025] The sound source separation process employs a speaker separation algorithm based on deep clustering. The input is the real-time audio stream from the target clinic, and the output is a separated, timestamp-sorted sequence of doctor's audio signals and patient's audio signals. The algorithm's pre-trained model is fine-tuned based on 1,000 hours of labeled audio data from domestic doctor-patient dialogue scenarios, achieving a speaker separation accuracy of no less than 95%. After separation, the two audio streams are processed into frames to generate a continuous audio signal sequence.

[0026] The first preset threshold is based on the statistical value of the system stability index corresponding to the complaint events in 1000 sets of doctor-patient dialogue samples, with a value range of 0.3 to 0.6 and a typical engineering value of 0.5.

[0027] Retrospective speech recognition adopts a standard end-to-end speech recognition model. The model is fine-tuned based on medical scenario dialogue data, and the accuracy of medical terminology recognition is no less than 98%. The input is two audio signal sequences within a preset sliding time window, and the output is text content with timestamps. It generates doctor audio text segment sequences and patient audio text segment sequences, and each text segment is bound with start and end millisecond-level timestamps.

[0028] The pre-set natural language model adopts a medical text classification model based on the BERT architecture. The model is pre-trained and fine-tuned based on at least 500,000 doctor-patient dialogue annotation data, and the sentence classification accuracy is no less than 96%. The input is a sequence of text fragments, and the output is the sentence classification result of the text fragments.

[0029] The preset interrogative sentence type consists of questions raised by patients regarding their condition, treatment plan, medication, examinations, costs, and other treatment-related matters. The recognition result is the first text segment sequence. The preset explanatory sentence type consists of declarative explanatory sentences given by doctors in response to patients' questions, such as explanations of the condition, treatment instructions, and medication guidance. The recognition result is the second text segment sequence.

[0030] The second preset threshold is set based on the 90th percentile of the verbal interaction tension function value corresponding to the doctor-patient dialogue sample where a complaint occurred, with a value range of 2s to 10s and a typical engineering value of 5s.

[0031] This embodiment solves the core technical problems of existing technologies, such as severe early warning lag, isolated data dimensions, and inability to capture the dynamic evolution of tension, by acquiring multimodal data from real-time audio streams and operation logs in the clinic, pre-assessing sound source separation and system stability index, trigger-based retrospective speech recognition and semantic sentence recognition, quantification of verbal interaction tension based on response delay, and dual-dimensional risk outbreak probability early warning. It achieves accurate identification of doctor-patient emotional tension and communication barriers hidden under routine discourse in complex outpatient clinic scenarios, and outputs forward-looking early warnings before complaint risks turn into behavioral conflicts. This effectively fills the regulatory blind spots of existing technologies and significantly improves the hospital's ability to pre-control outpatient complaint risks.

[0032] Furthermore, the specific analysis process of the system stability index is as follows: The doctor's audio signal sequence and the patient's audio signal sequence are time-aligned to generate a joint speech activity time axis. On this joint speech activity time axis, the first fundamental frequency fluctuation sequence of the doctor's audio signal sequence and the second fundamental frequency fluctuation sequence of the patient's audio signal sequence are extracted. Event density analysis is performed on the real-time operation log to generate an operation event sequence, which is then mapped to the joint speech activity time axis to generate an operation perturbation rhythm sequence. Within a preset sliding time window, the fundamental frequency coupling deviation between the first and second fundamental frequency fluctuation sequences is calculated, and the rhythm disorder degree of the operation perturbation rhythm sequence relative to a preset historical baseline model is calculated. The fundamental frequency coupling deviation and rhythm disorder degree are weighted and fused to generate the system stability index.

[0033] In this embodiment, the timing alignment process uses a speech activity detection algorithm. The input is a doctor's audio signal sequence and a patient's audio signal sequence, and the output is joint timing data with the timestamps of the two signals completely aligned, with an alignment accuracy of not less than 10ms. Based on the effective start and end timestamps of the two audio signals, a joint speech activity timeline is generated, which is based on Unix millisecond-level timestamps and covers the time intervals of all effective speech segments.

[0034] The fundamental frequency fluctuation sequence extraction adopts the fundamental frequency detection algorithm, extracts the fundamental frequency value in each effective speech frame of the joint speech activity time axis, and generates the first fundamental frequency fluctuation sequence and the second fundamental frequency fluctuation sequence in the order of timestamp. The fundamental frequency value ranges from 80Hz to 450Hz.

[0035] Real-time operation logs can be real-time operation logs of hospital information methods, electronic medical record methods, and test and examination ordering methods used by doctors in the target clinic. They include structured data such as operation type, operation timestamp, and operation object.

[0036] Event density analysis can use 100ms as the smallest time granularity to count the number of valid operation events within each time granularity, including automatically triggered non-human operation events, retained operation events manually triggered by doctors, and generated operation event sequences in ascending order of timestamp.

[0037] The preset sliding time window can be based on the statistical data of the cumulative emotional cycle of large-sample doctor-patient dialogues, with a value range of 5s to 60s, a typical engineering value of 30s, and a sliding step size of 1 / 10 of the window length, with a typical value of 3s.

[0038] Through the above technical solutions, this embodiment solves the technical problems of existing technologies that cannot collaboratively analyze the rhythm of doctors' business operations and acoustic features in the clinic, and the isolation of data dimensions, by aligning the audio sequence of doctors and patients and extracting fundamental frequency features, mapping the sequence of medical operation events and analyzing their rhythm, and by weighted fusion of multi-dimensional features. It realizes a quantitative assessment of the status of doctor-patient interaction methods in the clinic from both physiological acoustics and business operation dimensions. It can perceive the imbalance of methods through small perturbations of multimodal data before the risk is manifested as intense verbal conflict, and provides a forward-looking triggering premise for subsequent early warning, which is different from the static analysis methods of existing technologies that only rely on volume or isolated keywords.

[0039] Furthermore, the specific calculation process of the fundamental frequency coupling deviation is as follows: within a preset sliding time window, the first fundamental frequency fluctuation sequence and the second fundamental frequency fluctuation sequence are normalized respectively to obtain the first normalized fundamental frequency sequence and the second normalized fundamental frequency sequence; the dynamic time warping path distance between the first normalized fundamental frequency sequence and the second normalized fundamental frequency sequence is calculated, and the dynamic time warping path distance is used as the fundamental frequency coupling deviation.

[0040] In this embodiment, the normalization process adopts the min-max normalization method. The input is the first fundamental frequency fluctuation sequence and the second fundamental frequency fluctuation sequence within a preset sliding time window, and the output is the normalized sequence with values ​​in the range [0,1].

[0041] The Dynamic Time Warping (DTW) algorithm employs a symmetric path constraint. The input consists of a first normalized fundamental frequency sequence and a second normalized fundamental frequency sequence of equal length, and the output is the optimal warping path distance between the two sequences.

[0042] The example execution process is as follows: construct a distance matrix between two sequences, where the matrix elements are the absolute differences between corresponding elements of the two sequences; use dynamic programming to find the optimal path with the minimum cumulative cost from the starting point to the ending point of the matrix, with the path constraint that movement is only allowed to the right, up, or diagonally; calculate the sum of the cumulative costs of the optimal path, which is the dynamic time warped path distance.

[0043] The preset maximum fundamental frequency coupling deviation constant is set based on the 99th percentile of the DTW distance of 1000 normal doctor-patient dialogue samples, with a value range of 10 to 100 and a typical engineering value of 50.

[0044] Through the above technical solution, this embodiment eliminates individual fundamental frequency differences by normalizing the fundamental frequency sequence by min-max, and uses a dynamic time warping algorithm to calculate the coupling deviation of the fundamental frequency sequences of doctors and patients. This solves the technical problem that existing technologies cannot capture the synchronicity and antagonism of emotions between doctors and patients from a physiological acoustic level, and can only identify extreme volume changes. It achieves refined quantification of the emotional coupling state of doctor and patient voices, and can accurately identify the emotional antagonism characteristics masked under normal volume, which is different from the coarse-grained alarm methods of existing technologies that only rely on sound pressure level thresholds.

[0045] Furthermore, the specific process for obtaining the rhythm disorder degree is as follows: obtain the historical operation data of the doctor corresponding to the target clinic, and construct a preset historical baseline model that includes the average time interval and standard deviation of the historical operation interval; extract the timestamps of each operation event in the operation disturbance rhythm sequence, and calculate the operation interval sequence of adjacent operation events; calculate the Mahalanobis distance between each operation interval in the operation interval sequence and the average time interval of the historical operation interval, and use the cumulative mean of the Mahalanobis distance within the preset sliding time window as the rhythm disorder degree.

[0046] In this embodiment, the historical operation data can be obtained from outpatient operation data of the same department and clinic within the past 3 months, with a data volume of no less than 200 complete outpatient treatment cycles to ensure the statistical significance of the baseline data.

[0047] Baseline model construction can be achieved by cleaning historical operation data, filtering operation data from non-outpatient clinic hours and automatically triggered non-manual operation data, retaining manual operation events during outpatient clinic hours, statistically analyzing the time intervals of all adjacent manual operation events, and calculating the average time interval and standard deviation of the time intervals. This allows for the construction of a pre-set historical baseline model specific to each doctor, ensuring its compatibility with the doctor's current operating habits.

[0048] The preset maximum rhythm disorder constant can be determined based on the 90th percentile of the Mahalanobis distance from the target doctor's historical operation data, with a value range of 2 to 20 and a typical engineering value of 10.

[0049] Through the above technical solution, this embodiment solves the technical problems of existing technologies that ignore the correlation between doctors' operational state and verbal interaction and cannot capture abnormal disturbances in operational behavior by constructing a doctor-specific operational rhythm baseline model, using Mahalanobis distance to quantify the degree of abnormality in operational intervals, and generating rhythm disorder degree by using sliding window cumulative mean. It achieves accurate quantification of the degree of rhythm disorder in doctors' diagnosis and treatment operations, and can predict doctors' emotional fluctuations and imbalances in diagnosis and treatment state through subtle changes in operational behavior. It provides a reliable dimension for stability assessment that is independent of voice data, which is different from the single-dimensional analysis methods of existing technologies that only focus on voice content.

[0050] Furthermore, the specific formula for calculating the system stability index is as follows: ,in, The system stability index. This refers to the fundamental frequency coupling deviation. For the degree of rhythm disorder, This represents the preset maximum fundamental frequency coupling deviation constant. This represents the preset maximum rhythm disorder constant. This indicates the preset coupling weight coefficient. This indicates the preset disorder weighting coefficient. .

[0051] In this embodiment, through the above technical solution, the fundamental frequency coupling deviation and rhythm disorder are integrated into a unified system stability index by a normalized weighted fusion formula. This solves the technical problems of existing technologies, such as the inability to coordinate the quantification of multimodal data and the lack of unified state evaluation indicators. It realizes a single-valued and comparable real-time evaluation of the doctor-patient interaction state in the clinic. It can quickly identify the imbalance state through a single indicator, providing a clear quantitative threshold basis for triggering subsequent retrospective speech recognition. This is different from the shortcomings of existing technologies, such as the lack of unified evaluation standards and isolated analysis of multi-dimensional data.

[0052] Furthermore, before generating the system stability index, a signal-to-noise ratio (SNR) judgment and correction are performed. Specifically, this includes: acquiring the environmental noise power spectral density of the target clinic in real time and calculating the environmental SNR; if the environmental SNR is not lower than the preset SNR threshold, no adjustment is made; if the environmental SNR is lower than the preset SNR threshold, the preset coupling weight coefficient is reduced by the preset compensation coefficient and the preset disorder weight coefficient is increased simultaneously, and then normalized again to obtain the normalized coupling weight coefficient and the normalized disorder weight coefficient, which are then used in the calculation formula of the system stability index.

[0053] In this embodiment, the preset signal-to-noise ratio threshold is set according to the minimum signal-to-noise ratio requirement for speech recognition, with a value range of 20dB to 40dB and a typical engineering value of 30dB.

[0054] The preset compensation coefficient is set based on large sample test data of the attenuation curve of the speech fundamental frequency extraction accuracy under different signal-to-noise ratios, with a value range of 0.1 to 0.5 and a typical engineering value of 0.2; the compensation coefficient is a fixed step size adjustment value.

[0055] This embodiment solves the technical problem of decreased reliability of voice data and distortion of state assessment caused by existing technologies in high-noise environments by real-time monitoring of environmental signal-to-noise ratio and dynamic correction and normalization of weights based on noise level. It realizes adaptive adjustment of weight coefficients under different acoustic environments, ensuring the accuracy and robustness of the system stability index calculation in complex clinic environments, which is different from the defects of existing technologies with fixed weights and inability to adapt to environmental changes.

[0056] Furthermore, a speech interaction tension function is constructed, specifically including: calculating the average response delay between the first text segment sequence and the corresponding second text segment sequence at the current moment; obtaining the value of the speech interaction tension function at the previous moment; obtaining a preset smoothing factor; and updating the current value of the speech interaction tension function using an exponentially weighted moving average algorithm, as shown in the following formula: ,in, Indicates the average response delay. This represents the current value of the verbal interaction tension function. The preset smoothing factor, This represents the value of the verbal interaction tension function at the previous moment.

[0057] In this embodiment, the average response latency is calculated starting from the end timestamp of the first text segment sequence and ending with the start timestamp of the corresponding matching second text segment sequence. The time difference between the two timestamps is calculated in seconds, which is the average response latency corresponding to the question and answer set.

[0058] The preset smoothing factor is set based on the temporal fluctuation characteristics of the average response delay in doctor-patient dialogue and large sample statistical data on the cumulative effect of emotions. The value range is [0.1, 0.9], and the typical engineering value is 0.3. The larger the smoothing factor, the higher the weight of the current average response delay on the tension, and the lower the influence of historical values.

[0059] This formula is applicable to real-time updates after each set of valid question-answer pairs is matched. At the end of each preset sliding time window, a global update is performed based on the average response delay of all question-answer pairs within the window.

[0060] Through the above technical solution, this embodiment solves the technical problems of existing technologies that cannot capture the dynamic evolution of doctor-patient communication tension and ignore the historical effect of accumulated emotions by accurately calculating the average response delay of question-and-answer pairs and constructing a dynamically updated verbal interaction tension function using an exponentially weighted moving average algorithm. It realizes the transformation of the abstract doctor-patient communication atmosphere into a quantifiable and traceable dynamic numerical curve, which can reflect the current degree of interaction antagonism and retain the historical effect of accumulated emotions through a smoothing factor. This provides a core quantitative basis for calculating the probability of risk outbreak, which is different from the analysis methods of existing technologies that rely solely on static keyword retrieval and have no time sequence or accumulated effect.

[0061] Furthermore, the specific process for obtaining the probability of risk outbreak is as follows: based on the rate of change of the verbal interaction tension function within a preset sliding time window, a warning signal representing the probability of risk outbreak is generated. The specific formula for calculating the probability of risk outbreak is as follows: ,in, This represents the probability of a risk outbreak. The rate of change of the verbal interaction tension function within a preset sliding time window. This represents the current value of the verbal interaction tension function. This indicates the preset weighting coefficient for changes. This represents the preset function weight coefficients. This indicates the preset parameter for preventing zeroing.

[0062] In this embodiment, the rate of change is calculated using the first-order difference method. This formula is applicable to the calculation of the probability of risk outbreak at the end of each preset sliding time window.

[0063] Through the above technical solution, this embodiment solves the technical problems of existing technologies being unable to capture the critical point of risk from slow accumulation to explosive leap and the serious lag in early warning by collaboratively judging the current value and rate of change of the verbal interaction tension function and constructing a two-dimensional risk outbreak probability calculation formula. It achieves accurate quantification and forward-looking prediction of the probability of medical complaint risk outbreak, can keenly identify the non-linear growth trend of risk, and output graded early warning signals before the dispute actually breaks out, which is different from the post-event alarm methods of existing technologies that can only respond after the conflict breaks out.

[0064] Furthermore, the numerical fluctuation of the risk outbreak probability within a preset consecutive preset sliding time window is monitored to obtain the continuous growth rate of the risk outbreak probability; if the continuous growth rate of the risk outbreak probability exceeds the preset jump threshold, it is determined that the risk is in a non-linear outbreak period, and the length of the current preset sliding time window is shortened to a preset step size, and the sampling rate of retrospective speech recognition is increased according to the continuous growth rate.

[0065] In this embodiment, the formula for calculating the continuous growth rate is: ,in For continuous growth rate, To preset the number of continuous sliding time windows, For the first The probability of risk outbreak within a consecutive preset sliding time window To determine the probability of risk outbreak in the first window of a continuously preset sliding time window sequence; when At that time, take Avoid division by zero errors.

[0066] The preset threshold is set based on the 95th percentile of the probability growth rate before the risk outbreak in historical complaint events, with a range of 20% to 100% and a typical value of 50% for engineering projects.

[0067] The preset step size is set based on the high-frequency monitoring requirements during the nonlinear outbreak period of risk, with a value range of 1s to 10s and a typical engineering value of 5s; the sliding step size is adjusted synchronously to 1 / 10 of the shortened window length, with a typical value of 0.5s.

[0068] This embodiment solves the technical problems of existing technologies, such as the inability to adapt to the nonlinear characteristics of risk evolution, insufficient monitoring sensitivity in high-risk states, and waste of computing resources in low-risk states, by real-time monitoring of the continuous growth rate of the probability of risk outbreaks, automatic determination of nonlinear outbreak periods, and dynamic adaptive adjustment of the sliding window length and speech recognition sampling rate. It achieves a dynamic balance between monitoring accuracy and computing resource consumption, ensuring operational efficiency in low-risk states while enabling high-frequency and high-precision enhanced monitoring during high-risk outbreak periods. This further improves the foresight and reliability of the early warning method, and is different from the shortcomings of existing technologies that use fixed monitoring parameters and cannot adapt to dynamic changes in risks.

[0069] This application also provides a computer-readable storage medium for storing a computer program, which, when executed by a processor, implements a medical complaint risk warning method based on natural language processing.

[0070] Those skilled in the art will understand that embodiments of the present invention can be provided as methods, approaches, or computer program products. Therefore, the present invention can take the form of a completely hardware embodiment, a completely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present invention can take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, etc.) containing computer-usable program code.

[0071] This invention is described with reference to flowchart illustrations and / or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each block of the flowchart illustrations and / or block diagrams, as well as combinations of blocks in the flowchart illustrations and / or block diagrams, can be implemented by computer program instructions. These computer program instructions can be provided to a processor of a general-purpose computer, special-purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, generate instructions for implementing the flowchart illustrations and / or block diagrams. Figure 1 One or more processes and / or boxes Figure 1 A device that provides the functions specified in one or more boxes.

[0072] These computer program instructions may also be stored in a computer-readable storage medium that can direct a computer or other programmable data processing device to function in a particular manner, such that the instructions stored in the computer-readable storage medium produce an article of manufacture including instruction means, which are implemented in a process Figure 1 One or more processes and / or boxes Figure 1 The function specified in one or more boxes.

[0073] These computer program instructions may also be loaded onto a computer or other programmable data processing equipment to cause a series of operational steps to be performed on the computer or other programmable equipment to produce a computer-implemented process, thereby providing instructions that execute on the computer or other programmable equipment for implementing the process. Figure 1 One or more processes and / or boxes Figure 1 The steps of the function specified in one or more boxes.

[0074] Although preferred embodiments of the invention have been described, those skilled in the art, upon learning the basic inventive concept, can make other changes and modifications to these embodiments. Therefore, the appended claims are intended to be interpreted as including the preferred embodiments as well as all changes and modifications falling within the scope of the invention.

[0075] Obviously, those skilled in the art can make various modifications and variations to this invention without departing from its spirit and scope. Therefore, if these modifications and variations fall within the scope of the claims of this invention and their equivalents, this invention also intends to include these modifications and variations.

Claims

1. A medical complaint risk early warning method based on natural language processing, characterized in that, Includes the following steps: The system collects real-time audio streams and real-time operation logs from the target clinic, performs sound source separation processing on the real-time audio streams to obtain the doctor's audio signal sequence and the patient's audio signal sequence, and performs time series analysis based on these to obtain the system stability index. When the system stability index is lower than the first preset threshold, retrospective speech recognition is performed on the doctor's audio signal sequence and the patient's audio signal sequence within the preset sliding time window to generate doctor's audio text segment sequence and patient's audio text segment sequence; In the doctor's audio text segment sequence and the patient's audio text segment sequence, a first text segment sequence representing a preset interrogative sentence and a second text segment sequence representing a preset explanatory sentence are identified by a preset natural language model, and the average response delay of the first text segment sequence and the second text segment sequence within a preset sliding time window is calculated. Based on the average response delay, a speech interaction tension function is constructed. If the current value of the speech interaction tension function exceeds the second preset threshold, an early warning signal representing the probability of risk outbreak is generated and output according to the rate of change of the speech interaction tension function within a preset sliding time window.

2. The medical complaint risk early warning method based on natural language processing according to claim 1, characterized in that, The specific analysis process for the system stability index is as follows: The doctor's audio signal sequence and the patient's audio signal sequence are time-aligned to generate a joint speech activity time axis. The first fundamental frequency fluctuation sequence of the doctor's audio signal sequence and the second fundamental frequency fluctuation sequence of the patient's audio signal sequence are extracted from the joint speech activity time axis. Perform event density analysis on real-time operation logs to generate operation event sequences, and map the operation event sequences to the joint speech activity time axis to generate operation perturbation rhythm sequences; Within a preset sliding time window, the fundamental frequency coupling deviation between the first fundamental frequency fluctuation sequence and the second fundamental frequency fluctuation sequence is calculated, and the rhythm disorder of the operational perturbation rhythm sequence relative to the preset historical baseline model is calculated. The system stability index is generated by weighted fusion of fundamental frequency coupling deviation and rhythm disorder.

3. The medical complaint risk early warning method based on natural language processing according to claim 2, characterized in that, The specific calculation process for the fundamental frequency coupling deviation is as follows: Within the preset sliding time window, the first fundamental frequency fluctuation sequence and the second fundamental frequency fluctuation sequence are normalized respectively to obtain the first normalized fundamental frequency sequence and the second normalized fundamental frequency sequence. Calculate the dynamic time warping path distance between the first normalized fundamental frequency sequence and the second normalized fundamental frequency sequence, and use the dynamic time warping path distance as the fundamental frequency coupling deviation.

4. The medical complaint risk early warning method based on natural language processing according to claim 2, characterized in that, The specific process for obtaining the rhythm disorder level is as follows: Obtain historical operation data of the doctor corresponding to the target clinic and construct a preset historical baseline model that includes the average time interval and standard deviation of the time interval of historical operations; Extract the timestamps of each operation event from the operation perturbation rhythm sequence, and calculate the operation interval sequence of adjacent operation events; Calculate the Mahalanobis distance between each operation interval in the operation interval sequence and the historical average time interval, and use the cumulative mean of the Mahalanobis distance within a preset sliding time window as the rhythm disorder degree.

5. The medical complaint risk early warning method based on natural language processing according to claim 2, characterized in that, The specific formula for calculating the system stability index is as follows: wherein, is a system stability index, is a base frequency coupling deviation, is a rhythm disorder degree, represents a preset maximum base frequency coupling deviation constant, represents a preset maximum rhythm disorder degree constant, represents a preset coupling weight coefficient, represents a preset disorder weight coefficient, .

6. The medical complaint risk early warning method based on natural language processing according to claim 5, characterized in that, Before the system stability index is generated, a signal-to-noise ratio (SNR) correction is performed, which includes: Real-time acquisition of the environmental noise power spectral density of the target clinic, and calculation of the environmental signal-to-noise ratio; If the environmental signal-to-noise ratio is not lower than the preset signal-to-noise ratio threshold, no adjustment will be made; If the environmental signal-to-noise ratio is lower than the preset signal-to-noise ratio threshold, the preset coupling weight coefficient is reduced by the preset compensation coefficient and the preset disorder weight coefficient is increased simultaneously. Then, it is normalized again to obtain the normalized coupling weight coefficient and the normalized disorder weight coefficient, which are used in the calculation formula of the system stability index.

7. The medical complaint risk early warning method based on natural language processing according to claim 1, characterized in that, The construction of the verbal interaction tension function specifically includes: The average response delay of the first and second text segment sequences within a preset sliding time window is statistically analyzed. Get the value of the verbal interaction tension function from the previous moment; Obtain the preset smoothing factor; The current value of the verbal interaction tension function is updated using an exponentially weighted moving average algorithm, as shown in the following formula: wherein, denotes the average response delay, denotes the current value of the speech interaction tension function, is a preset smoothing factor, is the value of the speech interaction tension function at the previous time instant. 8.The medical complaint risk early warning method based on natural language processing of claim 1, wherein, The specific process for obtaining the probability of the risk outbreak is as follows: Based on the rate of change of the verbal interaction tension function within a preset sliding time window, an early warning signal representing the probability of risk outbreak is generated. The specific formula for calculating the probability of risk outbreak is as follows: wherein, is a risk outbreak probability, is a rate of change of the speech interaction tension function over a preset sliding time window, is a current value of the speech interaction tension function, denotes a preset change weight coefficient, denotes a preset function weight coefficient, denotes a preset prevention zero parameter. 9.The medical complaint risk early warning method based on natural language processing of claim 1, wherein, Monitor the fluctuation of the probability of risk outbreak within a preset number of consecutive preset sliding time windows to obtain the continuous growth rate of the probability of risk outbreak; If the continuous growth rate of the probability of risk outbreak exceeds the preset jump threshold, it is determined that the risk is in a non-linear outbreak period. Then, the length of the current preset sliding time window is shortened to the preset step size, and the sampling rate of the retrospective speech recognition is increased according to the continuous growth rate.

10. A computer-readable storage medium storing a computer program, the computer program comprising instructions that, when executed by a computer, cause the computer to perform the method of any one of claims 1 to 9. When the computer program is executed by a processor, it implements the method as described in any one of claims 1-9.