Human-machine voice interaction method and device for intelligent device

By collecting packet loss rate and network status data in the human-computer voice interaction system and dynamically adjusting the redundancy ratio, the problem of lagging redundancy ratio adjustment in the existing technology is solved, enabling early prediction and immediate response to data loss, and improving the quality and efficiency of voice interaction.

CN122245313APending Publication Date: 2026-06-19SHENZHEN NINGRUI ELECTRONICSAL TECH CO LTD

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
SHENZHEN NINGRUI ELECTRONICSAL TECH CO LTD
Filing Date
2026-04-02
Publication Date
2026-06-19

Smart Images

  • Figure CN122245313A_ABST
    Figure CN122245313A_ABST
Patent Text Reader

Abstract

This invention relates to the field of human-computer voice interaction technology, specifically disclosing a human-computer voice interaction method and apparatus for intelligent devices. The apparatus includes: an information acquisition module, used to acquire human-computer interaction voice data based on the intelligent device, and convert user voice into text commands through voice recognition technology. By comparing the packet loss rate during the text command conversion process of a single human-computer voice interaction with a preset packet loss rate threshold, and when it is determined that the packet loss rate of voice data transmission is low, a risk prediction module introduces network status data during the human-computer voice interaction process based on the packet loss rate data to predict the packet loss risk during subsequent use of the human-computer voice interaction device. This enables early prediction of data loss risk, and on this basis, the redundancy ratio can be adjusted in advance to avoid the situation of "remediation after packet loss occurs," thereby optimizing the redundancy ratio adjustment method.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of human-computer voice interaction technology, specifically to a human-computer voice interaction method and apparatus for intelligent devices. Background Technology

[0002] Human-computer voice interaction for smart devices is a current research hotspot in the field of artificial intelligence. Its core lies in achieving natural and efficient interaction between humans and devices through technologies such as speech recognition, natural language understanding, and speech synthesis.

[0003] The technical implementation of a human-computer voice interaction system involves converting user speech into text commands using speech recognition technology, then parsing the intent and executing the corresponding operation through natural language processing. During this process, the microphone collects the user's voice and converts it into a data signal, which is then encoded to generate voice data packets. These packets are then sent to the receiving end via a wireless network. Packet loss often occurs during this transmission process. To avoid data loss, the redundancy ratio is typically adjusted to increase redundant data, thereby covering the expected packet loss and enabling the receiving end to recover the original data, thus preventing data loss.

[0004] In existing technologies, traditional human-computer voice interaction methods typically adjust the redundancy ratio dynamically based on the packet loss rate during a single human-computer voice interaction process to avoid voice data loss. In this case, there is a risk of lag, meaning that data transmission loss cannot be prevented in time, which leads to a decrease in voice quality and a reduction in the communication efficiency of the human-computer voice interaction system, thus affecting the user experience. Summary of the Invention

[0005] The purpose of this invention is to provide a human-computer voice interaction method and apparatus for smart devices, and to solve the following technical problems: How to optimize the redundancy ratio adjustment method.

[0006] The objective of this invention can be achieved through the following technical solutions: A human-computer voice interaction method and apparatus for intelligent devices, the apparatus comprising: The information acquisition module is used to acquire human-computer interaction voice data based on human-computer voice interaction intelligent devices, and convert user voice into text commands through voice recognition technology; The data acquisition module is used to collect packet loss rate and network status data of the human-computer voice interaction smart device during a single human-computer voice interaction process; The data evaluation module is used to compare the packet loss rate during the text command conversion process of a user's human-computer voice interaction with a preset packet loss rate threshold, and evaluate the packet loss rate of voice data transmission during the voice conversion process based on the comparison results. The first decision module is used to evaluate the packet loss rate of voice data transmission during the voice conversion process and decide whether the redundancy ratio needs to be adjusted. The risk prediction module is used to predict the risk of packet loss during subsequent use of the human-computer voice interaction device by introducing network status data during the human-computer voice interaction process when the packet loss rate of voice data transmission is determined to be low. The second decision module is used to decide whether to adjust the redundancy ratio based on the predicted packet loss risk during subsequent use of the human-computer voice interaction device. The redundancy ratio adjustment module is used to dynamically adjust the redundancy ratio of the human-computer voice interaction device based on the decision results of the first decision module and the second decision module.

[0007] Furthermore, the network status data collected by the data acquisition module includes: The number of voice data packets transmitted by the intelligent device during any human-computer voice interaction, the size of the voice data packets, the network transmission latency, and the bandwidth utilization rate.

[0008] Furthermore, the evaluation process of the data evaluation module includes: By measuring the packet loss rate during the voice data transmission phase of the user's i-th human-computer voice interaction. Compared with the preset packet loss rate threshold Perform a comparison; like When judging the user's i-th human-computer voice interaction, the packet loss rate of voice data transmission is high during the voice conversion process; like When determining the user's i-th human-computer voice interaction, the packet loss rate of voice data transmission is low during the voice conversion process; Where i represents any human-computer voice interaction process initiated by the user.

[0009] Furthermore, the decision-making process of the first decision module includes: When it is determined that the packet loss rate of voice data transmission is high during the user's i-th human-computer voice interaction, the redundancy ratio of the human-computer voice interaction device is dynamically adjusted through the redundancy ratio adjustment module. When it is determined that the packet loss rate of voice data transmission is low during the user's i-th human-computer voice interaction, network status data during the human-computer voice interaction process is introduced to predict the packet loss risk during subsequent use of the human-computer voice interaction device.

[0010] Furthermore, the prediction process of the risk prediction module includes: By combining the packet loss rate of voice data transmission during the user's i-th human-computer voice interaction, as well as the number of voice data packets transmitted by the smart device during the user's i-th human-computer voice interaction, the size of the voice data packets, and the network transmission latency data, a calculation module is established to calculate the packet loss risk index during the user's i-th human-computer voice interaction. .

[0011] Furthermore, the prediction process of the risk prediction module also includes: By combining the packet loss risk index obtained during the user's i-th human-computer voice interaction in real time calculation Establish a curve showing the change in the packet loss risk index. Based on the integral formula, the change in the packet loss risk index during the process from the user's first human-computer voice interaction to the i-th human-computer voice interaction was calculated. .

[0012] Furthermore, the decision-making process of the second decision module also includes: The change in packet loss risk index during the process from the user's first human-computer voice interaction to the i-th human-computer voice interaction. The change threshold of the preset packet loss risk index Perform a comparison; like When it is determined that during the i-th human-computer voice interaction, the packet loss rate during the voice packet data transmission process shows an upward trend, and the redundancy ratio needs to be adjusted in advance. like If the packet loss rate during the i-th human-computer voice interaction is not showing a significant upward trend, then there is no need to adjust the redundancy ratio.

[0013] Furthermore, the adjustment process of the redundancy ratio adjustment module includes: when At that time, by combining the packet loss rate during the voice data transmission phase of the user's i-th human-computer voice interaction process... The redundancy ratio is adjusted according to the size, and the voice data is retransmitted after the redundancy ratio adjustment is completed.

[0014] Furthermore, the adjustment process of the redundancy ratio adjustment module also includes: when At that time, a calculation model is established by introducing the bandwidth utilization rate and data transmission delay data of the intelligent device for human-computer voice interaction during the user's i-th human-computer voice interaction process, and the redundancy ratio after the user's i-th human-computer voice interaction is calculated. .

[0015] A human-computer voice interaction method for smart devices, the method comprising: S1: The information acquisition module acquires human-computer interaction voice data based on the human-computer voice interaction smart device, and converts the user's voice into text commands through voice recognition technology; S2: During a single human-computer voice interaction process, the data acquisition module collects packet loss rate and network status data of the human-computer voice interaction smart device; S3: The data evaluation module compares the packet loss rate during the text command conversion process of a user's human-computer voice interaction with the preset packet loss rate threshold, and evaluates the packet loss rate of voice data transmission during the voice conversion process based on the comparison results. S4: Based on the evaluation results of the packet loss rate of voice data transmission during the voice conversion process, the first decision module decides whether the redundancy ratio needs to be adjusted. If yes, proceed to step S7; otherwise, proceed to step S5. S5: When the packet loss rate of voice data transmission is determined to be low, network status data during the human-computer voice interaction process is introduced through the risk prediction module to predict the packet loss risk during subsequent use of the human-computer voice interaction device. S6: Based on the prediction results of packet loss risk during subsequent use of the human-computer voice interaction device, the second decision module decides whether the redundancy ratio needs to be adjusted. If yes, proceed to step S7; otherwise, no action is taken. S7: The redundancy ratio adjustment module dynamically adjusts the redundancy ratio of the human-machine voice interaction device based on the decision results of the first decision module and the second decision module.

[0016] The beneficial effects of this invention are: (1) This invention compares the packet loss rate during the text command conversion process of a user’s human-computer voice interaction with a preset packet loss rate threshold. When it is determined that the packet loss rate of voice data transmission is low, the risk prediction module introduces network status data during the human-computer voice interaction process based on the packet loss rate data to predict the packet loss risk during subsequent use of the human-computer voice interaction device. This can achieve advance prediction of data loss risk. On this basis, the redundancy ratio can be adjusted in advance to avoid the situation of “compensation after packet loss occurs”, thereby optimizing the redundancy ratio adjustment method.

[0017] (2) This invention measures the packet loss rate during the voice data transmission stage in the i-th human-computer voice interaction process. Compared with the preset packet loss rate threshold By comparing the data, we can analyze the packet loss rate of voice data transmission during the voice conversion process when the user interacts with the computer for the ith time. Since the fixed redundancy ratio may not be able to cover all packet loss after the packet loss rate increases, the analysis results can serve as factual basis to provide data support for subsequent decisions on whether to adjust the redundancy ratio. This will enable us to quickly increase redundancy to avoid voice interruption and ensure the communication quality of human-computer voice interaction in the event of sudden packet loss.

[0018] (3) This invention measures the change in packet loss risk index during the process from the user's first human-computer voice interaction to the i-th human-computer voice interaction. The change threshold of the preset packet loss risk index By comparing data, we can analyze the changes and trends in packet loss rate during the data transmission of voice packets during the user's i-th human-computer voice interaction. Based on the analysis results, we can decide whether to adjust the redundancy ratio, thereby transforming passive response into active defense. That is, we can increase redundant data in advance before packet loss actually occurs to avoid a decline in voice quality and optimize the adjustment of the redundancy ratio.

[0019] (4) This invention can decide whether to adjust the redundancy ratio in real time to deal with emergencies by comparing the real-time packet loss rate with the preset packet loss rate threshold. On this basis, the network status data during the human-computer voice interaction process is further introduced through the risk prediction module to predict the packet loss risk when the human-computer voice interaction device is used in the future. This can realize the advance prediction of packet loss risk and the advance adjustment of redundancy ratio, thereby realizing a hybrid strategy of immediate response and forward defense, thereby enhancing the adaptability of network fluctuations and covering both "sudden" and "gradual" scenarios, so as to optimize the redundancy ratio adjustment method. Attached Figure Description

[0020] The invention will now be further described with reference to the accompanying drawings.

[0021] Figure 1 This is a structural block diagram of the human-computer voice interaction device for smart devices in this invention; Figure 2 This is a flowchart of the human-computer voice interaction method for smart devices in this invention. Detailed Implementation

[0022] The technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of the present invention, and not all embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of the present invention.

[0023] Please see Figure 1 As shown, in one embodiment, this application provides a human-computer voice interaction method and apparatus for smart devices, the apparatus comprising: The information acquisition module is used to acquire human-computer interaction voice data based on human-computer voice interaction intelligent devices, and convert user voice into text commands through voice recognition technology; The data acquisition module is used to collect packet loss rate and network status data of the human-computer voice interaction smart device during a single human-computer voice interaction process; The data evaluation module is used to compare the packet loss rate during the text command conversion process of a user's human-computer voice interaction with a preset packet loss rate threshold, and evaluate the packet loss rate of voice data transmission during the voice conversion process based on the comparison results. The first decision module is used to evaluate the packet loss rate of voice data transmission during the voice conversion process and decide whether the redundancy ratio needs to be adjusted. The risk prediction module is used to predict the risk of packet loss during subsequent use of the human-computer voice interaction device by introducing network status data during the human-computer voice interaction process when the packet loss rate of voice data transmission is determined to be low. The second decision module is used to decide whether to adjust the redundancy ratio based on the predicted packet loss risk during subsequent use of the human-computer voice interaction device. The redundancy ratio adjustment module is used to dynamically adjust the redundancy ratio of the human-computer voice interaction device based on the decision results of the first decision module and the second decision module. Through the above technical solution, this example provides an information acquisition module for acquiring human-computer interaction voice data based on a human-computer voice interaction intelligent device, and converting user voice into text commands through voice recognition technology. On this basis, when the user performs human-computer voice interaction, the data acquisition module can collect packet loss rate and network status data of the human-computer voice interaction intelligent device, and the data evaluation module can compare the packet loss rate during the text command conversion process of a user's human-computer voice interaction with a preset packet loss rate threshold, and evaluate the packet loss rate of voice data transmission during the voice conversion process based on the comparison results. Then, the first decision module is used to decide whether to adjust the redundancy ratio based on the evaluation results of the packet loss rate of voice data transmission during the voice conversion process. When it is determined that the packet loss rate of voice data transmission is low, the risk prediction module introduces the network status data during the human-computer voice interaction process to predict the packet loss risk during subsequent use of the human-computer voice interaction device. The second decision module, based on the predicted packet loss risk of the human-computer voice interaction device during subsequent use, decides whether to adjust the redundancy ratio. Finally, the redundancy ratio adjustment module dynamically adjusts the redundancy ratio of the human-computer voice interaction device based on the decision results of the first decision module and the second decision module. By comparing the packet loss rate during the text command conversion process in a single human-computer voice interaction with a preset packet loss rate threshold, and when the packet loss rate of voice data transmission is determined to be low, the risk prediction module introduces network status data during the human-computer voice interaction process based on the packet loss rate data to predict the packet loss risk during subsequent use of the human-computer voice interaction device. This enables early prediction of data loss risk. Based on this, the redundancy ratio can be adjusted in advance to avoid the situation of "remediation after packet loss occurs", thereby optimizing the redundancy ratio adjustment method. Furthermore, since increasing the redundancy ratio will increase the network load, this example adjusts the redundancy ratio by introducing network status data. This allows the adjusted redundancy ratio to balance the network load, preventing the network load from increasing due to the redundancy ratio, which would further increase the packet loss rate and create a vicious cycle. This further improves the communication efficiency of the human-computer voice interaction system. Specifically, the information acquisition process of the information acquisition module includes acquiring the raw audio stream through the microphone array (such as a circular 4-microphone array) of the human-computer voice interaction intelligent device, with a sampling rate ≥16kHz (supporting the voice frequency range of 300Hz-3.4kHz), a bit depth of 16bit, and implementing acoustic echo cancellation, noise suppression, and beamforming based on the hardware noise reduction unit integrated with a DSP chip, outputting clean audio with an SNR ≥35dB. Then, the speech recognition engine in the software layer of the human-computer voice interaction intelligent device establishes a long connection through WebSocket, divides the audio into multiple speech packets, and sends them to the receiving end through a wireless network. The receiving end then uses NLP intent recognition: BERT-tiny (parameter count ≤3M) is used to parse and classify the converted text commands, thereby realizing the conversion of human-computer interaction voice data into text commands.

[0024] The network status data collected by the data acquisition module includes: The number of voice data packets transmitted by the intelligent device during any human-computer voice interaction, the size of the voice data packets, the network transmission latency, and the bandwidth utilization rate; Through the above technical solution, this embodiment provides network status data collected by the data acquisition module, including the number of voice data packets transmitted by the human-computer voice interaction smart device, the size of the voice data packets, the network transmission delay, and the bandwidth utilization rate during any human-computer voice interaction. Among them, the number of voice data packets transmitted and the size of the voice data packets during any human-computer voice interaction can directly reflect the changes in packet loss risk during any human-computer voice interaction. Specifically, when the number of voice data packets transmitted is more, the number of lost data packets will increase under the same packet loss rate. When the voice data packet size is lower, the processing priority is lower when the network is congested, so it is easier for data loss to occur. On this basis, diversified data support can be provided for subsequent analysis of packet loss risk, and reliable data support can be provided for subsequent prediction of packet loss risk to ensure the rationality of the prediction results. The network status data collected by the data acquisition module also includes network transmission latency and bandwidth utilization. Higher network transmission latency indicates increased data transmission time, requiring an increase in redundancy to offset potential timeout risks (i.e., the loss of untransmitted data packets due to transmission timeout). Conversely, lower bandwidth utilization indicates no network congestion, allowing for appropriate redundancy to reduce data transmission risks. This configuration provides network status data support for subsequent adjustments to the redundancy ratio, ensuring the rationality of the redundancy ratio adjustment results.

[0025] The evaluation process of the data evaluation module includes: By measuring the packet loss rate during the voice data transmission phase of the user's i-th human-computer voice interaction. Compared with the preset packet loss rate threshold Perform a comparison; like When judging the user's i-th human-computer voice interaction, the packet loss rate of voice data transmission is high during the voice conversion process; like When determining the user's i-th human-computer voice interaction, the packet loss rate of voice data transmission is low during the voice conversion process; Where i represents any human-computer voice interaction process initiated by the user; Using the above technical solution, this example measures the packet loss rate during the voice data transmission phase of the user's i-th human-computer voice interaction. Compared with the preset packet loss rate threshold By comparing the data, we can analyze the packet loss rate of voice data transmission during the voice conversion process when the user interacts with the computer for the ith time. Since the fixed redundancy ratio may not be able to cover all packet loss after the packet loss rate increases, the analysis results can serve as factual basis to provide data support for subsequent decisions on whether to adjust the redundancy ratio. This will enable us to quickly increase redundancy to avoid voice interruption and ensure the communication quality of human-computer voice interaction in the event of sudden packet loss.

[0026] The decision-making process of the first decision module includes: When it is determined that the packet loss rate of voice data transmission is high during the user's i-th human-computer voice interaction, the redundancy ratio of the human-computer voice interaction device is dynamically adjusted through the redundancy ratio adjustment module. When it is determined that the packet loss rate of voice data transmission is low during the user's i-th human-computer voice interaction, network status data during the human-computer voice interaction process is introduced to predict the packet loss risk during subsequent use of the human-computer voice interaction device. Through the above technical solution, this example provides the decision-making process of the first decision module. Specifically, when it is determined that the packet loss rate of voice data transmission is high during the user's i-th human-computer voice interaction, the redundancy ratio adjustment module dynamically adjusts the redundancy ratio of the human-computer voice interaction device. Conversely, when it is determined that the packet loss rate of voice data transmission is low during the user's i-th human-computer voice interaction, network status data during the human-computer voice interaction process is introduced to predict the packet loss risk during subsequent use of the human-computer voice interaction device. With this configuration, since the fixed redundancy ratio may not be able to cover all packet loss as the packet loss rate increases, the redundancy is based on the packet loss rate during the voice data transmission phase of the user's i-th human-computer voice interaction. Compared with the preset packet loss rate threshold The comparison results can serve as factual evidence to provide data support for subsequent decisions on whether to adjust the redundancy ratio, thereby enabling rapid increase of redundancy to avoid voice interruption and ensure the quality of human-computer voice interaction in the event of sudden packet loss.

[0027] The prediction process of the risk prediction module includes: By combining the packet loss rate of voice data transmission during the user's i-th human-computer voice interaction, as well as the number of voice data packets transmitted by the smart device during the user's i-th human-computer voice interaction, the size of the voice data packets, and the network transmission latency data, a calculation module is established to calculate the packet loss risk index during the user's i-th human-computer voice interaction. ; Specifically, through the formula Calculate the packet loss risk index during the user's i-th human-computer voice interaction. ; Where 'a' represents any voice data packet during a single human-computer voice interaction process. Let this be the total number of voice data packets during the user's i-th human-computer voice interaction. Let a be the size of the a-th voice data packet in the i-th human-computer voice interaction. The preset voice data packet size, for The standard value can be selected and set based on the allowable error in empirical data. The data transmission delay during the user's i-th human-computer voice interaction. The preset transmission delay, for The standard value can be selected and set based on the allowable error in empirical data; Using the above technical solution, this example provides a packet loss risk index for the user's i-th human-computer voice interaction. It can be done through the formula Calculations show that the smaller the size of the a-th voice data packet during the i-th human-computer voice interaction, and the higher the packet loss rate and data transmission delay during the i-th human-computer voice interaction, the higher the packet loss risk index during the user's i-th human-computer voice interaction. The higher the value, the higher the risk of packet loss. This indicates that during the user's i-th human-computer voice interaction, although the packet loss rate did not exceed the threshold, the network status and the size of the voice packets could increase the risk of packet loss. Specifically, the smaller the voice data packet size, the lower the processing priority during network congestion, making packet loss more likely. Conversely, higher network transmission latency indicates increased data transmission time, requiring an increased redundancy ratio to offset potential timeout risks (i.e., transmission timeouts leading to the loss of untransmitted data packets). Therefore, by combining network status and voice packet size data, we can analyze the packet loss risk during the user's i-th human-computer voice interaction, thus providing reliable data support for subsequent predictions of data loss risks.

[0028] The prediction process of the risk prediction module also includes: By combining the packet loss risk index obtained during the user's i-th human-computer voice interaction in real time calculation Establish a curve showing the change in the packet loss risk index. The change in packet loss risk index during the process from the user's first human-computer voice interaction to the i-th human-computer voice interaction was calculated based on the integral formula. ; Specifically, through the formula Calculate the change in packet loss risk index from the user's first human-computer voice interaction to the i-th human-computer voice interaction. ; in, This refers to the time point of the user's first human-computer voice interaction process. The time point of the user's i-th human-computer voice interaction process; Using the above technical solution, this example provides the change in packet loss risk index from the user's first human-computer voice interaction to the i-th human-computer voice interaction. It can be done through the formula The data, obtained through this calculation method, reflects the changing trend and magnitude of the packet loss risk index from the user's first human-computer voice interaction to the i-th human-computer voice interaction. The higher the value of this data, the more the packet loss risk gradually increases from the user's first human-computer voice interaction to the i-th human-computer voice interaction. This indicates that the packet loss rate will gradually increase during future use. To avoid data loss due to the increased packet loss rate, it is necessary to adjust the redundancy ratio in advance to reduce the data transmission risk. Conversely, the lower the value of this data, the less significant the packet loss risk has changed from the user's first human-computer voice interaction to the i-th human-computer voice interaction. Therefore, if the current packet loss rate does not exceed the threshold, it is not necessary to adjust the redundancy ratio. With this setting, the change in packet loss risk index can be calculated based on the user's first human-computer voice interaction to the i-th human-computer voice interaction. This provides reliable data support for predicting future data loss risks and further supports decision-making on whether to adjust the redundancy ratio.

[0029] The decision-making process of the second decision module also includes: The change in packet loss risk index during the process from the user's first human-computer voice interaction to the i-th human-computer voice interaction. The change threshold of the preset packet loss risk index Perform a comparison; like When it is determined that during the i-th human-computer voice interaction, the packet loss rate during the voice packet data transmission process shows an upward trend, and the redundancy ratio needs to be adjusted in advance. like If the packet loss rate during the i-th human-computer voice interaction is not significantly increasing, then there is no need to adjust the redundancy ratio. Using the above technical solution, this example measures the change in the packet loss risk index during the process from the user's first human-computer voice interaction to the i-th human-computer voice interaction. The change threshold of the preset packet loss risk index By comparing data, we can analyze the changes and trends in packet loss rate during the data transmission of voice packets during the user's i-th human-computer voice interaction. Based on the analysis results, we can decide whether to adjust the redundancy ratio, thereby transforming passive response into active defense. That is, we can increase redundant data in advance before packet loss actually occurs to avoid a decline in voice quality and optimize the adjustment of the redundancy ratio.

[0030] The adjustment process of the redundancy ratio adjustment module includes: when At that time, by combining the packet loss rate during the voice data transmission phase of the user's i-th human-computer voice interaction process... The redundancy ratio is adjusted according to the size, and the voice data is retransmitted after the redundancy ratio adjustment is completed.

[0031] Specifically, based on the packet loss rate during the voice data transmission phase of the user's i-th human-computer voice interaction. For specific details on adjusting the redundancy ratio based on size, please refer to Table 1 below:

[0032] Table 1 Through the above technical solution, this example provides the adjustment process of the redundancy ratio adjustment module. Specifically, it is based on the packet loss rate during the voice data transmission phase of the user's i-th human-computer voice interaction. For specific solutions on adjusting the redundancy ratio, please refer to Table 1. Based on this, the redundancy ratio can be adjusted immediately according to the real-time packet loss rate. In the event of sudden packet loss (such as network congestion or signal interference), redundancy can be quickly increased to reduce voice interruptions and improve the user experience. It should be noted that the method for obtaining the redundancy ratio in Table 1 above is existing technology, and will not be elaborated on here.

[0033] The adjustment process of the redundancy ratio adjustment module also includes: when At that time, a calculation model is established by introducing the bandwidth utilization rate and data transmission delay data of the intelligent device for human-computer voice interaction during the user's i-th human-computer voice interaction process, and the redundancy ratio after the user's i-th human-computer voice interaction is calculated. ; Specifically, through the formula Calculate the redundancy ratio after the user's i-th human-computer voice interaction. ; in, The preset redundancy ratio, The bandwidth utilization rate of the intelligent device for human-computer voice interaction during the user's i-th human-computer voice interaction process. The preset bandwidth utilization rate, and These are weighting coefficients, based on values ​​from empirical data. The range and value of The extent to which the range of values ​​affects the redundancy ratio is obtained based on testing. To adjust the coefficient lookup table function, based on empirical data... The influence of the range of values ​​on the redundancy ratio is obtained based on testing. Specifically, the specific implementation of the function's corresponding rule is to obtain values ​​through a deep learning module based on the Pearson correlation coefficient formula and a large amount of historical data through extensive training. The preset parameters in the formula are set by those skilled in the art according to the actual situation. Through the above technical solution, this example provides the redundancy ratio after the user's i-th human-computer voice interaction. It can be done through the formula Calculations show that this setting not only allows for advance adjustment of the redundancy ratio based on the comparison between the change in the packet loss risk index from the user's first human-computer voice interaction to the i-th human-computer voice interaction and the preset threshold for the change in the packet loss risk index, thus avoiding the need for "post-packet loss recovery" and optimizing the redundancy ratio adjustment method, but also, by introducing network status data to adjust the redundancy ratio, the adjusted redundancy ratio can balance the network load, preventing an increase in network load due to an increase in the redundancy ratio, which could further increase the packet loss rate and create a vicious cycle, thereby further improving the communication efficiency of the human-computer voice interaction system.

[0034] Please see Figure 2 As shown, a human-computer voice interaction method for smart devices includes: S1: The information acquisition module acquires human-computer interaction voice data based on the human-computer voice interaction smart device, and converts the user's voice into text commands through voice recognition technology; S2: During a single human-computer voice interaction process, the data acquisition module collects packet loss rate and network status data of the human-computer voice interaction smart device; S3: The data evaluation module compares the packet loss rate during the text command conversion process of a user's human-computer voice interaction with the preset packet loss rate threshold, and evaluates the packet loss rate of voice data transmission during the voice conversion process based on the comparison results. S4: Based on the evaluation results of the packet loss rate of voice data transmission during the voice conversion process, the first decision module decides whether the redundancy ratio needs to be adjusted. If yes, proceed to step S7; otherwise, proceed to step S5. S5: When the packet loss rate of voice data transmission is determined to be low, network status data during the human-computer voice interaction process is introduced through the risk prediction module to predict the packet loss risk during subsequent use of the human-computer voice interaction device. S6: Based on the prediction results of packet loss risk during subsequent use of the human-computer voice interaction device, the second decision module decides whether the redundancy ratio needs to be adjusted. If yes, proceed to step S7; otherwise, no action is taken. S7: The redundancy ratio adjustment module dynamically adjusts the redundancy ratio of the human-computer voice interaction device based on the decision results of the first decision module and the second decision module. Through the above technical solution, this example provides a human-computer voice interaction method for smart devices. First, an information acquisition module acquires human-computer interaction voice data from the smart device, and then converts the user's voice into text commands using speech recognition technology. Next, a data acquisition module collects packet loss rate and network status data of the smart device during a single human-computer voice interaction. A data evaluation module compares the packet loss rate during the text command conversion process with a preset packet loss rate threshold, and evaluates the packet loss rate of voice data transmission during the voice conversion process based on the comparison results. Finally, a first decision module combines the voice data from the voice conversion process... The evaluation results of the packet loss rate of data transmission determine whether the redundancy ratio needs to be adjusted. When the packet loss rate of voice data transmission is high, the redundancy ratio adjustment module dynamically adjusts the redundancy ratio of the human-machine voice interaction device. When the packet loss rate of voice data transmission is low, the network status data during the human-machine voice interaction process is introduced through the risk prediction module to predict the packet loss risk during the subsequent use of the human-machine voice interaction device. The second decision module combines the predicted packet loss risk of the human-machine voice interaction device during the subsequent use to decide whether the redundancy ratio needs to be adjusted. Finally, the redundancy ratio adjustment module dynamically adjusts the redundancy ratio of the human-machine voice interaction device based on the decision results of the first and second decision modules. By comparing the real-time packet loss rate with a preset packet loss rate threshold, a decision can be made on whether to adjust the redundancy ratio in real time to cope with emergencies. On this basis, by introducing network status data during the human-computer voice interaction process through a risk prediction module, the packet loss risk during subsequent use of the human-computer voice interaction device can be predicted. This enables advance prediction of packet loss risk and advance adjustment of the redundancy ratio, thereby realizing a hybrid strategy of immediate response and forward-looking defense. This enhances the adaptability of network fluctuations, covers both "sudden" and "gradual" scenarios, and optimizes the redundancy ratio adjustment method.

[0035] The foregoing has provided a detailed description of one embodiment of the present invention, but this description is merely a preferred embodiment and should not be construed as limiting the scope of the invention. All equivalent variations and modifications made within the scope of the claims of this invention should still fall within the patent coverage of this invention.

Claims

1. A human-computer voice interaction device for smart devices, characterized in that, The device includes: The information acquisition module is used to acquire human-computer interaction voice data based on human-computer voice interaction intelligent devices, and convert user voice into text commands through voice recognition technology; The data acquisition module is used to collect packet loss rate and network status data of the human-computer voice interaction smart device during a single human-computer voice interaction process; The data evaluation module is used to compare the packet loss rate during the text command conversion process of a user's human-computer voice interaction with a preset packet loss rate threshold, and evaluate the packet loss rate of voice data transmission during the voice conversion process based on the comparison results. The first decision module is used to evaluate the packet loss rate of voice data transmission during the voice conversion process and decide whether the redundancy ratio needs to be adjusted. The risk prediction module is used to predict the risk of packet loss during subsequent use of the human-computer voice interaction device by introducing network status data during the human-computer voice interaction process when the packet loss rate of voice data transmission is determined to be low. The second decision module is used to decide whether to adjust the redundancy ratio based on the predicted packet loss risk during subsequent use of the human-computer voice interaction device. The redundancy ratio adjustment module is used to dynamically adjust the redundancy ratio of the human-computer voice interaction device based on the decision results of the first decision module and the second decision module.

2. The human-computer voice interaction device for intelligent devices according to claim 1, characterized in that, The network status data collected by the data acquisition module includes: The number of voice data packets transmitted by the intelligent device during any human-computer voice interaction, the size of the voice data packets, the network transmission latency, and the bandwidth utilization rate.

3. The human-computer voice interaction device for intelligent devices according to claim 1, characterized in that, The evaluation process of the data evaluation module includes: By measuring the packet loss rate during the voice data transmission phase of the user's i-th human-computer voice interaction. Compared with the preset packet loss rate threshold Perform a comparison; like When judging the user's i-th human-computer voice interaction, the packet loss rate of voice data transmission is high during the voice conversion process; like When determining the user's i-th human-computer voice interaction, the packet loss rate of voice data transmission is low during the voice conversion process; Where i represents any human-computer voice interaction process initiated by the user.

4. The human-computer voice interaction device for intelligent devices according to claim 3, characterized in that, The decision-making process of the first decision module includes: When it is determined that the packet loss rate of voice data transmission is high during the user's i-th human-computer voice interaction, the redundancy ratio of the human-computer voice interaction device is dynamically adjusted through the redundancy ratio adjustment module. When it is determined that the packet loss rate of voice data transmission is low during the user's i-th human-computer voice interaction, network status data during the human-computer voice interaction process is introduced to predict the packet loss risk during subsequent use of the human-computer voice interaction device.

5. The human-computer voice interaction device for intelligent devices according to claim 4, characterized in that, The prediction process of the risk prediction module includes: By combining the packet loss rate of voice data transmission during the user's i-th human-computer voice interaction, as well as the number of voice data packets transmitted by the smart device during the user's i-th human-computer voice interaction, the size of the voice data packets, and the network transmission latency data, a calculation module is established to calculate the packet loss risk index during the user's i-th human-computer voice interaction. .

6. The human-computer voice interaction device for intelligent devices according to claim 5, characterized in that, The prediction process of the risk prediction module also includes: By combining the packet loss risk index obtained during the user's i-th human-computer voice interaction in real time calculation Establish a curve showing the change in the packet loss risk index. Based on the integral formula, the change in the packet loss risk index during the process from the user's first human-computer voice interaction to the i-th human-computer voice interaction was calculated. .

7. The human-computer voice interaction device for intelligent devices according to claim 6, characterized in that, The decision-making process of the second decision module also includes: The change in packet loss risk index during the process from the user's first human-computer voice interaction to the i-th human-computer voice interaction. The change threshold of the preset packet loss risk index Perform a comparison; like When it is determined that during the i-th human-computer voice interaction, the packet loss rate during the voice packet data transmission process shows an upward trend, and the redundancy ratio needs to be adjusted in advance. like If the packet loss rate during the i-th human-computer voice interaction is not showing a significant upward trend, then there is no need to adjust the redundancy ratio.

8. The human-computer voice interaction device for intelligent devices according to claim 7, characterized in that, The adjustment process of the redundancy ratio adjustment module includes: when At that time, by combining the packet loss rate during the voice data transmission phase of the user's i-th human-computer voice interaction process... The redundancy ratio is adjusted according to the size, and the voice data is retransmitted after the redundancy ratio adjustment is completed.

9. The human-computer voice interaction device for intelligent devices according to claim 8, characterized in that, The adjustment process of the redundancy ratio adjustment module also includes: when At that time, a calculation model is established by introducing the bandwidth utilization rate and data transmission delay data of the intelligent device for human-computer voice interaction during the user's i-th human-computer voice interaction process, and the redundancy ratio after the user's i-th human-computer voice interaction is calculated. .

10. A human-computer voice interaction method for intelligent devices, applicable to the method of the human-computer voice interaction device for intelligent devices as described in claims 1-9, characterized in that, The method includes: S1: The information acquisition module acquires human-computer interaction voice data based on the human-computer voice interaction smart device, and converts the user's voice into text commands through voice recognition technology; S2: During a single human-computer voice interaction process, the data acquisition module collects packet loss rate and network status data of the human-computer voice interaction smart device; S3: The data evaluation module compares the packet loss rate during the text command conversion process of a user's human-computer voice interaction with the preset packet loss rate threshold, and evaluates the packet loss rate of voice data transmission during the voice conversion process based on the comparison results. S4: Based on the evaluation results of the packet loss rate of voice data transmission during the voice conversion process, the first decision module decides whether the redundancy ratio needs to be adjusted. If yes, proceed to step S7; otherwise, proceed to step S5. S5: When the packet loss rate of voice data transmission is determined to be low, network status data during the human-computer voice interaction process is introduced through the risk prediction module to predict the packet loss risk during subsequent use of the human-computer voice interaction device. S6: Based on the prediction results of packet loss risk during subsequent use of the human-computer voice interaction device, the second decision module decides whether the redundancy ratio needs to be adjusted. If yes, proceed to step S7; otherwise, no action is taken. S7: The redundancy ratio adjustment module dynamically adjusts the redundancy ratio of the human-machine voice interaction device based on the decision results of the first decision module and the second decision module.