A vehicle-mounted voice offline / online switching system, method, medium and device

What is AI technical title?
AI technical title is built by PatSnap AI team. It summarizes the technical point description of the patent document.
By monitoring network quality in real time and dynamically adjusting switching thresholds, combined with audio smooth transition technology, the problem of switching stutters and interruptions in in-vehicle voice systems during network fluctuations has been solved, achieving seamless offline/online mode switching and improving user experience.

CN122201310APending Publication Date: 2026-06-12DONGFENG MOTOR GRP

View PDF 2 Cites 0 Cited by

Patent Information

Authority / Receiving Office: CN · China
Patent Type: Applications(China)
Current Assignee / Owner: DONGFENG MOTOR GRP
Filing Date: 2026-02-05
Publication Date: 2026-06-12

Application Information

Patent Timeline

05 Feb 2026

Application

12 Jun 2026

Publication

CN122201310A

IPC: G10L15/30; G10L15/22; H04L43/0852; H04L43/0829; H04L43/16

AI Tagging

Application Domain

Speech recognition Transmission

Explore More Agents

Novelty Search
Search existing technologies and assess novelty
↗
FTO
Analyze whether a product may infringe others' patents
↗
Design FTO
Check prior-design risk for exterior design
↗
Drafting
Draft patent application text based on a technical solution
↗
Find Solutions with TRIZ
Generate feasible solution to solve your technical challenge
↗

Similar Technology Patents

Get free access to AI patent search and analysis

Check patentability, review prior art and ask IP Agent with full patent context.

AI Technical Summary

Technical Problem

Existing in-vehicle voice systems suffer from issues such as delayed network quality monitoring, fixed and poorly adaptable switching thresholds, and audio interruptions during switching when network quality fluctuates, leading to stuttering and interruptions in the user experience.

Method used

A network quality monitoring module is used to monitor network latency and packet loss rate in real time. Combined with a vehicle status acquisition module and a scene adaptation module, the switching trigger threshold is dynamically adjusted. An audio smooth transition module is used to achieve a smooth audio transition during the switching process.

Benefits of technology

It enables seamless switching of the in-vehicle voice system in scenarios with network fluctuations, improves the user interaction experience, reduces switching lag and interruptions, and improves the accuracy and adaptability of switching.

✦ Generated by Eureka AI based on patent content.

Smart Images

Figure CN122201310A_ABST

Patent Text Reader

Abstract

The application provides a vehicle-mounted voice offline / online switching system, method, medium and equipment, and belongs to the technical field of vehicle-mounted voice interaction systems. The system comprises a switching trigger decision module, which dynamically adjusts a switching trigger threshold value according to an initial switching trigger threshold value, a scene adaptation coefficient and a historical network quality parameter, and generates a switching control signal according to a real-time network quality parameter and the dynamically adjusted switching trigger threshold value; and an audio smooth transition module, which performs smooth transition processing on offline voice audio data and online voice audio data output by an offline voice processing module and an online voice processing module when the state is switched. The system can solve the problem of offline / online switching lag and experience interruption of an existing vehicle-mounted voice system when the network fluctuates, realize seamless and lag-free switching between offline and online modes of vehicle-mounted voice processing, and significantly improve the interactive experience of the vehicle-mounted voice system in a network fluctuation scenario.

Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of in-vehicle voice interaction system technology, and in particular to an in-vehicle voice offline / online switching system, method, medium and device. Background Technology

[0002] With the rapid development of vehicle networking technology and artificial intelligence, in-vehicle voice interaction systems have become one of the core configurations of modern automobiles. Users can achieve various functions such as navigation, vehicle control, media playback, and information query through voice commands. In-vehicle voice processing is generally divided into offline mode and online mode: online mode relies on the powerful computing power and rich knowledge base of cloud-based large models, which can provide more accurate semantic understanding and complex question answering capabilities (such as real-time traffic query and personalized recommendations); offline mode does not rely on the network and implements basic voice processing based on models stored locally in the vehicle (such as basic vehicle control commands "turn on the air conditioner" and "close the windows"), ensuring the availability of basic voice functions in scenarios with no network or poor network signal (such as underground parking garages and remote mountainous areas).

[0003] However, network quality in vehicular environments is greatly affected by driving scenarios, exhibiting significant fluctuations: in bustling urban areas, 5G / 4G network signals are stable, and online voice processing can consistently perform well; but when vehicles enter underground parking garages, tunnels, or remote sections of highways, network signal latency increases, packet loss rates rise, and even network outages may occur; even within the same area, network quality can plummet due to factors such as changes in base station load and electromagnetic interference. Existing in-vehicle voice systems generally suffer from the following technical problems during offline / online switching:

[0004] Delayed network quality monitoring and untimely switchover triggering: Existing systems mostly use a "request timeout trigger" mechanism, meaning that switching to offline mode is only triggered when an online voice request does not receive a response for more than a preset time (e.g., 1 second). This method cannot detect changes in network quality in real time. When network latency gradually increases but does not reach the timeout threshold, users will experience sluggish voice responses (e.g., a response takes 1.5 seconds after issuing a command). When the network suddenly drops, voice commands may become unresponsive, severely impacting the user experience. For example, if a user issues a command to "check the next service area" while driving at high speed, and the network latency has risen to 800ms, far exceeding the normal response time of 300ms, but has not yet reached the 1-second timeout threshold, the system will still attempt to process the request online, causing the user to wait too long and even miss the best opportunity to find service area information.

[0005] Fixed switching thresholds lead to poor adaptability: Existing systems often use fixed offline / online switching thresholds (such as a latency threshold of 500ms or a packet loss rate threshold of 20%), which cannot be dynamically adjusted according to actual scenarios. For example, in underground parking garages, network signals are generally poor, and packet loss rates often exceed 30%. If a 20% packet loss rate threshold is still used, it will lead to frequent switching (online → offline → online → offline), causing frequent interruptions in voice processing. In high-speed scenarios, network signals are relatively stable. If the latency threshold is set too wide (such as 500ms), the system may not trigger a switch even when network latency has risen to 400ms and the user has perceived lag, failing to switch to offline mode in time to ensure response speed.

[0006] Audio interruptions and disjointed experiences during switching: The existing system lacks a smooth audio transition mechanism when triggering offline / online switching. When switching from online mode to offline mode, the ending audio of the online voice message has not finished playing while the initial audio of the offline voice message has already started playing, causing audio "jumps" (e.g., the online voice message "Okay, checking for you" suddenly jumps to the offline voice message "The air conditioner has been turned on for you"). When switching from offline mode to online mode, due to the delay in the return of online voice results, there will be brief "silent gaps" (e.g., after the offline voice message "Poor network, switched to offline mode," the supplementary online voice message "There is a gas station 3km ahead" plays after a 500ms interval). Such audio interruptions or jumps will make users feel "interaction stutters" and disrupt the continuity of voice interaction.

[0007] In existing technologies, some solutions attempt to optimize the switching mechanism. For example, Chinese patent CN114553768A discloses "A method and device for offline / online switching of in-vehicle voice," which determines the switching mode by judging whether the network is connected. However, this solution is only based on the binary judgment of "network connectivity" and does not consider key QoS parameters such as network latency and packet loss rate, and cannot cope with scenarios where network quality fluctuates but the network is not disconnected. Chinese patent CN115274769A discloses "A method for switching in-vehicle voice interaction based on network status." Although it introduces network latency parameters, it uses a fixed latency threshold (400ms), which cannot be dynamically adjusted according to the scenario. Moreover, it does not mention audio smoothing during the switching process, and there are still problems with switching stuttering and experience interruption.

[0008] In summary, the existing offline / online switching technology of in-vehicle voice systems suffers from a lack of real-time and accurate network quality monitoring, fixed switching thresholds, and a lack of audio smooth transition mechanisms. This results in stuttering and interrupted experience during switching in scenarios with fluctuating network conditions, failing to meet users' demands for "smooth and continuous" in-vehicle voice interaction. Summary of the Invention

[0009] The present invention aims to solve at least one of the technical problems existing in the prior art, and proposes an in-vehicle voice offline / online switching system, method, medium and device.

[0010] In a first aspect, embodiments of the present invention provide an in-vehicle voice offline / online switching system, comprising:

[0011] The network quality monitoring module is used to monitor network quality parameters;

[0012] The vehicle status acquisition module is used to collect vehicle scene data;

[0013] The scene adaptation module is used to output scene adaptation coefficients based on vehicle scene data;

[0014] The handover trigger decision module is used to dynamically adjust the handover trigger threshold based on the initial handover trigger threshold, the scenario adaptation coefficient, and historical network quality parameters, and to generate a handover control signal based on the real-time network quality parameters and the dynamically adjusted handover trigger threshold.

[0015] The offline voice processing module is used to switch between on and off states according to the switching control signal, and outputs offline voice audio data when it is on.

[0016] The online voice processing module is used to switch between off and on states according to the switching control signal, and outputs online voice audio data when on.

[0017] The audio smooth transition module is used to perform smooth transition processing on the offline voice audio data and online voice audio data output by the offline voice processing module and the online voice processing module when switching states;

[0018] The vehicle audio interface module is used to receive audio data that has undergone smooth transition processing and output it to the vehicle audio system to enable voice playback.

[0019] Furthermore, network quality parameters include network latency and packet loss rate.

[0020] Furthermore, it also includes a threshold storage unit for storing the initial switching trigger threshold and the dynamically adjusted switching trigger threshold.

[0021] Furthermore, it also includes a monitoring data storage unit, which is used to receive and store historical network quality parameters sent by the network quality monitoring module, and to send historical network quality parameters to the switch trigger decision module.

[0022] Furthermore, the network quality monitoring module monitors network quality parameters through the following steps:

[0023] S1100: Initialize monitoring parameters and start the timed monitoring task;

[0024] S1200, Sending and receiving test data packets;

[0025] S1300, network latency and packet loss rate calculation;

[0026] S1400, Network Status Assessment;

[0027] S1500, data storage and output.

[0028] Furthermore, dynamically adjusting the handover trigger threshold based on the initial handover trigger threshold, scene adaptation coefficient, and historical network quality parameters includes the following steps:

[0029] S2100. The initial switching trigger threshold is obtained by multiplying the initial switching trigger threshold and the scene adaptation coefficient;

[0030] S2200: Read the average value of the most recent historical network quality parameters. If the average value of the historical network quality parameters is greater than the initially adjusted handover trigger threshold, multiply the initially adjusted handover trigger threshold by a proportional coefficient less than 1 to obtain the dynamically adjusted handover trigger threshold. If there is no consecutive average value of the historical network quality parameters greater than the initially adjusted handover trigger threshold, the initially adjusted handover trigger threshold is used as the dynamically adjusted handover trigger threshold.

[0031] Furthermore, generating handover control signals based on real-time network quality parameters and dynamically adjusted handover trigger thresholds includes:

[0032] A switching control signal for shutting down the online voice processing module and turning on the offline voice processing module is generated when any of the following conditions are met: the real-time network quality parameter is greater than the dynamically adjusted switching trigger threshold; or the network quality monitoring module determines that the network status is abnormal.

[0033] The switching control signal for shutting down the offline voice processing module and turning on the online voice processing module is generated when the following conditions are met: the real-time network quality parameter is less than 0.7 times the dynamically adjusted switching trigger threshold for several consecutive times.

[0034] Furthermore, the smooth transition processing of offline and online voice audio data output by the offline and online voice processing modules during state switching includes the following steps when switching from the online voice processing module being active to the offline voice processing module being active:

[0035] S3100, real-time caching of audio frames output by the online voice processing module;

[0036] S3200: After detecting the switching control signal, mark the currently playing audio frame;

[0037] After the S3300 offline voice processing module starts, it generates the first offline audio frame and performs audio feature analysis on the last frame of the cached online audio frame and the first offline audio frame to generate a transition frame.

[0038] S3400 outputs audio frames by concatenating them in the order of the currently playing audio frame, the cached online audio frame, the transition frame, and the offline audio frame.

[0039] Furthermore, the smooth transition processing of offline and online voice audio data output by the offline and online voice processing modules during state switching includes the following steps when switching from the offline voice processing module being active to the online voice processing module being active:

[0040] S4100: Real-time buffering of audio frames output by the offline voice processing module; upon detecting a switching control signal, marking the currently playing audio frame;

[0041] After the S4200 and online voice processing module are started, they generate online audio frames and perform linear amplitude attenuation processing on the cached offline audio frames to generate attenuated frames.

[0042] S4300 outputs audio frames by concatenating them in the order of the currently playing audio frame, the decay frame, and the online audio frame.

[0043] Secondly, embodiments of the present invention provide a method for switching between offline and online in-vehicle voice control, comprising the following steps:

[0044] S100: Acquire monitored network quality parameters and collected vehicle scene data;

[0045] S200: Determine the scene adaptation coefficient based on vehicle scene data, dynamically adjust the switching trigger threshold based on the initial switching trigger threshold, scene adaptation coefficient and historical network quality parameters, and generate a switching control signal based on real-time network quality parameters and the dynamically adjusted switching trigger threshold. The switching control signal is used to control the offline voice processing module to switch between on and off states and the online voice processing module to switch between on and off states.

[0046] S300 performs a smooth transition processing on the offline voice audio data and online voice audio data output by the offline voice processing module and the online voice processing module when switching states, and then outputs them.

[0047] Thirdly, embodiments of the present invention provide an electronic device, including:

[0048] One or more processors;

[0049] Memory, used to store one or more programs;

[0050] When the one or more programs are executed by the one or more processors, the one or more processors perform the method as described above.

[0051] Fourthly, embodiments of the present invention provide a computer-readable medium storing a computer program, which, when executed by a processor, implements the steps of the method described above.

[0052] The in-vehicle voice offline / online switching system, method, medium, and device provided by this invention solves the problem of stuttering and experience interruption when switching between offline and online modes in existing in-vehicle voice systems during network fluctuations by real-time monitoring of network quality, dynamic adjustment of the threshold for switching trigger, and smooth audio transition during the switching process. It realizes seamless and stutter-free switching of in-vehicle voice processing between offline and online modes, and significantly improves the interactive experience of in-vehicle voice systems in network fluctuation scenarios. Attached Figure Description

[0053] Figure 1 An overall architecture diagram of an in-vehicle voice offline / online switching system provided in an embodiment of the present invention;

[0054] Figure 2 This is a flowchart of real-time network quality parameter monitoring provided in an embodiment of the present invention;

[0055] Figure 3 A flowchart illustrating the smooth transition between online and offline audio switching provided in this embodiment of the invention;

[0056] Figure 4 A flowchart illustrating the smooth transition between offline and online audio switching provided in this embodiment of the invention;

[0057] Figure 5 A flowchart illustrating an in-vehicle voice offline / online switching method provided in an embodiment of the present invention;

[0058] Figure 6 This is a structural block diagram of an electronic device provided in an embodiment of the present invention. Detailed Implementation

[0059] To enable those skilled in the art to better understand the technical solutions of the present invention, exemplary embodiments of the present invention are described below in conjunction with the accompanying drawings, including various details of the embodiments of the present invention to aid understanding. These should be considered merely exemplary. Therefore, those skilled in the art should recognize that various changes and modifications can be made to the embodiments described herein without departing from the scope and spirit of the present invention. Similarly, for clarity and brevity, descriptions of well-known functions and structures are omitted in the following description.

[0060] Where there is no conflict, the various embodiments of the present invention and the features thereof may be combined with each other.

[0061] As used herein, the term “and / or” includes any and all combinations of one or more related enumerated entries.

[0062] The terminology used herein is for the purpose of describing particular embodiments only and is not intended to limit the invention. As used herein, the singular forms “a” and “the” are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will also be understood that when the terms “comprising” and / or “made of” are used in this specification, the presence of the stated feature, integral, step, operation, element, and / or component is specified, but the presence or addition of one or more other features, integrals, steps, operations, elements, components, and / or groups thereof is not excluded. Terms such as “connected” or “linked” are not limited to physical or mechanical connections but can include electrical connections, whether direct or indirect.

[0063] Unless otherwise specified, all terms used herein (including technical and scientific terms) have the same meaning as commonly understood by one of ordinary skill in the art. It will also be understood that terms such as those defined in commonly used dictionaries should be interpreted as having the meaning consistent with their meaning in the context of the relevant art and the invention, and will not be interpreted as having an idealized or overly formal meaning unless expressly so defined herein.

[0064] This invention provides an in-vehicle voice offline / online switching system, see reference. Figure 1 As shown, it includes:

[0065] The network quality monitoring module is used to monitor network quality parameters;

[0066] The vehicle status acquisition module is used to collect vehicle scene data;

[0067] The scene adaptation module is used to output scene adaptation coefficients based on vehicle scene data;

[0068] The handover trigger decision module is used to dynamically adjust the handover trigger threshold based on the initial handover trigger threshold, the scenario adaptation coefficient, and historical network quality parameters, and to generate a handover control signal based on the real-time network quality parameters and the dynamically adjusted handover trigger threshold.

[0069] The offline voice processing module is used to switch between on and off states according to the switching control signal, and outputs offline voice audio data when it is on.

[0070] The online voice processing module is used to switch between off and on states according to the switching control signal, and outputs online voice audio data when on.

[0071] The audio smooth transition module is used to perform smooth transition processing on the offline voice audio data and online voice audio data output by the offline voice processing module and the online voice processing module when switching states;

[0072] The vehicle audio interface module is used to receive audio data that has undergone smooth transition processing and output it to the vehicle audio system to enable voice playback.

[0073] In one embodiment, network quality parameters include network latency and packet loss rate.

[0074] In one embodiment, a threshold storage unit is further included for storing an initial switching trigger threshold and a dynamically adjusted switching trigger threshold.

[0075] In one embodiment, a monitoring data storage unit is further included, for receiving and storing historical network quality parameters sent by the network quality monitoring module, and for sending the historical network quality parameters to the switch trigger decision module.

[0076] Specifically, the vehicle networking module provides underlying network data to the network quality monitoring module; the vehicle status acquisition module provides vehicle driving scenario information to the scenario adaptation module (such as determining whether the vehicle is on urban roads, highways, or underground parking garages via GPS positioning); the network quality monitoring module calculates network latency and packet loss rate in real time and outputs it to the handover trigger decision module; the scenario adaptation module outputs adaptation coefficients based on the vehicle scenario to assist the handover trigger decision module in dynamically adjusting thresholds; the handover trigger decision module combines network quality parameters, scenario adaptation coefficients, and threshold data from the threshold storage unit to generate a handover control signal to control the start and stop of the offline / online voice processing modules; the offline / online voice processing modules output corresponding audio data to the audio smoothing transition module, which, after processing, outputs it to the vehicle audio system through the vehicle audio interface module to achieve voice playback; the monitoring data storage unit stores historical network quality data, providing data support for dynamic threshold adjustment.

[0077] Network quality monitoring module: It adopts an ARM Cortex-A53 architecture processor (such as Rockchip RK3328), connects to the vehicle networking module (such as Qualcomm SDX55 automotive-grade 5G module) via RJ45 Ethernet interface to collect network data packets; it connects to the monitoring data storage unit (such as Samsung K9F1G08U0E NAND Flash) via SPI interface to store historical monitoring data; and it connects to the switching trigger decision module via UART serial port (115200bps baud rate) to output real-time network quality parameters. The data transmission frequency is 100ms / time to ensure real-time performance.

[0078] Switching trigger decision module: It adopts an automotive-grade MCU (such as Infineon AURIX TC275), which is connected to the scene adaptation module via CAN bus (500kbps baud rate) to obtain the scene adaptation coefficient; it is connected to the threshold storage unit (such as Micron MT25QL02G Flash) via SPI interface to read the initial threshold and write the adjusted threshold; it is connected to the offline voice processing module and the online voice processing module via GPIO interface respectively, and outputs high and low level switching control signal (high level to start, low level to stop), the level duration of the control signal is 50ms to ensure stable recognition by the module.

[0079] Audio smooth transition module: It adopts a dedicated DSP chip (such as Texas Instruments TMS320C6748) and connects to the offline voice processing module (such as iFlytek XF3060 offline voice chip) and the online voice processing module (receiving cloud audio data via Ethernet) through the I2S interface (48kHz sampling rate, 16-bit quantization) to receive audio data; it also connects to the car audio interface module (such as NXP TFA9897 audio amplifier) through the I2S interface to output the smoothly transitioned audio data in PCM format to ensure audio quality.

[0080] Scene adaptation module: It adopts an embedded processor (such as STM32H743) and connects to the vehicle status acquisition module (such as GPS module, gyroscope module, vehicle speed sensor) via CAN bus to acquire data such as vehicle position, speed, and driving direction. The algorithm determines the driving scene (e.g., if the vehicle speed is >60km / h and the GPS location is a highway area, it is determined to be a highway scene). The scene adaptation coefficient is output to the switching trigger decision module via CAN bus at a frequency of 500ms / time.

[0081] The core function of the network quality monitoring module is to collect and calculate network latency (RTT) and packet loss rate (PLR) in real time, providing accurate network quality data for handover triggering decisions and avoiding the lag of traditional "timeout triggering". In one embodiment, see [reference needed]. Figure 2 As shown, the network quality monitoring module monitors network quality parameters through the following steps:

[0082] S1100: Initialize monitoring parameters and start the timed monitoring task;

[0083] After the module is powered on, the monitoring parameters are initialized first: the monitoring period T is set to 100ms to ensure that network quality data is acquired every 100ms, meeting the "real-time" requirements in the vehicle scenario (the user's perception threshold for voice lag is about 150ms, and the 100ms monitoring period can capture network quality changes in a timely manner); the test data packet size S is set to 128 bytes to simulate the average data volume of vehicle voice commands (such as the voice data for "searching for nearby restaurants" is about 100-150 bytes after compression), ensuring that the network transmission characteristics of the test data are consistent with those of the actual voice data; the timeout T0 is set to 500ms. If no acknowledgment is received 500ms after the test data packet is sent, it is considered a packet loss.

[0084] S1200, Sending and receiving test data packets;

[0085] The module periodically sends UDP test data packets to a preset cloud-based voice server (such as Baidu Smart Cloud or Alibaba Cloud vehicle voice server) according to the monitoring cycle T. The data packets contain the unique identifier of the vehicle device (VIN code) and the sending timestamp T_send (accurate to milliseconds). After receiving the test data packets, the cloud server immediately returns an acknowledgment data packet, which carries the original sending timestamp T_send. This facilitates the network quality monitoring module in matching the sent and received data packets and avoids confusion between data packets from different cycles.

[0086] S1300, network latency and packet loss rate calculation;

[0087] If an acknowledgment packet is received within the timeout period T0, extract T_send from the acknowledgment packet and combine it with the receive timestamp T_recv to calculate the network delay RTT = T_recv - T_send. At this time, the packet loss count PLC remains at 0. If no acknowledgment packet is received within T0, the test packet is considered lost, the packet loss count PLC is incremented by 1, and the current RTT is set to T0 (500ms) by default to avoid missing delay data due to packet loss, which could affect subsequent handover decisions. The packet loss rate PLR is calculated using a "sliding monitoring window" mechanism. The total number of transmissions within the monitoring window is N = 10 (i.e., 10 monitoring cycles, total duration 1 second). PLR = (total packet loss counts PLC within the window / N × 100%). For example, if there are 2 packet losses within the window, then PLR = 20%. This mechanism avoids a sudden increase in PLR due to a single packet loss, ensuring the stability of the packet loss rate calculation.

[0088] S1400, Network Status Assessment;

[0089] When RTT > 2000ms (severe network latency) or PLR > 50% (severe network packet loss), the network status is marked as "abnormal" and an alarm signal (GPIO low level) is output to the switch trigger decision module, indicating that an emergency switch to offline mode is required; otherwise, it is marked as "normal" and no alarm is required.

[0090] S1500, data storage and output;

[0091] At the end of each monitoring cycle, the current RTT, PLR, and network status (normal / abnormal) are stored in the monitoring data storage unit. The stored data retains records from the most recent 24 hours for historical data reference in subsequent dynamic threshold adjustments. Simultaneously, real-time monitoring data is output to the switching trigger decision module via the UART serial port, with the output data format being JSON.

[0092] json

[0093] {

[0094] "device_id": "L6T7952D1NN015829", / / Vehicle equipment VIN code

[0095] "timestamp": 1717245678901, / / Data generation timestamp (milliseconds)

[0096] "rtt": 350, / / Network latency (milliseconds)

[0097] "plr": 10, / / Packet loss rate (%)

[0098] "network_status": "normal" / / Network status (normal / abnormal)

[0099] }

[0100] Through the above process, the network quality monitoring module can achieve real-time network quality monitoring with a 100ms cycle, latency calculation accuracy of 1ms, and packet loss rate calculation accuracy of 1%. Compared with the traditional "1s timeout trigger" mechanism, the network quality perception lag time is reduced by 900ms, which can capture the trend of declining network quality in advance and provide timely and accurate data support for switching trigger decisions.

[0101] The handover triggering decision module dynamically adjusts the handover triggering thresholds (network latency threshold RTT_th and packet loss rate threshold PLR_th) based on the historical RTT and PLR output by the network quality monitoring module and the scenario adaptation coefficient output by the scenario adaptation module, avoiding the poor adaptability of traditional fixed thresholds. In one embodiment, dynamically adjusting the handover triggering threshold based on the initial handover triggering threshold, scenario adaptation coefficient, and historical network quality parameters includes the following steps:

[0102] S2100. The initial switching trigger threshold is obtained by multiplying the initial switching trigger threshold and the scene adaptation coefficient;

[0103] S2200: Read the average value of the most recent historical network quality parameters. If the average value of the historical network quality parameters is greater than the initially adjusted handover trigger threshold, multiply the initially adjusted handover trigger threshold by a proportional coefficient less than 1 to obtain the dynamically adjusted handover trigger threshold. If there is no consecutive average value of the historical network quality parameters greater than the initially adjusted handover trigger threshold, the initially adjusted handover trigger threshold is used as the dynamically adjusted handover trigger threshold.

[0104] Specifically, the switching trigger threshold initialization is as follows: The threshold storage unit pre-stores factory initial thresholds, determined based on extensive in-vehicle scenario test data: the initial latency threshold RTT_th0 = 300ms (the user's perceived threshold for voice response stuttering is approximately 300ms; exceeding this value will result in a noticeable delay), and the initial packet loss rate threshold PLR_th0 = 15% (when the packet loss rate exceeds 15%, the integrity of online voice data transmission cannot be guaranteed, and audio frame breaks are likely to occur). After the switching trigger decision module powers on, it first reads the initial thresholds from the threshold storage unit as a benchmark for subsequent dynamic adjustments.

[0105] Dynamic threshold adjustment (period 500ms): The core of dynamic threshold adjustment is to adjust the baseline threshold according to the changes in vehicle driving scenarios through the scenario adaptation coefficient K, and further optimize it by combining historical network quality data to ensure that the threshold matches the actual scenario.

[0106] Acquisition and application of scene adaptation coefficient K: The scene adaptation module analyzes the driving scene based on the data from the vehicle status acquisition module and outputs the corresponding scene adaptation coefficient K. The K values for different scenes are shown in Table 1.

[0107] Table 1. K values for different scenarios

[0108]

[0109] After the switching trigger decision module receives the scene adaptation coefficient K, it first calculates the preliminary adjustment thresholds RTT_th'=RTT_th0×K and PLR_th'=PLR_th0×K to achieve scene adaptation of the thresholds.

[0110] Threshold optimization based on historical network quality parameters:

[0111] To further adapt to long-term network quality trends (e.g., persistently high base station load in a certain area, with network latency consistently around 300ms), the switch trigger decision module reads the average network quality (RTT_avg, PLR_avg) from the monitoring data storage unit for the most recent 10 times (within 1 second) and performs the following optimization judgment: If the average network latency RTT_avg exceeds 100ms for 3 consecutive times (within 1.5 seconds)... If the actual network latency (RTT_th') is greater than the initial adjustment threshold (RTT_th'), it indicates that the actual network latency in the current scenario is higher than expected, and the network latency threshold needs to be further reduced (RTT_th'' = RTT_th' × 0.9). For example, in a high-speed scenario, if RTT_avg = 260ms for 3 consecutive times > RTT_th' = 240ms, then RTT_th'' = 240 × 0.9 = 216ms to ensure timely triggering of the handover. If the average packet loss rate (PLR_avg) is greater than the initial adjustment threshold (PLR_th') for 3 consecutive times, the packet loss rate threshold should be reduced accordingly (PLR_th'' = PLR_th' × 0.9). If the condition of exceeding the threshold for 3 consecutive times is not met, then the initial adjustment thresholds RTT_th' and PLR_th' should be maintained.

[0112] The adjusted threshold is written to the threshold storage unit to update the current threshold, thereby achieving dynamic optimization of the threshold and avoiding the problems of "over-switching" or "untimely switching" caused by fixed thresholds.

[0113] In one embodiment, generating a handover control signal based on real-time network quality parameters and a dynamically adjusted handover trigger threshold includes:

[0114] A switching control signal for shutting down the online voice processing module and turning on the offline voice processing module is generated when any of the following conditions are met: the real-time network quality parameter is greater than the dynamically adjusted switching trigger threshold; or the network quality monitoring module determines that the network status is abnormal.

[0115] The switching control signal for shutting down the offline voice processing module and turning on the online voice processing module is generated when the following conditions are met: the real-time network quality parameter is less than 0.7 times the dynamically adjusted switching trigger threshold for several consecutive times.

[0116] Specifically, the switching trigger decision module and the network quality monitoring module have a synchronized monitoring cycle (100ms) to determine in real time whether to trigger a switch, ensuring the timeliness and accuracy of the switch.

[0117] Conditions for switching between online and offline:

[0118] The online mode will switch to offline mode if any of the following conditions are met:

[0119] Condition 1: The current network latency RTT_curr is greater than the current dynamically adjusted network latency threshold RTT_th, and the current packet loss rate PLR_curr is greater than the current dynamically adjusted packet loss rate threshold PLR_th (two parameters trigger to avoid false triggering caused by fluctuation of a single parameter);

[0120] Condition 2: The network quality monitoring module outputs network_status=abnormal (network abnormal alarm, such as RTT>2000ms or PLR>50%). At this time, there is no need to wait for the two parameters to be judged, and the switch is triggered directly to avoid the complete interruption of the network and the resulting lack of voice response.

[0121] For example, in a high-speed scenario, if the current thresholds RTT_th = 216ms and PLR_th = 10.8%, and the current RTT_curr = 230ms and PLR_curr = 12%, condition 1 is met, triggering the online to offline switch; if the current RTT_curr = 2500ms, condition 2 is met, directly triggering the switch.

[0122] Offline → Online switching conditions:

[0123] To avoid frequent handovers (offline → online → offline) caused by network quality fluctuations, a "conservative triggering" strategy is adopted for offline → online handover, which requires the following conditions to be met simultaneously:

[0124] Currently, RTT_curr < RTT_th × 0.7 (network latency is much lower than the threshold, such as when RTT_th = 240ms, RTT_curr < 168ms);

[0125] The current PLR_curr < PLR_th × 0.7 (the packet loss rate is much lower than the threshold, such as when PLR_th = 12%, PLR_curr < 8.4%).

[0126] If the above two conditions are met for five consecutive monitoring sessions (within 500ms), the network quality will steadily recover.

[0127] For example, after the scene in the underground parking garage is switched to the city road, RTT_curr occurs 5 times in a row = 200ms < RTT_th = 450 × 0.7 = 315ms, and PLR_curr occurs 5 times in a row = 8% < PLR_th = 22.5 × 0.7 = 15.75%, triggering the offline to online switch.

[0128] Switch signal output:

[0129] The switching trigger decision module outputs switching control signals via the GPIO interface: a high level indicates activation, and a low level indicates deactivation. For example, when triggering an online → offline switch, it outputs "Online module GPIO low level (deactivation), offline module GPIO high level (activation)"; when triggering an offline → online switch, it outputs "Offline module GPIO low level (deactivation), online module GPIO high level (activation)". The duration of the control signal is 50ms to ensure that the offline / online voice processing module stably recognizes the switching command and avoids module non-response due to a short duration of the signal.

[0130] By using a dynamic threshold adjustment mechanism, compared to traditional fixed thresholds, the handover accuracy of this invention is improved by 25% (false trigger rate reduced from 18% to 13.5%), and the number of unnecessary handovers is reduced by 30% (from an average of 8 times per hour to 5.6 times per hour), significantly improving handover adaptability in network fluctuation scenarios.

[0131] When the switching trigger decision module triggers the online → offline switch, the online voice processing module may still be outputting audio data. The offline voice processing module needs to start and generate subsequent audio data. The audio smooth transition module needs to achieve a seamless connection between "the end of online audio and the beginning of offline audio".

[0132] As shown in one embodiment, see [link / reference] Figure 3 As shown, the smooth transition processing of offline and online voice audio data output by the offline and online voice processing modules during state switching includes the following steps when switching from the online voice processing module being active to the offline voice processing module being active:

[0133] S3100, real-time caching of audio frames output by the online voice processing module;

[0134] The audio smooth transition module's buffer unit uses a dual-port RAM (such as an IDT71V416SRAM) to cache audio frames output by the online voice processing module in real time, with a buffer depth of 10 frames. Since the in-vehicle voice audio uses a 48kHz sampling rate and 16-bit quantized PCM format, each audio frame is approximately 0.2ms long (each frame contains 96 sampling points, sampling period ≈ 20.8μs, 96 × 20.8μs ≈ 2ms. Correction: 48kHz sampling rate means 48,000 sampling points per second, each audio frame typically contains 1024 sampling points, duration = 1024 / 48000 ≈ 21.3ms, therefore the total buffering time for 10 frames is ≈ 213ms, ensuring sufficient online audio frames for transition during switching). The buffer unit uses a "first-in, first-out" (FIFO) mechanism to update the buffered audio frames in real time, always maintaining the latest 10 frames of online audio.

[0135] S3200: After detecting the switching control signal, mark the currently playing audio frame;

[0136] The audio smooth transition module receives the "online → offline" switching signal from the switching trigger decision module through the GPIO interface. After detecting the switching signal, it immediately marks the currently playing audio frame as Fk (e.g., the 5th frame). At this time, the audio frames stored in the buffer unit are Fk, Fk+1, ..., Fn (a total of 10 frames).

[0137] After the S3300 offline voice processing module starts, it generates the first offline audio frame and performs audio feature analysis on the last frame of the cached online audio frame and the first offline audio frame to generate a transition frame.

[0138] The audio smooth transition module sends a start signal to the offline voice processing module through the GPIO interface. After the offline module starts, it first generates the first offline audio frame O1 (which takes about 50ms to start). During the offline module startup, the DSP chip of the audio smooth transition module performs audio feature analysis on the cached online audio frame Fn (the last online audio frame) and the soon-to-be-generated offline audio frame O1: extracting the amplitude (e.g., 0.8V), fundamental frequency (e.g., 300Hz, corresponding to the mid-low frequency components in speech), and phase (e.g., π / 2) of Fn; extracting the amplitude (e.g., 0.7V), fundamental frequency (e.g., 280Hz), and phase (e.g., π / 3) of O1; and generating three transition frames T1-T3 based on a linear interpolation algorithm: T1 amplitude = 0.78V, fundamental frequency = 295Hz, phase = 5π / 12; T2 amplitude = 0.75V, fundamental frequency = 290Hz, phase = 4π / 12; T3 amplitude = 0.72V, fundamental frequency = 285Hz, phase = 3π / 12, achieving a smooth transition of audio features from Fn to O1, avoiding "skipping" caused by abrupt changes in amplitude and frequency.

[0139] S3400 outputs audio frames by concatenating them in the order of the currently playing audio frame, the cached online audio frame, the transition frame, and the offline audio frame.

[0140] The audio smooth transition module splices audio frames in the following order: "Current playback frame Fk → Subsequent online frame Fk+1 → ... → Fn → Transition frame T1 → T2 → T3 → Offline frame O1 → O2 → ...". The spliced audio stream is output to the car audio interface module via the I2S interface and finally played by the car audio system. Throughout the process, the audio playback interval is less than 5ms, and the user cannot perceive the switching, avoiding the audio interruption of traditional switching methods (usually more than 500ms).

[0141] When the switching trigger decision module triggers the offline → online switch, the offline voice processing module needs to gradually stop outputting, the online voice processing module starts and generates subsequent audio data, and the audio smooth transition module needs to achieve a seamless connection between "the end of offline audio and the beginning of online audio".

[0142] As shown in one embodiment, see [link / reference] Figure 4 As shown, the smooth transition processing of offline and online voice audio data output by the offline voice processing module and the online voice processing module when switching states includes the following steps when switching from the offline voice processing module being in the active state to the online voice processing module being in the active state:

[0143] S4100: Real-time buffering of audio frames output by the offline voice processing module; upon detecting a switching control signal, marking the currently playing audio frame;

[0144] The audio smooth transition module's buffer unit caches audio frames output by the offline speech processing module in real time, with a buffer depth of 5 frames (approximately 106.5ms), and updates using a FIFO mechanism. When an "offline → online" switching signal is detected, the currently playing audio frame is marked as OK, and a "offline module is ready to stop" signal is sent to the offline speech processing module, notifying it that output is about to stop.

[0145] After the S4200 and online voice processing module are started, they generate online audio frames and perform linear amplitude attenuation processing on the cached offline audio frames to generate attenuated frames.

[0146] After receiving the stop signal, the offline voice processing module stops generating new offline audio frames and only outputs the last 5 buffered audio frames (Om-4,...,Om). At the same time, the audio smooth transition module starts the online voice processing module. After the online module is initialized, it generates the first online audio frame F1 (startup time is about 50ms). During the startup of the online module, the last 5 frames of audio from the offline module are still playing to avoid "silent gaps".

[0147] To avoid "volume jumps" caused by abrupt amplitude changes between offline audio (e.g., amplitude 0.8V) and online audio (e.g., amplitude 0.9V), the DSP chip of the audio smoothing transition module performs linear amplitude attenuation processing on the last 5 frames of audio (Om-4,...,Om) output by the offline module:

[0148] Om-4 (first frame, attenuated frame): Amplitude = 0.8V (original amplitude);

[0149] Om-3 (2nd frame, attenuation frame): Amplitude = 0.6V (25% attenuation);

[0150] Om-2 (3rd frame, attenuation frame): Amplitude = 0.4V (50% attenuation);

[0151] Om-1 (4th frame, attenuation frame): Amplitude = 0.2V (75% attenuation);

[0152] Om (5th frame, attenuation frame): Amplitude = 0.1V (attenuation 87.5%);

[0153] The amplitude difference between the attenuated offline audio frame and the online audio frame F1 (amplitude 0.9V) is <0.8V, so the user cannot perceive the sudden change in volume.

[0154] S4300 outputs audio frames by concatenating them in the order of the currently playing audio frame, the decay frame, and the online audio frame.

[0155] The audio smooth transition module splices audio frames in the following order: "Current playback frame Ok → Subsequent offline frame Ok+1 → ... → Om-4 (attenuation) → Om-3 (attenuation) → Om-2 (attenuation) → Om-1 (attenuation) → Om (attenuation) → Online frame F1 → F2 → ...". The spliced audio stream is output to the car audio system, enabling continuous audio playback with "offline → online" switching. The silence interval is less than 3ms, which is much lower than the silence interval of more than 500ms in traditional switching methods.

[0156] Through the aforementioned audio smooth transition mechanism, the audio interruption time during offline / online switching is reduced from more than 500ms to less than 5ms, reducing user perception by 99% and completely eliminating experience problems such as "stuttering," "pitch skipping," and "mute."

[0157] Table 2 shows a comparison of the key performance indicators of this invention and existing technologies in terms of offline / online switching for in-vehicle voice systems:

[0158] Table 2 Comparison of Key Performance Indicators

[0159]

[0160] The quantitative data above shows that the present invention is significantly superior to the existing technology in terms of real-time network quality monitoring, timely switching triggering, and audio continuity, effectively solving the problems of stuttering and interruption of experience when switching between online and offline networks during network fluctuations.

[0161] This invention also provides a method for switching between offline and online in-vehicle voice control, see below. Figure 5 As shown, it includes the following steps:

[0162] S100: Acquire monitored network quality parameters and collected vehicle scene data;

[0163] S200: Determine the scene adaptation coefficient based on vehicle scene data, dynamically adjust the switching trigger threshold based on the initial switching trigger threshold, scene adaptation coefficient and historical network quality parameters, and generate a switching control signal based on real-time network quality parameters and the dynamically adjusted switching trigger threshold. The switching control signal is used to control the offline voice processing module to switch between on and off states and the online voice processing module to switch between on and off states.

[0164] S300 performs a smooth transition processing on the offline voice audio data and online voice audio data output by the offline voice processing module and the online voice processing module when switching states, and then outputs them.

[0165] This invention also provides an electronic device, see below. Figure 6 As shown, an embodiment of the present invention provides an electronic device including: one or more processors 101, a memory 102, and one or more I / O interfaces 103. The memory 102 stores one or more programs, which, when executed by the one or more processors, enable the one or more processors to implement any of the in-vehicle voice offline / online switching methods described in the above embodiments; the one or more I / O interfaces 103 are connected between the processor and the memory, configured to enable information interaction between the processor and the memory.

[0166] The processor 101 is a device with data processing capabilities, including but not limited to a central processing unit (CPU); the memory 102 is a device with data storage capabilities, including but not limited to random access memory (RAM, more specifically SDRAM, DDR, etc.), read-only memory (ROM), electrically erasable programmable read-only memory (EEPROM), and flash memory (FLASH); the I / O interface (read / write interface) 103 is connected between the processor 101 and the memory 102, and can realize information interaction between the processor 101 and the memory 102, including but not limited to a data bus (Bus).

[0167] In some embodiments, the processor 101, memory 102, and I / O interface 103 are interconnected via bus 104, and thus connected to other components of the computing device.

[0168] In some embodiments, the one or more processors 101 include a field-programmable gate array.

[0169] This invention also provides a computer-readable medium. The computer-readable medium stores a computer program, which, when executed by a processor, implements the steps of any of the in-vehicle voice offline / online switching methods described in the above embodiments. The computer-readable storage medium can be volatile or non-volatile.

[0170] Those skilled in the art will understand that all or some of the steps, systems, and apparatuses disclosed above, and their functional modules / units, can be implemented as software, firmware, hardware, or suitable combinations thereof. In hardware implementations, the division between functional modules / units mentioned above does not necessarily correspond to the division of physical components; for example, a physical component may have multiple functions, or a function or step may be performed collaboratively by several physical components. Some or all physical components may be implemented as software executed by a processor, such as a central processing unit, digital signal processor, or microprocessor, or as hardware, or as an integrated circuit, such as an application-specific integrated circuit (ASIC). Such software can be distributed on a computer-readable storage medium, which may include computer storage media (or non-transitory media) and communication media (or transient media).

[0171] As is known to those skilled in the art, the term computer storage medium includes volatile and non-volatile, removable and non-removable media implemented in any method or technology for storing information, such as computer-readable program instructions, data structures, program modules, or other data. Computer storage media includes, but is not limited to, random access memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM), static random access memory (SRAM), flash memory or other memory technologies, portable compact disc read-only memory (CD-ROM), digital versatile disc (DVD) or other optical disc storage, magnetic cartridges, magnetic tape, disk storage or other magnetic storage devices, or any other medium that can be used to store desired information and is accessible to a computer. Furthermore, it is known to those skilled in the art that communication media typically contain computer-readable program instructions, data structures, program modules, or other data in modulated data signals such as carrier waves or other transmission mechanisms, and may include any information delivery medium.

[0172] The computer-readable program instructions described herein can be downloaded from computer-readable storage media to various computing / processing devices, or downloaded via a network, such as the Internet, local area network, wide area network, and / or wireless network, to an external computer or external storage device. The network may include copper transmission cables, fiber optic transmission, wireless transmission, routers, firewalls, switches, gateway computers, and / or edge servers. A network adapter card or network interface in each computing / processing device receives the computer-readable program instructions from the network and forwards them to the computer-readable storage media in the respective computing / processing device.

[0173] The computer program instructions used to perform the operations of this invention may be assembly instructions, instruction set architecture (ISA) instructions, machine instructions, machine-dependent instructions, microcode, firmware instructions, state setting data, or source code or object code written in any combination of one or more programming languages, including object-oriented programming languages such as Smalltalk, C++, etc., and conventional procedural programming languages such as the "C" language or similar programming languages. The computer-readable program instructions may be executed entirely on the user's computer, partially on the user's computer, as a standalone software package, partially on the user's computer and partially on a remote computer, or entirely on a remote computer or server. In cases involving a remote computer, the remote computer may be connected to the user's computer via any type of network—including a local area network (LAN) or a wide area network (WAN)—or may be connected to an external computer (e.g., via the Internet using an Internet service provider). In some embodiments, electronic circuitry, such as programmable logic circuitry, field-programmable gate arrays (FPGAs), or programmable logic arrays (PLAs), is personalized by utilizing state information from the computer-readable program instructions. This electronic circuitry can execute the computer-readable program instructions to implement various aspects of the invention.

[0174] The computer program product described herein can be implemented specifically through hardware, software, or a combination thereof. In one alternative embodiment, the computer program product is specifically embodied in a computer storage medium; in another alternative embodiment, the computer program product is specifically embodied in a software product, such as a software development kit (SDK), etc.

[0175] Various aspects of the present invention are described herein with reference to flowchart illustrations and / or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It should be understood that each block of the flowchart illustrations and / or block diagrams, and combinations of blocks in the flowchart illustrations and / or block diagrams, can be implemented by computer-readable program instructions.

[0176] These computer-readable program instructions can be provided to a processor of a general-purpose computer, a special-purpose computer, or other programmable data processing apparatus to produce a machine such that, when executed by the processor of the computer or other programmable data processing apparatus, they create means for implementing the functions / actions specified in one or more blocks of the flowchart and / or block diagram. These computer-readable program instructions can also be stored in a computer-readable storage medium that causes a computer, programmable data processing apparatus, and / or other device to operate in a particular manner; thus, the computer-readable medium storing the instructions comprises an article of manufacture that includes instructions for implementing aspects of the functions / actions specified in one or more blocks of the flowchart and / or block diagram.

[0177] Computer-readable program instructions may also be loaded onto a computer, other programmable data processing apparatus, or other device to cause a series of operational steps to be performed on the computer, other programmable data processing apparatus, or other device to produce a computer-implemented process, thereby causing the instructions executed on the computer, other programmable data processing apparatus, or other device to perform the functions / actions specified in one or more boxes of a flowchart and / or block diagram.

[0178] The flowcharts and block diagrams in the accompanying drawings illustrate the architecture, functionality, and operation of possible implementations of systems, methods, and computer program products according to various embodiments of the present invention. In this regard, each block in a flowchart or block diagram may represent a module, segment, or portion of an instruction, which contains one or more executable instructions for implementing a specified logical function. In some alternative implementations, the functions marked in the blocks may occur in a different order than those shown in the drawings. For example, two consecutive blocks may actually be executed substantially in parallel, and they may sometimes be executed in reverse order, depending on the functions involved. It should also be noted that each block in the block diagrams and / or flowcharts, and combinations of blocks in the block diagrams and / or flowcharts, may be implemented using a dedicated hardware-based system that performs the specified function or action, or using a combination of dedicated hardware and computer instructions.

[0179] Example embodiments have been disclosed herein, and while specific terminology has been used, it is for illustrative purposes only and should be construed as such, and is not intended to be limiting. In some instances, it will be apparent to those skilled in the art that features, characteristics, and / or elements described in conjunction with particular embodiments may be used alone, or in combination with features, characteristics, and / or elements described in conjunction with other embodiments, unless otherwise expressly indicated. Therefore, those skilled in the art will understand that various changes in form and detail may be made without departing from the scope of the invention as set forth in the appended claims.

Claims

1. A vehicle-mounted voice offline / online switching system, characterized in that, include: The network quality monitoring module is used to monitor network quality parameters; The vehicle status acquisition module is used to collect vehicle scene data; The scene adaptation module is used to output scene adaptation coefficients based on vehicle scene data; The handover trigger decision module is used to dynamically adjust the handover trigger threshold based on the initial handover trigger threshold, the scenario adaptation coefficient, and historical network quality parameters, and to generate a handover control signal based on the real-time network quality parameters and the dynamically adjusted handover trigger threshold. The offline voice processing module is used to switch between on and off states according to the switching control signal, and outputs offline voice audio data when it is on. The online voice processing module is used to switch between off and on states according to the switching control signal, and outputs online voice audio data when on. The audio smooth transition module is used to perform smooth transition processing on the offline voice audio data and online voice audio data output by the offline voice processing module and the online voice processing module when switching states; The vehicle audio interface module is used to receive audio data that has undergone smooth transition processing and output it to the vehicle audio system to enable voice playback.

2. The system according to claim 1, characterized in that, Network quality parameters include network latency and packet loss rate.

3. The system according to claim 1, characterized in that, It also includes a threshold storage unit for storing the initial switching trigger threshold and the dynamically adjusted switching trigger threshold.

4. The system according to claim 1, characterized in that, It also includes a monitoring data storage unit, which is used to receive and store historical network quality parameters sent by the network quality monitoring module, and to send historical network quality parameters to the switch trigger decision module.

5. The system according to claim 1, characterized in that, The network quality monitoring module monitors network quality parameters through the following steps: S1100: Initialize monitoring parameters and start the timed monitoring task; S1200, Sending and receiving test data packets; S1300, network latency and packet loss rate calculation; S1400, Network Status Assessment; S1500, data storage and output.

6. The system according to claim 1, characterized in that, Dynamically adjusting the handover trigger threshold based on the initial handover trigger threshold, scenario adaptation coefficient, and historical network quality parameters includes the following steps: S2100. The initial switching trigger threshold is obtained by multiplying the initial switching trigger threshold and the scene adaptation coefficient; S2200: Read the average value of the most recent historical network quality parameters. If the average value of the historical network quality parameters is greater than the initial adjustment handover trigger threshold, multiply the initial adjustment handover trigger threshold by a proportional coefficient less than 1 to obtain the dynamically adjusted handover trigger threshold. If there are no consecutive historical network quality parameters whose average value is greater than the initially adjusted handover trigger threshold, the initially adjusted handover trigger threshold shall be used as the dynamically adjusted handover trigger threshold.

7. The system according to claim 1, characterized in that, The handover control signal is generated based on real-time network quality parameters and dynamically adjusted handover trigger thresholds, including: A switching control signal for shutting down the online voice processing module and turning on the offline voice processing module is generated when any of the following conditions are met: the real-time network quality parameter is greater than the dynamically adjusted switching trigger threshold; or the network quality monitoring module determines that the network status is abnormal. The switching control signal for shutting down the offline voice processing module and turning on the online voice processing module is generated when the following conditions are met: the real-time network quality parameter is less than 0.7 times the dynamically adjusted switching trigger threshold for several consecutive times.

8. The system according to claim 1, characterized in that, The smooth transition processing of offline and online voice audio data output by the offline and online voice processing modules during state switching includes the following steps when switching from the online voice processing module being active to the offline voice processing module being active: S3100, real-time caching of audio frames output by the online voice processing module; S3200: After detecting the switching control signal, mark the currently playing audio frame; After the S3300 offline voice processing module starts, it generates the first offline audio frame and performs audio feature analysis on the last frame of the cached online audio frame and the first offline audio frame to generate a transition frame. S3400 outputs audio frames by concatenating them in the order of the currently playing audio frame, the cached online audio frame, the transition frame, and the offline audio frame.

9. The system according to claim 1, characterized in that, The smooth transition processing of offline and online voice audio data output by the offline and online voice processing modules during state switching includes the following steps when switching from the offline voice processing module being active to the online voice processing module being active: S4100: Real-time buffering of audio frames output by the offline voice processing module; upon detecting a switching control signal, marking the currently playing audio frame; After the S4200 and online voice processing module are started, they generate online audio frames and perform linear amplitude attenuation processing on the cached offline audio frames to generate attenuated frames. S4300 outputs audio frames by concatenating them in the order of the currently playing audio frame, the decay frame, and the online audio frame.

10. A method for switching between offline and online voice control in a vehicle, characterized in that, Includes the following steps: S100: Acquire monitored network quality parameters and collected vehicle scene data; S200: Determine the scene adaptation coefficient based on vehicle scene data, dynamically adjust the switching trigger threshold based on the initial switching trigger threshold, scene adaptation coefficient and historical network quality parameters, and generate a switching control signal based on real-time network quality parameters and the dynamically adjusted switching trigger threshold. The switching control signal is used to control the offline voice processing module to switch between on and off states and the online voice processing module to switch between on and off states. S300 performs a smooth transition processing on the offline voice audio data and online voice audio data output by the offline voice processing module and the online voice processing module when switching states, and then outputs them.

11. An electronic device, characterized in that, include: One or more processors; Memory, used to store one or more programs; When the one or more programs are executed by the one or more processors, the one or more processors implement the method as described in claim 10.

12. A computer-readable medium having a computer program stored thereon, characterized in that, When the computer program is executed by a processor, it implements the steps of the method as described in claim 10.