Cloud platform-based vehicle navigation information feedback processing system and method

By compressing the multimodal input of in-vehicle navigation into spatiotemporal intent vectors and processing them in a local asynchronous queue, the problems of navigation feedback delay and interface blocking in weak network environments are solved, and the continuity and real-time performance of interactive feedback under network fluctuation conditions are achieved.

CN122268902APending Publication Date: 2026-06-23DONGGUAN YILING ELECTRONICS CO LTD

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
DONGGUAN YILING ELECTRONICS CO LTD
Filing Date
2026-03-27
Publication Date
2026-06-23

AI Technical Summary

Technical Problem

In environments with weak or intermittent network connectivity, the real-time performance and stability of in-vehicle navigation systems cannot meet preset requirements, resulting in delays, interruptions, or interface waiting issues.

Method used

Multimodal inputs are compressed into lightweight spatiotemporal intent vectors, synchronous uploads are blocked, they are pushed into a local asynchronous queue, and asynchronously uploaded to the cloud when the network is restored. At the same time, the initial visual feedback markers are rendered locally, and the visual display attributes are adjusted through a confidence decay model.

Benefits of technology

Maintaining continuity and smoothness of interactive feedback under network fluctuation conditions reduces uplink load, avoids interface blockage, and improves the reliability of feature extraction and the real-time nature of feedback.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN122268902A_ABST
    Figure CN122268902A_ABST
Patent Text Reader

Abstract

The present application relates to the technical field of vehicle-mounted intelligent navigation and cloud information processing, in particular to a cloud platform-based vehicle-mounted navigation information feedback processing system and method, which comprises: obtaining touch coordinate data stream and audio data stream with time stamps respectively; blocking synchronous uploading of the two when detecting that network quality is lower than a preset network quality threshold; extracting geometric features and semantic features and combining time offsets to form a space-time intention vector; pushing the vector into a local asynchronous queue and rendering initial visual feedback marks in an asynchronous visual feedback layer; monitoring vector uploading status and confirmation instructions or rejection instructions returned by the cloud, dynamically adjusting confidence according to a preset attenuation model and mapping the confidence to visual display attribute update marks; the present application can maintain low-delay interactive feedback under weak network or no network conditions, reduce uplink load and avoid interface blocking.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of vehicle-mounted intelligent navigation and cloud-based information processing technology, specifically to a vehicle-mounted navigation information feedback processing system and method based on a cloud platform. Background Technology

[0002] In daily use of in-vehicle navigation applications, the terminal needs to report abnormal information such as road closures, construction, and congestion. Drivers usually interact by touching the map location and using voice descriptions. However, the efficiency of feedback processing is directly limited by the stability of data transmission between the in-vehicle terminal and the cloud. In traditional methods, in-vehicle navigation feedback often relies on uploading touch data and voice data to the cloud simultaneously before returning the processing results. When the vehicle enters tunnels, elevated ramps, or sections with weak network coverage, the upload of the original multimodal data is prone to delays, interruptions, or interface waiting, resulting in the real-time performance and stability of the interactive feedback failing to meet the preset interactive requirements. Summary of the Invention

[0003] The purpose of this invention is to provide a cloud-based vehicle navigation information feedback processing system and method, and to solve the following technical problems: It can maintain low-latency display of interactive feedback in weak network or momentary network outage environments. By compressing multimodal input into lightweight spatiotemporal intent vectors to replace the direct uploading of raw data, it significantly reduces uplink load and processing overhead. At the same time, by decoupling real-time visual feedback from remote data consistency confirmation and rendering it in an independent asynchronous visual feedback layer, it avoids blocking the main navigation interface rendering pipeline, thus ensuring the continuity and smoothness of in-vehicle interaction under network fluctuation conditions.

[0004] The objective of this invention can be achieved through the following technical solutions: Cloud-based in-vehicle navigation information feedback processing methods include: Acquire touch coordinate data streams from a touch input device with a first timestamp and audio data streams from an audio acquisition device with a second timestamp; when network quality is detected to be lower than a preset network quality threshold, in response to acquiring the touch coordinate data streams and the audio data streams, block the synchronous transmission link between the two to the cloud server; Extract the geometric features of the touch coordinate data stream and the semantic features of the audio data stream; calculate the time offset between the first timestamp and the second timestamp, and fuse the geometric features, the semantic features and the time offset into a spatiotemporal intent vector; The spatiotemporal intent vector is pushed into a local asynchronous queue; and when the network connection is restored, the spatiotemporal intent vector in the local asynchronous queue is uploaded to the cloud server. When the synchronous transmission link is blocked, the initial visual feedback mark is rendered in the asynchronous visual feedback layer of the graphical user interface of the vehicle terminal. Assign an initial confidence level to the initial visual feedback marker; monitor the upload status of the spatiotemporal intent vector in the local asynchronous queue to the cloud server, and the status of receiving feedback instructions returned by the cloud server, the feedback instructions including confirmation instructions or rejection instructions; Based on the upload status and the status of the feedback instruction, the initial confidence level is dynamically adjusted according to a preset decay model with time as the independent variable to generate the current confidence level; the current confidence level is mapped to the visual display attributes of the initial visual feedback mark to update and display the initial visual feedback mark in the graphical user interface.

[0005] Optionally, the geometric features of the touch coordinate data stream and the semantic features of the audio data stream are extracted, including: The touch coordinate data stream is smoothed and denoised, and the curvature features and dwell time features of the touch coordinate data stream are extracted as the geometric features; The semantic features are generated by extracting keywords from the audio data stream using a preset acoustic model.

[0006] Optionally, monitoring the upload status of the spatiotemporal intent vector in the local asynchronous queue to the cloud server, and the status of receiving feedback instructions returned by the cloud server, includes: Detect the network connection status with the cloud server; When the network connection status is connected, the spatiotemporal intent vector in the local asynchronous queue is uploaded to the cloud server; If the network connection is disconnected, the spatiotemporal intent vector is retained in the local asynchronous queue to await uploading. After the spatiotemporal intent vector is successfully uploaded, listen for the feedback instruction returned by the cloud server.

[0007] Optionally, based on the upload status and the status of the feedback instruction, the initial confidence level is dynamically adjusted according to a preset decay model with time as the independent variable to generate the current confidence level, including: If the feedback instruction is not received, calculate the elapsed time from the moment the initial visual feedback marker is rendered; Input the elapsed time into the preset decay model to calculate the current confidence level that decreases as the elapsed time increases; Upon receiving the confirmation instruction, the current confidence level is set to the highest confidence level threshold; Upon receiving the rejection instruction, the current confidence level is set to the lowest confidence threshold.

[0008] Optionally, mapping the current confidence level to the visual display attributes of the initial visual feedback marker, in order to update the display of the initial visual feedback marker in the graphical user interface, includes: Convert the current confidence level into the transparency value or color saturation value of the initial visual feedback mark; If the current confidence level decreases, reduce the transparency value or the color saturation value. When the current confidence level reaches the minimum confidence threshold, the transparency value or the color saturation value is set to zero to eliminate the initial visual feedback mark in the graphical user interface. When the current confidence level reaches the maximum confidence threshold, the transparency value or the color saturation value is set to the maximum display value to stably display the initial visual feedback mark in the graphical user interface.

[0009] A cloud-based in-vehicle navigation information feedback processing system includes: The data acquisition module is used to acquire the touch coordinate data stream of the touch input device with a first timestamp and the audio data stream of the audio acquisition device with a second timestamp. The synchronization blocking module is used to block the synchronous transmission link between the touch coordinate data stream and the audio data stream to the cloud server when the network quality is detected to be lower than a preset network quality threshold. The feature extraction module is used to extract the geometric features of the touch coordinate data stream and the semantic features of the audio data stream; The vector fusion module is used to calculate the time offset between the first timestamp and the second timestamp, and to fuse the geometric features, the semantic features and the time offset into a spatiotemporal intent vector; The queue management module is used to push the spatiotemporal intent vector into a local asynchronous queue; The rendering module is used to render the initial visual feedback marker in the asynchronous visual feedback layer of the graphical user interface when the synchronous transmission link is blocked. A confidence initialization module is used to assign an initial confidence level to the initial visual feedback marker; The status monitoring module is used to monitor the upload status of the spatiotemporal intent vector in the local asynchronous queue to the cloud server, and the status of receiving feedback instructions returned by the cloud server, the feedback instructions including confirmation instructions or rejection instructions; The confidence adjustment module is used to dynamically adjust the initial confidence based on the upload status and the status of the feedback instruction, according to a preset decay model with time as the independent variable, so as to generate the current confidence. An attribute mapping module is used to map the current confidence level to the visual display attributes of the initial visual feedback marker, so as to update the display of the initial visual feedback marker in the graphical user interface.

[0010] Optionally, the system is deployed in terminal devices; The touch input device is a touch screen; The audio acquisition device is a microphone; The graphical user interface is the application interaction interface; The initial visual feedback is marked as an interactive feedback status icon.

[0011] Optionally, the queue management module is also used for: Before pushing the spatiotemporal intent vector into the local asynchronous queue, the touch coordinate data stream and the audio data stream are deleted, and only the spatiotemporal intent vector is retained for network transmission.

[0012] Optionally, the graphical user interface is rendered based on the main rendering pipeline; the asynchronous visual feedback layer is established independently of the main rendering pipeline. The rendering module is specifically used for: In the asynchronous visual feedback layer, the initial visual feedback marker is rendered in a time less than a preset delay threshold to avoid thread blocking of the main rendering pipeline.

[0013] The beneficial effects of this invention are: 1) When a weak network is detected, the present invention blocks the synchronous upload of multimodal raw data, extracts and fuses it into a lightweight spatiotemporal intent vector and pushes it into a local asynchronous queue, and immediately renders local visual tags. This method breaks the traditional mechanism that relies heavily on cloud response, effectively reduces the uplink transmission load, avoids interface waiting or interruption due to network latency, and effectively ensures the real-time performance of feedback. 2) This invention introduces a time-based confidence decay model, which dynamically maps the upload status of the asynchronous queue and the cloud feedback results to the transparency or color saturation of the visual feedback marker. This mechanism transforms the asynchronous process of local feedback and remote confirmation into a smooth and continuous visual state evolution, avoiding visual abrupt changes caused by long-term network unresponsiveness and preventing incorrect expectations from the driver. 3) This invention smooths and denoises the touch data stream to extract curvature and dwell time, and uses an acoustic model to extract semantic keywords from the audio stream; this mechanism fully overcomes the interference of physical vibration and cabin noise during vehicle operation, effectively filters out jitter-induced touch and voice misrecognition, makes the generated intent vector more refined and noise-resistant, and improves the reliability of feature extraction and final reporting results. 4) Before pushing the spatiotemporal intent vector into the local asynchronous queue, this invention actively deletes the original touch coordinate stream and audio stream, retaining only the lightweight intent vector for network transmission. This mechanism not only reduces the cache pressure on the vehicle terminal, enabling the system to quickly complete the reporting even during short-term network connectivity periods such as when the vehicle exits a tunnel, but also reduces the problem of long-term retention of original data. 5) This invention constructs an asynchronous visual feedback layer independent of the main rendering pipeline, enabling visual feedback markers to complete rendering with extremely low latency without blocking the main thread; this design realizes the logical decoupling of the interactive feedback layer from high-load rendering tasks such as the main map base map and route refresh, ensuring that the dragging and zooming of the map screen still maintains continuous rendering at the preset frame rate under complex road conditions or network fluctuations. Attached Figure Description

[0014] The invention will now be further described with reference to the accompanying drawings.

[0015] Figure 1 This is a flowchart illustrating the cloud-based vehicle navigation information feedback processing method provided in this application embodiment; Figure 2 This is a schematic diagram of a cloud-based vehicle navigation information feedback processing system in an embodiment of this application. Detailed Implementation

[0016] The technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of the present invention, and not all embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of the present invention.

[0017] Please see Figure 1 A cloud-based vehicle navigation information feedback processing method includes: acquiring a touch coordinate data stream from a touch input device with a first timestamp and an audio data stream from an audio acquisition device with a second timestamp; when the network quality is detected to be lower than a preset network quality threshold, in response to acquiring the touch coordinate data stream and the audio data stream, blocking the synchronous transmission link between the two to the cloud server; Extract the geometric features of the touch coordinate data stream and the semantic features of the audio data stream; calculate the time offset between the two based on the first timestamp and the second timestamp, and fuse the geometric features, the semantic features and the time offset into a spatiotemporal intent vector; The spatiotemporal intent vector is pushed into a local asynchronous queue; and when the network connection is restored, the spatiotemporal intent vector in the local asynchronous queue is uploaded to the cloud server; when the synchronous transmission link is blocked, an initial visual feedback marker is rendered in the asynchronous visual feedback layer of the graphical user interface of the vehicle terminal; and an initial confidence level is assigned to the initial visual feedback marker. Monitor the upload status of the spatiotemporal intent vector in the local asynchronous queue to the cloud server, and receive the status of feedback instructions returned by the cloud server, the feedback instructions including confirmation instructions or rejection instructions; Based on the upload status and the status of the feedback instruction, the initial confidence level is dynamically adjusted according to a preset decay model with time as the independent variable to generate the current confidence level; the current confidence level is mapped to the visual display attributes of the initial visual feedback mark to update and display the initial visual feedback mark in the graphical user interface.

[0018] This embodiment provides a cloud platform-based vehicle navigation information feedback processing mechanism; specifically, the scenario is uniformly set as follows: a ride-hailing vehicle is driving on a section of urban expressway and tunnel, and the driver finds that the ramp ahead is temporarily closed in the navigation application interface and needs to complete the map error feedback through the vehicle's system; During this process, the driver first selects the abnormal location on the map and then verbally states that the road is closed. As the vehicle is about to enter the tunnel, the uplink quality of the network drops rapidly. The key point of this embodiment is to maintain low-latency display of interactive feedback even in the case of weak network or even momentary network outage, while compressing multimodal input into lightweight results that can be uploaded later. The data acquisition thread on the vehicle's infotainment system continuously monitors the touchscreen and microphone; the touchscreen outputs a set of coordinate sequences that change over time. For example, in a certain interaction, three touch points are collected in sequence: point A at time 1000 milliseconds, with coordinates (520, 310); point B at time 1035 milliseconds, with coordinates (522, 312). And point C at time 1090 milliseconds, with coordinates (521, 311); the microphone outputs a voice segment in another acquisition thread, with a start timestamp of, for example, 1120 milliseconds and an end timestamp of, for example, 1780 milliseconds; the first and second timestamps here are not required to be the same, but rather to mark the occurrence time of their respective acquisition sources; the aforementioned A, B, and C are only used as identifiers of different sampling points in this touch coordinate sequence to distinguish multiple time points within the same event, and do not represent additional calculation parameters themselves; When the network monitoring module detects that the current uplink network quality is lower than the preset network quality threshold, such as a packet loss rate higher than 15% or an instantaneous uplink rate lower than 20 kilobits per second, it will no longer send the original touch stream and the original audio stream directly into the synchronous upload channel, but will immediately cut off the corresponding synchronous transmission link. The blocking here can be understood as follows: the real-time remote submission action that was originally executed in the main interaction thread is suspended and instead continues to be processed locally to avoid the main rendering thread waiting for the remote end to return; further, the synchronous transmission link here specifically refers to the real-time cloud upload path corresponding to the original touch coordinate data stream and the original audio data stream; after it is blocked, the subsequent asynchronous upload path based on the spatiotemporal intent vector is still retained, but it is triggered by the asynchronous queue to send after the network is restored or the upload conditions are met. Therefore, blocking synchronous transmission and subsequent asynchronous upload vectors do not conflict. The two correspond to different data forms and different transmission timings. Furthermore, the aforementioned network quality falling below a preset network quality threshold is used to trigger the blocking of the synchronous upload link of the original data. Whether the subsequent asynchronous queue will execute the transmission can be determined based on the network connection status with the cloud server. The former focuses on whether the uplink quality is sufficient to carry the original multimodal stream, while the latter focuses on whether the link has recovered to the point where vector-level data transmission can be completed. Therefore, the two serve different stages and do not constitute a conflict of conditions. During the local processing stage, geometric features are extracted from the touch coordinate data stream, and semantic features are extracted from the audio data stream. For ease of explanation, it is assumed that the click action, after processing, results in two geometric quantities: one is that the center coordinates of the clicked area are close to (521, 311), and the other is that the dwell time is 90 milliseconds. It is further assumed that the speech recognition results in two semantic quantities: the keyword "road closure" and the location pronoun "here". A stable touch point refers to a touch point in a continuous touch coordinate data stream where the rate of change of coordinate position is lower than a preset fluctuation threshold and the contact duration in the corresponding area exceeds a preset dwell time. The time offset is calculated based on the timestamps of the two types of inputs. For example, comparing the moment when the touch stabilizes at 1090 milliseconds is formed with the moment when the voice starts at 1120 milliseconds, the offset is 30 milliseconds. The system integrates the above geometric features, semantic features and time offset into a spatiotemporal intent vector. For ease of understanding, it can be visualized as a set of lightweight structures, such as [location area number R15, dwell time 90, keyword code K2, place word code L1, time offset 30]; the vector is not limited to a specific encoding method and can be a fixed-length array, a set of structured fields, or a compact binary packet; Here, R15 represents the region number corresponding to the clicked location after regional discretization, K2 represents the code of the keyword blocking in the local keyword coding table, and L1 represents the code of the location pronoun in the location word coding table. The aforementioned R15, K2, and L1 are all exemplary coding values, used only to illustrate the organization of the vector field, and do not introduce any new algorithm variables. After the spatiotemporal intent vector is generated, it is pushed into a local asynchronous queue. This queue can use a first-in-first-out (FIFO) approach to store feedback events that have not yet been uploaded. For example, if there are two vectors Q1 and Q2 waiting to be uploaded in the current queue, after the newly generated vector Q3 is enqueued, the order will be Q1→Q2→Q3. After the queue is enqueued, even if the network has not yet recovered, the main interface does not wait for any remote response. Furthermore, in order to ensure that the subsequent state is written back to the correct visual marker, the system can generate a local event identifier E103 at the same time when the vector is enqueued, and establish a one-to-one association between the vector Q3, the event number E103 and the visual marker M103. Subsequent upload success statuses, confirmation commands, and rejection commands will all be matched using this event identifier to avoid status crosstalk between multiple concurrent feedback events. Q1, Q2, and Q3 respectively represent different pending spatiotemporal intent vectors in the asynchronous queue, E103 represents the local event number corresponding to this feedback event, and M103 represents the visual feedback marker in the interface bound to this event number. These codes are only used to illustrate the correspondence between queue items, event items, and display items. In parallel with the queuing action, at the moment a synchronous transmission interruption is detected, the asynchronous visual feedback layer of the graphical user interface immediately renders an initial visual feedback marker; for example, a semi-transparent circular cue marker is overlaid on the map location just selected by the driver, and a simplified status graphic is displayed next to it. This marker is located in an independent feedback layer, does not change the navigation base map, and does not block the main map zooming and route refresh; the independent feedback layer here is the aforementioned asynchronous visual feedback layer, both referring to the same graphics layer object, and is only used to indicate that it is independent of the main rendering content; after rendering is completed, the system assigns an initial confidence level to this marker, for example, set to 1.0; The initial confidence level here is the normalization starting point for subsequent visual attribute mapping, and does not limit the initial display to a certain fixed transparency; that is, the marker can be designed to be semi-transparent, outlined and highlighted or other visual styles at the initial moment according to the interface style, as long as the current confidence level and the selected visual display attribute maintain a monotonic correspondence. Afterwards, the status monitoring logic continuously detects two states: one is the upload status of the vector in the local asynchronous queue, and the other is the status of the feedback instruction returned by the cloud. Taking a specific application scenario as an example: after the vehicle enters the tunnel, the network is disconnected for the first 3 seconds, and Q3 is kept in the queue; the network is restored in the 4th second, and Q3 is uploaded. If the cloud returns a confirmation command at 4.2 seconds, the flag is converted to a high-determinism state; if the cloud returns a rejection command at 4.2 seconds, the flag is converted to a negative state; if no result is returned continuously, the confidence level decreases over time. To illustrate the confidence adjustment process, it is assumed that the attenuation model uses time as the independent variable and the initial confidence level is 1.0; the preset attenuation model is specifically any one of the linear attenuation model, the exponential attenuation model, or the discrete step attenuation model; the system can dynamically configure different attenuation rate parameters for the preset attenuation model according to whether the current network connection status is in the non-upload state or the uploaded and waiting-for-feedback state. When no results are received from the cloud, the confidence level can drop to 0.8 in the first second, 0.5 in the third second, and 0.1 in the sixth second; the current confidence level is mapped to visual display attributes, such as transparency or color saturation. Assuming transparency is chosen as the mapping attribute, and the upper limit of the transparency scale is 100%, then when the current confidence level becomes 0.5, the transparency can be reduced to 50% simultaneously; when the current confidence level is close to 0, the marker is almost invisible; if a confirmation instruction is received, the current confidence level is directly set to the highest threshold, such as 1.0; if a rejection instruction is received, it can be directly set to the lowest threshold, such as 0. In one alternative implementation, when touch input is present but no valid voice input is detected, the system can still construct a spatiotemporal intent vector based solely on touch geometric features and a default empty semantic field; when voice is present but no explicit touch location is detected, the location field can be set to the current view center area or the most recently focused user area, and the vector can be locally marked as a low-position signal state. If the time offset exceeds the preset limit, such as more than 3 seconds, it is considered that the two types of input do not belong to the same feedback event. In this case, they are split into two candidate events and enqueued separately to avoid incorrect binding. If the local asynchronous queue reaches the capacity limit, such as caching a maximum of 100 events, the earlier entries that have decayed to the lowest confidence level and have not yet been uploaded are prioritized for elimination to prevent the storage from being full. For example, before the aforementioned ride-hailing vehicle enters the tunnel, the driver clicks on the elevated ramp entrance on the map and says that the road is closed here; the vehicle's system detects that the network quality drops from good to weak in a short period of time, so it terminates the original synchronous upload action, locally extracts the keyword that the clicked location stayed in the ramp entrance area for about 90 milliseconds and the time offset of about 30 milliseconds, forming a spatiotemporal intent vector and adding it to the asynchronous queue; Meanwhile, a semi-transparent feedback icon appears on the map interface in real time; within seconds after the vehicle enters the tunnel, the icon gradually fades but does not cause the main rendering pipeline to block the main navigation screen; after exiting the tunnel and restoring the signal, the vector is uploaded, and the cloud confirms that multiple vehicles have reported on this section of the road, so it returns a confirmation command, and the icon changes from a semi-transparent state to a clear confirmation state. The purpose of this step is to decouple real-time visual feedback from remote data consistency confirmation, so as to maintain the continuity of interactive response under network fluctuation conditions, and replace the original multimodal data with lightweight intent vectors for direct uploading, thereby reducing uplink load and avoiding interface blockage.

[0019] In a preferred embodiment of the present invention, extracting the geometric features of the touch coordinate data stream and the semantic features of the audio data stream includes: performing smoothing and noise reduction processing on the touch coordinate data stream, and extracting the curvature features and dwell time features of the touch coordinate data stream as the geometric features; and extracting keywords from the audio data stream using a preset acoustic model to generate the semantic features. This embodiment provides a mechanism for refining feature extraction; specifically, in the aforementioned continuous scenario, simply obtaining the original touch points and audio segments is not enough, because vibrations during vehicle movement, slight tremors of the driver's fingers, and cabin noise can cause data jitter. If the original trajectory and original audio are used directly, it is easy to encode mis-touch or mis-identified words into the spatiotemporal intent vector, which will affect the subsequent upload results; therefore, this embodiment further introduces smoothing and noise reduction and keyword extraction processing. For the touch coordinate data stream, adjacent sampling points can be smoothed and denoised first. For ease of understanding, assume that the actual target should be concentrated around (520, 310) in a certain point selection, but due to road bumps, five points are obtained by continuous sampling: (518, 309), (523, 315), (520, 311), (521, 310), (519, 309). The system can perform local smoothing on these points, for example, by reducing the weight of the second point with abnormal jumps to obtain a more stable trajectory center (520.2, 310.1). The smoothing and denoising process can employ any of the following algorithms: moving average, Gaussian filtering, or Kalman filtering. By dynamically assigning different weights to multiple continuously sampled touch coordinate points, abrupt noise caused by vehicle vibration or slight finger tremors can be eliminated. Based on this, curvature and dwell time features are extracted. If the finger path consists primarily of short-distance stops, the curvature is close to 0, indicating that the user is primarily confirming the location at a specific point. If a user swipes slightly along the edge of the road and then stops, the curvature will be higher than the threshold, which can reflect their intention to indicate the direction along the road; the dwell time is calculated by the start and end of the contact, for example, from 1000 milliseconds to 1090 milliseconds, 90 milliseconds can be obtained; For audio data streams, a pre-defined acoustic model is used for keyword extraction; the pre-defined acoustic model includes, but is not limited to, Hidden Markov Model, Recurrent Neural Network, or an end-to-end speech recognition model based on Transformer architecture; the model is pre-trained to map the input audio data frame sequence into a text sequence containing candidate keywords and their corresponding confidence scores; For technical explanation, it is assumed that the vocabulary of the acoustic model is pre-configured with high-frequency navigation feedback words such as road closures, congestion, accidents, construction, and traffic restrictions; for a certain segment of voice recording that "the road is closed here", the model can output a confidence score of 0.92 for road closure, 0.21 for construction, and 0.18 for congestion. The system selects keywords that exceed a preset threshold, such as retaining only road closures as semantic features; if location words such as "here" or "the intersection ahead" are also identified in the audio, they can be saved as auxiliary semantic fields; furthermore, the keyword extraction can be performed according to the following rules: first, the audio data stream is framed and feature-encoded, then the acoustic model outputs candidate words and their corresponding scores, and target keywords are selected according to the preset threshold; When the scores of multiple candidate words are close, multiple keywords and their ranking information can be retained for use in the subsequent vector fusion stage, thereby avoiding semantic loss caused by retaining only a single word; Furthermore, geometric and semantic features can be organized into a unified structure; for example, after this processing, the following results are obtained: curvature is 0.05, dwell time is 90 milliseconds, and the keyword is "road closure"; if the curvature is lower than a first preset curvature threshold and the dwell time exceeds a preset duration threshold, it can be presumed that it is accompanied by a voice explanation; if the curvature is higher than a second preset curvature threshold and is accompanied by [other factors], it can be presumed that it is a report along a road segment; this feature extraction method facilitates subsequent vector fusion. Furthermore, in abnormal situations, if the touch trajectory is still obviously unstable after smoothing, such as the span between adjacent points exceeds the preset pixel threshold and the duration is less than the minimum touch duration, it can be determined as a false touch and will not be included in subsequent geometric features. If the audio signal-to-noise ratio is lower than the preset quality threshold, causing multiple keywords to score below the threshold, the semantic features can be set to empty or unrecognized, and subsequent processes can be allowed to complete local feedback display based solely on geometric features. If two high-scoring words are recognized in the audio at the same time, such as "road closure" and "construction" with similar scores, both can be retained together, and multiple keyword identifiers can be added to the vector to avoid losing information prematurely. For example, before the aforementioned ride-hailing vehicle enters the tunnel, the driver's finger vibrates as the vehicle passes over a speed bump, and the original sample shows a touch point that deviates from the center of the trajectory by more than a preset displacement threshold; after smoothing and denoising, this point is downweighted, and the system confirms that the target area is still located near the ramp entrance. At the same time, the microphone picks up audio segments indicating that the road is closed, and the acoustic model extracts the core word "road closure" from them; the system no longer saves all the details of the entire audio segment, but only retains a set of features that can be used to express intent, such as low curvature, 90 milliseconds of dwell time, and the keyword "road closure"; The purpose of this step is to improve feature stability in vehicle vibration and noise environments, making the data entering the vector fusion stage more refined and noise-resistant, and reducing misjudgments caused by fluctuations in the original sampling.

[0020] In a preferred embodiment of the present invention, monitoring the upload status of the spatiotemporal intent vector in the local asynchronous queue to the cloud server, and the status of receiving feedback instructions returned by the cloud server, includes: detecting the network connection status with the cloud server. When the network connection is active, the spatiotemporal intent vector in the local asynchronous queue is uploaded to the cloud server; when the network connection is disconnected, the spatiotemporal intent vector is kept in the local asynchronous queue awaiting upload; after the spatiotemporal intent vector is successfully uploaded, the system listens for the feedback instruction returned by the cloud server.

[0021] This embodiment provides a mechanism for asynchronous queue uploading and status monitoring. Specifically, the aforementioned scheme can save feedback events locally and render tags immediately when the network is weak. However, if the network connection status and uploading timing are not finely managed, two problems may occur: first, repeated uploading attempts when the network is disconnected will cause invalid resource consumption; second, the uploading order will be disordered after the network is restored, causing feedback events generated earlier to be confirmed later. Based on this, this embodiment further refines the queue monitoring process. The system periodically checks the network connection status between itself and the cloud server. The detection results can be divided into at least two states: connected and disconnected. They can also be further subdivided into stable connected, weak connected, and completely disconnected. For ease of explanation, it is assumed that the system performs a check every 500 milliseconds and assigns a value to the current state: 1 indicates connected and 0 indicates disconnected. If the result of three consecutive checks is 0, it is determined that the system is currently in a disconnected state. If at least two values ​​are 1 and the handshake is successful, the connection is considered to be connected. The network connection status detection here is aimed at whether the asynchronous upload link has the conditions to send the spatiotemporal intent vector. It is not the same criterion as the aforementioned criterion used to block the synchronous transmission of the original touch stream and the original audio stream from being lower than the preset network quality threshold. In other words, the original multimodal stream may be prohibited from synchronous uploading due to insufficient uplink quality, but the lightweight spatiotemporal intent vector can still be sent asynchronously when the connection is restored in the future. Therefore, the connectivity state in this embodiment is a sending condition oriented towards asynchronous queue scheduling. When connectivity is detected, the queue manager retrieves the spatiotemporal intent vectors from the head of the local asynchronous queue in sequence and performs the upload. For example, if there are three events Q1, Q2, and Q3 in the queue in sequence, if the network is restored, Q1 will be uploaded first, Q2 will be processed after receiving the upload success flag, and then Q3 will be processed. Each vector can be accompanied by a local event number, such as E101, E102, E103, so that when a cloud feedback instruction is received later, it can be accurately matched with the corresponding visual mark on the interface. When a disconnection is detected, no action is taken to send data to the cloud. Instead, all unuploaded vectors are kept in a local asynchronous queue to wait. At this time, the queue element status can be marked as pending transmission. For example, if Q3 encounters a network disconnection immediately after generation, its status remains pending transmission; once the network is restored and the data has been sent, it is updated to sent pending feedback; upon receiving confirmation or rejection, it is updated to completed; after successful upload, the status monitoring logic enters the feedback listening stage; the feedback instructions returned by the cloud may include confirmation instructions or rejection instructions. For ease of understanding, let's assume that after the E103 corresponding to Q3 is successfully uploaded, the cloud returns a confirmation because it is consistent with existing multi-source events. Another event, Q2, may return a rejection because the location is invalid or the semantics are unclear. After receiving these feedbacks, the vehicle's infotainment system writes the results back to the local status table and drives the corresponding visual markers to switch display states. The confirmation and rejection here correspond to the aforementioned confirmation and rejection commands, respectively. The reason field attached is only for optional explanation and does not change the command type itself. In one alternative implementation, if the network status is detected to fluctuate frequently in a short period of time, such as switching between connected and disconnected more than 4 times in 2 seconds, the jitter protection mode can be entered. In this mode, the queue transmission is temporarily suspended, and only the backlog of events continues to be accumulated. The upload will be resumed after the network stabilizes for several detection cycles to avoid repeatedly establishing connections. If a vector is interrupted during the upload process, it will be put back to the head of the queue or marked as pending retransmission. If the number of retransmissions of the same vector exceeds the limit, such as more than 5 times, its sending priority can be reduced and a local record can be kept to prevent it from blocking the subsequent queue for a long time. If the upload is successful but no feedback instruction is received for a long time, such as exceeding the preset listening time, its status remains as sent and awaiting feedback, and the visual performance will continue to be processed by the subsequent confidence decay logic. For example, after the aforementioned vehicle enters the tunnel, Q3 is stored in the local asynchronous queue with a status of pending transmission; after the vehicle exits the tunnel and the network is detected to have been restored, the queue manager first checks the earlier backlogged Q1 and Q2, and then uploads Q3; Q1 receives an acknowledgment instruction within a preset response period, Q2 is rejected, and Q3 also receives acknowledgment within several hundred milliseconds after successful upload. The three feedback markers on the interface were thus switched to the corresponding display results. Throughout the process, the driver did not experience any interface thread blocking or screen lag caused by upload waiting. The purpose of this step is to ensure that upload actions in a weak network environment are recoverable and traceable through clear connection status determination and queue order control, and to establish a stable correlation between local visual feedback and remote processing results.

[0022] In a preferred embodiment of the present invention, based on the upload status and the status of the feedback instruction, the initial confidence level is dynamically adjusted according to a preset decay model with time as the independent variable to generate the current confidence level, including: calculating the elapsed time from the rendering time of the initial visual feedback mark when the feedback instruction is not received; The elapsed time is input into the preset decay model to calculate the current confidence level that decreases as the elapsed time increases; upon receiving the confirmation instruction, the current confidence level is set to the highest confidence level threshold; upon receiving the rejection instruction, the current confidence level is set to the lowest confidence level threshold. Mapping the current confidence level to the visual display attributes of the initial visual feedback marker in order to update the display of the initial visual feedback marker in the graphical user interface includes: converting the current confidence level into the transparency value or color saturation value of the initial visual feedback marker. If the current confidence level decreases, reduce the transparency value or the color saturation value; if the current confidence level reaches the minimum confidence threshold, set the transparency value or the color saturation value to zero to eliminate the initial visual feedback mark in the graphical user interface; if the current confidence level reaches the maximum confidence threshold, set the transparency value or the color saturation value to the maximum display value to stably display the initial visual feedback mark in the graphical user interface.

[0023] This embodiment provides a visual feedback adjustment mechanism based on confidence decay. Specifically, in the aforementioned continuous scenario, if a feedback marker is rendered locally but the network does not recover for a long time, or the cloud does not return a feedback result for more than a preset time, the feedback marker in the initial state will remain on the interface for a long time, which may lead to inaccurate interaction state indication. Therefore, this embodiment further introduces decay logic in the time dimension and maps this logic directly to visual display attributes, so that the feedback state changes in a gradual way rather than abruptly. The system records the start time when the initial visual feedback marker is rendered and assigns an initial confidence level; assuming the initial confidence level is 1.0; to avoid ambiguity in terminology, the initial confidence level here is an internally calculated value, while descriptions such as semi-transparent, clear, and faded are descriptions of the interface display effect, and the two do not necessarily correspond one-to-one to a fixed initial transparency value. In other words, if the system chooses to map the current confidence level to transparency, then 1.0 can be mapped to the maximum display transparency under that mapping scale; if the system chooses to map the current confidence level to color saturation, then the initial transparency can be kept unchanged while only the color intensity is adjusted; therefore, the numerical examples of transparency and saturation in the following text are examples of optional mapping schemes, rather than fixed interface parameters that must be satisfied simultaneously. If no feedback instruction is received from the cloud afterward, the elapsed time is continuously calculated; for example, if the marker is rendered at 0 seconds, the current confidence level is calculated at 1 second, 3 seconds, and 6 seconds respectively; for ease of explanation, a simple discrete decay table can be used: 1.0 at 0 seconds, 0.8 at 1 second, 0.5 at 3 seconds, 0.2 at 6 seconds, and 0 at 8 seconds; a continuous decay form can also be used, as long as the constraint that the longer the time, the lower the confidence level. After the current confidence level is generated, the system maps it to a visual display attribute; the most direct approach is to map it to a transparency value; assuming a transparency mapping scheme is adopted, and the maximum display transparency under this scheme is defined as 100, then when the current confidence level drops to 0.5, the transparency corresponds to 50; if it drops to 0.2, the transparency becomes 20, and the transparency value of the icon approaches zero; Another approach is to map to color saturation, for example, initially a highly saturated blue, which gradually turns into a pale gray as the confidence decreases; transparency and saturation can also be controlled simultaneously to make the marker present a more natural decay effect; if the initial interface style adopts a semi-transparent dot + stroke design, the semi-transparency can be a fixed basic style, and when the confidence decays, only its transparency or saturation is further reduced, which does not affect the validity of the numerical mapping relationship in this embodiment; When a confirmation command is received, the confidence level is no longer decayed, but is directly increased or locked to the highest threshold, such as 1.0. At this time, the visual marker can switch from a gradually fading state to a stable and clear confirmed state, such as restoring to the maximum transparency or maximum saturation under the selected mapping scale, and adding a solid outline to the edge of the icon. Conversely, when a rejection instruction is received, the current confidence level is directly set to the lowest threshold, such as 0, and the transparency or color saturation is simultaneously set to zero, causing the marker to be removed from the interface; if it is necessary to notify the driver that the feedback has not been accepted, a low-saturation negative graphic can be displayed briefly and then disappear automatically after a short time. To illustrate the process, suppose event E103 renders its initial marker at 0 seconds, has not yet been uploaded successfully at 1 second, and has a current confidence level of 0.8. If this event uses a transparency mapping scheme, the transparency can be set to 80. At 4 seconds, the event has been uploaded but no feedback has been received, and the current confidence level is 0.4, so the transparency can be set to 40. If a confirmation command is received after 5 seconds, the current confidence level immediately returns to 1.0, corresponding to the maximum display value under this mapping scheme; if another event E102 receives a rejection command after 2 seconds, the current confidence level is directly set to 0, the transparency is reduced to zero, and its mark is no longer displayed on the interface; in this way, the user sees a continuous, perceptible, and smoothly changing state evolution; Furthermore, to make the processing logic based on upload status and feedback instruction status clearer, the system can distinguish at least two intermediate states when no feedback instruction is received: the first is the state of not uploading or upload failure pending retry, and the second is the state of uploading pending feedback. For the first type of state, the marker indicates that the event is still only local, so it can be decayed according to the first preset decay rate; for the second type of state, the marker indicates that the event has been successfully sent to the cloud and is only waiting for the server to process the result, so it can be decayed according to the second preset decay rate, or kept for a short time near the preset intermediate confidence threshold. In this way, the upload status not only determines whether to continue sending, but also directly participates in the adjustment of the current confidence level. Taking the same initial confidence level of 1.0 as an example, if the event is not uploaded within the first 3 seconds, it can decrease at a rate of 0.8 per second and 0.5 per 3 seconds. If the event has been successfully uploaded in the 2nd second but there is no feedback before the 5th second, the decrease can be changed to a slower rate after the 2nd second. For example, it can be kept at around 0.6 in the 3rd second and dropped to around 0.4 in the 5th second to reflect the visual difference between the local pending send and cloud pending judgment statuses. Furthermore, to avoid unclear processing rules, the aforementioned faster and slower rhythms can be achieved by different parameter ranges under the same preset attenuation model, or by two pre-configured attenuation tables, as long as the current confidence level corresponding to the unuploaded state is not higher than the current confidence level corresponding to the uploaded and pending feedback state under the same elapsed time conditions. Furthermore, the mapping of the visual display attributes can also be linked to the above states, rather than just a single linear fading; for example, in the non-uploaded state, the icon can use a higher transparency decay and be supplemented with a slightly flickering edge to indicate that it is still stuck in the local queue. When the upload is pending feedback, the flashing can stop, and only the transparency or saturation can be reduced to indicate that it has entered the cloud processing stage; when the confirmation is confirmed, the icon will be displayed clearly and stably; when the rejection is rejected, the icon will be cleared and disappear. The above differential mapping still falls within the adjustment range of transparency or color saturation values, but it further gives different upload states a distinguishable visual semantic. Furthermore, in abnormal situations, if the upload is successful but the feedback instruction content is abnormal, such as missing fields, mismatched event numbers, or unrecognizable instruction type, the system will not directly use the returned result, but will keep the event in the pending confirmation state and continue to reduce the confidence level according to the decay model. If multiple duplicate commands arrive, the command with the latest time and matching event number shall prevail; if a delayed confirmation command is received after the confidence level has decayed to the minimum threshold, the confirmation marker may be redisplayed or only the background status may be recorded, depending on the actual interface strategy; to avoid visual flicker, a feasible approach is to not restore the display after the marker has completely disappeared and the time has exceeded the re-display threshold, and only update the background log. For example, when the vehicle enters a blind spot where the network disconnection lasts for more than a preset time, the ramp closure feedback marker reported by the driver is relatively clear when it first appears; due to the continued network unavailability, the icon fades significantly after 3 seconds, indicating that the event is still pending verification; after the vehicle exits the tunnel, if the cloud confirms that the feedback is valid, the icon returns to a clear state and is displayed stably; if the cloud rejects it, the icon disappears directly; the entire change process is continuous and has minimal interference for the driver, and will not cause the main navigation information to be forcibly obscured; The purpose of this mechanism is to transform the asynchronous process of local feedback followed by remote confirmation into a visual, continuous state change, thereby reducing the misunderstanding caused by state inconsistencies and achieving a natural interface degradation presentation through visual attribute mapping.

[0024] Please see Figure 2 The cloud-based vehicle navigation information feedback processing system includes: a data acquisition module for acquiring touch coordinate data streams from a touch input device with a first timestamp and audio data streams from an audio acquisition device with a second timestamp; The synchronization blocking module is used to block the synchronous transmission link between the touch coordinate data stream and the audio data stream to the cloud server when the network quality is detected to be lower than a preset network quality threshold. The feature extraction module is used to extract the geometric features of the touch coordinate data stream and the semantic features of the audio data stream; The vector fusion module is used to calculate the time offset between the first timestamp and the second timestamp, and fuse the geometric features, the semantic features and the time offset into a spatiotemporal intent vector; The queue management module is used to push the spatiotemporal intent vector into a local asynchronous queue; the rendering module is used to render the initial visual feedback marker in the asynchronous visual feedback layer of the graphical user interface when the synchronous transmission link is blocked; the confidence initialization module is used to assign an initial confidence level to the initial visual feedback marker. The status monitoring module is used to monitor the upload status of the spatiotemporal intent vector in the local asynchronous queue to the cloud server, and the status of receiving feedback instructions returned by the cloud server, the feedback instructions including confirmation instructions or rejection instructions; The confidence adjustment module is used to dynamically adjust the initial confidence based on the upload status and the status of the feedback instruction, according to a preset decay model with time as the independent variable, so as to generate the current confidence. An attribute mapping module is used to map the current confidence level to the visual display attributes of the initial visual feedback marker, so as to update the display of the initial visual feedback marker in the graphical user interface.

[0025] This embodiment provides an in-vehicle navigation information feedback processing system for implementing the aforementioned method; specifically, in the same operating vehicle, the system can be deployed as a set of collaborative functional modules in an in-vehicle navigation application, or it can be deployed as an architecture in which the operating system middleware and the navigation application work together; compared with only a method description, this embodiment further illustrates the collaborative relationship and data flow direction between the modules; The data acquisition module is responsible for receiving raw input from the touch screen and microphone, and adding timestamps to each; the synchronization blocking module is connected to the network quality detection logic, and when a weak network is detected, it blocks the synchronization call that would normally submit the raw touch and audio streams directly to the cloud. After receiving the raw stream, the feature extraction module performs coordinate smoothing, curvature extraction, dwell time calculation, and keyword extraction locally; the vector fusion module integrates the temporal relationship between the two feature streams to generate a spatiotemporal intent vector. After receiving the vector, the queue management module writes it to the local asynchronous queue and maintains the status of each event; at the same moment that the synchronous link is blocked, the rendering module drives the asynchronous visual feedback layer to generate the initial visual feedback flag; the confidence initialization module sets an initial value for the flag, such as 1.0. The status monitoring module continuously observes the upload status of each vector in the queue and the processing results returned from the cloud, and hands the results over to the confidence adjustment module. The confidence adjustment module calculates the current confidence based on the elapsed time and the return status, and the attribute mapping module converts the value into transparency, color saturation or other visual representations, and writes it back to the graphical user interface. To illustrate the collaborative relationship between modules more specifically, a simplified data flow can be used: Touch coordinate data stream D1 and audio data stream D2 first enter the data acquisition module; after a weak network is established, the synchronization blocking module blocks the direct cloud upload path P1 of the original multimodal data; the feature extraction module outputs geometric features G and semantic features S; the vector fusion module outputs the spatiotemporal intent vector V; and the queue management module writes V into the local asynchronous queue Q. The rendering module generates a visual marker M; the confidence initialization module assigns an initial confidence level C0 to M; the status monitoring module subsequently obtains the upload status U and the feedback status F; the confidence adjustment module outputs the current confidence level Ct; the attribute mapping module converts Ct into a display attribute A and updates M; the above letter codes are only used to indicate the data flow between modules and are not required to be used in the implementation. In one alternative implementation, if a module is temporarily unavailable, for example, if the voice keyword extraction submodule times out due to excessive resource consumption, the system may allow the feature extraction module to output a downgraded result containing only geometric features; if the rendering module is temporarily unable to create a visual feedback layer, it will at least record local events and provide a minimized prompt through lightweight status text; if the status monitoring module detects that the cloud feedback and the local event number are inconsistent, it will not propagate the result to the attribute mapping module to avoid incorrect interface updates. For example, in the aforementioned operating vehicle, when the driver reports a ramp closure, the touch screen and microphone first collect data through the data acquisition module; after entering the tunnel, the synchronous blocking module prevents the raw data from being uploaded directly; the feature extraction module and the vector fusion module transform the multimodal input into a lightweight vector; The queue management module caches the vector; the rendering module immediately generates a semi-transparent feedback icon on the map; after the network is restored, the status monitoring module receives confirmation from the cloud, the confidence adjustment module raises the current confidence to the highest value, and the attribute mapping module updates the icon to a stable confirmation state. The purpose of this system is to enable multimodal input processing, weak network scheduling, and interface feedback updates to be completed in parallel on the vehicle terminal through modular collaboration with functional layers, thereby improving the stability and maintainability of the engineering implementation.

[0026] In a preferred embodiment of the present invention, the system is deployed in a terminal device; the touch input device is a touch screen; the audio acquisition device is a microphone; the graphical user interface is an application interaction interface; and the initial visual feedback marker is an interaction feedback status icon.

[0027] This embodiment provides a specific implementation mechanism for a system deployment form. Specifically, if the aforementioned system is only abstractly described as a number of functional modules, there may still be problems with unclear implementation boundaries, such as whether the module runs in the cloud or on the terminal, and what type of input / output device the interaction object is. In order to make the overall solution more closely resemble vehicle application deployment, this embodiment clarifies its deployment object and interface carrier. The system is deployed in the vehicle terminal equipment, which can be the vehicle's main control unit, a central control computing platform integrating navigation applications, or other vehicle terminals with display and networking capabilities; the touch input device is preferably an in-vehicle touch screen, which is responsible for receiving the driver's selections or short swipes on the map interface; the audio acquisition device is preferably an in-vehicle microphone, which is used to collect road abnormality information spoken by the driver. The graphical user interface is preferably the interactive interface of a navigation application, including a map display area, a search area, and a status prompt area; the initial visual feedback marker is specifically represented as an interactive feedback status icon, such as a semi-transparent location point, a road segment highlight mark, or a small status circle. In a simplified example, after the driver clicks on the map, the touchscreen outputs the coordinates; the microphone detects that the road is closed; the processing system inside the terminal device then generates a semi-transparent status icon in the application interface; since all this processing happens locally on the terminal, the interface can still provide real-time feedback even if the cloud is unreachable. In one optional implementation, if the terminal device has multiple microphone channels, in-vehicle near-field voice selection can be performed first, with only the driver's side channel being the priority input; if the touch screen has multi-touch input, the system can prioritize retaining the channel that is triggered first and has the longest dwell time as the main input for this feedback. If the map is not displayed temporarily due to page switching, the interaction feedback status icon can be displayed in the status bar area, and then displayed in the corresponding location when the user returns to the map interface. For example, on the central control screen of the aforementioned operating vehicle, the navigation application interface is displaying the elevated road entrance that is about to be entered; the driver selects the entrance location through the touch screen and says through the overhead microphone that the road is closed, and the system overlays a semi-transparent feedback icon on the navigation application interface; this icon is a status element within the application interaction interface and does not require the addition of new physical instruments or other vehicle mechanical structures. The purpose of this step is to limit the implementation boundaries of the system to the input, output, and interface interaction of the vehicle terminal, so that the solution can be directly implemented in the existing vehicle-machine interaction environment.

[0028] In a preferred embodiment of the present invention, the queue management module is further configured to: delete the touch coordinate data stream and the audio data stream before pushing the spatiotemporal intent vector into the local asynchronous queue, and retain only the spatiotemporal intent vector for network transmission.

[0029] This embodiment provides a mechanism for uplink load compression and local data trimming. Specifically, in the aforementioned scheme, although the original touch stream and audio stream have been converted into spatiotemporal intent vectors, if the system still retains and attempts to upload the original stream for a long time, it will reintroduce the problems of large data packet transmission, storage occupation, and privacy exposure. Therefore, this embodiment further limits the process by deleting the original data before the vector enters the asynchronous queue, retaining only the lightweight result. After receiving the spatiotemporal intent vector output by the vector fusion module, the queue management module first performs a data pruning operation; the pruning targets the original touch coordinate data stream and the original audio data stream used for this event; after pruning, only the vector used to express the interaction intent is retained; For example, the original touch stream may contain 20 sampling points, and the original audio clip may be as long as 2 seconds. After fusion, only a simplified set of information is retained, such as target area R15, curvature 0.05, dwell time 90 milliseconds, keyword blocking, and time offset 30 milliseconds. For quantitative purposes, let's assume the original data size after combining touch and audio is approximately 32 kilobytes, while the spatiotemporal intent vector encoding results in only 180 bytes. After deleting the original stream, the local asynchronous queue only stores these 180 bytes along with the necessary event number and status fields. Subsequent network transmissions will also only send this portion of the content. In this way, it is easier to complete fast uploads under conditions of weak network or instantaneous network recovery. Furthermore, the deletion action preferably occurs after the spatiotemporal intent vector has completed its legality verification and before it is written to the local asynchronous queue. The legality verification includes at least field integrity checks, event number binding checks, and vector encoding success checks. After the above verification conditions are met, the original touch coordinate data stream and the original audio data stream are released from the temporary processing cache and no longer enter the asynchronous upload data structure. The deletion here can be to release the original data from the temporary buffer, or to retain only the volatile cache during the short processing period without writing it to the long-term upload queue; if the system needs to retain a very small amount of debugging information for local diagnostics, it can also retain only the summary information without the original speech content and the full trajectory, such as retaining the length, quality score and hash fingerprint, without retaining the original signal itself; Furthermore, in abnormal situations, if vector generation fails, the original stream should not be deleted immediately, but should first enter a short-term error buffer and wait for a retry; if the retry still fails, it should be discarded according to the preset strategy, and the event should be recorded as a processing failure; if the system needs to retain local evidence under user authorization requirements, it can also only temporarily retain a small encrypted digest in the local security area and prohibit it from entering the network transmission path; if the queue is about to be full, the original stream should be deleted first rather than the generated spatiotemporal intent vector, so as to ensure that the core semantics of the feedback event are preserved; For example, in the aforementioned ramp closure feedback scenario, after the vehicle-mounted system extracts the location, dwell time, keywords, and time offset, it immediately clears the original voice and click trajectory cache and only puts the lightweight vector into the local asynchronous queue; in this way, even if only a brief signal window appears when the vehicle exits the tunnel, it is enough to transmit the feedback to the cloud without waiting for the larger original packet to be sent completely. The purpose of this mechanism is to significantly reduce network upload load and terminal storage pressure, and to reduce the processing overhead caused by retaining the original multimodal data, making the asynchronous feedback link more suitable for vehicle-mounted weak network scenarios.

[0030] In a preferred embodiment of the present invention, the graphical user interface is rendered based on the main rendering pipeline; the asynchronous visual feedback layer is established independently of the main rendering pipeline; the rendering module is specifically used to: in the asynchronous visual feedback layer, complete the rendering of the initial visual feedback marker at a time lower than a preset delay threshold, so as to avoid thread blocking of the main rendering pipeline.

[0031] This embodiment provides a mechanism for decoupling graphical user interface rendering. Specifically, the aforementioned solution has logically separated local feedback from cloud confirmation. However, if the initial visual feedback marker still shares the same blocking pipeline with the rendering of the main navigation screen, problems such as map frame drops and dragging stutters may still occur when the network state changes, data processing, or interface refresh load is high. Therefore, this embodiment further limits the asynchronous visual feedback layer to be established independently of the main rendering pipeline and requires the initial marker to complete rendering within a time of less than a preset delay threshold. The main rendering pipeline is responsible for the main elements in the graphical user interface, such as the map base, routes, and vehicle locations. The asynchronous visual feedback layer exists independently as an overlay layer, and its responsibility is only to display lightweight visual elements related to user feedback, such as semi-transparent icons, short-term cue rings, or local highlights. After receiving a signal that the initial visual feedback marker needs to be generated, the rendering module does not initiate a synchronous wait with the main rendering pipeline, but directly creates the corresponding primitives on the asynchronous visual feedback layer. For ease of explanation, it is assumed that the main rendering pipeline is currently updating the map at a rate of 16 milliseconds per frame, while the default delay threshold for the asynchronous visual feedback layer in this embodiment is set to 20 milliseconds. After the driver completes the click and voice trigger, the rendering module only needs to generate a simple icon and overlay it onto the specified map coordinates. Therefore, its actual drawing time is, for example, 8 milliseconds, which is less than the 20 millisecond threshold. Since this drawing occurs in an independent feedback layer, even if the main rendering pipeline is refreshing the route arrows or zooming the map at this time, it will not be blocked due to waiting for the feedback icon to be created. From a micro data perspective, the output of the main rendering pipeline can be regarded as the main screen frames F1, F2, F3..., and the output of the asynchronous visual feedback layer can be regarded as the overlay frames L1, L2, L3...; when the main screen is outputting F20 at a certain moment, the feedback icon only needs to generate the corresponding overlay frame L20 and cover F20, without modifying the drawing order inside F20. In this way, the main screen and the feedback layer can be updated independently; if subsequent changes in confidence lead to adjustments in transparency, only the properties of the overlay frame are updated, without triggering a redraw of the entire map; F1, F2, F3 up to F20 here all represent the main screen frames output by the main rendering pipeline at different times; L1, L2, L3 up to L20 all represent the asynchronous visual feedback layer frames overlaid with the main screen at the corresponding time; the aforementioned F and L are only illustrative labels used to distinguish between the main screen frames and the overlay layer frames; In an alternative implementation, if the asynchronous visual feedback layer fails to be created, for example due to limited graphics resources or an abnormal overlay context, it can degenerate into displaying a lightweight prompt in the non-blocking state area of ​​the main interface, but waiting for network results should still be avoided in this case. If the rendering time of the asynchronous visual feedback layer occasionally exceeds the threshold, the complexity of the feedback icon should be reduced in subsequent frames, such as by removing shadows, reducing gradients, or simplifying animations, to ensure that the low-latency display requirements are continuously met. If the map is undergoing a large-scale switch, causing the screen position corresponding to the target coordinates to change briefly, the feedback icon can be anchored to the geographic coordinates first, and then updated to the correct display position after the coordinate conversion is completed in the next frame of the main screen, in order to avoid jitter. For example, before the aforementioned operating vehicle enters the tunnel, the main screen of the navigation application continues to refresh the route and vehicle icons; after the driver completes the feedback on the ramp closure, the rendering module does not wait for the main screen to finish refreshing, but instead draws a semi-transparent icon on an independent feedback layer within 8 milliseconds; even if the main navigation screen is busy rendering due to map scrolling at the same time, the feedback icon can appear immediately without causing map lag; its transparency gradually changes thereafter, and this only happens in the feedback layer without interfering with the main navigation screen; The purpose of this mechanism is to avoid blocking the main interface thread by decoupling the rendering layer, so that the appearance and update of the feedback marker have deterministic low-latency characteristics, thereby ensuring that the in-vehicle interaction remains smooth under weak network and complex interface refresh conditions.

[0032] The foregoing has provided a detailed description of one embodiment of the present invention, but this description is merely a preferred embodiment and should not be construed as limiting the scope of the invention. All equivalent variations and modifications made within the scope of the claims of this invention should still fall within the patent coverage of this invention.

Claims

1. A cloud-based method for processing vehicle navigation information feedback, characterized in that, include: Acquire the touch coordinate data stream of the touch input device with a first timestamp, and the audio data stream of the audio acquisition device with a second timestamp; When the network quality is detected to be lower than a preset network quality threshold, in response to acquiring the touch coordinate data stream and the audio data stream, the synchronous transmission link between the two to the cloud server is blocked; Extract the geometric features of the touch coordinate data stream and the semantic features of the audio data stream; The time offset between the first timestamp and the second timestamp is calculated, and the geometric features, the semantic features, and the time offset are fused into a spatiotemporal intent vector. Push the spatiotemporal intent vector into a local asynchronous queue; When the network connection is restored, the spatiotemporal intent vector in the local asynchronous queue is uploaded to the cloud server; when the synchronous transmission link is blocked, the initial visual feedback marker is rendered in the asynchronous visual feedback layer of the graphical user interface of the vehicle terminal. Assign an initial confidence level to the initial visual feedback marker; monitor the upload status of the spatiotemporal intent vector in the local asynchronous queue to the cloud server, and the status of receiving feedback instructions returned by the cloud server, the feedback instructions including confirmation instructions or rejection instructions; Based on the upload status and the status of the feedback instruction, the initial confidence level is dynamically adjusted according to a preset decay model with time as the independent variable to generate the current confidence level; The current confidence level is mapped to the visual display attributes of the initial visual feedback marker, so as to update the display of the initial visual feedback marker in the graphical user interface.

2. The vehicle navigation information feedback processing method based on a cloud platform according to claim 1, characterized in that, The extraction of geometric features from the touch coordinate data stream and semantic features from the audio data stream includes: The touch coordinate data stream is smoothed and denoised, and the curvature features and dwell time features of the touch coordinate data stream are extracted as the geometric features; The semantic features are generated by extracting keywords from the audio data stream using a preset acoustic model.

3. The vehicle navigation information feedback processing method based on a cloud platform according to claim 1, characterized in that, The monitoring of the upload status of the spatiotemporal intent vector in the local asynchronous queue to the cloud server, and the status of receiving feedback instructions returned by the cloud server, includes: Detect the network connection status with the cloud server; When the network connection status is connected, the spatiotemporal intent vector in the local asynchronous queue is uploaded to the cloud server; If the network connection is disconnected, the spatiotemporal intent vector is retained in the local asynchronous queue to await uploading. After the spatiotemporal intent vector is successfully uploaded, listen for the feedback instruction returned by the cloud server.

4. The vehicle navigation information feedback processing method based on a cloud platform according to claim 1, characterized in that, The process of dynamically adjusting the initial confidence level based on the upload status and the status of the feedback instruction, according to a preset decay model with time as the independent variable, to generate the current confidence level, includes: If the feedback instruction is not received, calculate the elapsed time from the moment the initial visual feedback marker is rendered; Input the elapsed time into the preset decay model to calculate the current confidence level that decreases as the elapsed time increases; Upon receiving the confirmation instruction, the current confidence level is set to the highest confidence level threshold; Upon receiving the rejection instruction, the current confidence level is set to the lowest confidence threshold.

5. The vehicle navigation information feedback processing method based on a cloud platform according to claim 4, characterized in that, The step of mapping the current confidence level to the visual display attributes of the initial visual feedback marker, so as to update the display of the initial visual feedback marker in the graphical user interface, includes: Convert the current confidence level into the transparency value or color saturation value of the initial visual feedback mark; If the current confidence level decreases, reduce the transparency value or the color saturation value. When the current confidence level reaches the minimum confidence threshold, the transparency value or the color saturation value is set to zero to eliminate the initial visual feedback mark in the graphical user interface. When the current confidence level reaches the maximum confidence threshold, the transparency value or the color saturation value is set to the maximum display value to stably display the initial visual feedback mark in the graphical user interface.

6. A cloud-based vehicle navigation information feedback processing system, characterized in that, include: The data acquisition module is used to acquire the touch coordinate data stream of the touch input device with a first timestamp and the audio data stream of the audio acquisition device with a second timestamp. The synchronization blocking module is used to block the synchronous transmission link between the touch coordinate data stream and the audio data stream to the cloud server when the network quality is detected to be lower than a preset network quality threshold. The feature extraction module is used to extract the geometric features of the touch coordinate data stream and the semantic features of the audio data stream; The vector fusion module is used to calculate the time offset between the first timestamp and the second timestamp, and to fuse the geometric features, the semantic features and the time offset into a spatiotemporal intent vector; The queue management module is used to push the spatiotemporal intent vector into a local asynchronous queue; The rendering module is used to render the initial visual feedback marker in the asynchronous visual feedback layer of the graphical user interface when the synchronous transmission link is blocked. A confidence initialization module is used to assign an initial confidence level to the initial visual feedback marker; The status monitoring module is used to monitor the upload status of the spatiotemporal intent vector in the local asynchronous queue to the cloud server, and the status of receiving feedback instructions returned by the cloud server, the feedback instructions including confirmation instructions or rejection instructions; The confidence adjustment module is used to dynamically adjust the initial confidence based on the upload status and the status of the feedback instruction, according to a preset decay model with time as the independent variable, so as to generate the current confidence. An attribute mapping module is used to map the current confidence level to the visual display attributes of the initial visual feedback marker, so as to update the display of the initial visual feedback marker in the graphical user interface.

7. The cloud-based vehicle navigation information feedback processing system according to claim 6, characterized in that, The system is deployed on terminal devices; The touch input device is a touch screen; The audio acquisition device is a microphone; The graphical user interface is the application interaction interface; The initial visual feedback is marked as an interactive feedback status icon.

8. The vehicle navigation information feedback processing system based on a cloud platform according to claim 6, characterized in that, The queue management module is also used for: Before pushing the spatiotemporal intent vector into the local asynchronous queue, the touch coordinate data stream and the audio data stream are deleted, and only the spatiotemporal intent vector is retained for network transmission.

9. The vehicle navigation information feedback processing system based on a cloud platform according to claim 6, characterized in that, The graphical user interface is rendered based on the main rendering pipeline; the asynchronous visual feedback layer is established independently of the main rendering pipeline. The rendering module is specifically used for: In the asynchronous visual feedback layer, the initial visual feedback marker is rendered in a time less than a preset delay threshold to avoid thread blocking of the main rendering pipeline.