AI-powered voice query system for cultural and tourism information in two-wheeled car rental scenarios

By building an in-vehicle AI voice query system, the problems of recall robustness and recommendation accessibility in noisy environments in two-wheeled car rental scenarios were solved, enabling executable information broadcasting under car rental constraints and improving user experience.

CN122309701APending Publication Date: 2026-06-30SHENZHEN HOT WHEELS TECHNOLOGY CO LTD

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
SHENZHEN HOT WHEELS TECHNOLOGY CO LTD
Filing Date
2026-03-19
Publication Date
2026-06-30

AI Technical Summary

Technical Problem

In two-wheeled vehicle rental scenarios, existing cultural and tourism information voice query systems have poor recall robustness in noisy environments, and recommended results may be unreachable or unable to be parked or returned. Furthermore, the broadcast information is not actionable enough, affecting the user experience.

Method used

The AI ​​voice query system, built using in-vehicle voice terminals and servers, improves recall robustness in noisy environments through modules such as voice recognition, query graph construction, fuzzy search, corridor clipping, accessibility assessment, and broadcast generation. It ensures that the recommended results are accessible under car rental constraints and outputs actionable information within the broadcast budget.

Benefits of technology

It improves speech recognition accuracy in noisy environments, reduces multi-turn clarification interactions, reduces unreachable and non-stop recommendations, ensures the executability of information and the efficiency of broadcasting, and enhances the user experience.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN122309701A_ABST
    Figure CN122309701A_ABST
Patent Text Reader

Abstract

This invention discloses an AI voice query system for cultural and tourism information in a two-wheeled vehicle rental scenario, belonging to the field of intelligent transportation and voice interaction technology. The system collects user voice data via an in-vehicle voice terminal, obtaining GNSS latitude, longitude, heading, speed, and timestamps, which are then sent to a server. The server recognizes the voice, outputs the optimal transcription and uncertain results in N-best or obfuscated network / word lattice form, constructs a weighted query graph, and recalls a set of candidate POIs based on POI name approximation matching index. A forward cycling corridor is constructed based on heading and speed to spatially prune the candidates. Cycling accessibility costs are calculated on the road network, and combined with rental fences and return points for filtering or correction, resulting in composite costs and ranking to determine target POIs. Prefetching and caching based on corridors and high-frequency intents are used to adapt to weak network environments, thereby improving the stability of recall and ranking under noisy conditions and reducing irrelevant computational load.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention belongs to the field of intelligent transportation and voice interaction technology, specifically involving an AI voice query system for cultural and tourism information in the context of two-wheeled car rental. Background Technology

[0002] Current voice-based queries for cultural and tourism information typically follow this process: collecting user voice data, obtaining optimal transcribed text through speech recognition, searching a Points of Interest (POI) database based on the optimal transcription, and broadcasting the search results. This approach has the following shortcomings in two-wheeled vehicle rental scenarios: (1) Two-wheeled riding is in an open outdoor environment. Wind noise, road noise, traffic noise and other factors cause the stability of speech recognition to decline. Proper nouns such as scenic spot names and place names are easily misheard. If only the optimal transcribed text is used for retrieval, recall failure or false recall is likely to occur, which will trigger multiple rounds of clarification interaction, increase response time and reduce user experience. (2) Existing search and ranking methods mainly rely on straight-line distance, popularity, etc., without taking into account the accessibility of road networks, differences in road grades, and restrictions such as no-entry / parking / returning areas in two-wheeled vehicle rental services. This can easily lead to unfeasible recommendations such as vehicles that seem nearby but cannot be reached by bicycle, or vehicles that cannot be parked or returned after arrival. (3) Existing solutions often ignore the matching relationship between arrival time and business hours / activity duration, and may recommend targets that have already closed, shut down or ended when you arrive, leading to secondary inquiries or even invalid routes; (4) During the ride, the user’s attention and broadcast time are limited, and directly outputting long text descriptions can easily cause information burden; at the same time, the cloud retrieval latency fluctuates greatly in a weak network environment, affecting availability.

[0003] Therefore, there is an urgent need for a technical solution that can improve recall robustness under noisy conditions, while ensuring that the recommendation results are achievable and available upon arrival under rental constraints, and can output executable short answers within the broadcast budget. Summary of the Invention

[0004] The purpose of this invention is to provide an AI voice query system for cultural and tourism information in two-wheeled car rental scenarios, which solves the technical problems in the prior art of how to improve recall robustness under noisy conditions, while ensuring that the recommended results are achievable under car rental constraints and available upon arrival, and can output executable short answers within the broadcast budget.

[0005] To achieve the above objectives, the present invention adopts the following technical solution: This invention provides an AI voice query system for cultural and tourism information in a two-wheeled car rental scenario, including an in-vehicle voice terminal, a server, and a road network and car rental business data interface; The vehicle-mounted voice terminal is used to collect user voice and obtain the vehicle's GNSS heading and GNSS speed. The server includes: a speech recognition module, used to perform speech recognition on the user's speech and output an uncertainty recognition result including the best transcription result and an N-best candidate sequence or a confusion network / lattice; The query construction module is used to construct a query graph for point of interest (POI) retrieval based on the uncertainty identification results. The fuzzy retrieval module is used to perform candidate retrieval on the query graph based on the approximate matching index of POI names to obtain a set of candidate POIs; The corridor clipping module is used to construct a forward cycling corridor area based on the GNSS heading and GNSS speed, and to spatially clip the candidate POI set to obtain a candidate set within the corridor; The accessibility assessment module is used to calculate the cycling accessibility cost of candidate POIs within the corridor based on the road network, and to filter or correct the cycling accessibility cost based on the rental fence and return points obtained from the car rental business data interface, so as to obtain the composite cost and sort the target POIs accordingly. The time-series feasibility module is used to determine the arrival time (ETA) based on the composite cost, and to make a feasibility judgment between the ETA and the business hours or activity duration of the target POI, so as to eliminate or downgrade infeasible POIs; The broadcast generation module is used to output a structured result package containing the target POI, and generate voice broadcast content under the broadcast budget constraint and send it to the vehicle voice terminal for broadcasting.

[0006] Furthermore, the uncertainty identification result includes the N-best candidate sequence and the corresponding confidence level, and the number N of the N-best candidate sequence is 3 to 10.

[0007] Furthermore, the approximate matching index is a pinyin index or syllable index based on the POI name, and the fuzzy retrieval module uses syllable edit distance, pinyin similarity, or an approximate matching method based on BK-tree / Trie for candidate recall.

[0008] Furthermore, the query graph includes a weighted token set generated by the N-best candidate sequence or the lattice, the weight of the weighted token is determined by the confidence level or the lattice edge weight, and the fuzzy retrieval module performs a weighted scoring on the candidate recall results according to the weight.

[0009] Furthermore, the forward cycling corridor area is a fan-shaped or strip-shaped corridor, including a corridor angle θ and a corridor length D, and the corridor length D is a monotonic function of the GNSS speed.

[0010] Furthermore, the accessibility assessment module calculates the shortest cycling path cost for the target POI based on the cyclable road segments and road grades in the road network. The shortest cycling path cost includes cycling time cost, cycling distance cost, or a combination thereof.

[0011] Furthermore, the car rental fence includes at least one of a no-entry fence, a parking fence, or a return-to-car fence, and the accessibility assessment module eliminates or imposes penalties on candidate POIs located within or crossing the no-entry fence.

[0012] Furthermore, the composite cost includes a first accessibility cost from the current location to the candidate POI and a second accessibility cost from the candidate POI to the nearest available parking spot. The accessibility assessment module sorts the candidate POIs based on a weighted sum of the first accessibility cost and the second accessibility cost.

[0013] Furthermore, the time-series feasibility module determines the ETA as a time interval [t_min, t_max] based on road grade or historical speed fluctuations, and removes the corresponding POI when t_min is later than the closing time or the end time of the event, or reduces the weight of the corresponding POI and generates a prompt information field when t_max is later than the closing time or the end time of the event.

[0014] Furthermore, the structured result package includes at least the target POI identifier, name, direction information or distance information, ETA, and business status field; the broadcast budget is the upper limit of voice broadcast duration or word count; the broadcast generation module trims the voice broadcast content according to field priority and generates a breakpoint resume identifier for the trimmed content.

[0015] Furthermore, it also includes a prefetching and caching module, which is used to prefetch and cache the candidate POI structured result packets corresponding to high-frequency intentions to the vehicle voice terminal based on the forward cycling corridor area. The high-frequency intentions include at least one of toilet search, supply point search, bike return point search, or popular tourist attraction search.

[0016] Furthermore, the prefetch cache module triggers prefetch updates according to a vehicle movement distance threshold or a time threshold, and sets a time-to-live (TTL) for the cached data; when the server is unavailable or the network quality is below the threshold, the system completes at least a part of the processing for candidate recall and broadcast generation based on the cached data.

[0017] This invention also provides an AI voice query method for cultural and tourism information in a two-wheeled car rental scenario, including: S1, collect user voice data and obtain the vehicle's GNSS heading and GNSS speed; S2, perform speech recognition on the user's speech, and output the uncertainty recognition result including the best transcription result and the N-best candidate sequence or confusion network / lattice; S3, construct a query graph for POI retrieval based on the uncertainty identification result, and perform candidate recall on the query graph based on the approximate matching index of POI names to obtain a candidate POI set; S4, construct a forward cycling corridor region based on the GNSS heading and GNSS speed, and perform corridor-in-corridor pruning on the candidate POI set to obtain a corridor-in-corridor candidate set; S5. Calculate the cycling accessibility cost of candidate POIs in the corridor based on the road network, and filter or correct the cycling accessibility cost based on the rental fence and return point to obtain the composite cost and determine the target POI accordingly. S6. Determine the arrival time (ETA) based on the composite cost, and make a feasibility judgment between the ETA and the business hours or activity timeliness of the target POI, so as to eliminate or downgrade infeasible POIs; S7 generates a structured result package containing the target POI and outputs the voice broadcast content under the broadcast budget constraint.

[0018] Furthermore, the number of candidates N in the N-best candidate sequence is 3 to 10, and tokens with confidence levels below the threshold are either weighted less or excluded from the query graph.

[0019] Furthermore, the approximate matching index is a pinyin index or a syllable index, and the candidate recall satisfies that the syllable edit distance is less than or equal to a preset threshold or the pinyin similarity is greater than or equal to a preset threshold.

[0020] Furthermore, the forward cycling corridor area is a fan-shaped corridor, the corridor length D increases with the increase of the GNSS speed, and the corridor angle θ is 30° to 90°.

[0021] Furthermore, the cycling accessibility cost is obtained by performing a shortest path search on the road network, wherein the shortest path search is Dijkstra's algorithm, A... Algorithm or equivalent algorithm.

[0022] Furthermore, the car rental fence includes a no-entry fence. If a candidate POI is located within the no-entry fence or the shortest path to the candidate POI crosses the no-entry fence, the corresponding candidate POI is eliminated or a penalty is imposed on it.

[0023] Furthermore, the composite cost includes a weighted sum of the first accessibility cost from the current location to the candidate POI and the second accessibility cost from the candidate POI to the nearest available parking spot, and the target POI is selected based on the weighted sum.

[0024] Furthermore, the ETA is a time interval [t_min, t_max]; when t_min is later than the business closing time or the event end time, the corresponding POI is removed; when t_max is later than the business closing time or the event end time, the corresponding POI is downgraded and a timeliness prompt field is added to the structured result package.

[0025] Furthermore, the broadcast budget is the upper limit of the voice broadcast duration or the upper limit of the number of words. The voice broadcast content is trimmed according to field priority and a breakpoint resume broadcast identifier is generated. When the user triggers resume broadcast, the trimmed content is output again based on the breakpoint resume broadcast identifier.

[0026] Furthermore, it also includes: prefetching and caching the structured result packets of candidate POIs corresponding to high-frequency intentions based on the forward cycling corridor region; and prioritizing the use of the prefetched cache to generate voice broadcast content when the network quality is below a threshold.

[0027] The present invention also provides an electronic device, including a processor, a memory, and a communication interface, wherein the memory stores a computer program, and the computer program, when executed by the processor, implements the method described herein.

[0028] The present invention also provides a computer-readable storage medium having a computer program stored thereon, which, when executed by a processor, implements the method described thereon.

[0029] In summary, due to the adoption of the above technical solution, the beneficial effects of the present invention are: (1) This invention utilizes uncertainty identification results such as N-best or lattice to construct a query graph and perform approximate matching recall, thereby improving the recall rate and robustness of proper noun retrieval in noisy environments and reducing multiple rounds of clarification; it constructs forward cycling corridor pruning candidates based on GNSS heading / speed to reduce the number of irrelevant candidates, reduce the amount of retrieval and sorting calculations, and improve response speed.

[0030] (2) This invention introduces road network accessibility and car rental fence / return point constraints into filtering and sorting to reduce unfeasible recommendations such as inaccessible, restricted, or unparkable / unreturnable vehicles; by introducing ETA and business hours / event timeliness feasibility judgment, it reduces the proportion of unavailable targets upon arrival and can output prompt information.

[0031] (3) The present invention adopts structured result packages and broadcast budget pruning to output executable key information within a limited broadcast duration, and can reduce repeated retrieval and repeated broadcast through breakpoint resume broadcast; optional prefetch caching and weak network degradation mechanism can reduce weak network latency fluctuations and improve availability. Attached Figure Description

[0032] To more clearly illustrate the technical solutions in the embodiments of the present invention or the prior art, the drawings used in the description of the embodiments or the prior art will be briefly introduced below. Obviously, the drawings described below are only some embodiments of the present invention. For those skilled in the art, other drawings can be obtained based on these drawings without creative effort.

[0033] Figure 1 This is a schematic diagram of the overall system structure provided in an embodiment of the present invention.

[0034] Figure 2 This is a schematic diagram of the method flow provided in an embodiment of the present invention.

[0035] Figure 3 This is a schematic diagram of the speech uncertainty recognition result and query graph construction provided in an embodiment of the present invention.

[0036] Figure 4 This is a schematic diagram of the construction and candidate clipping of a forward cycling corridor provided in an embodiment of the present invention.

[0037] Figure 5 This is a schematic diagram illustrating the calculation of the combined cost of road network accessibility assessment and car rental fence / return point, provided in an embodiment of the present invention.

[0038] Figure 6 This is a schematic diagram illustrating the feasibility determination of arrival time ETA intervals and business hours / event timeliness provided in an embodiment of the present invention.

[0039] Figure 7 This diagram illustrates the generation of structured result packages, budget trimming, and breakpoint resume playback provided in this embodiment of the invention. Detailed Implementation

[0040] The technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of the present invention, and not all embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of the present invention.

[0041] To ensure consistency in spatial calculations, in the following implementation methods, geographic data such as GNSS, POI, car rental fences, and road networks are preferably represented using a unified coordinate system (such as WGS-84 or GCJ-02), and coordinate transformation is completed during the data access phase.

[0042] Example 1: like Figure 1 As shown in this embodiment, the AI ​​voice query system for cultural and tourism information in a two-wheeled car rental scenario includes an in-vehicle voice terminal and a server. The in-vehicle voice terminal is used to collect user voice and obtain location information, and then send the voice and location information to the server. The location information includes at least GNSS latitude and longitude, GNSS heading, GNSS speed, and timestamp. Latitude and longitude are represented using a unified geographic coordinate reference and map coordinate system. When the GNSS coordinate system of the in-vehicle terminal is inconsistent with the road network / fence data coordinate system of the server, it is preferable to perform coordinate transformation and consistency processing on the server side or the in-vehicle side to ensure the consistency of corridor clipping, fence verification, and road network matching calculations.

[0043] The location information may further include location accuracy, location mode, or number of satellites, which are used to weight the reliability of subsequent corridor construction and accessibility assessment. When the location accuracy is lower than a preset threshold, the system may reduce the corridor pruning intensity or expand the corridor angle to improve the robustness of candidate recall.

[0044] The server executes the following workflow sequentially: speech recognition outputs uncertainty recognition results, a query graph is constructed, a candidate POI set is recalled based on an approximate matching index, a forward cycling corridor is constructed and candidates are pruned to obtain a candidate set within the corridor, the composite cost is calculated and sorted under the constraints of road network and bike rental fence / return point, feasibility is judged based on ETA and business hours / event duration, a structured result package is generated and the speech broadcast content is generated under the broadcast budget constraint, and finally the content is returned to the in-vehicle voice terminal for broadcast.

[0045] like Figure 3 As shown, in the process of the speech recognition module and the uncertainty recognition result, the speech recognition module is used to recognize the audio stream uploaded by the vehicle voice terminal. In addition to outputting the best transcribed text best_text, it also outputs the uncertainty recognition result to characterize the multiple candidate possibilities in a noisy environment.

[0046] The uncertainty identification result is preferably an N-best candidate sequence or a confusion network / lattice: where the N-best candidate sequence contains candidate texts and their confidence scores, and N is preferably 3 to 10; the lattice can be represented as a directed graph composed of nodes and edges, where each edge contains a token and its probability weight. Preferably, tokens with confidence scores below a threshold are downweighted or used only for candidate recall and not for final ranking, in order to balance recall rate and ranking stability. The threshold is a configurable parameter, and the downweighting method can be to multiply the token weight by a discount factor or directly mark the token as a soft constraint token used only for recall, so that the subsequent ranking stage ignores or weakens the impact of the token on ranking beyond the composite cost.

[0047] like Figure 3 As shown, the query construction module constructs a query graph Q for retrieval based on the uncertainty identification results. The query graph Q includes at least a weighted set of nodes, where each node element contains a token or syllable and its weight w, and optionally includes a set of edges to represent the adjacent order relationship of the tokens; the query graph Q can be a directed acyclic graph or a directed graph, and the specific form is not limited.

[0048] Preferably, when using the N-best candidate sequence, each candidate text is segmented or tokenized, and the token is converted into a pinyin / syllable sequence; a weight is assigned to each token, which can be determined by the candidate confidence, the frequency of token occurrence, and the weight of the candidate sequence; the same tokens from different candidates are merged and accumulated to obtain a weighted token set, thereby forming the query graph Q.

[0049] For example, the confidence of the i-th candidate text can be denoted as c_i, and the occurrence indicator function of token t can be denoted as I(t∈candidate_i). The token weight can be calculated as w(t)=Σ_i c_i·I(t∈candidate_i), and can be further multiplied by a position decay factor or an occurrence frequency factor to enhance the contribution to stable tokens. When a token contains polyphonic characters or homographs, it is preferable to retain multiple pinyin / syllable candidates and enter them into the graph separately, or retain the original character token and the pinyin token in parallel to enter them into the graph to improve the recall rate.

[0050] When using lattice, tokens can be extracted along high-probability paths or selected from edges with higher weights to form a candidate token set. The edge weights are then normalized to obtain token weights, thus forming a query graph Q. Query graph Q can contain metadata such as construction patterns and timestamps, used for subsequent corridor pruning, reachability assessment, session alignment, and cache reuse. For example, the probability or weight of each edge in the lattice can be normalized, and the probabilities of edges corresponding to the same token can be summed to obtain w(t). Alternatively, the K paths with the highest cumulative probabilities can be selected, and the tokens on those paths can be weighted and accumulated according to their path probabilities to form w(t), where K is a configurable parameter.

[0051] like Figure 3 As shown, the fuzzy search module performs candidate recall for the query graph Q based on the approximate matching index of POI names. Each POI record in the POI database includes at least: POI identifier poi_id, name, alias, latitude and longitude, category, and open_hours / activity duration fields. In one implementation, the POI database can be incrementally or fully updated according to data version or update timestamp, and the approximate matching index can be incrementally maintained or rebuilt according to version as the POI database is updated to ensure that the candidate recall results are consistent with the POI data.

[0052] For approximate matching, a Trie index or a BK-tree index is preferred: Trie index nodes store syllable prefixes and are associated with a set of candidate poi_ids; BK-tree indexes organize name keys by edit distance to support approximate retrieval. Matching metrics can use syllable edit distance thresholds or similarity thresholds to cover mishearing, homophones, and near-homophones.

[0053] Syllable edit distance can be calculated using the classic Levenshtein edit distance dynamic programming algorithm: Let the query syllable sequence be A, and the candidate name syllable sequence be B, with lengths |A| and |B| respectively. Define the DP matrix d(i,j) as the minimum number of edits required to convert the first i syllables of A into the first j syllables of B. Then initialize d(0,0)=0, d(i,0)=i, d(0,j)=j; recursively, d(i,j)=min{ d(i−1,j)+1 (delete), d(i,j−1)+1 (inserte), d(i−1,j−1)+cost (replace)}, where cost is 0 when A_i=B_j, and 1 otherwise. Thus, the edit distance dist_edit=d(|A|,|B|) is obtained, and a normalized similarity sim=1−dist_edit / max(|A|,|B|) can be further constructed for similarity threshold determination. The calculation method is not limited.

[0054] For example, when using syllable edit distance, stricter distance constraints can be imposed on high-weight tokens in the query graph Q, while looser distance constraints can be imposed on low-weight tokens; when using similarity threshold, normalization can be performed according to the candidate name length to avoid bias caused by comparing short words with long words.

[0055] The fuzzy search module outputs a set of candidate POIs, where each candidate contains at least a poi_id, a matching score (match_score_asr) based on the query graph Q, and a hit token.

[0056] Approximate matching can be measured based on edit distance of pinyin / syllable sequences, n-gram similarity, or a combination thereof. Candidate recall can employ distance threshold filtering or a Top-K most similarity recall strategy. To control computational load, it is preferable to first use an index for coarse recall to obtain Top-M candidates, and then perform fine matching and scoring on the Top-M candidates to output a candidate POI set. For example, M can be 200 in Top-M coarse recall, and K can be 5 in Top-K output. Both M and K are configurable parameters that can be adjusted according to the size of the POI library, computational budget, and expected response latency.

[0057] In one implementation, the clipping within the corridor can be achieved using the following geometric determination rule: First, calculate the spherical distance dist and the azimuth angle bearing from the current position origin to the candidate POI, and calculate the angle difference Δ between bearing and heading (normalized around by ±180°); when dist≤D and |Δ|≤θ / 2, the candidate POI is determined to fall into the corridor area; otherwise, it is determined to be a candidate outside the corridor and is eliminated or downweighted.

[0058] The spherical distance dist and azimuth bearing can be calculated based on the classical great circle distance and azimuth of spherical trigonometry: Let the Earth's radius be R (e.g., R = 6371000m), the current position be (lat1, lon1), and the candidate point be (lat2, lon2), then Δlat = lat2 − lat1 and Δlon = lon2 − lon1. The dist can then be calculated using the hadrsine form: dist = 2R·arcsin( sqrt( sin²(Δlat / 2) + cos(lat1)·cos(lat2)·sin²(Δlon / 2) ) ). The azimuth angle bearing can be calculated as bearing = atan2( sin(Δlon)·cos(lat2), cos(lat1)·sin(lat2)−sin(lat1)·cos(lat2)·cos(Δlon) ), and the result is normalized to [0°, 360°). The angle difference Δ can be implemented as Δ = wrap_to_180(bearing−heading), where wrap_to_180 represents normalizing the angle difference to (−180°, 180°] to meet the operability of corridor determination.

[0059] like Figure 4 As shown, the corridor trimming module constructs a forward cycling corridor area based on GNSS heading and GNSS speed to reflect the user's short-term reachable direction and reduce irrelevant candidates. The forward cycling corridor can preferably be a fan-shaped corridor, defined by the origin (current latitude and longitude), heading, angle θ, and length D; θ is preferably 30° to 90°, and D is preferably a monotonic function of speed, such as D = D0 + k·v or a piecewise function.

[0060] In one implementation, the corridor length D can be constructed based on the classical kinematic relationship distance = velocity × time, i.e., predicting the short-term reachable distance using the foresight time window T_lookahead, resulting in D = D0 + v·T_lookahead; letting k = T_lookahead, we get D = D0 + k·v, where D0 is the base length, v is the GNSS velocity, and k is the foresight time parameter. The above formulas satisfy dimensional consistency: the dimension of D0 is length (e.g., m), the dimension of v is length / time (e.g., m / s), and the dimension of k is time (e.g., s). Therefore, k·v has the same dimension as D0, and the dimension of D is length.

[0061] Candidate pruning is achieved by determining whether candidate POIs fall within the corridor region, outputting a set of candidates within the corridor. Corridor updates can be triggered by a time period or a displacement threshold to balance computational complexity and orientation adaptability.

[0062] like Figure 5 As shown, the accessibility assessment module calculates the cycling accessibility cost of candidate POIs within the corridor based on the road network. The current location and candidate POIs can be matched to the nearest cyclable road edge or road node (e.g., the nearest edge projection point or the nearest node) on the map using spatial indexing, and the matched point serves as the start and end point for the shortest path search.

[0063] In one implementation, road network data is acquired by the road network data interface or synchronized to the local server. The road network can be updated according to data version number, update timestamp, or regional sharding. Accessibility assessment preferably includes road network version information for session alignment and result reuse. The road network is preferably represented as a graph structure, including nodes and edges. Edges include at least the start and end nodes, length, road class, whether cycling is permitted, one-way / two-way, and reference speed. A spatial index is established to match the current location with candidate POIs to the nearest cyclable road edge / node. Cycling accessibility cost is obtained through shortest path search, preferably using A... Alternatively, the Dijkstra algorithm can be used; the cost function can be riding time, riding distance, or a combination thereof, and the speed can be adjusted according to road grade. The module outputs the first reachability cost (e.g., ETA_to_poi) from the current location to the candidate POI and the corresponding path information.

[0064] For example, when cycling time is used as the cost, an edge cost edge_cost = length_m / speed(edge) can be set for each road edge, where speed(edge) can be determined by attributes such as road level, whether cycling is allowed, and one-way / two-way; when the road network is unreachable or the search fails, the cost of the corresponding candidate can be set to infinity and the candidate can be removed before sorting.

[0065] The edge cost is constructed based on the classical physical relation t=s / v, where length_m is the edge length (m) and speed(edge) is the edge velocity (m / s). Therefore, the dimension of edge_cost is time (s) and consistent with the dimension of ETA. When the road speed is given as base_speed_km / h, it can be converted to m / s by speed(edge)=base_speed_km / h×1000 / 3600. The speed can be reduced or constrained by attributes such as road grade, whether cycling is allowed, and one-way / two-way to ensure that the calculation result is reasonable.

[0066] like Figure 5As shown, the AI ​​voice query system in this embodiment obtains car rental fence and drop-off point data through the car rental business data interface. In one implementation, the car rental fence and drop-off point data may carry data version, generation timestamp, or effective time period fields. The server can perform periodic updates or event updates according to the version / timestamp, and select valid data with the current timestamp when calculating the composite cost to avoid misjudgments caused by fence changes or drop-off point status changes. The car rental fence is preferably represented by a polygon, and the fields include at least fence identifier fence_id, fence type fence_type (no entry, parking allowed, drop-off allowed, time-limited, etc.) and geometric information; the drop-off point includes at least drop-off point identifier return_point_id and latitude and longitude, and may optionally include fields such as capacity or open status.

[0067] The system performs fence constraint verification on the shortest path of candidate POIs: if a candidate POI is located within a restricted area fence or the shortest path crosses the restricted area fence, it is preferable to directly eliminate the candidate or impose a penalty cost to reduce its ranking. A candidate POI being located within a fence can be determined by determining if a point is inside a polygon; a path crossing a fence can be determined by detecting the intersection of the shortest path's polyline segment with the fence polygon boundary, or by detecting if the path's sampled points are inside a polygon. The detection method is not limited.

[0068] The determination of whether a point is inside a polygon can be achieved using classic computational geometry algorithms such as ray casting or the winding number method; the determination of path crossing can be achieved by representing the shortest path as a polyline composed of several line segments, and performing line segment intersection detection on each line segment and the boundary of the fence polygon. If there is an intersection or the path sampling point falls into the no-entry fence, it is determined as crossing.

[0069] For business scenarios that require the vehicle to be returnable upon arrival, a second reachability cost (e.g., ETA_poi_to_return) is further calculated from the candidate POI to the nearest returnable point. The second reachability cost is preferably obtained by searching the shortest path based on the road network to avoid misjudgment caused by using only straight-line distance; the nearest returnable point can be determined by minimizing the composite cost or minimizing the ETA_poi_to_return.

[0070] Preferably, the composite cost can be calculated using the following formula: CompositeCost = α×ETA_to_poi + β×ETA_poi_to_return + γ×Penalty; In this system, α, β, and γ are non-negative weights, with α and β preferably satisfying α + β = 1. Penalty is assigned according to fence rules; for example, when crossing a no-entry fence, Penalty is set to infinity or a constant much larger than the normal ETA; when parking / returning conditions are not met, a tiered penalty is applied. The system sorts candidate POIs based on composite cost to obtain the target POI or Top-K candidates. CompositeCost is used to merge and sort the reachability to the POI, the feasibility to the return point, and the business constraint penalty. ETA_to_poi and ETA_poi_to_return are time quantities (e.g., seconds or minutes). To ensure consistency of units, Penalty can be defined as a time-equivalent penalty (e.g., an additional cost in seconds) or mapped to a constant with the same units as ETA, thus keeping CompositeCost and ETA on the same units for easy direct comparison and sorting.

[0071] like Figure 6 As shown, the temporal feasibility module estimates the Time of Arrival (ETA) based on composite cost and road grade / speed information, and makes a feasibility judgment based on the operating hours or activity duration of the target POI. To improve robustness, the ETA can be represented by a time interval [t_min, t_max], where t_min can be obtained by summing the upper bound speeds of each edge of the path, and t_max can be obtained by summing the lower bound speeds of each edge of the path, thus reflecting the uncertainty brought about by fluctuations in traffic and cycling speeds. The POI operating hours open_hours are preferably represented by structured time periods and may include temporary closure / shutdown markers. For example, open_hours can be stored by one or more start and end time periods per day; when there is cross-day operation (e.g., 22:00 to 02:00 the next day), it can be split into the current day segment and the next day segment, or it can be implemented by allowing the end time to be earlier than the start time and performing cross-day conversion during the judgment. The specific implementation is not limited.

[0072] The preferred feasibility rule is to remove the POI when t_min is later than the closing time or the end time of the activity; and to reduce the weight of the POI when t_max is later than the closing time or the end time of the activity, and to generate a prompt information field in the structured result package.

[0073] like Figure 7As shown, the broadcast generation module preferably generates a structured result package first, and then generates the voice broadcast content under the broadcast budget constraint. The broadcast budget can be defined as the maximum number of characters (max_chars) or the maximum broadcast duration (max_seconds); the aforementioned broadcast budget and the broadcast budget below are different expressions of the same budget constraint, and the specific choice between using the number of characters or the duration as the budget metric can be made according to the implementation conditions. When using the maximum broadcast duration, the text length can be converted into duration based on the average TTS speech rate for budget determination, or the number of characters can be directly used as an approximate budget indicator. The structured result package includes at least: target POI identifier, name, distance or direction information, ETA or ETA range, and business status field, and may optionally include prompt information and highlight fields. The broadcast budget can be defined as the maximum broadcast duration or the maximum number of characters; the broadcast generation module generates voice broadcast content according to field priority, and when the budget is exceeded, lower priority fields are pruned or the expression is compressed according to priority. For example, field priorities can be set as follows: name, distance / direction, and ETA (or ETA range) take precedence over operating status, and over prompts and highlight fields. When budget is exceeded, highlight fields and secondary prompt fields are preferred to be pruned first, and then the text is compressed to meet budget constraints. For pruned content, a resume token can be generated. When a user triggers a resume command, the remaining content is played based on the resume token to reduce duplicate retrieval and playback. The resume token can contain at least the poi_id, the set of fields already played or the segment number segment_id, and verification information (such as a hash or signature) so that the vehicle terminal can determine the remaining playback segments and avoid duplicate playback when resuming playback.

[0074] Example 2 like Figure 1 As shown, in an optional implementation, the AI ​​voice query system for cultural and tourism information in a two-wheeled vehicle rental scenario also includes a prefetching and caching module. This module is used to prefetch and cache structured result packets to the in-vehicle voice terminal or edge node based on the forward riding corridor and high-frequency intents. The cache key preferably consists of a grid number (H3 or GeoHash), a heading bucket, an intent type, and a data version. The cache value is a set of structured result packets or their necessary fields, and a Time-to-Live (TTL) is set. Prefetching can be triggered based on a time threshold or a displacement threshold. When network quality is below the threshold or the server is unavailable, the system prioritizes using the cache to complete at least a portion of the candidate recall and broadcast generation processing.

[0075] Network quality can be determined by one or a combination of request timeouts, round-trip time (RTT), packet loss rate, or throughput. When network quality does not meet the preset conditions, the vehicle or edge side can prioritize using the cache to perform at least a portion of the processing for candidate recall and broadcast generation, and update or correct the results with the server after the network recovers.

[0076] Example 3 like Figure 2 As shown in this embodiment, the AI ​​voice query method for cultural and tourism information in a two-wheeled car rental scenario includes the following steps: S1: Collect user voice data and obtain the vehicle's GNSS heading and GNSS speed; S2: Perform speech recognition on the user's speech and output the uncertainty recognition result, which includes the best transcription result and the N-best candidate sequence or confusion network / lattice; S3: Construct a query graph for POI retrieval based on the uncertainty identification results, and perform candidate recall on the query graph based on the approximate matching index of POI names to obtain a candidate POI set; S4: Construct a forward cycling corridor region based on the GNSS heading and GNSS speed, and perform corridor-in-corridor cropping on the candidate POI set to obtain a candidate set within the corridor; S5: Calculate the cycling accessibility cost of candidate POIs in the corridor based on the road network, and filter or correct the cycling accessibility cost based on the rental fence and return point to obtain the composite cost and determine the target POI accordingly; S6: Determine the arrival time (ETA) based on the composite cost, and make a feasibility judgment between the ETA and the business hours or activity timeliness of the target POI, so as to eliminate or downgrade infeasible POIs; S7: Generate a structured result package containing the target POI, and generate voice broadcast content for output under the broadcast budget constraint.

[0077] The above steps can be executed by the server, or a portion of the processing can be executed by the vehicle or edge side under weak network conditions, both of which fall within the protection scope of this invention.

[0078] Example 4 like Figure 1-7 As shown, this embodiment provides a specific end-to-end data flow example for the AI ​​voice query method for cultural and tourism information in a two-wheeled car rental scenario: During the ride, the user sends a voice message: "Take me to the Hangzhou West Lake Museum." The vehicle's voice terminal collects the audio and reports a GNSS snapshot (including latitude and longitude, GNSS heading = 70°, GNSS speed = 5.2m / s). The speech recognition module outputs the best_text as West Lake Museum, and outputs the N-best candidate sequence and confidence score, such as West Lake Museum (0.62), West Lake Museum (0.58), West Lake Botanical Garden (0.41), etc. The query construction module builds a query graph based on N-best and confidence, converts candidates into pinyin / syllable tokens and assigns weights; The fuzzy search module uses an approximate matching index to recall a set of candidate POIs (such as poi_A West Lake Museum, poi_B West Lake Museum, etc.) under a preset threshold condition, and gives a matching score match_score_asr for each candidate; The corridor trimming module constructs a forward cycling corridor based on GNSS heading and GNSS velocity (e.g., θ=60°, length D determined by the velocity function), eliminating candidates outside the corridor. For example, taking a forward look time window T_lookahead=240s and a base length D0=500m, the corridor length D=500+5.2×240=1748m can be obtained from D=D0+v·T_lookahead; taking θ=60°, the corridor half-angle is 30°. If the spherical distance dist=1600m between a candidate POI and the current position, and the angle difference between its azimuth (bearing) and heading (heading) |Δ|=12°, then dist≤D and |Δ|≤θ / 2 are satisfied, and the candidate is determined to fall within the corridor.

[0079] The accessibility assessment module performs a shortest path search on the road network to obtain ETA_to_poi, and combines this with fencing to determine if there are any prohibited crossings. For candidates that meet the constraints, it further calculates ETA_poi_to_return and applies the composite cost formula CompositeCost = α·ETA_to_poi + β·ETA_poi_to_return + The γ·Penalty algorithm calculates the composite cost ranking to obtain the target POI. For example, the shortest path is decomposed into several road edges and the edge costs are accumulated: Suppose the path contains three edges e1, e2, and e3, with lengths of 800m, 900m, and 700m respectively, corresponding to speeds of 15km / h, 12km / h, and 10km / h respectively. Converted to m / s, these are 4.167, 3.333, and 2.778 respectively, resulting in ETA_to_poi = 800 / 4.167 + 900 / 3.333 + 700 / 2.778 ≈ 192 + 270 + 252 = 714s (approximately 11.9min). If the path length from this POI to the nearest return point is 600m and the speed is 12km / h (3.333m / s), then ETA_poi_to_return = 600 / 3.333 ≈ 180s (approximately 3min). With α=0.8, β=0.2, γ=1 and Penalty=0, then CompositeCost=0.8×714+0.2×180+1×0=571.2+36=607.2 (unit is consistent with ETA, which can be seconds). Based on this, the candidate sorting is completed.

[0080] The timing feasibility module represents the ETA as an interval [t_min, t_max] and compares it with the target POI's operating hours. If it is close to closing time, the weight is reduced and a prompt information field is generated. For example, for the above ETA_to_poi=714s, if the upper and lower bounds of the speed are given according to the road level, such that t_min=650s, t_max=900s, and the target POI's closing time on the day is 18:00 and the current time is 17:50, then the estimated arrival time interval is [17:50+650s, 17:50+900s]≈[18:00:50, 18:05:00]. Since the arrival time corresponding to t_min is later than the closing time, it can be directly removed if the rule of 'removing if t_min is later than the closing time' is met. If only t_max is later than the closing time, the POI is reduced in weight according to the rule and a prompt field indicating that it is close to closing time is generated.

[0081] The broadcast generation module generates a structured result package and generates voice broadcast content within the broadcast budget (e.g., maximum broadcast duration of 12 seconds). For example, it may say that the distance ahead is about 1.6 kilometers and the West Lake Museum is expected to be reached in 12 minutes. It is close to closing time, so it is recommended to go there as soon as possible. If the budget is exceeded, low priority fields are pruned and a breakpoint resume broadcast identifier is generated for resuming broadcast.

[0082] To facilitate implementation and reproduction, the main data structure fields in this invention are defined as follows (fields can be added or removed according to actual projects without affecting the implementation of the core solution of this invention): (a) Speech Recognition Result (asr) best_text: Optimal transcribed text; nbest: N-best candidate sequence, elements include {text, confidence, start_time, end_time}; lattice: Confusion network / lexicon; lattice contains nodes and edges, where edge contains {from, to, token, weight / probability}; language, timestamp: Optional metadata.

[0083] (ii) QueryGraph (Q) nodes: A collection of nodes, each containing {token_or_syllable, weight}; edges: A collection of optional edges, each containing {from, to, weight}; meta: {build_mode(nbest / lattice), timestamp, language}.

[0084] (iii) Forward cycling corridor shape_type: sector (fan-shaped) or band (band-shaped); origin: {lat, lon}; heading: heading; theta: corridor angle; length_D: corridor length; timestamp: build time.

[0085] (iv) POI Record (POIRecord) poi_id: POI identifier; name: standard name; alias: list of aliases; lat, lon: geographic coordinates; category: category; open_hours: business hours (structured time period); activity_window: optional activity duration; temporary_closure: temporary closure / park closure flag; tags, popularity_score: optional fields.

[0086] (v) Car Rental Fence fence_id: Fence identifier; fence_type: Fence type (no entry / parking allowed / vehicle return allowed / time-limited, etc.); geometry: A sequence of coordinate points for a Polygon or MultiPolygon; effective_time: Optional effective time; priority: Optional priority; spatial_index: Spatial index at the implementation level (e.g., R-tree / bbox index).

[0087] (vi) Return Point return_point_id: Return point identifier; lat, lon: Coordinates; capacity, open_state, open_hours: Optional fields; binding_fence_id: Optional associated fence identifier.

[0088] (vii) Road Graph nodes: A collection of nodes, each containing {node_id, lat, lon}; edges: A collection of edges, each containing {edge_id, from, to, length_m, road_class, bike_allowed, direction, base_speed_km / h}; spatial_index: The index used for matching points to the road network; map_match: The map matching strategy (implementation methods such as nearest edge / nearest node / projection point, etc.).

[0089] (viii) Reachability assessment results poi_id: Candidate POI identifier; eta_to_poi or cost_to_poi: Cost from the current location to the POI; eta_poi_to_return or cost_to_return: Cost from the POI to the nearest return point; penalty: Penalty for fence / parking / return constraints; composite_cost: Composite cost; eta_interval: Optional ETA interval [t_min, t_max].

[0090] (ix) Structured ResultPackage poi_id, name; distance_m or direction / bearing; eta or eta_interval; open_state (open / closed / nearly closed / unknown); notice: optional notification information (e.g., nearing closing); highlights: optional highlight field; resume_token: resume play identifier (e.g., {poi_id, segment_id, hash}).

[0091] (x) Broadcasting the budget max_seconds or max_chars; field_priority: field priority configuration; language / voice_style: optional broadcast configuration.

[0092] The above description is only a preferred embodiment of the present invention, but the scope of protection of the present invention is not limited thereto. Any equivalent substitutions or modifications made by those skilled in the art within the scope of the technology disclosed in the present invention, based on the technical solution and inventive concept of the present invention, should be covered within the scope of protection of the present invention.

[0093] The preferred embodiments of the present invention disclosed above are merely illustrative of the invention. These preferred embodiments do not exhaustively describe all details, nor do they limit the invention to specific implementations. Clearly, many modifications and variations can be made based on the content of this specification. This specification selects and specifically describes these embodiments to better explain the principles and practical applications of the invention, thereby enabling those skilled in the art to better understand and utilize the invention. The invention is limited only by the claims and their full scope and equivalents.

Claims

1. A system for AI voice query of information on travel and leisure in a two-wheeled rental scenario, characterized in that, This includes in-vehicle voice terminals, servers, and data interfaces between road networks and car rental services; The vehicle-mounted voice terminal is used to collect user voice and obtain the vehicle's GNSS heading and GNSS speed. The server includes: a speech recognition module, used to perform speech recognition on the user's speech and output an uncertainty recognition result including the best transcription result and an N-best candidate sequence or a confusion network / lattice; The query construction module is used to construct a query graph for point of interest (POI) retrieval based on the uncertainty identification results. The fuzzy retrieval module is used to perform candidate retrieval on the query graph based on the approximate matching index of POI names to obtain a set of candidate POIs; The corridor clipping module is used to construct a forward cycling corridor area based on the GNSS heading and GNSS speed, and to spatially clip the candidate POI set to obtain a candidate set within the corridor; The accessibility assessment module is used to calculate the cycling accessibility cost of candidate POIs within the corridor based on the road network, and to filter or correct the cycling accessibility cost based on the rental fence and return points obtained from the car rental business data interface, so as to obtain the composite cost and sort the target POIs accordingly. The time-series feasibility module is used to determine the arrival time (ETA) based on the composite cost, and to make a feasibility judgment between the ETA and the business hours or activity duration of the target POI, so as to eliminate or downgrade infeasible POIs; The broadcast generation module is used to output a structured result package containing the target POI, and generate voice broadcast content under the broadcast budget constraint and send it to the vehicle voice terminal for broadcasting.

2. The AI ​​voice query system for cultural and tourism information in a two-wheeled car rental scenario according to claim 1, characterized in that, The uncertainty identification result includes the N-best candidate sequence and the corresponding confidence level, and the number N of the N-best candidate sequence is 3 to 10.

3. The AI ​​voice query system for cultural and tourism information in a two-wheeled car rental scenario according to claim 1, characterized in that, The approximate matching index is a pinyin index or syllable index based on the POI name, and the fuzzy retrieval module uses syllable edit distance, pinyin similarity, or an approximate matching method based on BK-tree / Trie for candidate recall.

4. The AI ​​voice query system for cultural and tourism information in a two-wheeled car rental scenario according to claim 1, characterized in that, The query graph includes a set of weighted tokens generated by the N-best candidate sequence or the lattice. The weight of the weighted token is determined by the confidence level or the lattice edge weight. The fuzzy retrieval module performs a weighted scoring on the candidate recall results according to the weight.

5. The AI ​​voice query system for cultural and tourism information in a two-wheeled car rental scenario according to claim 1, characterized in that, The forward cycling corridor area is a fan-shaped or strip-shaped corridor, including a corridor angle θ and a corridor length D, and the corridor length D is a monotonic function of the GNSS velocity.

6. The AI ​​voice query system for cultural and tourism information in a two-wheeled car rental scenario according to claim 1, characterized in that, The accessibility assessment module calculates the shortest cycling path cost for the target POI based on the cyclable road segments and road grades in the road network. The shortest cycling path cost includes cycling time cost, cycling distance cost, or a combination thereof.

7. The AI ​​voice query system for cultural and tourism information in a two-wheeled car rental scenario according to claim 1, characterized in that, The car rental fence includes at least one of a no-entry fence, a parking fence, or a return-to-car fence. The accessibility assessment module eliminates or imposes penalties on candidate POIs located within or crossing the no-entry fence.

8. The AI ​​voice query system for cultural and tourism information in a two-wheeled car rental scenario according to claim 1, characterized in that, The composite cost includes a first accessibility cost from the current location to the candidate POI and a second accessibility cost from the candidate POI to the nearest available parking spot. The accessibility assessment module sorts the candidate POIs based on a weighted sum of the first and second accessibility costs.

9. The AI ​​voice query system for cultural and tourism information in a two-wheeled car rental scenario according to claim 1, characterized in that, The structured result package includes at least the target POI identifier, name, direction information or distance information, ETA, and business status field; the broadcast budget is the upper limit of voice broadcast duration or word count; the broadcast generation module trims the voice broadcast content according to field priority and generates a breakpoint resume icon for the trimmed content.

10. A method for AI voice query of cultural and tourism information in two-wheeled car rental scenarios, characterized in that, include: S1, collect user voice data and obtain the vehicle's GNSS heading and GNSS speed; S2, perform speech recognition on the user's speech, and output the uncertainty recognition result including the best transcription result and the N-best candidate sequence or confusion network / lattice; S3, construct a query graph for POI retrieval based on the uncertainty identification result, and perform candidate recall on the query graph based on the approximate matching index of POI names to obtain a candidate POI set; S4, construct a forward cycling corridor region based on the GNSS heading and GNSS speed, and perform corridor-in-corridor pruning on the candidate POI set to obtain a candidate set within the corridor; S5. Calculate the cycling accessibility cost of candidate POIs in the corridor based on the road network, and filter or correct the cycling accessibility cost based on the rental fence and return point to obtain the composite cost and determine the target POI accordingly. S6. Determine the arrival time (ETA) based on the composite cost, and make a feasibility judgment between the ETA and the business hours or activity timeliness of the target POI, so as to eliminate or downgrade infeasible POIs; S7 generates a structured result package containing the target POI and outputs the voice broadcast content under the broadcast budget constraint.