Global intelligent navigation and cultural explanation system and method for multi-modal ai interaction

By integrating multi-source positioning and interaction modules through the travel converter, the problems of insufficient positioning accuracy and language barriers in cross-border navigation are solved, realizing continuous and stable navigation and personalized cultural explanations in all cross-border scenarios, and improving navigation reliability and interactive intelligence.

CN122281920APending Publication Date: 2026-06-26ZHUHAI TESSAN POWER TECHNOLOGY CO LTD

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
ZHUHAI TESSAN POWER TECHNOLOGY CO LTD
Filing Date
2026-04-30
Publication Date
2026-06-26

AI Technical Summary

Technical Problem

Existing global navigation technologies suffer from a sharp drop in positioning accuracy and interruption of positioning trajectory in weak signal environments such as cross-border tunnels, densely built-up areas, and remote overseas areas. Furthermore, their reliance on cross-border roaming networks leads to high usage costs, and their ability to fuse and adapt multi-source positioning data is insufficient, making it impossible to guarantee continuous and stable navigation output across all cross-border scenarios.

Method used

By integrating a multi-source positioning module through a travel converter, the system dynamically weights and processes satellite, base station, and local wireless LAN scanning signals. Combined with Kalman filtering and smoothing algorithms, it generates continuous and stable navigation guidance. By combining image acquisition and voice interaction modules, it identifies scenic spot information and generates personalized cultural explanations. A wake word trigger mechanism is used to enable multilingual natural interaction.

Benefits of technology

It achieves continuous and stable navigation output across all cross-border scenarios, improves navigation reliability and scenario adaptability, provides personalized cultural interpretation services, breaks down language barriers in cross-border travel, and enhances the continuity of interaction and the level of intelligence.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN122281920A_ABST
    Figure CN122281920A_ABST
Patent Text Reader

Abstract

This application relates to the field of intelligent navigation technology, providing a multimodal AI interactive global intelligent navigation and cultural interpretation system and method. The method is applied to a travel converter and includes: acquiring multi-source positioning data, including fused satellite positioning data, base station positioning information, and local wireless LAN scanning signals; dynamically weighting the multi-source positioning data to generate and output navigation guidance; acquiring image data and voice requests corresponding to the target area to identify attraction-related information in the target area, generating multilingual cultural interpretation content based on the attraction-related information, and playing the multilingual cultural interpretation content; after playing the multilingual cultural interpretation content, if a preset wake-up word is received, a preset continuous dialogue mechanism is triggered to recognize and semantically analyze the received real-time voice, and generate corresponding navigation information or interpretation information. This solves the core pain points of existing technologies, such as scattered equipment and fragmented functions.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This application relates to the field of intelligent navigation technology, and in particular to a global intelligent navigation and cultural interpretation system and method with multimodal AI interaction. Background Technology

[0002] With the rapid development of the cross-border travel industry, users are increasingly demanding continuous and stable global navigation services, cultural explanations of destination attractions, and natural cross-language interaction during their cross-border travels. Currently, existing technological solutions for these needs all suffer from shortcomings such as fragmented functions, poor scenario adaptability, and low hardware resource reuse rates.

[0003] Existing global navigation technologies mostly rely on independent terminals such as mobile phones and dedicated navigators. In weak signal environments such as cross-border tunnels, densely populated high-rise buildings, and remote overseas areas, these terminals generally suffer from a sharp drop in positioning accuracy and interruption of positioning trajectory. At the same time, they rely on cross-border roaming networks to provide services, which is costly. Furthermore, their ability to integrate and adapt multi-source positioning data is insufficient, making it impossible to guarantee continuous and stable navigation output across all cross-border scenarios. Summary of the Invention

[0004] This application provides a multimodal AI interactive global intelligent navigation and cultural interpretation method, aiming to solve the problems of existing global navigation technologies, which mostly rely on independent terminals such as mobile phones and dedicated navigation devices. These terminals generally suffer from a sharp drop in positioning accuracy and interruption of positioning trajectory in weak signal environments such as cross-border tunnels, densely built-up areas, and remote overseas areas. At the same time, they rely on cross-border roaming networks to provide services, resulting in high usage costs and insufficient ability to fuse and adapt multi-source positioning data, which cannot guarantee continuous and stable navigation output in cross-border full-scenario scenarios.

[0005] In a first aspect, embodiments of this application provide a multimodal AI-interactive global intelligent navigation and cultural interpretation method, applied to a travel converter; the method includes: Acquire multi-source positioning data, including fused satellite positioning data, base station positioning information, and local wireless LAN scanning signals; perform dynamic weighting processing on the multi-source positioning data to generate and output navigation guidance; The system acquires image data and voice requests corresponding to the target area, identifies scenic spot information in the target area, generates multilingual cultural explanation content based on the scenic spot information in the target area, and plays the multilingual cultural explanation content. After playing the multilingual cultural explanation content, if a preset wake-up word is received, a preset continuous dialogue mechanism is triggered to recognize and semantically analyze the received real-time voice and generate corresponding navigation or explanation information.

[0006] In some embodiments, acquiring multi-source positioning data includes: collecting satellite positioning data, base station positioning information, and local wireless LAN scanning signals corresponding to the current location of the travel converter; verifying the validity of the satellite positioning data, base station positioning information, and local wireless LAN scanning signals respectively; removing invalid data that exceeds a preset error range; and retaining multi-source positioning data that meets the accuracy requirements.

[0007] In some embodiments, the dynamic weighting processing of multi-source positioning data to generate and output navigation guidance includes: assigning dynamically changing weight values ​​to the fused satellite positioning data, base station positioning information, and local wireless LAN scanning signals based on the signal strength corresponding to the multi-source positioning data and the current environmental scene; completing the fusion processing of multi-source positioning data based on the assigned weight values; smoothing and predictive compensation of the positioning trajectory obtained after fusion processing; outputting continuous and stable multi-source positioning data in weak signal environments including tunnels and densely built-up areas; generating navigation guidance based on the multi-source positioning data and user-preset destination information; and outputting the navigation guidance through the terminal speaker or the user-bound mobile terminal.

[0008] In some embodiments, identifying attraction-related information in the target area includes: extracting and matching features of target objects in the image data corresponding to the target area to identify attraction identity information corresponding to the target objects; parsing user demand information contained in the voice request to match attraction-related information corresponding to the attraction identity information and user demand information.

[0009] In some embodiments, generating multilingual cultural explanation content based on attraction-related information of the target area and playing the multilingual cultural explanation content includes: generating personalized cultural explanation content in the corresponding language based on the identified attraction-related information, combined with pre-set language preferences and explanation depth requirements, using a natural language processing model, sending the generated cultural explanation content back to the travel converter, playing the cultural explanation content through a speaker, and simultaneously pushing the cultural explanation content to the bound mobile terminal for display and local storage.

[0010] In some embodiments, the step of triggering a preset continuous dialogue mechanism after receiving a preset wake-up word after playing the multilingual cultural explanation content includes: after completing the playback of the multilingual cultural explanation content, collecting audio data in the environment where the terminal is located, performing wake-up word matching detection on the collected audio data, and when audio content matching the preset wake-up word is detected, triggering the continuous dialogue mechanism, opening a real-time voice acquisition and interaction channel of preset duration, and maintaining the association state of the dialogue context.

[0011] In some embodiments, the step of recognizing and semantically parsing the received real-time speech and generating corresponding navigation or explanatory information includes: performing speech recognition and semantic parsing on the real-time speech collected under the continuous dialogue mechanism, combining the contextual understanding model in the cloud to complete the dissociation of references and continuous semantic association, and completing cross-language real-time translation according to the pre-set language; and / or, generating corresponding navigation or supplementary explanatory information based on the user needs obtained from semantic parsing, playing it through the terminal and synchronously pushing it to the user's bound mobile terminal.

[0012] In some embodiments, the method further includes: collecting corresponding historical travel trajectory data, historical interaction data, and stay duration data at each attraction, so as to extract corresponding travel route preferences, explanation content preferences, and itinerary rhythm preferences through a preset user preference analysis model; generating personalized recommended routes and pre-explanatory content for corresponding attractions based on the user's current location, the real-time opening status of attractions, and cross-border travel arrangements; and triggering the playback of pre-explanatory content when entering the preset trigger range of the corresponding attraction.

[0013] In some embodiments, the method further includes: detecting the current network communication status, positioning signal strength, and power grid compatibility status of the travel converter; when the network communication status is detected to be lower than a preset normal operating threshold, loading pre-downloaded offline navigation map data, attraction explanation data package, and offline speech recognition model for the corresponding area; and completing the positioning and navigation processing, speech recognition parsing, and playback of attraction explanation content locally to maintain the continuity of navigation and explanation services in environments without or with weak networks.

[0014] Secondly, this application provides a multimodal AI interactive global intelligent navigation and cultural interpretation system, the system comprising: The data acquisition unit is used to acquire multi-source positioning data, including fused satellite positioning data, base station positioning information and local wireless LAN scanning signals; to perform dynamic weighting processing on the multi-source positioning data, and to generate and output navigation guidance; The request acquisition unit is used to acquire image data and voice requests corresponding to the target area, to identify scenic spot-related information in the target area, to generate multilingual cultural explanation content based on the scenic spot-related information in the target area, and to play the multilingual cultural explanation content. The explanation and playback unit is used to trigger a preset continuous dialogue mechanism if a preset wake-up word is received after the multilingual cultural explanation content is played. The unit then identifies and semantically analyzes the received real-time speech and generates corresponding navigation or explanation information.

[0015] This application integrates a complete navigation and explanation methodology into the travel converter terminal, reusing the terminal's processing core, communication unit, power supply module, and other hardware resources, significantly reducing the amount of equipment users carry when traveling across borders, and completely solving the core pain points of existing technologies such as scattered equipment and fragmented functions.

[0016] By dynamically weighting and fusing multi-source positioning data, and with the smoothing and predictive compensation of positioning trajectories, the problem of positioning interruption and insufficient accuracy in weak signal environments such as tunnels and densely built-up areas during cross-border travel is effectively solved. This achieves continuous and stable navigation output in all scenarios worldwide, greatly improving the reliability and scenario adaptability of cross-border navigation.

[0017] By collaboratively collecting target object image data and user voice requests, and through cloud-based intelligent recognition and multilingual narration content generation, a personalized cultural narration service that delivers a WYSIWYG experience during navigation is achieved. This deeply integrates navigation guidance with cultural narration, significantly improving the efficiency and personalized experience of obtaining cultural information during cross-border travel.

[0018] By using a wake word-triggered continuous dialogue mechanism, combined with real-time speech recognition, semantic analysis, and cross-language translation, multi-round natural interaction is achieved in navigation and explanation scenarios, effectively breaking down language barriers in cross-border travel and improving the intelligence level and interaction continuity of services.

[0019] It should be understood that the above general description and the following detailed description are exemplary and explanatory only, and do not limit this application. Attached Figure Description

[0020] To more clearly illustrate the technical solutions of the embodiments of this application, the drawings used in the description of the embodiments will be briefly introduced below. Obviously, the drawings described below are some embodiments of this application. For those skilled in the art, other drawings can be obtained based on these drawings without creative effort.

[0021] Figure 1 This is a schematic flowchart illustrating the steps of a multimodal AI interactive global intelligent navigation and cultural explanation method provided in one embodiment of this application; Figure 2 This is a schematic diagram of the structure of a travel converter provided in an embodiment of this application; Figure 3 This is a schematic block diagram of the structure of a multimodal AI interactive global intelligent navigation and cultural interpretation system provided in one embodiment of this application; Figure 4 This is a schematic block diagram of the structure of a travel converter provided in an embodiment of this application.

[0022] It should be understood that the above general description and the following detailed description are exemplary and explanatory only, and do not limit this application. Detailed Implementation

[0023] The technical solutions of the embodiments of this application will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of this application, not all embodiments. Based on the embodiments of this application, all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of this application.

[0024] The flowchart shown in the attached diagram is for illustrative purposes only and does not necessarily include all content and operations / steps, nor does it necessarily have to be performed in the described order. For example, some operations / steps can be broken down, combined, or partially merged, so the actual execution order may change depending on the actual situation.

[0025] It should be understood that, in order to clearly describe the technical solutions of the embodiments of the present invention, the terms "first" and "second" are used in the embodiments of the present invention to distinguish identical or similar items with essentially the same function and effect. Those skilled in the art will understand that the terms "first" and "second" do not limit the quantity or execution order, and the terms "first" and "second" are not necessarily different.

[0026] It should be understood that the terminology used in this specification is for the purpose of describing particular embodiments only and is not intended to limit the scope of the application. As used in this specification and the appended claims, the singular forms “a,” “an,” and “the” are intended to include the plural forms unless the context clearly indicates otherwise.

[0027] It should also be understood that the term “and / or” as used in this application specification and the appended claims means any combination of one or more of the associated listed items and all possible combinations, and includes such combinations.

[0028] With the rapid development of the cross-border travel industry, users are increasingly demanding continuous and stable global navigation services, cultural explanations of destination attractions, and natural cross-language interaction during their cross-border travels. Currently, existing technological solutions for these needs all suffer from shortcomings such as fragmented functions, poor scenario adaptability, and low hardware resource reuse rates.

[0029] Existing global navigation technologies mostly rely on independent terminals such as mobile phones and dedicated navigators. In weak signal environments such as cross-border tunnels, densely populated high-rise buildings, and remote overseas areas, these terminals generally suffer from a sharp drop in positioning accuracy and interruption of positioning trajectory. At the same time, they rely on cross-border roaming networks to provide services, which is costly. Furthermore, their ability to integrate and adapt multi-source positioning data is insufficient, making it impossible to guarantee continuous and stable navigation output across all cross-border scenarios.

[0030] To solve the above problem, please refer to Figure 1 This application provides a multimodal AI interactive global intelligent navigation and cultural explanation method, applicable to, for example... Figure 2 The travel converter shown is illustrated. It should also be noted that all information involved in the method provided in this application was extracted with the authorization of the relevant user and in accordance with relevant regulations, and will not infringe upon user privacy.

[0031] The provided multimodal AI-interactive global intelligent navigation and cultural interpretation method includes steps S101 to S103. Details are as follows: Step S101. Acquire multi-source positioning data, including fused satellite positioning data, base station positioning information and local wireless LAN scanning signals; perform dynamic weighting processing on the multi-source positioning data, generate and output navigation guidance.

[0032] Specifically, the core of step S101 is to simultaneously collect three positioning data sources—satellites, base stations, and local wireless LAN—through the multi-source positioning module integrated in the travel converter. By using a dynamic weighted fusion algorithm, the problem of insufficient positioning accuracy or signal interruption of a single data source in a specific scenario is solved, thereby achieving continuous and stable navigation across cross-border scenarios.

[0033] After the travel converter is powered on, it automatically starts the multi-source positioning module and simultaneously activates the satellite positioning receiver, cellular network base station scanning, and local wireless LAN scanning functions.

[0034] The satellite positioning receiver collects satellite signals from the four major systems, BeiDou-3, GPS, GLONASS, and Galileo, at a frequency of 1Hz. It calculates the pseudorange, carrier phase, and Doppler shift data of each visible satellite to obtain the first positioning result. At the same time, it outputs the satellite signal strength (carrier-to-noise ratio C / N0) and the positioning accuracy factor (PDOP).

[0035] The cellular network module scans the surrounding base station signals to obtain the base station's cell ID, signal strength (RSRP), and timing advance (TA) information, and calculates the second positioning result through the base station location database.

[0036] The Wi-Fi module scans the surrounding wireless LAN access points, obtains the SSID, MAC address and signal strength (RSSI) information of the access points, and calculates the third location result through the global Wi-Fi location database.

[0037] The travel converter's local computing module assigns dynamically changing weight values ​​to the three positioning results based on the current environment and the signal quality of each data source.

[0038] Based on the assigned weight values, the Kalman filter algorithm is used to fuse the three localization results to obtain the fused localization result.

[0039] The fused positioning trajectory is smoothed and predicted for compensation: the sliding window averaging method is used to smooth the continuous positioning points to eliminate positioning noise; the linear prediction algorithm is used to predict the positioning points in the weak signal environment based on the motion state (velocity, acceleration) of the first N positioning points, and the predicted positioning points are output when effective positioning data cannot be obtained at a certain moment.

[0040] The local computing module loads pre-downloaded global offline navigation map data, matches the fused positioning results with the map data, and combines the user's preset destination information to plan the optimal navigation route using the A* algorithm.

[0041] Navigation guidance is output in voice form through the travel converter's speaker, while the navigation route map and real-time location information are displayed on the touch screen, and can be pushed to the user's paired smartphone, smartwatch and other mobile terminals via Bluetooth.

[0042] Step S102. Obtain image data and voice requests corresponding to the target area, which are used to identify scenic spot information in the target area, generate multilingual cultural explanation content based on the scenic spot information in the target area, and play the multilingual cultural explanation content.

[0043] Specifically, the core of step S102 is to use the image acquisition module and voice interaction module of the travel converter to achieve automatic identification of the target attraction and analysis of user needs. Combined with the large language model in the cloud, personalized multilingual cultural explanation content is generated, which solves the problems of single attraction explanation content, limited languages, and inability to meet personalized needs in the existing technology.

[0044] Users can trigger the attraction explanation function in two ways: Automatic trigger: When the travel converter detects that the user has entered the pre-set attraction trigger range (default radius 50 meters), the image acquisition module will be automatically started; Manual trigger: The user can press the function key on the travel converter or trigger the image acquisition module by giving the voice command "Start Explanation".

[0045] The image acquisition module collects real-time image data of the target area and transmits the image data to the local computing module for preprocessing, including image scaling, grayscale conversion, denoising, and normalization. The local computing module runs a pre-trained object detection model (such as YOLOv8) to detect objects in the preprocessed image and extract the target areas of the scenic spots. Feature extraction is performed on the extracted target areas of the scenic spots, and a convolutional neural network (CNN) is used to generate feature vectors for the scenic spots. The generated feature vectors are then matched with a global scenic spot feature database in the cloud to identify the identity information of the target scenic spot (including scenic spot name, ID, geographical location, category, etc.).

[0046] Meanwhile, the voice interaction module collects the user's voice requests, performs voice recognition and semantic analysis on the voice requests, and extracts the user's needs (such as "What is the history of this building?", "Explain in English", "Explain in more detail", etc.).

[0047] The identified attraction identity information and user demand information are sent to a large natural language processing model (such as GPT-4, Claude3, etc.) in the cloud.

[0048] The large model combines a pre-stored knowledge base of tourist attractions (including the historical background, cultural connotations, architectural features, anecdotes of famous people, and visiting suggestions) to generate personalized multilingual cultural explanations based on the user's language preferences and the depth of explanation needed.

[0049] The cloud-based system transmits the generated cultural explanations back to the travel converter, which then plays the explanations through a speaker and displays the text on a touchscreen. The explanations are also pushed to the user's linked mobile device for display and local storage.

[0050] Step S103. After playing the multilingual cultural explanation content, if a preset wake-up word is received, a preset continuous dialogue mechanism is triggered to recognize and semantically analyze the received real-time speech and generate corresponding navigation information or explanation information.

[0051] Specifically, the core of step S103 is to trigger a continuous dialogue mechanism by using a preset wake word after the explanation of the scenic spot, maintain the connection state of the dialogue context, realize real-time natural interaction across languages, and solve the problem of single interaction mode and inability to conduct continuous dialogue in the existing technology.

[0052] After the travel converter finishes playing the multilingual cultural explanation content, it automatically enters the wake word listening state, and the voice interaction module continuously collects audio data in the environment where the terminal is located. The local computing module runs a pre-trained wake word detection model (such as HeySnips) to perform real-time wake word matching detection on the collected audio data. When audio content matching the preset wake word (such as "Xiaolv Xiaolv") is detected, the continuous conversation mechanism is triggered, and a real-time voice collection and interaction channel with a preset duration (default 30 seconds) is opened. During the opening of the continuous conversation mechanism, the voice interaction module continuously collects the user's real-time voice data and transmits the voice data to the local computing module for preprocessing (including noise reduction, echo cancellation, and endpoint detection). The preprocessed voice data is sent to the voice recognition and semantic understanding service in the cloud for voice recognition and semantic parsing. The context understanding model in the cloud combines the previous conversation history to complete reference resolution and continuous semantic association, and accurately understands the user's continuous questions.

[0053] Meanwhile, the translation service in the cloud completes cross-lingual real-time translation according to the language set by the user in advance, supporting mutual translation between multiple languages such as Chinese, English, French, German, Spanish, Japanese, and Korean.

[0054] Based on the user's needs obtained from semantic parsing, the large model in the cloud generates corresponding navigation information or supplementary explanation information. The generated information is played in voice form through the speaker of the travel converter, and at the same time, the text content is displayed on the touch display screen and pushed to the mobile terminal bound by the user synchronously. If no voice input from the user is detected within the preset duration, the continuous conversation mechanism automatically closes, and the travel converter returns to the normal navigation and wake word listening state.

[0055] In some embodiments, the obtaining of multi-source positioning data includes: collecting satellite positioning data, base station positioning information, and local wireless local area network scan signals corresponding to the current position of the travel converter, respectively performing validity verification on the satellite positioning data, base station positioning information, and local wireless local area network scan signals, removing invalid data beyond the preset error range, and retaining multi-source positioning data that meets the accuracy requirements.

[0056] After the satellite positioning data, base station positioning information, and local wireless local area network scan signals are collected by the travel converter, validity verification is first performed on the three data sources respectively. For satellite positioning data, the validity verification criteria include: the number of visible satellites is not less than 4; the positioning dilution of precision (PDOP) is not greater than 6.0; the average carrier-to-noise ratio (C / N0) of the satellite signal is not lower than 35 dB-Hz; the distance between consecutive positioning points does not exceed the product of the preset maximum movement speed threshold (default 120 km / h) and the time interval.

[0057] For base station positioning information, the validity verification criteria include: at least 3 visible base stations; average base station signal strength (RSRP) not lower than -110dBm; and the deviation between the positioning result and the fused positioning result from the previous moment not exceeding 500 meters. For local wireless LAN scanning signals, the validity verification criteria include: at least 3 visible wireless LAN access points; average access point signal strength (RSSI) not lower than -70dBm; and the deviation between the positioning result and the fused positioning result from the previous moment not exceeding 200 meters. Positioning data that does not meet the above validity criteria is deemed invalid and discarded. All valid multi-source positioning data that meets the accuracy requirements are retained and proceed to the subsequent dynamic weighted fusion processing step.

[0058] If all data from a certain data source is deemed invalid at a certain moment, the weight of that data source in the current fusion calculation will be automatically set to 0, and only the remaining valid data sources will be used for location fusion.

[0059] In some embodiments, the dynamic weighting processing of multi-source positioning data to generate and output navigation guidance includes: assigning dynamically changing weight values ​​to the fused satellite positioning data, base station positioning information, and local wireless LAN scanning signals based on the signal strength corresponding to the multi-source positioning data and the current environmental scene; completing the fusion processing of multi-source positioning data based on the assigned weight values; smoothing and predictive compensation of the positioning trajectory obtained after fusion processing; outputting continuous and stable multi-source positioning data in weak signal environments including tunnels and densely built-up areas; generating navigation guidance based on the multi-source positioning data and user-preset destination information; and outputting the navigation guidance through the terminal speaker or the user-bound mobile terminal.

[0060] The travel converter pre-stores weight configuration tables for various typical environmental scenarios, including: open outdoor scenarios: satellite positioning weight 0.8, base station positioning weight 0.15, Wi-Fi positioning weight 0.05; densely populated urban high-rise areas: satellite positioning weight 0.3, base station positioning weight 0.4, Wi-Fi positioning weight 0.3; underground tunnel / indoor scenarios: satellite positioning weight 0, base station positioning weight 0.3, Wi-Fi positioning weight 0.7; and remote rural scenarios: satellite positioning weight 0.7, base station positioning weight 0.3, Wi-Fi positioning weight 0.

[0061] The travel converter detects the current environment in real time using methods including: determining whether it is an open outdoor area or a densely built-up area based on satellite signal strength and PDOP value; determining whether it has entered a tunnel based on accelerometer and gyroscope data; determining whether it is an indoor scene based on the number and density of Wi-Fi access points; and using road type and geographic location information from map data to assist in the judgment. Based on the detected environment, initial weight values ​​are obtained from a weight configuration table.

[0062] The initial weight values ​​are further dynamically adjusted based on the real-time signal quality of each data source: for satellite positioning, the weight value is directly proportional to the average carrier-to-noise ratio of the satellite signal and inversely proportional to the PDOP value; for base station positioning, the weight value is directly proportional to the average signal strength of the base station; for Wi-Fi positioning, the weight value is directly proportional to the average signal strength of the access point. The adjusted weight values ​​are then normalized to ensure that the sum of the three weight values ​​is 1. Based on the normalized weight values, an extended Kalman filter algorithm is used to fuse the three valid positioning results to obtain the fused positioning result. The fused positioning trajectory is then smoothed: a sliding window of length 5 is used, and the positioning points within the window are weighted and averaged, with the weight value inversely proportional to time, meaning newer positioning points have higher weights. Predictive compensation is performed on the positioning trajectory: a user motion state model is established, including three state variables: position, velocity, and acceleration, and a Kalman filter algorithm is used to predict the motion state. When a satellite signal is completely lost (e.g., when entering a long tunnel) and effective base station and Wi-Fi positioning data cannot be obtained, the pure inertial navigation mode is activated. Using the accelerometer and gyroscope data built into the travel converter, combined with the predicted motion state, the positioning results are continuously output. The effective duration of pure inertial navigation is no less than 5 minutes, and the positioning error does not exceed 100 meters / minute.

[0063] When valid multi-source positioning data is reacquired, the system automatically switches back to multi-source fusion navigation mode and uses the newly acquired positioning data to correct the accumulated errors of inertial navigation.

[0064] In some embodiments, identifying attraction-related information in the target area includes: extracting and matching features of target objects in the image data corresponding to the target area to identify attraction identity information corresponding to the target objects; parsing user demand information contained in the voice request to match attraction-related information corresponding to the attraction identity information and user demand information.

[0065] After the image acquisition module of the travel converter acquires image data of the target area, the local computing module runs the YOLOv8 object detection model to detect objects in the image and outputs the bounding boxes and confidence scores of all possible scenic objects in the image.

[0066] Target bounding boxes with a confidence score greater than 0.7 are selected. Image regions within each bounding box are cropped and normalized to obtain uniformly sized (224×224) scenic spot image patches. A pre-trained ResNet-50 convolutional neural network is used as a feature extractor to extract features from each scenic spot image patch, generating a 1024-dimensional feature vector. The generated feature vectors are sent to a global scenic spot feature database in the cloud for feature matching, using cosine similarity as the matching metric. The cosine similarity between the input feature vector and the feature vectors of all scenic spots in the database is calculated, and the top 5 scenic spots with the highest similarity are selected as candidate results. Combining the current location information of the travel converter, the distance between each candidate scenic spot and the current location is calculated, and scenic spots within 100 meters and with a similarity greater than 0.8 are selected as the final recognition results. Simultaneously, the voice interaction module collects the user's voice requests and uses an end-to-end speech recognition model (such as Whisper) to convert the speech into text.

[0067] A pre-trained large-scale language model is used to perform semantic parsing on the converted text, extracting user demand information, including: content demands (such as history, architecture, culture, tickets, opening hours, transportation, etc.); language demands (such as Chinese, English, French, etc.); and depth demands (such as brief, detailed, professional, etc.). The finally identified attraction identity information is then associated with the extracted user demand information to generate a complete query request, which is then sent to the cloud-based large-scale natural language processing model.

[0068] In some embodiments, generating multilingual cultural explanation content based on attraction-related information of the target area and playing the multilingual cultural explanation content includes: generating personalized cultural explanation content in the corresponding language based on the identified attraction-related information, combined with pre-set language preferences and explanation depth requirements, using a natural language processing model, sending the generated cultural explanation content back to the travel converter, playing the cultural explanation content through a speaker, and simultaneously pushing the cultural explanation content to the bound mobile terminal for display and local storage.

[0069] After receiving a query request through a cloud-based natural language processing model, the system first retrieves all information related to the attraction from a global attraction knowledge base, including historical background, cultural significance, architectural features, anecdotes about famous people, visitor suggestions, ticket prices, opening hours, and surrounding facilities.

[0070] Based on the user's desired level of in-depth explanation, the retrieved information is filtered and organized as follows: Brief Mode: Includes only the core historical background and main features of the attraction, with a duration of approximately 1-2 minutes; Detailed Mode: Includes the complete history, cultural connotations, architectural details, and anecdotes of notable figures, with a duration of approximately 5-10 minutes; Professional Mode: Includes the academic research value, architectural technical analysis, and historical verification of the attraction, with a duration of approximately 15-20 minutes. According to the user's language preference, the organized explanations are translated into the corresponding target language, paying attention to cultural differences during the translation process to ensure accuracy, naturalness, and ease of understanding.

[0071] By combining users' historical interaction data and preference information, the narration content is personalized: if the user is interested in history, more historical details and stories are added; if the user is interested in photography, suggestions for the best shooting locations and times are added; if the user is traveling with children, interactive content suitable for children and safety tips are added. The generated narration content is converted into a structured JSON format, containing text content, segmented information, key points, and relevant image links. The structured narration content is then sent back to the travel converter in the cloud and simultaneously sent to all the user's linked mobile devices.

[0072] After receiving the narration, the Travel Converter uses its local TTS (Text-to-Speech) engine to convert the text into speech, which is then played through the speaker. The TTS engine supports various timbres and speech rates, allowing users to customize settings to their liking. Simultaneously, the Travel Converter's touchscreen displays the narration text, scrolling through segments and highlighting the currently playing content. When the user's linked mobile device receives the narration, the complete text and related images are displayed in the corresponding app, supporting zooming, panning, and sharing functions. The Travel Converter and the user's linked mobile device automatically store the narration content locally in " / storage / emulated / 0 / TravelConverter / Guides / ", with file names in the format "attraction ID_language_timestamp.json". Users can view and play the stored narration at any time offline.

[0073] In some embodiments, the step of triggering a preset continuous dialogue mechanism after receiving a preset wake-up word after playing the multilingual cultural explanation content includes: after completing the playback of the multilingual cultural explanation content, collecting audio data in the environment where the terminal is located, performing wake-up word matching detection on the collected audio data, and when audio content matching the preset wake-up word is detected, triggering the continuous dialogue mechanism, opening a real-time voice acquisition and interaction channel of preset duration, and maintaining the association state of the dialogue context.

[0074] After playing multilingual cultural explanation content through the travel converter, it automatically enters the low-power wake word listening state. In this state, the voice interaction module only turns on 1 microphone to collect audio data at a sampling rate of 16 kHz, and the local computing module only runs a lightweight wake word detection model with a power consumption not exceeding 10 mW.

[0075] The wake word detection model uses a keyword detection algorithm based on a deep neural network, and "Xiaolv Xiaolv" is pre-trained as the default wake word. Users can customize the wake word in the APP. The wake word detection model processes the collected audio data in real time and calculates the matching degree between the audio segment and the wake word template. When the matching degree exceeds the preset threshold (default 0.9), it is determined that the wake word is detected. To reduce the false wake-up rate, a secondary confirmation mechanism is adopted: when the wake word is detected for the first time, the travel converter plays a prompt sound of "I'm here" through the speaker and turns on all 4 microphones to collect audio data in the next 3 seconds for secondary verification. If the secondary verification also passes, the continuous conversation mechanism is officially triggered; otherwise, it returns to the low-power wake word listening state.

[0076] After the continuous conversation mechanism is triggered, a real-time voice collection and interaction channel with a preset duration (default 30 seconds) is opened. During the opening of this channel, the voice interaction module continuously collects audio data at a sampling rate of 16 kHz and a quantization precision of 16 bits.

[0077] At the same time, the local computing module establishes a long connection with the cloud to maintain the associated state of the conversation context. The cloud assigns a unique session ID to each conversation session, and all voice recognition results, semantic parsing results, and reply contents related to this session are associated and stored with this session ID.

[0078] If the user's voice input is detected within the preset duration, the timer is reset and the 30-second timing starts again. If the user's voice input is not detected within the preset duration, the continuous conversation mechanism is automatically closed, the cloud ends this session and clears the context data, and the travel converter returns to the low-power wake word listening state.

[0079] The user can manually close the continuous conversation mechanism by pressing the function key or saying the command "End conversation". <0000​

[0081] During the continuous dialogue mechanism, after the voice interaction module collects the user's real-time voice data, it first performs preprocessing: beamforming algorithm is used to suppress environmental noise and echo; endpoint detection algorithm is used to accurately identify the start and end positions of the speech; and the voice data is normalized to ensure its amplitude is between -1 and 1. The preprocessed voice data is then sent to the cloud-based speech recognition service via an encrypted HTTPS connection. The cloud-based speech recognition service uses an end-to-end large-model speech recognition algorithm to convert the speech into text and outputs a confidence score for each character. The cloud-based semantic understanding service performs semantic parsing on the recognized text, using a large language model combined with contextual information to complete referential resolution, intent recognition, and entity extraction.

[0082] Reference resolution: accurately understand the specific meaning of referential words such as "it," "this," and "there"; Intent recognition: determine whether the user's intent is a navigation request, an explanation request, or another question; Entity extraction: extract entity information such as location, time, and people mentioned by the user.

[0083] Meanwhile, the cloud-based translation service performs real-time cross-language translation based on the user's pre-set source and target languages. The translation process uses a neural machine translation model, supports mutual translation between multiple languages, and has a translation latency of no more than 1 second.

[0084] Based on user needs derived from semantic parsing, a large cloud-based model generates corresponding responses: for navigation requests, detailed navigation guidance information is generated, including route, distance, estimated time, and turn prompts; for explanation requests, supplementary cultural explanations are generated; and for other questions, corresponding answers are generated. The generated responses are converted into both voice and text formats and simultaneously sent to the travel converter and the user's linked mobile terminal. The travel converter plays the voice response through its speaker and displays the text response on its touchscreen.

[0085] The user's linked mobile device displays the complete conversation history in the app, including the user's questions and the system's replies, and supports copying, sharing, and exporting functions.

[0086] All conversation data is encrypted and stored in the cloud. Users can view, manage, and delete their conversation history within the app.

[0087] In some embodiments, the method further includes: collecting corresponding historical travel trajectory data, historical interaction data, and dwell time data at each attraction, so as to extract corresponding travel route preferences, explanation content preferences, and itinerary pace preferences through a preset user preference analysis model; generating personalized recommended routes and pre-explanatory content for corresponding attractions based on the user's current location, the real-time opening status of attractions, and cross-border travel arrangements; and triggering the playback of pre-explanatory content when entering a preset trigger range of the corresponding attraction. During user use, the travel converter automatically collects the following user behavior data: historical travel trajectory data: including all locations visited by the user, dwell time, and movement routes; historical interaction data: including the user's voice requests, questions, and explanation preferences; and dwell time data at each attraction: recording the user's dwell time at each attraction.

[0088] The collected user behavior data is periodically uploaded to a cloud-based user data analysis platform using encryption. The cloud-based user preference analysis model employs a combination of collaborative filtering and deep learning to analyze the user behavior data and extract user preferences. Travel route preferences: such as whether you prefer natural scenery or historical sites, whether you prefer independent travel or group tours, whether you prefer a slow pace or a fast pace, etc.; content preferences: such as whether you prefer history, culture, architecture or food, whether you prefer brief explanations or detailed explanations, etc.; itinerary pace preferences: such as the number of attractions visited per day, average stay time, frequency of rest, etc.

[0089] By combining the user's current location information, the real-time opening status of attractions (including ticket prices, queue status, temporary park closure notices, etc.) with the user's cross-border travel arrangements (such as flight times, hotel locations, etc.), a reinforcement learning algorithm is used to generate personalized recommended routes that suit the user's needs.

[0090] The recommended routes include daily itineraries, the order of attractions visited, transportation options, dining recommendations, and accommodation suggestions. Users can view and modify these routes within the app. Simultaneously, personalized audio guides are pre-generated for each attraction on the recommended route based on the user's preferred audio content. When the travel converter detects that a user is about to enter a recommended attraction's preset trigger range (default radius 200 meters), it automatically triggers the download and playback of the pre-guided audio guide. The pre-guided audio guide includes a basic introduction to the attraction, important notes for visitors, the best tour route, and highlighted areas, helping users understand the attraction in advance and plan their visit time. Users can configure whether to enable the personalized recommendation function, as well as the frequency and type of recommendations, within the app.

[0091] In some embodiments, the method further includes: detecting the current network communication status, positioning signal strength, and power grid compatibility status of the travel converter; when the network communication status is detected to be lower than a preset normal operating threshold, loading pre-downloaded offline navigation map data, attraction explanation data package, and offline speech recognition model for the corresponding area; and completing the positioning and navigation processing, speech recognition parsing, and playback of attraction explanation content locally to maintain the continuity of navigation and explanation services in environments without or with weak networks.

[0092] The travel converter monitors the current network communication status, location signal strength, and power grid compatibility in real time. Network communication status indicators include cellular signal strength, Wi-Fi signal strength, and network latency. When the detected network communication status falls below a preset normal operating threshold (e.g., cellular signal strength below -120dBm, network latency greater than 500ms), the offline mode switching process is automatically triggered. The offline mode switching process first checks whether offline data for the current area has been downloaded to local storage: offline navigation map data (including road network, POI information, terrain data, etc.); attraction explanation data package (including basic information, explanation content, images, and audio); offline speech recognition model (a lightweight end-to-end speech recognition model); and offline TTS engine (a text-to-speech engine supporting multiple languages).

[0093] If offline data for the current region has already been downloaded to local storage, the system will automatically switch to offline mode. In offline mode, all location and navigation processing, voice recognition and parsing, and playback of attraction descriptions are completed locally on the travel converter, without relying on cloud services.

[0094] The positioning and navigation system uses a local multi-source fusion positioning algorithm and offline navigation map data to generate and output navigation guidance. Attraction recognition employs a locally pre-trained attraction recognition model to process the collected image data and identify the target attraction's identity information. Voice interaction uses a local offline speech recognition model to recognize the user's voice requests and a local offline TTS engine to play the response content. Attraction narration extracts corresponding narration content from locally stored attraction narration data packages for playback and display.

[0095] If offline data for the current region is not available in local storage, the travel converter will prompt the user via speaker and display: "Current network conditions are poor; it is recommended to download offline data in advance," and provide a one-click download function. Users can download offline data packages for their destination country or region in advance when they have a network connection.

[0096] When the network communication status is detected to have returned to normal, it automatically switches back to online mode, synchronizes user data generated during the offline period, and updates the local offline data packets.

[0097] In some embodiments, this embodiment is based on Figure 2 This is a three-dimensional structural diagram of the travel converter described in this embodiment, wherein: the upper part of the front (right side) is provided with 3 USB-C ports, the middle part is provided with a universal compatible socket, and the lower part is provided with a rectangular touch screen; the left side is provided with 3 retractable plugs, which are, from top to bottom, a US standard two-prong plug, a European standard two-prong round plug, and a British standard three-prong plug; The overall design features a rectangular, rounded-corner shape, with the outer shell made of flame-retardant ABS engineering plastic. The travel adapter uses a snap-fit ​​design with top and bottom covers, and all electronic components are secured internally with screws. The surface of the shell has a frosted finish to increase grip and prevent slippage. The three retractable plugs on the left side use an independent sliding structure, each equipped with a sliding push button and a self-locking mechanism. Users can push the corresponding push button to extend the plug according to the socket standard of the destination country, and the self-locking mechanism automatically locks the plug in place. After use, pressing the unlock button on the side of the push button automatically retracts the plug into the shell. The three plugs are interlocked, allowing only one plug to extend at a time to prevent short circuits. The universal compatible socket in the center of the front uses a flexible copper plate design, compatible with plugs of all global standards, including US, European, British, Australian, and Chinese standards. A safety shutter is installed inside the socket; it only opens when two or three prongs of the plug are inserted simultaneously to prevent accidental electric shock to children.

[0098] The power conversion module is integrated inside the casing and adopts high-frequency switching power supply technology. The input voltage range is 100-240VAC, 50 / 60Hz, and the output voltage is 5V / 9V / 12V / 15V / 20VDC. It supports mainstream fast charging protocols such as PD3.0, QC4.0, and SCP. The maximum output power of a single port is 65W, and the total output power of the three ports does not exceed 100W.

[0099] The power conversion module is electrically connected to three retractable plugs. When any one of the plugs extends and is inserted into an AC outlet, the power conversion module automatically starts, powering the entire travel converter and charging connected external devices.

[0100] It has a built-in 10000mAh lithium-ion polymer battery that is electrically connected to the power conversion module. When the travel adapter is plugged into AC power, it automatically charges the built-in battery; when it is unplugged from AC power, it automatically switches to battery power mode to power the positioning, communication, voice interaction and other functional modules, with a battery life of no less than 24 hours.

[0101] The multi-source positioning module is integrated into the main board inside the casing, located behind the touch display screen. This module integrates receivers for four major global satellite navigation systems: BeiDou-3, GPS, GLONASS, and Galileo, supporting L1+L5 dual-frequency positioning, with a positioning accuracy of ±1 meter in open outdoor environments.

[0102] By integrating a 2G / 3G / 4G / 5G multi-mode baseband chip and eSIM card, it supports cellular network communication in over 150 countries and regions worldwide and can obtain base station location information. It integrates Wi-Fi 6E and Bluetooth 5.3 modules, with built-in FPC antennas located at the top and bottom of the casing, respectively, enabling it to scan surrounding Wi-Fi access points and Bluetooth beacons for Wi-Fi and Bluetooth positioning. It also integrates a 6-axis IMU sensor (accelerometer + gyroscope), barometer, and geomagnetic sensor, enabling inertial navigation and altitude and direction detection.

[0103] The high-definition camera for the image acquisition module is integrated into the top center of the casing, with the lens facing the same direction as the universal socket. It has a 120-degree field of view and supports autofocus and optical image stabilization. A transparent protective lens is placed in front of the camera to prevent scratches. The microphone array for the voice interaction module is integrated into the top edge of the casing, arranged in a straight line, enabling 360-degree omnidirectional voice acquisition and a far-field wake-up distance of up to 5 meters. The high-fidelity speaker is integrated into the bottom of the casing, employing a sound cavity design to deliver clear and loud sound quality, allowing for clear playback of navigation guidance and explanations even in noisy outdoor environments.

[0104] The local computing module integrates an 8-core ARM processor and an NPU neural network acceleration unit on the motherboard, equipped with 8GB of LPDDR5 RAM and 128GB of UFS 3.1 high-speed storage. It can run local positioning algorithms, voice recognition models, scenic spot recognition models, and a TTS engine. The rectangular touchscreen display at the bottom front is a 2.4-inch IPS color touchscreen with a resolution of 320×240 and a brightness of up to 500 nits, ensuring clear visibility even in sunlight. The display supports multi-touch, allowing users to view navigation routes, browse audio guides, and access device settings via touch.

[0105] The right side of the casing (not shown in the attached image) features a power button, volume up buttons, volume down buttons, and an SOS emergency call button. Press and hold the power button for 3 seconds to turn the device on / off, and press briefly to turn the screen on / off. The volume buttons adjust the speaker volume. Press and hold the SOS button for 5 seconds to activate the emergency call function. The three USB-C ports at the top of the front panel support both input and output, and can be used to charge external devices or connect to a computer for data transfer and firmware upgrades.

[0106] The multi-source positioning module implements the multi-source positioning data acquisition function in step S101; the local computing module implements the dynamic weighted fusion, trajectory smoothing and prediction compensation in step S101, as well as the image preprocessing, feature extraction in step S102 and the wake word detection function in step S103; the image acquisition module implements the target area image data acquisition function in step S102; the voice interaction module implements the voice request acquisition and explanation content playback in step S102, as well as the real-time voice acquisition and response playback function in step S103; the touch screen implements the visualization display function of navigation information and explanation content; the communication module implements the data interaction function with the cloud; and the power conversion and battery module provides stable power supply support for all functional modules.

[0107] In some embodiments, by adding real-time detection and adaptive adjustment functions for cross-border multi-standard power grids and multiple equipment safety protection mechanisms on the basis of the above hardware structure, and linking the power grid status with the power consumption management of navigation and explanation functions, the risk of equipment damage caused by large differences in power grid parameters in different countries is solved, while ensuring the availability of core functions in extreme power grid environments.

[0108] By integrating a power grid parameter detection circuit at the input of the power conversion module, parameters such as input voltage, input current, frequency, harmonic distortion, and neutral-to-ground voltage can be detected in real time, with a sampling frequency of 1kHz.

[0109] When the travel adapter is plugged into a power outlet in any country, the power grid parameter detection circuit is activated immediately, completing the detection of all power grid parameters within 100ms and sending the detection results to the local calculation module.

[0110] The local computing module compares the detected power grid parameters with a pre-stored database of standard parameters for power grids in various countries around the world to identify the power grid system of the current country.

[0111] The local computing module dynamically adjusts the operating parameters of the power conversion module, including switching frequency, duty cycle, and output voltage ripple, based on the identified power grid type and real-time power grid parameters, to adapt to different power grid environments.

[0112] For example, when a large fluctuation in the grid voltage is detected (such as in rural areas of some developing countries), the voltage regulation range of the power conversion module is automatically increased to reduce output ripple; when the grid frequency is detected to be 60Hz, the operating frequency of the switching power supply is automatically adjusted to improve conversion efficiency.

[0113] At the same time, the output power of the three USB-C ports is dynamically allocated according to the charging needs of the connected external devices, giving priority to charging essential travel devices such as mobile phones and cameras.

[0114] The device integrates multiple safety protection mechanisms, including overvoltage protection, overcurrent protection, overheat protection, short-circuit protection, leakage protection, and lightning protection. Overvoltage protection: When the input voltage exceeds 265VAC, the power input is immediately cut off to prevent high-voltage damage to the equipment. Overcurrent protection: When the output current exceeds the maximum rated current of a single port, the output of that port is immediately cut off. Overheat protection: When the internal temperature exceeds 85℃, the output power is automatically reduced; when the temperature exceeds 95℃, all outputs are completely cut off. Leakage protection: When a leakage current exceeding 30mA is detected, the power input is cut off within 10ms to prevent electric shock. Lightning protection: Integrating a varistor and gas discharge tube, it can withstand surge voltages up to 10kV. When any protection mechanism is triggered, the travel converter plays a voice prompt through the speaker and displays the corresponding fault code and troubleshooting suggestions on the touchscreen display.

[0115] The local computing module dynamically adjusts the power consumption of navigation, narration, and interactive functions based on real-time power grid status and battery level to ensure the availability of core functions. When unstable power grid voltage is detected and battery level is below 20%, it automatically enters low-power mode: closing unnecessary background processes, reducing screen brightness, shortening the duration of voice interaction, and prioritizing the normal operation of positioning and navigation functions. When a complete power grid interruption is detected and battery level is below 5%, it automatically enters emergency mode: retaining only satellite positioning and SOS emergency distress functions, disabling all other functions, and maintaining positioning and distress capabilities for at least 12 hours.

[0116] In some embodiments, the problem of existing technologies being unable to provide accurate navigation and automatic explanations indoors (such as museums, art galleries, shopping malls, airports, etc.) is solved by integrating Bluetooth beacons, IMU, barometers, geomagnetic sensors and visual SLAM technology to achieve meter-level accuracy navigation and automatic explanation of exhibits indoors.

[0117] The travel converter simultaneously collects the following indoor positioning data sources: Bluetooth beacon signals: scanning surrounding Bluetooth beacons to obtain their UUID, Major, Minor values, and signal strength (RSSI); IMU data: acquiring accelerometer and gyroscope data to calculate the user's stride length, number of steps, and walking direction; barometer data: detecting air pressure changes to calculate the user's floor level; geomagnetic sensor data: detecting the strength and direction of the geomagnetic field to help determine the user's walking direction; and visual SLAM data: acquiring images of the surrounding environment through a camera to build a local map and perform positioning. The validity of each data source is verified: Bluetooth beacon signal strength is not lower than -80dBm, IMU data noise does not exceed a preset threshold, and the rate of change of barometer data does not exceed a preset threshold.

[0118] The multi-sensor fusion positioning algorithm uses the extended Kalman filter algorithm to fuse multiple data sources and generate indoor positioning results.

[0119] Based on Bluetooth beacon positioning, it provides absolute position information; supplemented by IMU dead reckoning, it provides continuous relative position information; barometer data is used to determine the floor level; geomagnetic sensor data is used to correct orientation errors; and visual SLAM data is used to correct long-term accumulated errors.

[0120] When the Bluetooth beacon signal is good, the weight of Bluetooth beacon positioning is 0.7, the weight of IMU dead reckoning is 0.2, and the weight of visual SLAM is 0.1. When the Bluetooth beacon signal is lost, it automatically switches to IMU+visual SLAM fusion positioning mode, which can maintain continuous positioning for at least 10 minutes with a positioning error of no more than 5 meters.

[0121] Indoor navigation route planning utilizes pre-downloaded high-precision map data of the target indoor venue, including floor plans, passageways, staircases, elevators, exhibit locations, and service facility locations.

[0122] Users select their destination on the touchscreen display (such as "3rd floor Van Gogh exhibition hall" or "2nd floor restroom"). The local computing module uses the A* algorithm to plan the optimal indoor navigation route, avoiding obstacles and crowded areas.

[0123] Navigation guidance is provided through voice and visual means: the voice prompt is "Turn left in 50 meters ahead and take the elevator to the 3rd floor"; the touch screen displays the floor plan and real-time location, and uses arrows to indicate the direction of travel.

[0124] The automatic explanation function for indoor exhibits is activated when the travel converter detects that a user has entered the trigger range (default radius of 3 meters) of an exhibit.

[0125] First, the exhibit's identity information is determined by the Bluetooth beacon's ID. Then, the exhibit's image captured by the camera is used for secondary confirmation to ensure accurate identification.

[0126] The corresponding exhibit explanation content is extracted from the locally stored indoor explanation data package, played through the speaker, and displayed on the touch screen with pictures and text descriptions of the exhibits.

[0127] Users can ask questions via voice interaction, such as "Who is the author of this exhibit?" or "In what year was it created?" The travel converter will then generate corresponding supplementary explanations.

[0128] When a user leaves one indoor venue and enters another, the travel converter automatically detects the change in venue and switches to the corresponding indoor map and explanatory data package.

[0129] The system determines whether a user has left an indoor location based on the recovery of satellite positioning signals; it determines whether a user has entered a new indoor location based on the scanned Bluetooth beacon ID; and it matches the user's movement trajectory with map data.

[0130] In some embodiments, by solving the problems of location sharing, finding lost individuals, and collaborative navigation when multiple people travel together across borders, distributed positioning and information interaction are achieved in network-free environments through Bluetooth Mesh networking between multiple travel converters, while also supporting the synchronization and sharing of narration content.

[0131] Multiple travel converters automatically form a network via Bluetooth 5.3 Mesh technology, without relying on cellular or Wi-Fi networks. The network range reaches up to 100 meters and supports up to 20 devices simultaneously. When a user activates "Multi-person Travel Mode," the travel converter automatically broadcasts a network signal to search for other nearby travel converters. When other devices in "Multi-person Travel Mode" are detected, a Mesh connection is automatically established, forming a travel group. Each travel group is assigned a unique group ID, and all group member devices can receive broadcast messages within the group.

[0132] Within the Mesh network, each travel converter periodically broadcasts its location information (including latitude, longitude, altitude, speed, and direction) at a frequency of 1 Hz. When a device is in an environment without satellite or base station signals (such as an underground parking lot, tunnel, or deep mountain), it can receive location information and signal strength from other surrounding devices and calculate its relative position using trilateration, achieving distributed positioning. The real-time locations of all group members are displayed on the touchscreen of each travel converter, represented by icons of different colors. Users can click on any member's icon to view that member's detailed location information and battery status.

[0133] When a member's distance from the group exceeds a preset threshold (default 50 meters), that member's device and all other devices in the group will issue a voice prompt "You have strayed from the group," and display the member's location and navigation route to that location on the screen. If a member gets lost, they can press the "Call Group" button on their device. All other devices in the group will sound a loud alarm and display the member's real-time location and navigation route. A one-click "Meet" command is supported; all devices in the group will display the meeting point's location and navigation route, and issue a voice prompt "Please proceed to the meeting point."

[0134] Any member in the group can create a shared itinerary, including the destination, route, order of attractions visited, and time schedule. Once created, the shared itinerary automatically syncs to the devices of all other members in the group. When the creator modifies the itinerary, all members' devices will automatically update the itinerary information. All members' devices display the same navigation route, and all members' devices will receive a notification if the creator deviates from the route.

[0135] When a member's device is playing a narration for a particular attraction, they can share it to the group. All other members' devices in the group will automatically start playing the same narration, enabling synchronized multi-user narration. Members can transfer offline navigation map data and attraction narration data packages via Bluetooth Mesh network, without consuming mobile data.

[0136] In some embodiments, by utilizing the travel converter's camera and touchscreen display, AR augmented reality navigation and attraction explanation functions are realized, overlaying virtual navigation arrows and attraction information with the real environment to provide users with a more intuitive and immersive travel experience.

[0137] AR navigation works by activating "AR navigation" mode on the travel adapter. The camera automatically starts, capturing real-time images of the surrounding environment and displaying them on the touchscreen. The local computing module matches the fused positioning results, the user's orientation, and offline navigation map data to calculate the position and direction of the virtual navigation arrow in the real environment. This virtual arrow is then overlaid on the real-time image captured by the camera. The arrow is blue, its size is proportional to the distance, and its direction indicates the direction of travel. When a turn is needed, the arrow turns yellow and flashes, accompanied by a voice prompt, "Turn right in 10 meters." Upon reaching the destination, the arrow turns green and displays "Destination reached."

[0138] It supports AR road sign display by overlaying important surrounding road signs, such as hotels, restaurants, gas stations, and restrooms, onto the display screen. These are represented by different icons, and clicking on the icon can view detailed information and navigation routes.

[0139] AR attraction guides work by automatically recognizing an attraction when the user points their camera at it, and then displaying an AR information card overlaid on the screen. The AR information card includes basic information such as the attraction's name, rating, opening hours, and ticket price. Clicking on the card expands to view detailed information and play a guided tour.

[0140] It supports AR annotation, which overlays important architectural structures, historical sites, and statues of famous people onto images of scenic spots. Clicking on an annotation will display a detailed explanation of the corresponding information.

[0141] For example, when a user points their camera at the Eiffel Tower in Paris, the display screen will overlay information such as the Eiffel Tower's construction date, height, and designer, while also labeling different floors of the tower with terms like "observation deck," "restaurant," and "souvenir shop." Clicking on a label will show the corresponding detailed information.

[0142] It supports simple gesture control of AR functions. Users can pause the AR display by making a "fist" gesture in front of the camera; resume the AR display by making an "open palm" gesture; and switch between different information cards by making a "swipe left or right" gesture.

[0143] Users can control the display of AR information using voice commands such as "zoom in," "zoom out," "previous," and "next." All data required for AR navigation and explanation (including map data, 3D models of attractions, and annotation information) can be pre-downloaded to local storage. The AR function still works normally in environments without or with weak network connectivity, without relying on cloud services. When the network is restored, the local AR data is automatically updated to ensure the accuracy of the information.

[0144] In some embodiments, this embodiment addresses the problem of being unable to make emergency calls in remote areas (such as deep mountains, deserts, and oceans) without cellular network coverage during cross-border travel. By integrating a low-orbit satellite communication module, it enables one-click emergency calls, location sharing, and SMS communication functions globally, ensuring the safety of users.

[0145] By integrating a low-Earth orbit (LEO) satellite communication module within the travel converter, it supports mainstream LEO satellite communication systems such as Iridium and Starlink, enabling two-way communication globally. The satellite communication antenna features a built-in high-gain design, located on the top of the casing, achieving stable satellite communication without the need for an external antenna. The satellite communication module boasts extremely low power consumption, not exceeding 1mW in standby mode and not exceeding 5W in communication mode. The built-in battery supports at least 100 emergency distress calls.

[0146] The travel converter features a dedicated SOS emergency call button on its side. This button is designed to prevent accidental activation and requires a 5-second press to trigger. In case of an emergency, pressing and holding the SOS button for 5 seconds will automatically activate the travel converter's satellite communication module and send an emergency distress signal.

[0147] The distress signal includes the following information: the user's real-time location information (latitude, longitude, altitude, speed); the device's unique identifier; pre-set emergency contact information; and the user's basic information (name, age, blood type, allergies, etc.). The distress signal is simultaneously sent to the global emergency response center and the mobile phones of three pre-set emergency contacts.

[0148] After sending a distress signal, the travel converter automatically enters two-way communication mode, allowing it to receive responses from global emergency response centers and emergency contacts. Users can input short messages via the touchscreen display to send to response centers and emergency contacts, informing them of the specific emergency and needs. Response centers and emergency contacts can also send short messages to users, informing them of rescue progress and precautions. Short messages are limited to 140 characters and support multiple languages ​​including Chinese and English.

[0149] Please see Figure 3 As shown, Figure 3 This is a schematic diagram of the structure of a multimodal AI-interactive global intelligent navigation and cultural explanation system 200 provided in this application embodiment. The multimodal AI-interactive global intelligent navigation and cultural explanation system 200 is used to execute the steps of the multimodal AI-interactive global intelligent navigation and cultural explanation method shown in the above embodiments. The multimodal AI-interactive global intelligent navigation and cultural explanation system 200 can be a single server or a server cluster, or it can be a terminal, such as a handheld terminal, a laptop computer, a wearable device, or a robot.

[0150] like Figure 3 As shown, the multimodal AI interactive global intelligent navigation and cultural interpretation system 200 includes: The data acquisition unit 201 is used to acquire multi-source positioning data, including fused satellite positioning data, base station positioning information and local wireless local area network scanning signals; to perform dynamic weighting processing on the multi-source positioning data, and to generate and output navigation guidance; The request acquisition unit 202 is used to acquire image data and voice requests corresponding to the target area, to identify scenic spot-related information in the target area, to generate multilingual cultural explanation content based on the scenic spot-related information in the target area, and to play the multilingual cultural explanation content. The explanation and playback unit 203 is used to trigger a preset continuous dialogue mechanism after receiving a preset wake-up word after playing the multilingual cultural explanation content, to recognize and semantically analyze the received real-time voice, and to generate corresponding navigation information or explanation information.

[0151] In some embodiments, acquiring multi-source positioning data includes: collecting satellite positioning data, base station positioning information, and local wireless LAN scanning signals corresponding to the current location of the travel converter; verifying the validity of the satellite positioning data, base station positioning information, and local wireless LAN scanning signals respectively; removing invalid data that exceeds a preset error range; and retaining multi-source positioning data that meets the accuracy requirements.

[0152] In some embodiments, the dynamic weighting processing of multi-source positioning data to generate and output navigation guidance includes: assigning dynamically changing weight values ​​to the fused satellite positioning data, base station positioning information, and local wireless LAN scanning signals based on the signal strength corresponding to the multi-source positioning data and the current environmental scene; completing the fusion processing of multi-source positioning data based on the assigned weight values; smoothing and predictive compensation of the positioning trajectory obtained after fusion processing; outputting continuous and stable multi-source positioning data in weak signal environments including tunnels and densely built-up areas; generating navigation guidance based on the multi-source positioning data and user-preset destination information; and outputting the navigation guidance through the terminal speaker or the user-bound mobile terminal.

[0153] In some embodiments, identifying attraction-related information in the target area includes: extracting and matching features of target objects in the image data corresponding to the target area to identify attraction identity information corresponding to the target objects; parsing user demand information contained in the voice request to match attraction-related information corresponding to the attraction identity information and user demand information.

[0154] In some embodiments, generating multilingual cultural explanation content based on attraction-related information of the target area and playing the multilingual cultural explanation content includes: generating personalized cultural explanation content in the corresponding language based on the identified attraction-related information, combined with pre-set language preferences and explanation depth requirements, using a natural language processing model, sending the generated cultural explanation content back to the travel converter, playing the cultural explanation content through a speaker, and simultaneously pushing the cultural explanation content to the bound mobile terminal for display and local storage.

[0155] In some embodiments, the step of triggering a preset continuous dialogue mechanism after receiving a preset wake-up word after playing the multilingual cultural explanation content includes: after completing the playback of the multilingual cultural explanation content, collecting audio data in the environment where the terminal is located, performing wake-up word matching detection on the collected audio data, and when audio content matching the preset wake-up word is detected, triggering the continuous dialogue mechanism, opening a real-time voice acquisition and interaction channel of preset duration, and maintaining the association state of the dialogue context.

[0156] In some embodiments, the step of recognizing and semantically parsing the received real-time speech and generating corresponding navigation or explanatory information includes: performing speech recognition and semantic parsing on the real-time speech collected under the continuous dialogue mechanism, combining the contextual understanding model in the cloud to complete the dissociation of references and continuous semantic association, and completing cross-language real-time translation according to the pre-set language; and / or, generating corresponding navigation or supplementary explanatory information based on the user needs obtained from semantic parsing, playing it through the terminal and synchronously pushing it to the user's bound mobile terminal.

[0157] In some embodiments, the method further includes: collecting corresponding historical travel trajectory data, historical interaction data, and stay duration data at each attraction, so as to extract corresponding travel route preferences, explanation content preferences, and itinerary rhythm preferences through a preset user preference analysis model; generating personalized recommended routes and pre-explanatory content for corresponding attractions based on the user's current location, the real-time opening status of attractions, and cross-border travel arrangements; and triggering the playback of pre-explanatory content when entering the preset trigger range of the corresponding attraction.

[0158] In some embodiments, the method further includes: detecting the current network communication status, positioning signal strength, and power grid compatibility status of the travel converter; when the network communication status is detected to be lower than a preset normal operating threshold, loading pre-downloaded offline navigation map data, attraction explanation data package, and offline speech recognition model for the corresponding area; and completing the positioning and navigation processing, speech recognition parsing, and playback of attraction explanation content locally to maintain the continuity of navigation and explanation services in environments without or with weak networks.

[0159] It should be noted that those skilled in the art will understand that, for the sake of convenience and brevity, the specific working processes of the multimodal AI interactive global intelligent navigation and cultural interpretation system and its modules described above can be found in the corresponding contents of the various embodiments of the multimodal AI interactive global intelligent navigation and cultural interpretation method, and will not be repeated here.

[0160] The aforementioned multimodal AI-interactive global intelligent navigation and cultural interpretation method can be implemented as a computer program, which can be used in, for example... Figure 3 It runs on the device shown.

[0161] Please see Figure 4 , Figure 4 This is a schematic block diagram of the structure of a travel converter provided in an embodiment of this application. The travel converter includes a processor, a memory, and a network interface connected via a device bus, wherein the memory may include a storage medium and internal memory.

[0162] The storage medium can store operating devices and computer programs. The computer program includes program instructions that, when executed, cause the processor to perform any multimodal AI interactive global intelligent navigation and cultural interpretation method.

[0163] The processor provides computing and control capabilities to support the operation of the entire travel converter.

[0164] Internal memory provides an environment for the execution of computer programs in non-volatile storage media. When executed by a processor, the computer program enables the processor to perform any multimodal AI interactive global intelligent navigation and cultural interpretation method.

[0165] This network interface is used for network communication, such as sending assigned tasks. Those skilled in the art will understand that... Figure 4 The structure shown is merely a block diagram of a portion of the structure related to the present application and does not constitute a limitation on the terminal to which the present application is applied. Specific travel converters may include more or fewer components than those shown in the figure, or combine certain components, or have different component arrangements.

[0166] It should be understood that the processor can be a Central Processing Unit (CPU), but it can also be other general-purpose processors, digital signal processors (DSPs), application-specific integrated circuits (ASICs), field-programmable gate arrays (FPGAs), or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc. Among these, a general-purpose processor can be a microprocessor or any conventional processor.

[0167] In one embodiment, the processor is configured to run a computer program stored in memory to perform the following steps: Acquire multi-source positioning data, including fused satellite positioning data, base station positioning information, and local wireless LAN scanning signals; perform dynamic weighting processing on the multi-source positioning data to generate and output navigation guidance; The system acquires image data and voice requests corresponding to the target area, identifies scenic spot information in the target area, generates multilingual cultural explanation content based on the scenic spot information in the target area, and plays the multilingual cultural explanation content. After playing the multilingual cultural explanation content, if a preset wake-up word is received, a preset continuous dialogue mechanism is triggered to recognize and semantically analyze the received real-time voice and generate corresponding navigation or explanation information.

[0168] In some embodiments, acquiring multi-source positioning data includes: collecting satellite positioning data, base station positioning information, and local wireless LAN scanning signals corresponding to the current location of the travel converter; verifying the validity of the satellite positioning data, base station positioning information, and local wireless LAN scanning signals respectively; removing invalid data that exceeds a preset error range; and retaining multi-source positioning data that meets the accuracy requirements.

[0169] In some embodiments, the dynamic weighting processing of multi-source positioning data to generate and output navigation guidance includes: assigning dynamically changing weight values ​​to the fused satellite positioning data, base station positioning information, and local wireless LAN scanning signals based on the signal strength corresponding to the multi-source positioning data and the current environmental scene; completing the fusion processing of multi-source positioning data based on the assigned weight values; smoothing and predictive compensation of the positioning trajectory obtained after fusion processing; outputting continuous and stable multi-source positioning data in weak signal environments including tunnels and densely built-up areas; generating navigation guidance based on the multi-source positioning data and user-preset destination information; and outputting the navigation guidance through the terminal speaker or the user-bound mobile terminal.

[0170] In some embodiments, identifying attraction-related information in the target area includes: extracting and matching features of target objects in the image data corresponding to the target area to identify attraction identity information corresponding to the target objects; parsing user demand information contained in the voice request to match attraction-related information corresponding to the attraction identity information and user demand information.

[0171] In some embodiments, generating multilingual cultural explanation content based on attraction-related information of the target area and playing the multilingual cultural explanation content includes: generating personalized cultural explanation content in the corresponding language based on the identified attraction-related information, combined with pre-set language preferences and explanation depth requirements, using a natural language processing model, sending the generated cultural explanation content back to the travel converter, playing the cultural explanation content through a speaker, and simultaneously pushing the cultural explanation content to the bound mobile terminal for display and local storage.

[0172] In some embodiments, the step of triggering a preset continuous dialogue mechanism after receiving a preset wake-up word after playing the multilingual cultural explanation content includes: after completing the playback of the multilingual cultural explanation content, collecting audio data in the environment where the terminal is located, performing wake-up word matching detection on the collected audio data, and when audio content matching the preset wake-up word is detected, triggering the continuous dialogue mechanism, opening a real-time voice acquisition and interaction channel of preset duration, and maintaining the association state of the dialogue context.

[0173] In some embodiments, the step of recognizing and semantically parsing the received real-time speech and generating corresponding navigation or explanatory information includes: performing speech recognition and semantic parsing on the real-time speech collected under the continuous dialogue mechanism, combining the contextual understanding model in the cloud to complete the dissociation of references and continuous semantic association, and completing cross-language real-time translation according to the pre-set language; and / or, generating corresponding navigation or supplementary explanatory information based on the user needs obtained from semantic parsing, playing it through the terminal and synchronously pushing it to the user's bound mobile terminal.

[0174] In some embodiments, the method further includes: collecting corresponding historical travel trajectory data, historical interaction data, and stay duration data at each attraction, so as to extract corresponding travel route preferences, explanation content preferences, and itinerary rhythm preferences through a preset user preference analysis model; generating personalized recommended routes and pre-explanatory content for corresponding attractions based on the user's current location, the real-time opening status of attractions, and cross-border travel arrangements; and triggering the playback of pre-explanatory content when entering the preset trigger range of the corresponding attraction.

[0175] In some embodiments, the method further includes: detecting the current network communication status, positioning signal strength, and power grid compatibility status of the travel converter; when the network communication status is detected to be lower than a preset normal operating threshold, loading pre-downloaded offline navigation map data, attraction explanation data package, and offline speech recognition model for the corresponding area; and completing the positioning and navigation processing, speech recognition parsing, and playback of attraction explanation content locally to maintain the continuity of navigation and explanation services in environments without or with weak networks.

[0176] This application also provides a computer-readable storage medium storing a computer program that, when executed by a processor, causes the processor to implement the steps of the multimodal AI interactive global intelligent navigation and cultural interpretation method provided in any embodiment of this application.

[0177] The computer-readable storage medium can be an internal storage unit of the travel converter described in the foregoing embodiments, such as the hard drive or memory of the travel converter. Alternatively, the computer-readable storage medium can be an external storage device of the travel converter, such as a plug-in hard drive, SmartMediaCard (SMC), SecureDigital (SD) card, or FlashCard.

[0178] The above description is merely a specific embodiment of this application, but the scope of protection of this application is not limited thereto. Any person skilled in the art can easily conceive of various equivalent modifications or substitutions within the technical scope disclosed in this application, and these modifications or substitutions should all be covered within the scope of protection of this application. Therefore, the scope of protection of this application should be determined by the scope of the claims.

Claims

1. A multimodal AI-interactive global intelligent navigation and cultural interpretation method, characterized in that, Applied to travel converters; the method includes: Acquire multi-source positioning data, including fused satellite positioning data, base station positioning information, and local wireless LAN scanning signals; perform dynamic weighting processing on the multi-source positioning data to generate and output navigation guidance; The system acquires image data and voice requests corresponding to the target area, identifies scenic spot information in the target area, generates multilingual cultural explanation content based on the scenic spot information in the target area, and plays the multilingual cultural explanation content. After playing the multilingual cultural explanation content, if a preset wake-up word is received, a preset continuous dialogue mechanism is triggered to recognize and semantically analyze the received real-time voice and generate corresponding navigation or explanation information.

2. The method according to claim 1, characterized in that, The acquisition of multi-source positioning data includes: The system collects satellite positioning data, base station positioning information, and local wireless LAN scanning signals corresponding to the current location of the travel converter. It then verifies the validity of the satellite positioning data, base station positioning information, and local wireless LAN scanning signals, discarding invalid data that exceeds the preset error range and retaining multi-source positioning data that meets the accuracy requirements.

3. The method according to claim 2, characterized in that, The process of dynamically weighting multi-source positioning data to generate and output navigation guidance includes: Based on the signal strength corresponding to the multi-source positioning data and the current environmental scenario, dynamically changing weight values ​​are assigned to the fusion of satellite positioning data, base station positioning information and local wireless LAN scanning signals, and the fusion processing of multi-source positioning data is completed based on the assigned weight values. The system performs smoothing and prediction compensation on the fused positioning trajectory, outputs continuous and stable multi-source positioning data in weak signal environments including tunnels and densely built-up areas, generates navigation guidance based on the multi-source positioning data and the user's preset destination information, and outputs the navigation guidance through the terminal speaker or the user's bound mobile terminal.

4. The method according to claim 3, characterized in that, The identified attraction-related information for the target area includes: The target objects in the image data corresponding to the collected target area are subjected to feature extraction and feature matching to identify the scenic spot identity information corresponding to the target objects; The system parses the user request information contained in the voice request and matches it with the attraction's identity information and the user request information to obtain the attraction's relevant information.

5. The method according to claim 4, characterized in that, The process of generating multilingual cultural explanation content based on relevant information about attractions in the target area, and then playing the multilingual cultural explanation content, includes: Based on the identified scenic spot information, combined with pre-set language preferences and required depth of explanation, a natural language processing model generates personalized cultural explanation content in the corresponding language. The generated cultural explanation content is then sent back to the travel converter, played through a speaker, and simultaneously pushed to the bound mobile terminal for display and local storage.

6. The method according to claim 1, characterized in that, After playing the multilingual cultural explanation content, if a preset wake word is received, a preset continuous dialogue mechanism is triggered, including: After playing the multilingual cultural explanation content, the system collects audio data from the environment in which the terminal is located. It then performs wake word matching detection on the collected audio data. When audio content that matches the preset wake word is detected, a continuous dialogue mechanism is triggered, opening a real-time voice acquisition and interaction channel for a preset duration to maintain the contextual association of the dialogue.

7. The method according to claim 1, characterized in that, The process of recognizing and semantically parsing the received real-time speech, and generating corresponding navigation or explanatory information, includes: The system performs speech recognition and semantic parsing on real-time speech collected under a continuous dialogue mechanism, combines a cloud-based contextual understanding model to complete referential resolution and continuous semantic association, and performs real-time cross-language translation based on pre-set languages; and / or, Based on the user needs obtained from semantic parsing, corresponding navigation information or supplementary explanatory information is generated, played through the terminal, and simultaneously pushed to the user's bound mobile terminal.

8. The method according to claim 1, characterized in that, The method further includes: Collect relevant historical travel trajectory data, historical interaction data, and stay duration data at each attraction, and extract corresponding travel route preferences, explanation content preferences, and itinerary pace preferences through a preset user preference analysis model; Based on the user's current location, the real-time opening status of attractions, and cross-border travel arrangements, a personalized recommended route and corresponding pre-explanatory content for each attraction are generated to meet the user's needs. When the user enters the preset trigger range of the corresponding attraction, the pre-explanatory content is triggered to play.

9. The method according to claim 1, characterized in that, The method further includes: The system detects the current network communication status, positioning signal strength, and power grid compatibility status of the travel converter. When the network communication status is detected to be lower than the preset normal operating threshold, it loads the pre-downloaded offline navigation map data, attraction explanation data package, and offline speech recognition model for the corresponding area. It can complete local positioning and navigation processing, voice recognition and parsing, and playback of scenic spot explanations locally, maintaining the continuity of navigation and explanation services in environments with no network or weak network.

10. A multimodal AI-interactive global intelligent navigation and cultural interpretation system, applied to a travel converter; characterized in that, For implementing the method as described in any one of claims 1-9; comprising: The data acquisition unit is used to acquire multi-source positioning data, including fused satellite positioning data, base station positioning information and local wireless LAN scanning signals; to perform dynamic weighting processing on the multi-source positioning data, and to generate and output navigation guidance; The request acquisition unit is used to acquire image data and voice requests corresponding to the target area, to identify scenic spot-related information in the target area, to generate multilingual cultural explanation content based on the scenic spot-related information in the target area, and to play the multilingual cultural explanation content. The explanation and playback unit is used to trigger a preset continuous dialogue mechanism if a preset wake-up word is received after the multilingual cultural explanation content is played. The unit then identifies and semantically analyzes the received real-time speech and generates corresponding navigation or explanation information.