Tourism user demand mining system based on multi-modal data
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- SHEXIANGJIA (SICHUAN) TOURISM CO LTD
- Filing Date
- 2026-04-03
- Publication Date
- 2026-06-30
Smart Images

Figure CN122309712A_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the fields of natural language processing and multimodal intelligent analysis technology, specifically a tourism user demand mining system based on multimodal data. Background Technology
[0002] With the continuous upgrading of mass tourism consumption levels, tourists' travel patterns have gradually shifted from traditional standardized group tours to highly personalized customized tours. Under this trend, intelligent tourism recommendation and itinerary planning systems have emerged. Currently, existing tourism demand mining and recommendation technologies mainly rely on users manually selecting structured tags (such as destination, number of days, and budget range) on the front-end interface, or entering query terms through a simple single text search box. The system then performs rigid keyword matching in a relational database to return a fixed route template.
[0003] However, this traditional processing model reveals significant limitations when dealing with the increasingly complex and ambiguous personalized demands of modern tourists. At the front-end input stage, users' true travel intentions are often multidimensional and unstructured. For example, a user might upload a picture of a "peaceful beach they long for," along with a voice message expressing, "I want to take my elderly relative with mobility issues to a less crowded place to relax for a few days." Existing systems, lacking multimodal intelligent analysis capabilities, struggle to effectively align and deeply integrate pixel-level semantic features in visual space with implicit emotional tendencies and entity associations in natural language. This results in demand mining remaining at a superficial literal matching level, easily overlooking crucial contextual intentions.
[0004] Traditional technologies for resource retrieval and profile building typically employ isolated data table queries, lacking a knowledge graph-based understanding of the potential topological relationships and subordinate attributes among massive tourism resources (such as attractions, hotels, and transportation facilities). Therefore, they cannot perform deep resource reasoning and generalized recommendations through semantic similarity or subgraph matching. Simultaneously, existing systems often present static and fixed user demand profiles. When faced with users' long-term historical consumption habits and specific, sudden vacation demands, they fail to introduce a dynamic decay mechanism based on time series data for weight rebalancing. This results in recommendations that are often rigid and lack timeliness and true personalized depth.
[0005] Furthermore, in the final trip scheduling stage, traditional solutions mostly employ simple rule concatenation or basic greedy algorithms, neglecting the multi-objective strategic considerations between trip matching, transportation transfer costs, and time utilization efficiency. This often results in poor user experiences due to route reversals or excessively strenuous trips. More importantly, these "black box" trip schedules generated by underlying algorithms are often presented directly and rigidly to users. The system cannot trace the generation process behind the schedule, nor can it provide users with logically coherent and persuasive natural language recommendations. This significantly reduces users' trust in customized trips and their willingness to ultimately adopt them.
[0006] Therefore, there is an urgent need in this field for a system or method that can deeply integrate multimodal features, perform intelligent retrieval based on graphs, and dynamically evolve user profiles to generate highly interpretable multi-objective scheduling. Summary of the Invention
[0007] To address the shortcomings of existing technologies, this invention provides a tourism user demand mining system based on multimodal data. This system solves the technical problems of existing intelligent tourism recommendation systems, which suffer from low matching accuracy, lack of timeliness, and low interpretability of the mined customized itinerary intentions due to their inability to deeply integrate and analyze unstructured multimodal demand data, and the lack of dynamic evolution mechanisms for user profiles and multi-objective scheduling capabilities. To achieve the above objectives, the present invention provides a tourism user demand mining system based on multimodal data, comprising: The multimodal data acquisition module is used to acquire multimodal tourism data, including text, voice, and images, input by the user. The semantic analysis and multimodal parsing module is used to segment the text in the multimodal tourism data, perform semantic analysis on each modality of data to extract the semantic features corresponding to each modality, and then obtain the user's tourism demand features through semantic fusion. The knowledge graph retrieval and matching module is used to map the user's tourism demand features to a pre-constructed tourism knowledge graph for semantic retrieval and to obtain candidate tourism resource entities. The demand mining and itinerary generation module is used to construct a user demand profile based on the candidate tourism resource entities and the user's tourism demand characteristics, and generate a target customized itinerary.
[0008] When performing semantic analysis on the speech and text in the multimodal tourism data, the semantic analysis and multimodal parsing module converts the speech in the multimodal tourism data into speech-text; it then uses machine translation to convert the speech-text in the non-target language and the text in the multimodal tourism data into a unified language text; it performs word segmentation and grammatical analysis on the unified language text to extract demand triples containing user, action, and object information; and it uses a pre-trained language model to perform semantic classification and sentiment analysis on the demand triples to obtain text semantic features.
[0009] When performing semantic analysis on images in the multimodal tourism data, the semantic analysis and multimodal parsing module uses a multimodal visual model to perform semantic segmentation and semantic recognition on the images, extracting visual features corresponding to scenes and objects in the images; and uses optical character recognition technology to perform character recognition on the text in the images to supplement the text semantic features.
[0010] When extracting the user's travel demand features and performing semantic fusion, the semantic analysis and multimodal parsing module calculates the association weights between the textual semantic features and the visual features through semantic processing; determines the demand priority in the textual semantic features based on word frequency statistics; and performs semantic fusion between the textual semantic features and the visual features according to the association weights and the demand priority, outputting structured user travel demand features. Specifically, let the textual semantic features be represented as feature vectors. The visual features are represented as feature vectors. The association weight between the text and the visual features is: Furthermore, the priority weight matrix for demand constructed based on word frequency statistics is as follows: The semantic fusion process calculates the structured feature vector of the user's tourism needs using the following formula. :
[0011] This fusion mechanism achieves dynamic balancing of data from different modalities by associating weights and strengthens users’ high-frequency core needs using a priority matrix.
[0012] The system also includes a graph construction module for pre-constructing and obtaining the pre-constructed tourism knowledge graph. The graph construction module defines tourism resource entity types and their attributes, and constructs the relationship structure between various entities; it integrates external tourism data for knowledge extraction and alignment, generates relationship triples between tourism entities, and stores them in a graph database to form the pre-constructed tourism knowledge graph.
[0013] The knowledge graph retrieval and matching module utilizes a knowledge graph embedding model to vectorize the entities and relationships in the pre-built tourism knowledge graph; it vectorizes the user's tourism demand features and calculates the semantic similarity between the vectorized user tourism demand features and the entities in the pre-built tourism knowledge graph for semantic retrieval; and it extracts candidate tourism resource entities associated with the user's tourism demand features through subgraph matching. In the semantic similarity calculation step, let the vectorized user tourism demand features be... The vector representation of tourism resource entities in a knowledge graph is as follows: Then semantic similarity Calculation based on cosine distance metric:
[0014] in, and These represent the magnitudes of the corresponding feature vectors. The system sets a similarity threshold to... Entities with values greater than a preset threshold are selected as candidate tourism resource entities, thereby completing the mapping from cross-modal features to graph entities.
[0015] The system also includes a natural language interaction module, which receives natural language queries from users, uses an intent recognition model to fill in the gaps in the tourism demand information, and, based on a retrieval-enhanced generation architecture, inputs the obtained candidate tourism resource entities as context into a large language model. The large language model then performs machine question answering and outputs recommended guidance and explanation text to the user for the natural language query.
[0016] The demand mining and itinerary generation module integrates the user's travel demand features after semantic fusion to construct a user demand profile including destination preferences, attraction type preferences, and itinerary compactness preferences. When the user's travel demand features are updated based on new multimodal travel data, the user demand profile is recalculated to dynamically update it. During this dynamic update, the system employs a time-series-based decay update mechanism. Let the profile feature vector at a historical time node be... The latest profile feature vector extracted based on the newly added multimodal tourism data is The time decay factor is and The updated user demand profile vector Represented as:
[0017] The demand mining and itinerary generation module uses the candidate tourism resource entities as decision variables and the destination preferences, attraction type preferences, and schedule compactness preferences from the user demand profile as constraints to construct a constraint satisfaction problem-solving model. A multi-objective optimization algorithm is introduced to balance the three optimization objectives of demand matching degree, transportation cost, and time efficiency, generating the customized itinerary. During the solution process of the multi-objective optimization algorithm, a comprehensive objective evaluation function is constructed. Candidate routes generated for quantitative evaluation Let the demand matching degree evaluation function be... The traffic cost assessment function is The time efficiency evaluation function is And set the corresponding non-negative weight coefficients as follows: , , The comprehensive objective evaluation function The optimization objective is to maximize the solution by following these steps:
[0018] The system solves for the solution within the solution space that satisfies the constructed constraints. The candidate route that maximizes the value is selected as the final target customized route.
[0019] The demand mining and itinerary generation module outputs a structured itinerary table containing the target customized itinerary, and calls a pre-built large language model to use the thinking chain technology to label the candidate tourism resource entities contained in the target customized itinerary with the recommendation reasoning process and reasons.
[0020] This invention provides a tourism user demand mining system based on multimodal data. It has the following beneficial effects: 1. This solution completely breaks through the limitations of traditional tourism systems that rely on single text retrieval by employing a multimodal semantic fusion mechanism at the underlying feature level. The system can analyze users' voice requests and inspirational images in parallel, and mathematically align visual scene features with textual request priorities using dynamically calculated association weights. This cross-modal information capture method not only fills the semantic gaps that are easily caused by single expression methods, but also accurately extracts the sensory and aesthetic preferences that users find difficult to concretize in words.
[0021] 2. By leveraging the deep coupling of a pre-trained language model and a tourism knowledge graph, this system achieves a technological leap from shallow "keyword matching" to deep "real intent insight." After extracting the demand triples containing user, action, and object information, the system transforms them into high-dimensional vectors and performs cosine similarity detection within the entity relationship network of the knowledge graph. This process accurately maps scattered explicit user statements (e.g., "taking elderly people to see the beach") to high-quality entities with implicit associations in the knowledge graph (e.g., "a beach with a gentle slope and complete medical facilities"), significantly improving the accuracy of resource mining.
[0022] 3. Addressing the pain point that users often have vague and disjointed needs in the early stages of travel planning, the system innovatively introduces a natural language interaction architecture based on Retrieval Augmentation (RAG). When core needs are detected to be missing, the system dynamically fills in the missing slots using an intent recognition model, proactively providing explanations and guidance through machine-generated questions, much like a professional guide. This mechanism significantly reduces the user's cognitive burden and expression threshold, completing the closed-loop collection of needs information with extremely high interaction efficiency.
[0023] 4. In the core route generation stage, this invention constructs a constraint-satisfying solution model with entities as decision variables and user profiles as boundaries. The scheduling process is no longer a crude accumulation of attractions, but introduces a multi-objective optimization algorithm to find the global optimal solution among three often interdependent parameters: demand matching degree, transportation transfer cost, and schedule compactness. This ensures that the final output itinerary maximizes alignment with the user's personalized interests while guaranteeing time efficiency and physical experience in physical execution.
[0024] 5. When outputting structured itineraries, this system uniquely utilizes the CoT (Coding Principle) technology of a large language model to logically inverse the scheduling results. Simultaneously, the system annotates the contextual basis and reasoning process for selecting each candidate resource entity while delivering the route. This highly interpretable "white-box" recommendation presentation completely eliminates users' concerns about blind scheduling by pure machine algorithms, establishing a high level of trust in the decision-making process. Attached Figure Description
[0025] Figure 1 This is a system structure diagram of the present invention; Figure 2 This is an overall flowchart of the method of the present invention; Figure 3 This is a schematic diagram illustrating the principle of multimodal semantic parsing and feature fusion in this invention. Figure 4 This is a schematic diagram of the tourism knowledge graph architecture and vector retrieval matching of the present invention; Figure 5 This is a flowchart illustrating the natural language interaction and dynamic update of the demand profile in this invention. Figure 6 This is a schematic diagram illustrating the multi-objective optimization scheduling and thought chain rendering of the present invention. Detailed Implementation
[0026] The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of the present invention, and not all embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of the present invention.
[0027] Please see the appendix Figure 1-6 This invention provides a tourism user demand mining system based on multimodal data, including a multimodal data acquisition module, a semantic analysis and multimodal parsing module, a knowledge graph retrieval and matching module, and a demand mining and itinerary generation module.
[0028] The processing flow of this tourism user demand mining system based on multimodal data can include the following steps during actual operation: Acquire multimodal tourism data from user input, including text, voice, and images; The text in the multimodal tourism data is segmented into words, and semantic analysis is performed on each modality to extract the semantic features corresponding to each modality. Then, the user's tourism demand features are obtained through semantic fusion. The user's tourism demand features are mapped to a pre-constructed tourism knowledge graph for semantic retrieval to obtain candidate tourism resource entities; Based on the candidate tourism resource entities and the user's tourism demand characteristics, a user demand profile is constructed, and a target customized itinerary is generated.
[0029] During the multimodal data acquisition phase, the multimodal data acquisition module receives heterogeneous data streams uploaded by the client through a pre-defined application programming interface (API). The system simultaneously listens for and records text commands typed by the user in the interactive interface, recorded audio clips, and uploaded visual images.
[0030] These heterogeneous data are uniformly packaged into a multimodal tourism data stream, with timestamps and user identification added, and then sent to the downstream semantic analysis and multimodal parsing modules for deep feature extraction.
[0031] When performing semantic analysis on the speech and text in the multimodal tourism data, the system first calls the speech recognition engine to convert the captured dynamic speech segments into discrete speech-text sequences.
[0032] Subsequently, the system introduces a machine translation component to convert the non-target language speech text and the initial input text into languages, eliminating the semantic barriers caused by multilingual input and outputting unified language text.
[0033] The system performs rigorous word segmentation and grammatical analysis on the unified language text. During this process, the system utilizes natural language processing techniques to construct a dependency syntax tree, extracting redundant modifiers from complex natural language expressions and accurately extracting requirement triples containing user, action, and object information.
[0034] After extracting the demand triples, the system uses a pre-trained language model to vectorize them, simultaneously performing semantic classification and sentiment analysis to assess users' preferences for specific tourism elements, and then outputs fixed-dimensional text semantic features. Let this text semantic feature be represented as a feature vector in a multi-dimensional vector space. .
[0035] For visual modal data input, the semantic analysis and multimodal parsing modules initiate image processing branches in parallel. The system utilizes a pre-built multimodal visual model to perform pixel-level semantic segmentation and object detection on the image, accurately identifying specific scene types and entity objects contained within the image.
[0036] Based on the above visual recognition results, the system extracts the visual features corresponding to the image, and assumes that these visual features are represented as feature vectors in the same vector space. Simultaneously, the system scans the text regions in the image using optical character recognition technology, converting the extracted background road sign or storefront character information into supplementary word embedding vectors to enrich and correct the aforementioned text semantic features. .
[0037] After extracting the features of each independent modality, the semantic analysis and multimodal parsing module performs cross-modal semantic fusion to obtain comprehensive and structured user travel demand features.
[0038] The system calculates the semantic features of the text through feature cross-referencing and semantic processing algorithms. With the aforementioned visual features The spatial distribution correlation between them is used to output the association weight between them. This weight is used to quantify the relative importance of text descriptions and image content in expressing the current user's true intent.
[0039] Simultaneously, the system performs word frequency statistics on all structured words parsed from the multimodal input. For core words whose frequency exceeds a preset benchmark threshold, the system improves their numerical response in the feature vector, thereby determining the demand priority in the text semantic features, and constructing a demand priority weight matrix accordingly. .
[0040] Based on the aforementioned association weight With the aforementioned demand priority weight matrix The system performs semantic fusion on the text semantic features and the visual features. The underlying mathematical operations of this fusion process follow the following formula, and output a structured feature vector of the user's travel needs. :
[0041] In the above formula, the specific definitions of each mathematical symbol are as follows: This represents the structured user tourism demand feature vector output after cross-modal alignment and fusion. This represents a diagonal weight matrix constructed based on word frequency statistics and demand priority, where the values of the diagonal elements are positively correlated with the demand priority of the corresponding feature dimension; The weight representing the association between the text semantic features and the visual features is strictly limited to a range of values. Within the range; This represents the semantic feature vector of the text after sentiment analysis and optical character assistance. This represents the visual feature vector output after image semantic segmentation and target extraction, and its vector dimension is... Maintain consistency.
[0042] Through the processing of the above formulas, the system achieves numerical unification of unstructured multimodal information in the underlying data space. The introduction of association weights ensures that visual and textual information can be dynamically weighted according to their actual signal-to-noise ratio, while the priority matrix amplifies the core demands repeatedly emphasized by the user, providing accurate input basis without omissions or ambiguities for subsequent graph matching.
[0043] In this embodiment, the system pre-constructs and obtains a pre-built tourism knowledge graph through a graph construction module, which serves as the data foundation supporting the underlying semantic retrieval. In this invention, the system first defines the types of tourism resource entities and their attributes, and rigorously constructs the relationship structure between various entities.
[0044] Specifically, the types of tourism resources include natural landscapes, historical sites, accommodation facilities, and transportation hubs. The system sets specific attribute fields for each type of entity, such as the entity's geographical coordinates, opening hours, visit duration, and average cost per person. Simultaneously, the system defines the topological connectivity relationships between entities, such as "spatial adjacency," "suitable groups," or "subordinate inclusion."
[0045] After establishing the basic architecture, the system integrates external tourism data for knowledge extraction and alignment. The system utilizes named entity recognition and relation extraction models to extract entity nodes from unstructured external travelogue texts and structured geographic information systems, and eliminates ambiguity based on the unique identifiers of the entities.
[0046] After the above extraction and alignment operations, the system generates relational triples between tourism entities. Each triple consists of a head entity, a relational edge, and a tail entity, objectively describing the inherent connections between tourism resources.
[0047] The system persistently stores all generated relation triples in a graph database, thereby forming the pre-built tourism knowledge graph. This graph transforms fragmented tourism information into a structured semantic network, completely breaking through the performance bottleneck of traditional relational databases during multi-hop queries.
[0048] In this invention, when the knowledge graph retrieval and matching module obtains the candidate tourism resource entities, it uses a knowledge graph embedding model to vectorize the entities and relationships in the pre-constructed tourism knowledge graph.
[0049] The system maps high-dimensional sparse entity nodes and relation edges to a low-dimensional continuous semantic vector space through a translation distance model or an embedding framework based on graph neural networks, so that entities with similar attributes or similar topological structures in the graph will cluster together in the vector space.
[0050] Simultaneously, the system performs vectorization and feature space projection on the user travel demand features output by the preceding module to generate a unified dimension query vector for retrieval.
[0051] Subsequently, the system calculates the semantic similarity between the vectorized user travel demand features and the entities in the pre-constructed travel knowledge graph to perform semantic retrieval. In this step, the system employs a cosine distance metric algorithm to directly calculate the cosine of the angle between the query vector and the entity vector in space.
[0052] The specific semantic similarity calculation process follows the formula below:
[0053] In the above formula, the specific definitions of each mathematical symbol are as follows: This represents the semantic similarity score between the user's tourism demand characteristics and specific tourism resource entities; This represents the user's tourism demand feature vector after vectorization and spatial projection. The vector representation of a specific entity in the pre-constructed tourism knowledge graph; and and represent the Euclidean norm of the corresponding eigenvectors, that is, the magnitude of the vector in multidimensional space.
[0054] By traversing and calculating the similarity scores of a large number of potential entities, the system filters the results based on a preset similarity threshold. Entities with scores below the threshold are removed, and high-scoring entities are locked as the initial core anchor nodes.
[0055] After identifying the core anchor nodes, the system initiates subgraph matching within the pre-constructed tourism knowledge graph. Starting from the core anchor nodes, the system performs multi-level traversals along the graph's relational edges to extract a set of associated entities that satisfy user needs and meet the principle of geographical proximity.
[0056] Finally, the set of associated entities extracted through subgraph matching, along with the aforementioned core anchor nodes, is uniformly archived by the system and output as candidate tourism resource entities associated with the user's tourism demand characteristics. These entities establish precise resource boundaries for downstream modules to perform itinerary combination and route deduction.
[0057] In this embodiment, for situations where the user's initial input requirements are vague or the constraints are incomplete, the system is equipped with a natural language interaction module. The system receives the user's natural language query and calls a pre-trained intent recognition model to perform sequence labeling and entity extraction on the input natural language statement.
[0058] During intent recognition, the system compares its data with a pre-defined dictionary of travel demand slots to accurately locate missing key information nodes. Subsequently, the system performs slot filling, proactively providing the user with structured follow-up questions to iteratively complete the missing travel demand information, such as specific budget ranges, types of travel companions, or particular transportation preferences.
[0059] After completing the closed-loop collection of demand information, the system processes the above information based on a retrieval-enhanced generation architecture. The system serializes the candidate tourism resource entities obtained by the knowledge graph retrieval and matching module in the previous stage, using their objective attribute data as a high-quality external knowledge context.
[0060] The system inputs the aforementioned context into a large language model and integrates the user's natural language query to construct combined prompts. Through machine question answering using the large language model, the system anchors the generation boundaries based on externally injected entity knowledge and outputs recommended guidance and explanatory text to the user for the natural language query.
[0061] In this invention, to achieve deep personalization of the itinerary, the demand mining and itinerary generation modules simultaneously initiate the construction of long-term representations of user behavior. The system integrates the user's travel demand features after semantic fusion, and performs multi-dimensional vector alignment of the user's historical interaction records and immediate expressed needs.
[0062] The system constructs a user demand profile in the feature space, including destination preferences, attraction type preferences, and schedule compactness preferences. The schedule compactness preference parameter directly maps to the upper limit of a user's acceptable node coverage density and physical transition frequency within a unit of travel period.
[0063] Because user preferences evolve over time, the system establishes an adaptive iterative mechanism for user profiles. When the user's travel demand characteristics are updated based on newly added multimodal travel data, the system triggers a reconstruction command to recalculate the features of the user demand profile in order to dynamically update the user demand profile.
[0064] When performing feature recalculation, the system employs a time-series-based decay update mechanism. This mechanism ensures that the system absorbs the latest preferences while retaining the underlying memory of long-term stable tendencies by adjusting the fusion weights of the new and old feature vectors.
[0065] The specific dynamic update process follows the formula below:
[0066] In the above formula, the specific definitions of each mathematical symbol are as follows: This represents the updated user demand profile vector output after feature recalculation. This represents the profile feature vectors of historical time points stored internally by the system before receiving new data; This represents the latest profile feature vector obtained by extracting from the newly added multimodal tourism data and performing semantic fusion calculation; This represents the time decay factor, whose value range is strictly defined within... Between these two feature extraction actions, the system dynamically sets the value of this factor based on the time interval between the two actions.
[0067] By introducing the aforementioned time-series decay update formula, the system achieves a smooth transition and real-time correction of demand profiles. This feature recalculation logic effectively filters out the drastic fluctuations in the global preference distribution caused by a single sudden or random query input, providing a stable and accurate user preference constraint boundary for the downstream route scheduling algorithm.
[0068] In this embodiment, after completing the initial selection of resource entities and the dynamic construction of user profiles, the demand mining and itinerary generation module initiates a route scheduling mechanism based on operations research when generating the target customized itinerary. The system treats the candidate tourism resource entities as discrete decision variables and extracts the destination preference, attraction type preference, and schedule compactness preference from the user demand profile. The system transforms these preference features into rigid and flexible constraints with clear physical and temporal boundaries, thereby constructing a constraint satisfaction problem-solving model.
[0069] After defining the constraint space of the above solution model, the system introduces a multi-objective optimization algorithm to avoid trip imbalance caused by a single indicator. For any generated candidate route, the system simultaneously balances the three optimization objectives of demand matching degree, transportation cost, and time efficiency.
[0070] To quantify and evaluate the aforementioned optimization objectives, the system constructs a comprehensive objective evaluation function. For a specific candidate route, the system calculates its demand matching degree, transportation cost, and time efficiency. Demand matching degree represents the sum of the overall similarity scores between the entities contained in the route and the user profile vector; transportation cost aggregates the physical spatial transfer distance between each entity node in the route and the estimated commuting expenditure; time efficiency measures the proportion of effective tour time in the total trip cycle.
[0071] The system incorporates the evaluation metrics from the above three dimensions into a unified scalar solution framework, and the comprehensive objective evaluation function is maximized according to the following optimization objective:
[0072] In the above formula, the specific definitions of each mathematical symbol are as follows: This represents the comprehensive objective evaluation function value for the candidate travel routes; This represents a candidate itinerary route generated by arranging multiple discrete candidate tourism resource entities in a specific time sequence; This represents the demand matching evaluation function value of the candidate route in the feature space; This represents the traffic cost evaluation function value of the candidate route in the physical execution dimension; This represents the time efficiency evaluation function value of the candidate route; , , These represent the non-negative weight coefficients of the corresponding evaluation functions. The system dynamically adjusts the relative proportions of these weight coefficients based on the preferences in the user demand profile.
[0073] In this invention, the system iteratively optimizes discrete combination nodes within the solution space that satisfies the constructed constraints using a heuristic search algorithm. The system finds the optimal node sequence that maximizes the comprehensive objective evaluation function and uses it as the final generated customized itinerary. This solution paradigm with a clear positive and negative reward / penalty mechanism effectively eliminates the negative gains brought by traffic costs, ensuring that the output route achieves a mathematically optimal balance between experience and execution cost.
[0074] Finally, the demand mining and itinerary generation module renders the scheduling results and outputs a structured itinerary containing the target customized itinerary. Simultaneously, the system calls a pre-built large language model to perform semantic-level interpretability reconstruction of the candidate tourism resource entities included in the target customized itinerary.
[0075] The system employs a thought chain technique to guide the large language model to gradually trace back the inherent decision-making logic of the aforementioned multi-objective optimization and entity matching. Through step-by-step reasoning, the large language model explicitly labels the recommendation reasoning process and rationale for each entity node. This mechanism transforms the underlying machine scheduling algorithm into natural language expressions that conform to human logical connections, clearly demonstrating to users the objective basis for resource selection, thereby achieving a deep, white-box presentation of itinerary planning.
[0076] In this embodiment, to intuitively demonstrate the end-to-end execution logic of the underlying computing architecture of the present invention, the system receives a multimodal travel input stream from a specific user. Specifically, the input stream consists of a natural language speech and a real-world image showing a gentle beach and sunset scenery. The text-to-text translation of the natural language speech is: "I hope to take my elderly parents on a vacation to a warm seaside resort, with a relaxed itinerary."
[0077] In this invention, the semantic analysis and multimodal parsing module first uses dependency parsing to extract demand triples such as "bringing the elderly", "beach", "vacation", and "relaxed itinerary" from the transformed text. Then, it maps these triples into text semantic feature vectors using a pre-trained language model. Simultaneously, the system drives the visual model in parallel to perform pixel-level segmentation on the real-world images, extracting visual feature vectors such as "flat beach" and "warm lighting environment". .
[0078] Subsequently, the system assigns higher feature response values to core words such as "elderly" and "relaxed" based on word frequency statistics, and constructs a demand priority weight matrix. The system calculates the association weight between text and image content. And perform semantic fusion computation:
[0079] in, This represents a structured user tourism demand feature vector that characterizes this specific user's needs. The aforementioned feature-level fusion accurately captures the core underlying demand: "low-intensity beach trips suitable for seniors and featuring sunset views."
[0080] The system transforms the generated structured user travel demand feature vector into a query vector. This data is then mapped to the pre-built tourism knowledge graph. The system traverses the destination and attraction nodes in the graph database, calculating the query vector and the vectors of each entity. Semantic similarity:
[0081] in, and These represent the magnitudes of the corresponding vectors. In this step, the system accurately identifies candidate tourism resource entities that are "gentle in slope," "have complete accessibility facilities," and have a "mild climate" by calculating the cosine distance. The system further utilizes subgraph matching to extract age-friendly hotels and short-distance connecting transportation around the core entity along the topological edge of the graph.
[0082] When the system detects that the input stream lacks specific travel days or budget constraints, the natural language interaction module invokes the intent recognition model to determine the appropriate resource and, combined with the initially identified candidate tourism resources, sends a query to the user to obtain supplementary information. After receiving feedback, the system uses a time-series decay update mechanism to reconstruct the user's underlying needs profile.
[0083] in, The system pre-stores the user's historical profile features. This is a new requirement for extraction. The time decay factor, The updated target profile vector takes into account both long-term consumption habits and specific vacation needs in the current period.
[0084] In the itinerary generation phase, the system uses the selected age-friendly attractions and hotels as decision variables to construct a constraint-satisfying problem-solving model based on the aforementioned profile. Several candidate itinerary routes are generated based on the spatial combinations. The system calculates the degree of requirement matching. Transportation costs With time efficiency And perform comprehensive goal optimization:
[0085] Because the profile clearly includes hard boundary features such as "relaxed itinerary" and "carrying elderly people," the system significantly increases the weighting coefficient representing time efficiency. Weight coefficients representing matching degree And strictly amplify the weighting of transportation cost deduction, which represents the degree of physical transfer fatigue. Therefore, the value of the comprehensive objective evaluation function is obtained by solving this problem. The node sequence that is maximized is established as the final target customized iterative process.
[0086] Ultimately, the system outputs a structured itinerary and triggers the reasoning mechanism of the large language model. The system adds natural language explanation text to each stop in the itinerary; for example, the annotation for a specific beach attraction is: "Based on your uploaded daily real-time photos and your need to travel with elderly people, this beach has a gentle slope and is equipped with a complete medical first-aid station, perfectly meeting the standards for a low-intensity vacation." This white-box recommendation output not only verifies the cross-dimensional feature capture accuracy of the multimodal algorithm but also completes a substantial transformation from underlying mathematical tensor calculations to a closed loop of surface-level user experience.
[0087] Although embodiments of the invention have been shown and described, it will be understood by those skilled in the art that various changes, modifications, substitutions and alterations can be made to these embodiments without departing from the principles and spirit of the invention, the scope of which is defined by the appended claims and their equivalents.
Claims
1. A system for mining tourism user demand based on multi-modal data, characterized in that, include: The multimodal data acquisition module is used to acquire multimodal tourism data, including text, voice, and images, input by the user. The semantic analysis and multimodal parsing module is used to segment the text in the multimodal tourism data, perform semantic analysis on each modality of data to extract the semantic features corresponding to each modality, and then obtain the user's tourism demand features through semantic fusion. The knowledge graph retrieval and matching module is used to map the user's tourism demand features to a pre-constructed tourism knowledge graph for semantic retrieval and to obtain candidate tourism resource entities. The demand mining and itinerary generation module is used to construct a user demand profile based on the candidate tourism resource entities and the user's tourism demand characteristics, and generate a target customized itinerary. 2.The system of claim 1, wherein, The semantic analysis and multimodal parsing module, when performing semantic analysis on the speech and text in the multimodal tourism data, is specifically used for: Convert the speech in the multimodal tourism data into speech-to-text; Machine translation is used to convert the spoken text in the non-target language and the text in the multimodal tourism data into a unified language text. The unified language text is subjected to word segmentation and grammatical analysis to extract the requirement triplet containing user, action and object; A pre-trained language model is used to perform semantic classification and sentiment analysis on the required triples to obtain text semantic features.
3. The tourism user demand mining system based on multimodal data according to claim 2, characterized in that, When performing semantic analysis on images in the multimodal tourism data, the semantic analysis and multimodal parsing module is specifically used for: The images are semantically segmented and semantically recognized using a multimodal visual model to extract visual features corresponding to scenes and objects in the images. The text in the image is identified using optical character recognition technology to supplement the semantic features of the text.
4. The tourism user demand mining system based on multimodal data according to claim 3, characterized in that, When the semantic analysis and multimodal parsing module extracts the user's travel demand features and performs semantic fusion, it is specifically used for: The association weights between the text semantic features and the visual features are calculated through semantic processing; Based on word frequency statistics, the demand priority in the text semantic features is determined, and based on the association weight and the demand priority, the text semantic features and the visual features are semantically fused to output the structured user travel demand features.
5. The tourism user demand mining system based on multimodal data according to claim 1, characterized in that, The system also includes a graph construction module, used to pre-build and obtain the pre-built tourism knowledge graph, specifically for: Define the types of tourism resource entities and their attributes, and construct the relationship structure between various entities; External tourism data is integrated for knowledge extraction and alignment to generate relational triples between tourism entities, which are then stored in a graph database to form the pre-constructed tourism knowledge graph.
6. The tourism user demand mining system based on multimodal data according to claim 1, characterized in that, When the knowledge graph retrieval and matching module obtains the candidate tourism resource entities, it is specifically used for: The entities and relationships in the pre-constructed tourism knowledge graph are vectorized using a knowledge graph embedding model. The user's travel demand features are vectorized, and the semantic similarity between the vectorized user's travel demand features and entities in the pre-constructed tourism knowledge graph is calculated for semantic retrieval. Candidate tourism resource entities associated with the user's travel demand features are extracted through subgraph matching.
7. The tourism user demand mining system based on multimodal data according to claim 1, characterized in that, The system also includes a natural language interaction module, which is specifically used for: It receives users' natural language queries and uses an intent recognition model to fill in the gaps in the travel demand information. Based on the retrieval-enhanced generation architecture, the candidate tourism resource entities obtained by the knowledge graph retrieval and matching module are used as context input into the large language model. The large language model performs machine question answering and outputs recommended guidance and explanation text to the user for the natural language query.
8. The tourism user demand mining system based on multimodal data according to claim 1, characterized in that, The requirement mining and itinerary generation module, when constructing and updating the user requirement profile, is specifically used for: By integrating the user's travel demand features after semantic fusion, a user demand profile is constructed that includes destination preference, attraction type preference, and itinerary compactness preference. When the user's travel demand characteristics are updated based on the user's new multimodal travel data, the user demand profile is recalculated to dynamically update the user demand profile.
9. The tourism user demand mining system based on multimodal data according to claim 8, characterized in that, When generating the target customized itinerary, the demand mining and itinerary generation module is specifically used for: Using the candidate tourism resource entities as decision variables and the destination preference, attraction type preference, and schedule compactness preference in the user demand profile as constraints, a constraint satisfaction problem-solving model is constructed. A multi-objective optimization algorithm is introduced to balance the three optimization objectives of demand matching degree, transportation cost and time efficiency, and generate the customized itinerary with the objectives.
10. The tourism user demand mining system based on multimodal data according to claim 9, characterized in that, The demand mining and itinerary generation module is also used for: The output includes a structured itinerary table containing the target customized itinerary, and a pre-built large language model is invoked to use the thinking chain technique to label the candidate tourism resource entities contained in the target customized itinerary with the recommendation reasoning process and reasons.