Hotel intelligent service terminal based on multi-round dialogue and interaction method

What is AI technical title?
AI technical title is built by PatSnap AI team. It summarizes the technical point description of the patent document.
By employing a multi-turn dialogue mechanism and voice acquisition and noise reduction technology, the problem of voice interaction in the complex acoustic environment of hotels has been solved, enabling accurate intent recognition and work order generation in different scenarios, thereby improving the intelligence of hotel services and management efficiency.

CN122309674APending Publication Date: 2026-06-30HANGZHOU MEISU ZAITU NETWORK TECH CO LTD

Patent Information

Authority / Receiving Office: CN · China
Patent Type: Applications(China)
Current Assignee / Owner: HANGZHOU MEISU ZAITU NETWORK TECH CO LTD
Filing Date: 2026-04-03
Publication Date: 2026-06-30

AI Technical Summary

Technical Problem

Existing hotel service systems have weak anti-interference capabilities in voice interaction under complex acoustic environments, low intelligence in multi-turn dialogues, and are unable to adapt to user needs in different scenarios. Furthermore, they lack a systematic user intent recognition and information completion mechanism, resulting in deviations in demand recognition and inaccurate information delivery.

Method used

It adopts a multi-turn dialogue mechanism combined with voice acquisition and noise reduction technology. The voice interaction module collects and processes voice signals in real time, uses a pre-set hotel industry corpus for intent matching and semantic understanding, generates standardized intent data, and automatically generates a work order framework through the work order management module. Combined with the linkage interaction module, it realizes the linkage execution of smart hardware and third-party systems.

Benefits of technology

It achieves accurate voice acquisition and intent recognition in complex acoustic environments, generates standardized work orders and prioritizes them, improves service collaboration efficiency, reduces manual operation costs, and optimizes user experience and service management efficiency.

✦ Generated by Eureka AI based on patent content.

Smart Images

Figure CN122309674A_ABST

Patent Text Reader

Abstract

This invention discloses a hotel intelligent service terminal and interaction method based on multi-turn dialogue, belonging to the field of intelligent hotel service technology. This invention achieves multi-turn targeted clarification and follow-up questions through dual-mode triggering, multi-array sound pickup and noise reduction processing, combined with a dialogue context memory mechanism. This accurately identifies hotel-specific intentions and completes business entity information, improving the accuracy of intention recognition. It automatically generates standardized work orders based on a work order rule base and prioritizes them through weighted scoring, enabling accurate work order push and real-time status updates, reducing information transmission errors and improving the efficiency of service collaboration among departments. Through a linked interaction module, it achieves rapid response to query-type needs and linkage between intelligent hardware and third-party systems for work order execution needs. Combined with key node audio and video collection and automatic satisfaction feedback, it constructs a closed-loop hotel service across the entire chain. Simultaneously, it can optimize the corpus and semantic graph based on interaction data, continuously improving the level of service intelligence.

Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of intelligent hotel services, and in particular to a hotel intelligent service terminal and interaction method based on multi-turn dialogue. Background Technology

[0002] With the intelligent development of the hotel industry, voice interaction and intelligent work order management have become important directions for improving hotel service efficiency. However, existing hotel service models still have many technical pain points. For example, a hotel room intelligent management method (publication number CN113852849A) allows customers to communicate with an intelligent robot via voice throughout the entire process, controlling the system itself, its applications, and smart devices in the room. It supports both a basic menu navigation interaction mode and a fully intelligent robot mode, allowing users to switch between modes at will by issuing commands. This is user-friendly for all ages, reducing the learning curve. Innovative approximate sound analysis and command analysis modes help the intelligent robot better understand user intentions, achieving accurate voice recognition and intent identification. The intelligent robot replaces hotel staff in responding to user needs promptly, improving hotel service efficiency and saving labor costs.

[0003] However, while the aforementioned patents have enabled voice interaction and basic intelligent control in hotel rooms, the following problems still exist:

[0004] 1. In the existing technology, voice interaction is limited to the single physical scene of the guest room. It relies on the TV terminal and remote control microphone array to achieve local voice collection. The voice collection optimization design is not suitable for complex acoustic environments such as hotel lobby, corridor and floor service desk. The voice anti-interference ability is weak and cannot adapt to the interaction needs of hotel guests or staff in different scenarios. The scenario coverage and usage flexibility of voice interaction are insufficient.

[0005] 2. Existing contextual memory technology is limited to extracting basic instruction information from a single dialogue. It lacks a systematic standard for judging user intent confidence and rules for identifying core business entities. When faced with user needs that are ambiguous in intent or incomplete in information, it lacks a standardized mechanism for targeted clarification and hierarchical completion. It can only perform simple information push, which is prone to demand identification deviations and cannot achieve accurate intent mining and information completion. The level of intelligence and standardization in multi-turn dialogues is low. Summary of the Invention

[0006] The purpose of this invention is to provide a hotel intelligent service terminal and interaction method based on multi-turn dialogue. Through professional voice acquisition and noise reduction and multi-turn dialogue mechanism, it adapts to the complex acoustic environment of hotels, accurately identifies exclusive service intentions and completes core information, realizes the automatic generation, priority sorting and accurate push of service work orders, improves the service collaboration and response efficiency of various departments, and combines audio and video acquisition and traceability of work order execution with automatic satisfaction feedback, taking into account both user experience improvement and hotel service management efficiency optimization, reducing manual operation costs, so as to solve the problems mentioned in the background technology.

[0007] To achieve the above objectives, the present invention provides the following technical solution:

[0008] Hotel smart service terminals based on multi-turn dialogue include:

[0009] The voice interaction module is used to collect voice signals in the hotel scene in real time, extract and process the voice signals, convert the processed voice features into digital voice data, match them with the preset hotel industry corpus, perform semantic understanding and intent classification based on the matching results, and generate standardized intent data.

[0010] The work order management module is used to match standardized intent data with templates based on the work order rule base, generate a work order framework corresponding to the standardized hotel service work order, fill the target entity of the standardized intent data into the target field of the work order framework, generate a standardized hotel service work order, sort it by priority, and push the standardized hotel service work order to the corresponding department service terminal based on the sorting result. After the work order is completed, it automatically triggers a user satisfaction voice feedback and records the feedback result simultaneously.

[0011] The linkage interaction module is used to identify the type of the acquired standardized intent data, distinguish between query-type and work order execution-type intent data based on the identification results, and obtain the work order linkage execution instructions corresponding to the standardized hotel service work orders. Based on the work order linkage execution instructions, standardized linkage control instructions are issued to the corresponding hotel smart hardware and third-party business systems. At the same time, real-time feedback data is transmitted to the voice interaction module, and the progress information of linkage execution is synchronized to the work order management module.

[0012] Furthermore, the voice interaction module includes:

[0013] The voice acquisition and wake-up unit is used to acquire voice signals in real time, perform spatial orientation discrimination and beamforming processing on the acquired voice signals from multiple channels, locate the incident angle and spatial position of the target voice signal, perform real-time discrimination of the target voice signal based on a preset signal-to-noise ratio threshold, filter out and attenuate interference signals below the signal-to-noise ratio threshold, distinguish between voice segments and non-voice segments, determine the start and end points of the valid voice signal, and perform digital encoding and feature extraction on the valid voice signal to output digital voice data with scene identification.

[0014] The multi-turn dialogue interaction unit is used to perform speech conversion processing based on digital speech data. It performs intent matching and semantic understanding with the converted speech data and a preset hotel industry corpus. Based on the dialogue context memory mechanism, it conducts multi-turn dialogues on speech interaction commands with unclear intent or incomplete information, and continuously tracks the state of the multi-turn dialogues to output dialogue text data with complete context.

[0015] The semantic understanding decision unit is used to perform entity recognition, intent classification and demand parsing based on dialogue text data, and to determine risk stratification, generate standardized intent data, and feed back content with risks in the risk stratification to the multi-round dialogue interaction unit to initiate secondary follow-up questions.

[0016] Furthermore, the voice acquisition and wake-up unit performs digital encoding and feature value extraction on the valid voice signal, and outputs digital voice data with scene identifiers, including:

[0017] Based on spatial location, the scene where the effective speech signal is located is determined. Based on the reverberation characteristics of the scene, the effective speech signal is inversely compensated by a combination of blind deconvolution and multi-frame superposition smoothing to obtain the target effective speech signal.

[0018] Pre-establish corresponding encoding patterns based on scene characteristics, associate the encoding patterns with scene characteristics, and establish an adaptive encoding pattern selection mechanism based on the association results;

[0019] A basic coding layer based on pulse code modulation for standardized coding is established; an enhanced coding layer for lossless compression coding of key segments in effective speech signals and lossy compression coding of non-key segments is established; and an adaptive coding layer for coding based on scene feature embedding of adaptive coding parameters is established. The basic coding layer, enhanced coding layer and adaptive coding layer are connected sequentially to obtain a layered coding method.

[0020] The target effective speech signal is digitally encoded based on the adaptive coding mode selection mechanism and hierarchical coding method to obtain digital encoded data.

[0021] Extract time-domain features, frequency-domain features, and business-related domain features from digital coded data, and perform multi-dimensional feature weighted fusion to obtain a multi-dimensional feature vector;

[0022] Based on the matching relationship between scene attributes and multi-dimensional features, the scene attributes corresponding to the multi-dimensional feature vectors are determined, and the scene attributes are matched with the scene identifier library to obtain the target scene identifier.

[0023] Data is encapsulated in the order of scene identifier, multi-dimensional feature vector, and digital encoded data to obtain digital voice data with scene identifier.

[0024] Furthermore, the specific process of the multi-turn dialogue interaction unit conducting multi-turn dialogue includes:

[0025] Speech-to-text conversion is performed based on digitized voice data to obtain initial interactive text;

[0026] Based on a pre-set hotel industry corpus and semantic similarity algorithm, global semantic matching is performed on the initial interactive text to obtain the initial intent category and the corresponding intent confidence score.

[0027] Based on the initial intent category, intent confidence score, and preset confidence threshold, determine whether the user intent is clear and whether the demand information is complete.

[0028] Based on the named entity recognition rules, the initial interactive text is used to identify target business entities, and the target business entity recognition results are obtained.

[0029] When the intent confidence score is lower than the preset confidence threshold, or when no target business entity such as room number, user type, or demand content is identified in the initial interaction text, it is judged as an unclear intent or incomplete information voice interaction command.

[0030] Based on the dialogue context memory mechanism, the current interaction scenario, historical question and answer text and the missing state of the target business entity are obtained and maintained to form a context state vector.

[0031] Based on the context state vector, the initial intent category, and the missing type of the target business entity, generate targeted clarification and follow-up questions for the corresponding scenario;

[0032] Voice broadcast is based on targeted clarification and follow-up questions to obtain user feedback voice signals and convert them into feedback text.

[0033] Based on the feedback text and context state vector, the initial intent category, intent confidence, and target business entity identification results are updated and completed.

[0034] The process continues until the intent confidence score reaches the preset confidence threshold and all target business entities are fully identified, at which point dialogue text data with complete context is generated and output.

[0035] This invention provides another technical solution, an interaction method based on a multi-turn dialogue hotel smart service terminal, comprising the following steps:

[0036] Terminal interaction trigger: Actively wake up via wake word or passively trigger via telephone access / physical button to start the voice signal acquisition function of the hotel smart service terminal, and perform real-time acquisition, noise reduction, noise removal and digital conversion of voice signals in the hotel scene, extract voice feature information, and obtain digital voice data with scene identification;

[0037] Multi-turn dialogue interaction: Based on digital voice data, speech-to-text and initial semantic understanding are performed sequentially. Combined with a pre-set hotel industry corpus, preliminary intent matching is performed to identify voice interaction commands with unclear intent or incomplete target business entity information. Based on the dialogue context memory mechanism, multiple rounds of targeted clarification and follow-up questions are conducted to complete the intent and target business entity information, and dialogue text data with complete context is generated.

[0038] Semantic parsing and classification: Identify target business entities, classify hotel-specific intents, analyze service needs and determine risk stratification for dialogue text data, verify the legality of extracted target business entities, and generate standardized intent data containing intent type, risk level and target business entity;

[0039] Execution plan determination: Based on standardized intent data, distinguish between query-type requirements and work order execution-type requirements. For query-type requirements, match the corresponding hotel business management system to determine the data query execution path. For work order execution requirements, generate standardized hotel service work orders. At the same time, determine the work order priority ranking results and link the corresponding smart hardware and third-party business systems to execute instructions.

[0040] Business execution feedback: Standardized hotel service work orders are pushed to the corresponding department terminals in sequence based on priority ranking. According to the linkage execution instructions, the corresponding smart hardware and third-party business systems are matched, standardized control instructions are issued and the corresponding smart hardware and third-party business systems are driven to perform operations, and the work order status and execution progress are updated in real time. After the work order is completed, the voice follow-up process is automatically carried out, and the user follow-up results are collected and recorded.

[0041] Furthermore, the multi-turn dialogue interaction specifically includes:

[0042] Based on a pre-set hotel industry corpus, a semantic association graph containing hotel-specific intents, target business entities, and scene-related relationships is constructed. Target keywords in the initial interaction text are obtained. Based on the semantic association graph and the cosine similarity algorithm, the semantic matching score between the target keywords and each hotel-specific intent in the semantic association graph is calculated.

[0043] Obtain the intent category and corresponding intent confidence score of the hotel-specific intent, and construct a dual judgment index by combining the semantic matching score. Compare the intent confidence score and semantic matching score with the preset confidence threshold and semantic matching threshold respectively to determine whether the user's intent is clear.

[0044] Based on the named entity recognition rules, the initial interactive text is used to identify target business entities, obtain the target business entity recognition results, and generate a list of missing target business entities based on the target business entity recognition results.

[0045] Based on the list of missing target business entities, a hierarchical follow-up questioning strategy is generated. The voice signal of the user's follow-up questioning feedback in each round is obtained, converted into feedback text, and the target business entity information is extracted. At the same time, the missing entity identifier and historical question and answer records in the context state vector are updated.

[0046] Based on the updated context state vector, the intent confidence score and semantic matching score are recalculated to determine whether the intent is clear and whether the target business entity is complete, until both judgment indicators reach the corresponding thresholds, and dialogue text data with complete context is generated and output.

[0047] Based on the data from the entire multi-turn dialogue process, the association weights of the semantic association graph are updated and the intent matching templates of the hotel industry corpus are updated.

[0048] Furthermore, the multi-turn dialogue interaction also includes:

[0049] Obtain the initial interactive text after the digitized voice data is converted, and extract intent-related keywords and candidate target business entities from the initial interactive text based on a preset hotel industry corpus;

[0050] Based on semantic similarity algorithm, the intent-related keywords are used to conduct a preliminary exploration of hotel-specific intent, and the preliminary intent category and corresponding preliminary exploration confidence score are obtained;

[0051] Based on the initial intent category, the preset slot template library is called to generate a set of target business entity slots corresponding to the intent category, and the required and optional items of the slots are specified.

[0052] Based on the named entity recognition rules, the legality of candidate target business entities is verified. The candidate target business entities that pass the verification are filled into the corresponding slots, and the unfilled mandatory slots and optional slots are marked.

[0053] Based on the initial confidence score and the completion status of slot filling, determine whether the initial exploration results are credible and whether the required slots are completely filled.

[0054] If the initial confidence score is lower than the preset confidence threshold or the required slots are not fully filled, the initial results and slot filling status will be synchronized to the multi-round dialogue clarification process as key evidence for targeted follow-up questions.

[0055] If the initial confidence score reaches the preset confidence threshold and the required slots are filled completely, the slot filling results are combined with the initial intent category to generate an initial standardized intent fragment.

[0056] Based on the feedback text from each round of user feedback, the slot filling status and initial exploration confidence score are continuously updated, and the initial intent exploration results are dynamically adjusted until a complete multi-round dialogue interaction process is completed.

[0057] Furthermore, the execution plan is determined, specifically including:

[0058] Obtain standardized intent data generated by the semantic parsing and classification steps, and extract intent type, target business entity information, risk level, and urgency of demand from the standardized intent data;

[0059] The standardized intent data is classified and judged according to the intent type, distinguishing between query-type requirements and work order execution-type requirements;

[0060] Fill the key information from the target business entity information into the corresponding target fields of the work order framework to generate a complete standardized hotel service work order.

[0061] Based on the risk level and urgency of demand in the standardized intent data, combined with the work order priority rule base, the priority score of each standardized hotel service work order is calculated, and the work orders are sorted according to the priority score to determine the work order priority ranking result, and marked with three levels of work order identification: urgent, regular, and ordinary.

[0062] Based on the demand type in the target business entity information, the corresponding hotel smart hardware and third-party business systems are matched to generate linkage execution instructions corresponding to the work order execution type demand, and the work order priority sorting results, linkage execution instructions and standardized hotel service work orders are associated and bound.

[0063] Furthermore, the business execution feedback also includes:

[0064] Obtain standardized hotel service work orders, priority ranking results, and linked execution instructions pushed by the execution plan determination steps;

[0065] When a standardized hotel service work order requires on-site handling, the audio and video capture functions of the hotel's smart terminal and the mobile terminal carried by the on-site service personnel are triggered.

[0066] Based on the work order execution progress, real-time audio and video are collected at three key nodes: after work order dispatch, during on-site handling, and after handling is completed. This acquires audio and video data of the entire work order execution process, and encodes and compresses the collected audio and video data to generate standardized audio and video files.

[0067] Obtain the work order number of the standardized hotel service work order, uniquely associate and bind the standardized audio and video files with the corresponding work order number, establish an association mapping relationship between the work order and the audio and video, and synchronously store the audio and video files and the standardized hotel service work order in the terminal database based on the association mapping relationship.

[0068] After a work order is completed, the audio and video data will be used as supporting documentation for the work order execution and simultaneously pushed to the work order management module of the hotel's smart service terminal for work order execution verification and handling of user objections.

[0069] Furthermore, based on the list of missing target business entities, a tiered follow-up inquiry strategy is generated, including:

[0070] Obtain the business feature vector of the target business entity, combine it with the scene feature vector of the target business entity, determine the entity-scene matching coefficient, and pre-match the initial interaction text with the hotel industry corpus to obtain the initial intent confidence. Finally, calculate the missing value of the target business entity.

[0071] Based on the list of missing target business entities, determine the type of each target business entity, including mandatory entities and optional entities;

[0072] Based on the missing quantification value of the target business entity and the type of the target business entity, the follow-up priority coefficient of the target business entity is calculated.

[0073] The list of missing target business entities is sorted in descending order according to the priority coefficient of follow-up questions to obtain the follow-up question priority sequence.

[0074] The target business entities in the first 1 / 3 of the inquiry priority sequence are designated as the priority inquiry batch, the target business entities in the last 1 / 3 of the inquiry priority sequence are designated as the final inquiry batch, and the remaining target business entities are designated as the secondary inquiry batch, thus generating a hierarchical inquiry strategy.

[0075] Compared with the prior art, the beneficial effects of the present invention are:

[0076] 1. This invention solves the challenge of voice acquisition in complex acoustic environments like hotels by employing dual-mode triggering, multi-array sound pickup, and noise reduction processing. Combined with a dialogue context memory mechanism, it enables multi-round targeted clarification and follow-up questioning, accurately identifying hotel-specific intents and supplementing business entity information. This avoids information loss and intent misjudgment issues in single-round interactions, optimizing the voice interaction experience in hotel scenarios and improving intent recognition accuracy. Furthermore, it automatically generates standardized work orders based on a work order rule base and prioritizes them through weighted scoring, enabling precise work order delivery and real-time status updates. This achieves standardized and intelligent management of hotel service work orders, reducing information transmission errors and improving the efficiency of service collaboration across departments.

[0077] 2. This invention enables rapid response to query-type needs and linkage between intelligent hardware and third-party systems for work order execution needs through a linkage and interaction module. Combined with key node audio and video collection and automatic satisfaction feedback, it constructs a closed loop for the entire hotel service chain. At the same time, it can optimize the corpus and semantic graph based on interactive data to continuously improve the level of service intelligence, ensure that service quality is traceable and optimizable, and balance the improvement of user experience with the optimization of hotel service management efficiency, thereby reducing manual operation costs. Attached Figure Description

[0078] Figure 1 This is a schematic diagram of the hotel intelligent service terminal module based on multi-turn dialogue according to the present invention;

[0079] Figure 2 This is a flowchart of the interaction method of the hotel intelligent service terminal based on multi-turn dialogue according to the present invention. Detailed Implementation

[0080] The technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of the present invention, and not all embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of the present invention.

[0081] Please see Figure 1 The present invention provides the following technical solutions:

[0082] Hotel smart service terminals based on multi-turn dialogue include:

[0083] The voice interaction module is used to collect voice signals in the hotel scene in real time, and to extract and process the voice signals to achieve dual modes of active wake-up triggered by wake words and passive triggering by telephone access / physical button. It completes noise reduction, noise removal, and digital conversion of voice signals, converts the processed voice features into digital voice data, and performs intent matching with a preset hotel industry corpus. Based on the matching results, it performs semantic understanding and intent classification, accurately identifies hotel-specific intents such as consultation, customer needs, maintenance, complaints, and order dispatch, and converts voice text into standardized structured intent information to generate standardized intent data.

[0084] The work order management module is used to match standardized intent data with templates based on the work order rule base, generate a work order framework corresponding to the standardized hotel service work order, populate the target fields of the work order framework with the target entities of the standardized intent data, and generate a standardized hotel service work order, including core fields such as work order number, type, room number, demand content, priority, initiation time, and user type, and sort them by priority. Based on the sorting results, the standardized hotel service work orders are pushed to the corresponding department service terminals, and the status of each standardized hotel service work order is updated in real time. After the work order is completed, a user satisfaction voice follow-up is automatically triggered, and the follow-up results are recorded synchronously.

[0085] The linkage interaction module is used to identify the type of the acquired standardized intent data, distinguish between query-type and work order execution-type intent data based on the identification results, call the corresponding hotel business management system to retrieve relevant data for query-type standardized intent data, and match the corresponding hotel smart hardware and third-party business systems for work order execution-type standardized intent data. At the same time, it obtains the work order linkage execution instructions corresponding to the standardized hotel service work orders, and issues standardized linkage control instructions to the corresponding hotel smart hardware and third-party business systems based on the work order linkage execution instructions. Simultaneously, it transmits real-time feedback data such as data query results and linkage execution status of hardware and systems to the voice interaction module, and synchronizes the linkage execution progress information to the work order management module, providing real-time data basis for the work order management module to update the status of standardized hotel service work orders.

[0086] In this embodiment, intelligent management of user needs—from capture and interpretation to execution and feedback—is achieved through a hotel smart service terminal. This allows for efficient service flow without excessive manual intervention, effectively simplifying the cumbersome manual processes in traditional hotel services, avoiding deviations and omissions in the demand transmission process, and ensuring that all types of hotel-related user needs are accurately captured, quickly responded to, and executed in a standardized manner. It also achieves efficient integration and coordinated scheduling of hotel service resources, ensuring the continuity and consistency of service processes, while standardizing service execution standards, enabling traceability and controllability of service links, and providing reliable data support for hotel service optimization. This not only effectively reduces the labor costs and management difficulty of hotel services and improves the efficiency and level of hotel operation and management, but also simplifies the user demand submission process, optimizes the user service experience in the hotel setting, enhances user satisfaction and recognition, and helps hotels improve core service quality and market competitiveness, achieving improved service process efficiency and refined management.

[0087] In this embodiment, the voice interaction module includes:

[0088] The voice acquisition and wake-up unit is equipped with a multi-array high-sensitivity pickup component to achieve omnidirectional acquisition of voice signals in the complex acoustic environment of a hotel. It is configured with a wake-up word library specific to the hotel scene and supports custom addition of wake-up words and sensitivity adjustment. It is used to acquire voice signals in real time. Through multi-channel synchronous acquisition and spatial domain filtering, it performs spatial orientation discrimination and beamforming processing on the acquired voice signals, eliminates interference sources in non-target directions, locates the incident angle and spatial position of the target voice signal, and achieves directional enhancement of the target sound source. Based on a preset signal-to-noise ratio threshold, it performs real-time discrimination of the target voice signal and filters and attenuates interference signals such as ambient noise, air conditioning noise, corridor reverberation, and TV noise below the signal-to-noise ratio threshold. Through short-time energy and zero-crossing rate joint detection, it distinguishes between voice segments and non-voice segments, determines the start and end points of the effective voice signal, completes accurate detection of voice endpoints, eliminates invalid signals other than human voices, and performs digital encoding and feature value extraction on the effective voice signal, outputting digital voice data with scene identification.

[0089] The multi-turn dialogue interaction unit is used to perform speech conversion processing based on digital speech data. It performs intent matching and semantic understanding with the converted speech data and a preset hotel industry corpus. Based on the dialogue context memory mechanism, it conducts multi-turn dialogues on speech interaction commands with unclear intent or incomplete information, and continuously tracks the state of the multi-turn dialogues to output dialogue text data with complete context.

[0090] The semantic understanding decision unit is used to perform entity recognition, intent classification, and demand parsing based on dialogue text data, and to determine risk stratification. It uses named entity recognition algorithm to automatically extract and verify the legality of target business entities such as room number, user type, and demand content. Room number extraction supports multiple expression forms such as "X building X room" and "XX". The content that passes the verification is encapsulated in a structured way to generate standardized intent data containing intent type, risk level, and target business entity. Content with risks in the risk stratification determination is fed back to the multi-turn dialogue interaction unit to initiate secondary follow-up questions.

[0091] In one embodiment, the voice acquisition and wake-up unit performs digital encoding and feature extraction on valid voice signals, and outputs digital voice data with scene identifiers, including:

[0092] Based on spatial location, the scene where the effective speech signal is located is determined. Based on the reverberation characteristics of the scene, the effective speech signal is inversely compensated by a combination of blind deconvolution and multi-frame superposition smoothing to obtain the target effective speech signal.

[0093] Pre-establish corresponding encoding patterns based on scene characteristics, associate the encoding patterns with scene characteristics, and establish an adaptive encoding pattern selection mechanism based on the association results;

[0094] A basic coding layer based on pulse code modulation for standardized coding is established; an enhanced coding layer for lossless compression coding of key segments in effective speech signals and lossy compression coding of non-key segments is established; and an adaptive coding layer for coding based on scene feature embedding of adaptive coding parameters is established. The basic coding layer, enhanced coding layer and adaptive coding layer are connected sequentially to obtain a layered coding method.

[0095] The target effective speech signal is digitally encoded based on the adaptive coding mode selection mechanism and hierarchical coding method to obtain digital encoded data.

[0096] Extract time-domain features, frequency-domain features, and business-related domain features from digital coded data, and perform multi-dimensional feature weighted fusion to obtain a multi-dimensional feature vector;

[0097] Based on the matching relationship between scene attributes and multi-dimensional features, the scene attributes corresponding to the multi-dimensional feature vectors are determined, and the scene attributes are matched with the scene identifier library to obtain the target scene identifier.

[0098] Data is encapsulated in the order of scene identifier, multi-dimensional feature vector, and digital encoded data to obtain digital voice data with scene identifier.

[0099] In this embodiment, scene attributes include, for example, voice interaction distance, user identity verification, and acoustic environment type.

[0100] In this embodiment, the matching relationship between scene attributes and multi-dimensional features is obtained in advance through analysis and processing of historical data.

[0101] In this embodiment, the weighted fusion of time-domain features, frequency-domain features, and service-related domain features is performed in a 3:3:4 ratio.

[0102] In this embodiment, time-domain features include, for example, short-time average energy, short-time average amplitude, short-time autocorrelation coefficient, peak factor, and kurtosis, reflecting the temporal amplitude variation and periodicity of the speech signal; frequency-domain features include, for example, spectral centroid, spectral bandwidth, spectral slope, and spectral roll-off point, reflecting the frequency distribution centroid and energy diffusion range of the speech signal; and business-related domain features include, for example, data features extracted from hotel industry corpora corresponding to keywords related to hotel business (such as room number, maintenance, check-out, and food delivery).

[0103] In this embodiment, for example, a high-definition linear encoding mode is used in the guest room scenario (pure signal, coherent voice) to retain more voice details; a hybrid encoding mode is used in the lobby and corridor scenarios (signal contains slight interference, voice is fragmented) to balance sound quality and anti-interference.

[0104] In this embodiment, key segments include, for example, core terms related to user requests and descriptions related to business entities.

[0105] In this embodiment, adaptive coding parameters are embedded based on scene features. For example, intelligent compression of silent frames is added to the guest room scene, and adaptive adjustment of signal amplitude is added to the lobby scene (the coding gain is dynamically adjusted according to the background noise intensity).

[0106] In this embodiment, the basic coding layer reduces the amount of data while ensuring recognizability; the enhanced coding layer reduces the amount of data while ensuring that key information is not lost; and the adaptive coding layer further optimizes the scene adaptability of the encoded data.

[0107] In this embodiment, for example, the reverberation characteristics of the guest rooms are short reverberation, while the reverberation characteristics of the lobby are long reverberation.

[0108] In this embodiment, the purpose of performing reverse compensation on the effective speech signal based on the combination of blind deconvolution and multi-frame superposition smoothing is to shorten the reverberation tail, restore the original temporal contour of the speech signal, and avoid feature distortion caused by reverberation.

[0109] The beneficial effects of the above design scheme are as follows: By determining the scene through spatial location, and combining blind deconvolution and multi-frame superposition smoothing to perform reverberation inverse compensation on the effective speech signal, the core solution is to address the signal distortion problem caused by reverberation differences in different hotel scenes. By pre-establishing the association between scene characteristics and coding modes, an adaptive selection mechanism is constructed, achieving dynamic matching between coding modes and scene / signal characteristics. Through the fixed association between scene characteristics and coding modes, standardized selection rules are established, avoiding incorrect coding mode selection due to manual intervention, and improving the standardization and automation level of the coding process. By constructing a layered coding architecture consisting of a basic coding layer, an enhanced coding layer, and an adaptive coding layer, the core quality of the speech signal is guaranteed, while data efficiency and scene adaptability are optimized. Compared to a single coding method, its overall performance is more suited to the high-quality and high-efficiency requirements of hotel smart service terminals. Combined with the adaptive coding mode selection mechanism... This solution encodes the target effective speech signal using a hierarchical coding method, achieving personalized, high-quality, and highly adaptable encoded data. It extracts and weights features from the time domain, frequency domain, and business-related domain, providing a high-quality basis for scene identification matching. By matching multi-dimensional feature vectors with scene attributes and combining them with a scene identification library, it obtains the target scene identification, achieving deep association between features and scenes and improving the accuracy of intent recognition. Through data encapsulation, it achieves structured, traceable, and efficient data retrieval. Ultimately, it solves the problems of speech signal distortion and insufficient coding adaptability caused by acoustic differences in various hotel scenarios, and achieves deep association between speech data and scene information. Through hierarchical coding and adaptive selection mechanisms, it balances sound quality, data efficiency, and scene adaptability. Through multi-dimensional feature fusion and scene identification binding, it provides high-quality, highly relevant input data for subsequent semantic understanding and work order generation. The overall solution improves the accuracy and reliability of voice acquisition from hotel smart service terminals, optimizes data processing efficiency and the collaborative capabilities of various modules, provides solid data support for core functions such as multi-turn dialogue interaction and smart hardware linkage, and simultaneously considers user experience and hotel service management efficiency.

[0110] In this embodiment, the hotel-specific wake-up word library is pre-set with basic wake-up words such as "Xiao X Hotel Butler" and "Hotel Service Assistant". Hotel administrators can add custom wake-up words through the central control room service platform. The wake-up sensitivity is adjustable from 1 to 5 levels to adapt to different acoustic environments such as guest rooms, corridors, and lobbies. The system performs directional discrimination on multiple voice signals, with beamforming angle accuracy of ±5°, accurately detects voice endpoints, and eliminates invalid signals such as knocking sounds and footsteps. The valid voice signals are converted into 16bit / 16kHz digital voice data through PCM encoding and "guest room / lobby / floor" scene identifiers are added.

[0111] In this embodiment, the specific process of the multi-turn dialogue interaction unit performing multi-turn dialogue includes:

[0112] Speech-to-text conversion is performed based on digitized voice data to obtain initial interactive text;

[0113] Based on a pre-set hotel industry corpus and semantic similarity algorithm, global semantic matching is performed on the initial interactive text to obtain the initial intent category and the corresponding intent confidence score.

[0114] Based on the initial intent category, intent confidence score, and preset confidence threshold, determine whether the user intent is clear and whether the demand information is complete.

[0115] Based on the named entity recognition rules, the initial interactive text is used to identify target business entities, and the target business entity recognition results are obtained.

[0116] When the intent confidence score is lower than the preset confidence threshold, or when no target business entity such as room number, user type, or demand content is identified in the initial interaction text, it is judged as an unclear intent or incomplete information voice interaction command.

[0117] Based on the dialogue context memory mechanism, the current interaction scenario, historical question and answer text and the missing state of the target business entity are obtained and maintained to form a context state vector.

[0118] Based on the context state vector, the initial intent category, and the missing type of the target business entity, generate targeted clarification and follow-up questions for the corresponding scenario;

[0119] Voice broadcast is based on targeted clarification and follow-up questions to obtain user feedback voice signals and convert them into feedback text.

[0120] Based on the feedback text and context state vector, the initial intent category, intent confidence, and target business entity identification results are updated and completed.

[0121] The above judgment, generation, acquisition, and update steps are executed repeatedly until the intent confidence score reaches the preset confidence threshold and all target business entities are fully identified, generating and outputting dialogue text data with complete context.

[0122] In this embodiment, based on hardware configuration and signal processing mechanisms adapted to hotel scenarios, the system can accurately capture target speech signals in complex acoustic environments, effectively filter out various environmental interferences and invalid signals, achieve directional enhancement of target sound sources and accurate identification of speech endpoints, and ensure that the acquired speech signals have high purity and effectiveness. Combined with a contextual memory mechanism, it can effectively handle voice interaction commands with unclear intentions and incomplete information, ensuring the continuity and effectiveness of voice interaction and improving the user's voice interaction experience. It achieves full-process optimization from voice acquisition and dialogue interaction to semantic understanding, improving the adaptability, accuracy, and reliability of voice interaction in hotel scenarios, simplifying the user's voice interaction operation process, ensuring that user needs can be accurately captured and interpreted, and providing high-quality data support for the efficient operation of subsequent links of hotel intelligent service terminals, thus promoting the transformation of hotel voice interaction services towards refinement and intelligence.

[0123] For a better demonstration of the hotel smart service terminal based on multi-turn dialogue, please refer to [link / reference]. Figure 2 This invention provides an interaction method for a hotel smart service terminal based on multi-turn dialogue, comprising the following steps:

[0124] Terminal interaction trigger: Actively wake up via wake word or passively trigger via telephone access / physical button to start the voice signal acquisition function of the hotel smart service terminal, and perform real-time acquisition, noise reduction, noise removal and digital conversion of voice signals in the hotel scene, extract voice feature information, and obtain digital voice data with scene identification;

[0125] Multi-turn dialogue interaction: Based on digital voice data, speech-to-text and initial semantic understanding are performed sequentially. Combined with a pre-set hotel industry corpus, preliminary intent matching is performed to identify voice interaction commands with unclear intent or incomplete target business entity information. Based on the dialogue context memory mechanism, multiple rounds of targeted clarification and follow-up questions are conducted to complete the intent and target business entity information, and dialogue text data with complete context is generated.

[0126] Semantic parsing and classification: Identify target business entities, classify hotel-specific intents, analyze service needs and determine risk stratification for dialogue text data, verify the legality of extracted target business entities, and generate standardized intent data containing intent type, risk level and target business entity;

[0127] Execution plan determination: Based on standardized intent data, distinguish between query-type requirements and work order execution-type requirements. For query-type requirements, match the corresponding hotel business management system to determine the data query execution path. For work order execution requirements, generate standardized hotel service work orders. At the same time, determine the work order priority ranking results and link the corresponding smart hardware and third-party business systems to execute instructions.

[0128] Business execution feedback: Standardized hotel service work orders are pushed to the corresponding department terminals in sequence based on priority ranking. According to the linkage execution instructions, the corresponding smart hardware and third-party business systems are matched, standardized control instructions are issued and the corresponding smart hardware and third-party business systems are driven to perform operations, and the work order status and execution progress are updated in real time. After the work order is completed, the voice follow-up process is automatically carried out, and the user follow-up results are collected and recorded.

[0129] In this embodiment, standardized semantic parsing and multi-turn dialogue completion structure ensure that user needs are accurately interpreted and transformed into standardized and executable service instructions. The demand classification and matching and linkage execution structure enables accurate differentiation and targeted processing of query-type and work order execution-type demands, improving service resource utilization and execution efficiency. The full-process traceability and automatic follow-up structure enables controllable service execution process and verifiable service results, effectively improving service standardization and user satisfaction. Through systematic optimization and intelligent upgrade of the entire service chain, the reliability, standardization, and efficiency of hotel intelligent services are significantly improved.

[0130] In this embodiment, the multi-turn dialogue interaction specifically includes:

[0131] Based on a pre-set hotel industry corpus, a semantic association graph containing hotel-specific intents, target business entities, and scene-related relationships is constructed. Target keywords in the initial interaction text are obtained. Based on the semantic association graph and the cosine similarity algorithm, the semantic matching score between the target keywords and each hotel-specific intent in the semantic association graph is calculated.

[0132] Obtain the intent category and corresponding intent confidence score of the hotel-specific intent, and construct a dual judgment index by combining the semantic matching score. Compare the intent confidence score and semantic matching score with the preset confidence threshold and semantic matching threshold respectively to determine whether the user's intent is clear.

[0133] Based on the named entity recognition rules, the initial interactive text is used to identify target business entities, obtain the target business entity recognition results, and generate a list of missing target business entities based on the target business entity recognition results, clarifying the missing status of core required entities and optional entities;

[0134] Based on the missing list of target business entities, a hierarchical follow-up questioning strategy is generated. Priority is given to asking about core mandatory entities such as room number and demand type, followed by optional entities such as service time and special requirements. The voice signals of the user's feedback in each round of follow-up questions are obtained, converted into feedback text, and the target business entity information is extracted. At the same time, the missing entity identifier and historical question and answer records in the context state vector are updated to ensure the targeting and efficiency of multi-round dialogue.

[0135] Based on the updated context state vector, the intent confidence score and semantic matching score are recalculated to determine whether the intent is clear and whether the target business entity is complete, until both judgment indicators reach the corresponding thresholds and all core required entities are completed, generating and outputting dialogue text data with complete context.

[0136] Based on the data from the entire multi-turn dialogue process, the association weights of the semantic association graph and the intent matching templates of the hotel industry corpus are updated to optimize the accuracy and efficiency of subsequent multi-turn dialogues.

[0137] In this embodiment, the core mandatory entities are room number and demand type, while the optional entities are service time and special requirements. The hierarchical follow-up questioning strategy adopts the principle of "mandatory first, then optional," and the follow-up questions are concise and clear, such as "What is your room number?" or "What time do you need the service?".

[0138] In this embodiment, the multi-turn dialogue interaction further includes:

[0139] Obtain the initial interactive text after the digitized voice data is converted, and extract intent-related keywords and candidate target business entities from the initial interactive text based on a preset hotel industry corpus;

[0140] Based on semantic similarity algorithm, the intent-related keywords are used to conduct a preliminary exploration of hotel-specific intent, and the preliminary intent category and corresponding preliminary exploration confidence score are obtained;

[0141] Based on the initial intent category, the preset slot template library is called to generate a set of target business entity slots corresponding to the intent category, and the required and optional fields of the slots are specified. For example, the required slots for maintenance intents are room number, maintenance equipment, and fault description, while the optional slot is maintenance time. Unfilled required slots are marked in red.

[0142] Based on the named entity recognition rules, the legality of candidate target business entities is verified. The candidate target business entities that pass the verification are filled into the corresponding slots, and the unfilled mandatory slots and optional slots are marked.

[0143] Based on the initial confidence score and the completion status of slot filling, determine whether the initial exploration results are credible and whether the required slots are completely filled.

[0144] If the initial confidence score is lower than the preset confidence threshold or the required slots are not fully filled, the initial results and slot filling status will be synchronized to the multi-round dialogue clarification process as key evidence for targeted follow-up questions.

[0145] If the initial confidence score reaches the preset confidence threshold and the required slots are filled completely, the slot filling results are combined with the preliminary intent category to generate a preliminary standardized intent fragment, providing data support for subsequent semantic parsing and classification steps.

[0146] Based on the feedback text from each round of user feedback, the slot filling status and initial exploration confidence score are continuously updated, and the initial intent exploration results are dynamically adjusted to ensure the accuracy of the initial intent exploration and slot filling, until a complete multi-round dialogue interaction process is completed.

[0147] In this embodiment, a dual-judgment structure combining semantic association graphs and cosine similarity algorithms, coupled with deep adaptation to a hotel industry corpus, ensures the reliability of user intent recognition. A structure combining a tiered follow-up questioning strategy with a list of missing target business entities clarifies the priority of core mandatory and optional entities, significantly improving the targeting and interaction efficiency of multi-turn dialogues. A linked structure of slot template library calls, mandatory / optional slot marking, and legality verification enables accurate filling and standardized control of target business entities, avoiding interference from invalid entity information. The structure of the semantic association graph and corpus is continuously optimized for dialogue accuracy and efficiency through reverse iteration updating of data throughout the multi-turn dialogue process. Furthermore, the concise and clear follow-up question design and visual slot marking method further optimize the user interaction experience and reduce user communication costs. This provides high-quality dialogue data support for the efficient operation of hotel smart service terminals, while simultaneously improving both user experience and service efficiency.

[0148] In this embodiment, the execution plan determination specifically includes:

[0149] Obtain standardized intent data generated by the semantic parsing and classification steps, and extract intent type, target business entity information, risk level, and urgency of demand from the standardized intent data;

[0150] The standardized intent data is classified and judged according to the intent type, distinguishing between query-type requirements and work order execution-type requirements;

[0151] Fill the key information such as room number, demand content, and user type from the target business entity information into the corresponding target fields of the work order framework to generate a complete standardized hotel service work order.

[0152] Based on the risk level and urgency of demand in standardized intent data, combined with the work order priority rule base, a weighted scoring algorithm is used to calculate the priority score of each standardized hotel service work order. The risk level accounts for 60% and the urgency of demand accounts for 40%, with a score range of 0-100. Work orders are sorted according to their priority scores to determine the priority ranking results, and are marked with three levels of identification: urgent, regular, and ordinary. 90-100 points are urgent work orders, 70-89 points are regular work orders, and 0-69 points are ordinary work orders, which are marked as red, yellow, and green, respectively.

[0153] Based on the demand type in the target business entity information, the corresponding hotel smart hardware and third-party business systems are matched to generate linkage execution instructions corresponding to the work order execution type demand. The work order priority sorting results, linkage execution instructions and standardized hotel service work orders are associated and bound, and pushed synchronously to the business execution feedback steps to provide a basis for subsequent business execution.

[0154] In this embodiment, for query-type requests, based on the request content in the target business entity, the corresponding hotel business management system is matched to determine the data query execution path and data return format for each query-type request; for work order execution-type requests, a preset work order rule library is called, and a standardized hotel service work order framework is generated based on the target entity information of the standardized intent data.

[0155] In this embodiment, the service execution feedback further includes:

[0156] Obtain standardized hotel service work orders, priority ranking results, and linked execution instructions pushed by the execution plan determination steps;

[0157] When a standardized hotel service work order is a repair or complaint type that requires on-site handling, the audio and video capture functions of the hotel's smart terminal and the mobile terminal carried by the on-site service personnel are triggered.

[0158] Based on the work order execution progress, real-time audio and video are collected at three key nodes: after work order dispatch, during on-site handling, and after handling is completed. This process acquires audio and video data of the entire work order execution process and encodes and compresses the collected audio and video data to generate standardized audio and video files with timestamps, work order numbers, and collection node identifiers.

[0159] Obtain the work order number of the standardized hotel service work order, uniquely associate and bind the standardized audio and video files with the corresponding work order number, establish an association mapping relationship between work orders and audio and video, and synchronously store the audio and video files and standardized hotel service work orders in the terminal database based on the association mapping relationship, so as to realize the linkage query and traceability of audio and video data and work order data. The query methods include work order number, room number, initiation time, etc.

[0160] After a work order is completed, the audio and video data will be used as supporting documentation for the work order execution and simultaneously pushed to the work order management module of the hotel's smart service terminal for work order execution verification and user objection handling. If a user raises an objection, hotel management personnel can retrieve the audio and video data for verification and further improve the work order execution feedback.

[0161] In the above embodiments, by classifying intent types and tailoring them accordingly, query-type and work order execution-type requirements are accurately distinguished and matched with corresponding execution paths. This ensures that all types of requirements receive appropriate technical and resource support, improving the accuracy and efficiency of requirement processing. Combined with weighted judgment rules based on risk level and urgency, standardized hotel service work orders are automatically generated and precisely prioritized at three levels, achieving intelligent, automated, and standardized hotel services. A multi-turn dialogue interaction mechanism, combined with contextual memory and self-learning optimization, improves the accuracy of intent recognition. Work order priority ranking and audio / video collection and traceability mechanisms enhance the efficiency and quality of hotel services. A user satisfaction voice feedback mechanism effectively solves problems such as low efficiency of manual interaction, untimely information transmission, and non-standardized service processes in traditional hotel services. This effectively reduces the labor costs and control difficulties of hotel operation and management, improves the standardization and intelligence of service execution, enhances user satisfaction, and provides solid execution and feedback support for the efficient and orderly operation of hotel intelligent service terminals.

[0162] In one embodiment, a tiered follow-up strategy is generated based on a list of missing target business entities, including:

[0163] Obtain the business feature vector of the target business entity, combine it with the scene feature vector of the target business entity, determine the entity-scene matching coefficient, and pre-match the initial interaction text with the hotel industry corpus to obtain the initial intent confidence. Finally, calculate the missing value D of the target business entity.

[0164] ;

[0165] in, Represents the business feature vector. Represents the scene feature vector. This represents the cosine similarity between the business feature vector and the scene feature vector, where e represents the natural constant with a value of 2.72. This indicates that the confidence level of the intent is preset to a certain threshold. Indicates the confidence level of the initial intent;

[0166] Based on the list of missing target business entities, determine the type of each target business entity, including mandatory entities and optional entities;

[0167] Based on the missing quantification value of the target business entity and the type of the target business entity, the follow-up priority coefficient of the target business entity is calculated.

[0168] When the target business entity is a required entity, the follow-up question priority coefficient for the required entity is... The calculation formula is as follows:

[0169] ;

[0170] in, This indicates the total number of business entities in the target business entity missing list. Indicates the number of required entities;

[0171] When the target business entity is an optional entity, the follow-up priority coefficient for optional entities. The calculation formula is as follows:

[0172] ;

[0173] in, Indicates the number of entities that have been completed among the required entities;

[0174] The list of missing target business entities is sorted in descending order according to the priority coefficient of follow-up questions to obtain the follow-up question priority sequence.

[0175] The target business entities in the first 1 / 3 of the inquiry priority sequence are designated as the priority inquiry batch, the target business entities in the last 1 / 3 of the inquiry priority sequence are designated as the final inquiry batch, and the remaining target business entities are designated as the secondary inquiry batch, thus generating a hierarchical inquiry strategy.

[0176] In this embodiment, The exponential correction term represents the deviation from the initial intention confidence level. The larger the deviation between the initial intention confidence level and the threshold, the larger the correction term and the higher the missing quantification value. This is a bias smoothing term to avoid the influence of extreme values on the quantization results.

[0177] In this embodiment, the business feature vector is used to characterize the business attributes, expression characteristics, and execution relevance of the target business entity. The dimension design covers core attributes such as entity necessity, expression diversity, business relevance strength, and verification complexity. The scenario feature vector is used, for example, to characterize the acoustic environment characteristics, business interaction habits, and equipment deployment of the hotel interaction scenario. The dimension design covers core attributes such as background noise intensity, reverberation time, entity usage frequency, and interaction triggering method.

[0178] In this embodiment, the priority coefficient for follow-up questions of mandatory entities is... This indicates that the greater the deviation between the initial intent confidence level and the threshold, the larger this value, and the greater the amplification of the follow-up question priority coefficient. This enables business logic that further increases the follow-up question priority of core mandatory entities when the intent is ambiguous. This means that the more required missing entities there are, the smaller the denominator and the larger the correction term.

[0179] In this embodiment, the priority coefficient for follow-up questions of optional entities, This indicates that the greater the deviation between the initial intent confidence level and the threshold, the higher the priority of follow-up questions for optional entities, but the amplification is far less than that for mandatory entities. This indicates that the lower the completion rate of the required entity, the smaller the correction item; the higher the priority of follow-up questions for optional entities is restricted, thus implementing the business logic that before the core required entity is completed, the priority of follow-up questions for optional entities is suppressed; and only after the core required entity is completed is the priority of follow-up questions for optional entities gradually released.

[0180] In this embodiment, the mandatory entities are the basic prerequisites and core basis for the generation of hotel service work orders and the execution of business. If they are missing, the service object, service scenario or core needs cannot be clearly defined, resulting in the inability to create work orders or advance services. They are the core information that is prioritized to be completed in multi-turn dialogues, such as room number, need type, service object and user type. Optional entities are the optimized supplementary information of hotel services, which do not affect the generation of work orders and the execution of basic services. They are only used to improve the accuracy of services and adapt to the personalized needs of users. If they are missing, services can be advanced according to the hotel's default rules, such as contact person's phone number, service time and additional needs.

[0181] The beneficial effects of the above design scheme are as follows: The matching coefficient is determined by the cosine similarity between the business feature vector and the scene feature vector, objectively reflecting the degree of fit between the business attributes of the business entity and the acoustic-interaction attributes of the scene, making follow-up questions more aligned with the current scene requirements. Entity levels are distinguished by type values (mandatory entity = 1, optional entity = 0), providing a clear logical division for subsequent hierarchical calculation of priority coefficients. By calculating the follow-up priority coefficient for mandatory entities, the more missing items, the larger the correction term, achieving dynamic adaptation where the more urgent the missing items, the higher the follow-up priority. By calculating the follow-up priority coefficient for optional entities, only when… As required entities are gradually completed, their priority coefficients will increase accordingly. This ensures that optional entities are not prioritized for inquiry if required entities are not completed, thus guaranteeing the basic premise for service progress. By establishing a three-tier inquiry strategy, the first batch focuses on the most urgent missing entities among the required entities, the second batch covers the remaining required entities and some highly compatible optional entities, and the last batch supplements low-priority optional entities. This achieves a completion logic of prioritizing core entities and gradually improving them, balancing efficiency and user experience. By optimizing the user interaction experience through phased inquiry, it provides efficient and reliable core support for multi-turn dialogue interaction of hotel smart service terminals.

[0182] The above description is only a preferred embodiment of the present invention, but the scope of protection of the present invention is not limited thereto. Any equivalent substitutions or modifications made by those skilled in the art within the scope of the technology disclosed in the present invention, based on the technical solution and inventive concept of the present invention, should be covered within the scope of protection of the present invention.

Claims

1. A hotel smart service terminal based on multi-turn dialogue, characterized in that: include: The voice interaction module is used to collect voice signals in the hotel scene in real time, extract and process the voice signals, convert the processed voice features into digital voice data, match them with the preset hotel industry corpus, perform semantic understanding and intent classification based on the matching results, and generate standardized intent data. The work order management module is used to match standardized intent data with templates based on the work order rule base, generate a work order framework corresponding to the standardized hotel service work order, fill the target entity of the standardized intent data into the target field of the work order framework, generate a standardized hotel service work order, sort it by priority, and push the standardized hotel service work order to the corresponding department service terminal based on the sorting result. After the work order is completed, it automatically triggers a user satisfaction voice feedback and records the feedback result simultaneously. The linkage interaction module is used to identify the type of the acquired standardized intent data, distinguish between query-type and work order execution-type intent data based on the identification results, and obtain the work order linkage execution instructions corresponding to the standardized hotel service work orders. Based on the work order linkage execution instructions, standardized linkage control instructions are issued to the corresponding hotel smart hardware and third-party business systems. At the same time, real-time feedback data is transmitted to the voice interaction module, and the progress information of linkage execution is synchronized to the work order management module.

2. The hotel smart service terminal based on multi-turn dialogue as described in claim 1, characterized in that, The voice interaction module includes: The voice acquisition and wake-up unit is used to acquire voice signals in real time, perform spatial orientation discrimination and beamforming processing on the acquired voice signals from multiple channels, locate the incident angle and spatial position of the target voice signal, perform real-time discrimination of the target voice signal based on a preset signal-to-noise ratio threshold, filter out and attenuate interference signals below the signal-to-noise ratio threshold, distinguish between voice segments and non-voice segments, determine the start and end points of the valid voice signal, and perform digital encoding and feature extraction on the valid voice signal to output digital voice data with scene identification. The multi-turn dialogue interaction unit is used to perform speech conversion processing based on digital speech data. It performs intent matching and semantic understanding with the converted speech data and a preset hotel industry corpus. Based on the dialogue context memory mechanism, it conducts multi-turn dialogues on speech interaction commands with unclear intent or incomplete information, and continuously tracks the state of the multi-turn dialogues to output dialogue text data with complete context. The semantic understanding decision unit is used to perform entity recognition, intent classification and demand parsing based on dialogue text data, and to determine risk stratification, generate standardized intent data, and feed back content with risks in the risk stratification to the multi-round dialogue interaction unit to initiate secondary follow-up questions.

3. The hotel intelligent service terminal based on multi-turn dialogue according to claim 2, characterized in that, The voice acquisition and wake-up unit performs digital encoding and feature extraction on valid voice signals, and outputs digital voice data with scene identifiers, including: Based on spatial location, the scene where the effective speech signal is located is determined. Based on the reverberation characteristics of the scene, the effective speech signal is inversely compensated by a combination of blind deconvolution and multi-frame superposition smoothing to obtain the target effective speech signal. Pre-establish corresponding encoding patterns based on scene characteristics, associate the encoding patterns with scene characteristics, and establish an adaptive encoding pattern selection mechanism based on the association results; A basic coding layer based on pulse code modulation for standardized coding is established; an enhanced coding layer for lossless compression coding of key segments in effective speech signals and lossy compression coding of non-key segments is established; and an adaptive coding layer for coding based on scene feature embedding of adaptive coding parameters is established. The basic coding layer, enhanced coding layer and adaptive coding layer are connected sequentially to obtain a layered coding method. The target effective speech signal is digitally encoded based on the adaptive coding mode selection mechanism and hierarchical coding method to obtain digital encoded data. Extract time-domain features, frequency-domain features, and business-related domain features from digital coded data, and perform multi-dimensional feature weighted fusion to obtain a multi-dimensional feature vector; Based on the matching relationship between scene attributes and multi-dimensional features, the scene attributes corresponding to the multi-dimensional feature vectors are determined, and the scene attributes are matched with the scene identifier library to obtain the target scene identifier. Data is encapsulated in the order of scene identifier, multi-dimensional feature vector, and digital encoded data to obtain digital voice data with scene identifier.

4. The hotel smart service terminal based on multi-turn dialogue as described in claim 2, characterized in that, The specific process of the multi-turn dialogue interaction unit conducting multi-turn dialogue includes: Speech-to-text conversion is performed based on digitized voice data to obtain initial interactive text; Based on a pre-set hotel industry corpus and semantic similarity algorithm, global semantic matching is performed on the initial interactive text to obtain the initial intent category and the corresponding intent confidence score. Based on the initial intent category, intent confidence score, and preset confidence threshold, determine whether the user intent is clear and whether the demand information is complete. Based on the named entity recognition rules, the initial interactive text is used to identify target business entities, and the target business entity recognition results are obtained. When the intent confidence score is lower than the preset confidence threshold, or when no target business entity is identified in the initial interaction text, it is judged as an unclear intent or incomplete information voice interaction command. Based on the dialogue context memory mechanism, the current interaction scenario, historical question and answer text and the missing state of the target business entity are obtained and maintained to form a context state vector. Based on the context state vector, the initial intent category, and the missing type of the target business entity, generate targeted clarification and follow-up questions for the corresponding scenario; Voice broadcast is based on targeted clarification and follow-up questions to obtain user feedback voice signals and convert them into feedback text. Based on the feedback text and context state vector, the initial intent category, intent confidence, and target business entity identification results are updated and completed. The process continues until the intent confidence score reaches the preset confidence threshold and all target business entities are fully identified, at which point dialogue text data with complete context is generated and output.

5. An interaction method for a hotel smart service terminal based on multi-turn dialogue, using the hotel smart service terminal based on multi-turn dialogue as described in any one of claims 1-4, characterized in that, Includes the following steps: Terminal interaction trigger: Actively wake up via wake word or passively trigger via telephone access / physical button to start the voice signal acquisition function of the hotel smart service terminal, and perform real-time acquisition, noise reduction, noise removal and digital conversion of voice signals in the hotel scene, extract voice feature information, and obtain digital voice data with scene identification; Multi-turn dialogue interaction: Based on digital voice data, speech-to-text and initial semantic understanding are performed sequentially. Combined with a pre-set hotel industry corpus, preliminary intent matching is performed to identify voice interaction commands with unclear intent or incomplete target business entity information. Based on the dialogue context memory mechanism, multiple rounds of targeted clarification and follow-up questions are conducted to complete the intent and target business entity information, and dialogue text data with complete context is generated. Semantic parsing and classification: The dialogue text data is used to identify target business entities, classify hotel-specific intents, analyze service needs and determine risk levels, and verify the legality of the extracted target business entities to generate standardized intent data containing intent type, risk level and target business entity. Execution plan determined: Based on standardized intent data, query-type requirements and work order execution-type requirements are distinguished. Query-type requirements are matched with the corresponding hotel business management system, and standardized hotel service work orders are generated for work order execution requirements. At the same time, the work order priority ranking results are determined and linked with the corresponding smart hardware and third-party business systems to execute instructions. Business execution feedback: Standardized hotel service work orders are pushed to the corresponding department terminals in sequence based on priority ranking. According to the linkage execution instructions, the corresponding smart hardware and third-party business systems are matched, standardized control instructions are issued and the corresponding smart hardware and third-party business systems are driven to perform operations, and the work order status and execution progress are updated in real time. After the work order is completed, the voice follow-up process is automatically carried out, and the user follow-up results are collected and recorded.

6. The interaction method based on a multi-turn dialogue hotel smart service terminal as described in claim 5, characterized in that, The multi-turn dialogue interaction specifically includes: A semantic association graph is constructed based on a pre-set hotel industry corpus. Target keywords are obtained from the initial interactive text, and the semantic matching score between the target keywords and the specific intents of each hotel in the semantic association graph is calculated. Obtain the intent category and corresponding intent confidence score of the hotel-specific intent, and construct a dual judgment index by combining the semantic matching score. Compare the intent confidence score and semantic matching score with the preset confidence threshold and semantic matching threshold respectively to determine whether the user's intent is clear. Based on the named entity recognition rules, the initial interactive text is used to identify target business entities, obtain the target business entity recognition results, and generate a list of missing target business entities based on the target business entity recognition results. Based on the list of missing target business entities, a hierarchical follow-up questioning strategy is generated. The voice signal of the user's follow-up questioning feedback in each round is obtained, converted into feedback text, and the target business entity information is extracted. At the same time, the missing entity identifier and historical question and answer records in the context state vector are updated. Based on the updated context state vector, the intent confidence score and semantic matching score are recalculated to determine whether the intent is clear and whether the target business entity is complete, until both judgment indicators reach the corresponding thresholds, and dialogue text data with complete context is generated and output. Based on the data from the entire multi-turn dialogue process, the association weights of the semantic association graph are updated and the intent matching templates of the hotel industry corpus are updated.

7. The interaction method based on a multi-turn dialogue hotel smart service terminal as described in claim 5, characterized in that, The multi-turn dialogue interaction also includes: Obtain the initial interactive text after the digitized voice data is converted, and extract intent-related keywords and candidate target business entities from the initial interactive text based on a preset hotel industry corpus; Based on semantic similarity algorithm, the intent-related keywords are used to conduct a preliminary exploration of hotel-specific intent, and the preliminary intent category and corresponding preliminary exploration confidence score are obtained; Based on the initial intent category, the preset slot template library is invoked to generate a set of target business entity slots corresponding to the intent category; Based on the named entity recognition rules, the legality of candidate target business entities is verified. The candidate target business entities that pass the verification are filled into the corresponding slots, and the unfilled mandatory slots and optional slots are marked. Based on the initial confidence score and the completion status of slot filling, determine whether the initial exploration results are credible and whether the required slots are completely filled. If the initial confidence score is lower than the preset confidence threshold or the required slots are not fully filled, the initial results and slot filling status will be synchronized to the multi-round dialogue clarification process. If the initial confidence score reaches the preset confidence threshold and the required slots are filled completely, the slot filling results are combined with the initial intent category to generate an initial standardized intent fragment. Based on the feedback text from each round of user feedback, the slot filling status and initial exploration confidence score are continuously updated, and the initial intent exploration results are dynamically adjusted until a complete multi-round dialogue interaction process is completed.

8. The interaction method based on a multi-turn dialogue hotel smart service terminal as described in claim 5, characterized in that, The execution plan is determined, specifically including: Obtain standardized intent data generated by the semantic parsing and classification steps, and extract intent type, target business entity information, risk level, and urgency of demand from the standardized intent data; The standardized intent data is classified and judged according to the intent type, distinguishing between query-type requirements and work order execution-type requirements; Fill the key information from the target business entity information into the corresponding target fields of the work order framework to generate a complete standardized hotel service work order. Based on the risk level and urgency of demand in the standardized intent data, combined with the work order priority rule base, the priority score of each standardized hotel service work order is calculated, and the work orders are sorted according to the priority score to determine the work order priority ranking result, and marked with three levels of work order identification: urgent, regular, and ordinary. Based on the demand type in the target business entity information, the corresponding hotel smart hardware and third-party business systems are matched to generate linkage execution instructions corresponding to the work order execution type demand, and the work order priority sorting results, linkage execution instructions and standardized hotel service work orders are associated and bound.

9. The interaction method based on a multi-turn dialogue hotel smart service terminal as described in claim 5, characterized in that, The business execution feedback also includes: Obtain standardized hotel service work orders, priority ranking results, and linked execution instructions pushed by the execution plan determination steps; When a standardized hotel service work order is an on-site handling work order, the audio and video capture functions of the hotel's smart terminal and the mobile terminal carried by the on-site service personnel are triggered. Based on the work order execution progress, real-time audio and video are collected at three key nodes: after work order dispatch, during on-site handling, and after handling is completed. This acquires audio and video data of the entire work order execution process, and encodes and compresses the collected audio and video data to generate standardized audio and video files. Obtain the work order number of the standardized hotel service work order, uniquely associate and bind the standardized audio and video files with the corresponding work order number, establish an association mapping relationship between the work order and the audio and video, and synchronously store the audio and video files and the standardized hotel service work order in the terminal database based on the association mapping relationship. After a work order is completed, the audio and video data will be used as supporting documentation for the work order execution and simultaneously pushed to the work order management module of the hotel's smart service terminal for work order execution verification and handling of user objections.

10. The interaction method based on a multi-turn dialogue hotel smart service terminal as described in claim 6, characterized in that, Based on the list of missing target business entities, a tiered follow-up questioning strategy is generated, including: Obtain the business feature vector of the target business entity, combine it with the scene feature vector of the target business entity, determine the entity-scene matching coefficient, and pre-match the initial interaction text with the hotel industry corpus to obtain the initial intent confidence. Finally, calculate the missing value of the target business entity. Based on the list of missing target business entities, determine the type of each target business entity, including mandatory entities and optional entities; Based on the missing quantification value of the target business entity and the type of the target business entity, the follow-up priority coefficient of the target business entity is calculated. The list of missing target business entities is sorted in descending order according to the priority coefficient of follow-up questions to obtain the follow-up question priority sequence; The target business entities in the first 1 / 3 of the inquiry priority sequence are designated as the priority inquiry batch, the target business entities in the last 1 / 3 of the inquiry priority sequence are designated as the final inquiry batch, and the remaining target business entities are designated as the secondary inquiry batch, thus generating a hierarchical inquiry strategy.