Method and apparatus for domain classification of language recognition
By analyzing users' speech history and learning domain classification rules, the domain score and part-of-speech score of voice commands are calculated, and the domain that best matches the user's intent is selected. This solves the problem of inaccurate domain separation in multilingual recognition services in autonomous vehicles, and improves the accuracy of voice command processing and user experience.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- HYUNDAI MOTOR CO LTD
- Filing Date
- 2025-06-04
- Publication Date
- 2026-06-19
AI Technical Summary
In autonomous vehicles, inaccurate domain separation of multiple language recognition services can lead to incorrect processing of user voice commands, causing inconvenience to users.
By analyzing users' speech history, domain classification rules are learned, domain scores and part-of-speech scores for voice commands are calculated, the domain that best matches the user's intent is selected, and the domain classification rules are adjusted based on feedback to reduce errors.
It improves the accuracy of domain classification for voice commands, reduces domain classification errors, and enhances the user experience.
Smart Images

Figure CN122245295A_ABST
Abstract
Description
[0001] Cross-reference to related applications
[0002] This application claims priority to Korean Patent Application No. 10-2024-0189704, filed with the Korean Intellectual Property Office on December 18, 2024, the entire contents of which are incorporated herein by reference. Technical Field
[0003] This disclosure relates to methods and apparatus for language recognition domain classification. More specifically, this disclosure relates to methods and apparatus for selecting a domain of user intent from multiple domains in response to a user's voice command. Background Technology
[0004] The descriptions in this Background section are intended only to enhance the understanding of the background of this disclosure and should not be construed as an admission that they correspond to prior art known to those skilled in the art.
[0005] Language recognition services are technologies used to recognize a user's language and execute commands or provide information, thereby improving driver convenience in autonomous vehicles.
[0006] Autonomous vehicles may be equipped with one or more language recognition services provided by separate programs. For example, language recognition services for navigation devices (or programs) and language recognition services for generative AI applications used for information retrieval may be provided together.
[0007] If the user's voice commands are not accurately separated into domains, navigation domain questions may be answered using results from the generative AI domain, or vice versa, which can be inconvenient for the user.
[0008] Therefore, when providing multiple language recognition services simultaneously, it is necessary to clearly identify the domain to which the user's voice command belongs. Summary of the Invention
[0009] The purpose of this disclosure is to provide a method and apparatus that can eliminate or reduce domain classification errors by analyzing users' discourse patterns based on their discourse history and learning domain classification rules.
[0010] The technical objectives to be achieved by this disclosure are not limited to those described above, and other technical objectives not mentioned above will be clearly understood by those skilled in the art from the detailed description given below.
[0011] Embodiments of this disclosure provide a method for determining a domain for processing a user's voice command from a plurality of domains. The method includes the steps of: obtaining a plurality of words contained in the voice command; calculating a score for the voice command for each domain in the plurality of domains based on a domain score and a part-of-speech score of each word in the plurality of words; and selecting a domain for processing the voice command based on the score of the voice command for each domain in the plurality of domains, wherein the domain score and part-of-speech score of each word in the plurality of words are updated based on whether the selected domain matches the user's intent.
[0012] Another embodiment of this disclosure provides an apparatus for determining a domain for processing a user's voice command from a plurality of domains. The apparatus includes: at least one memory storing instructions; and at least one processor, wherein the at least one processor is configured to perform the following processes by executing instructions: obtaining a plurality of words included in the voice command; calculating a score for the voice command for each of the plurality of domains based on domain scores and part-of-speech scores of each of the plurality of words; and selecting a domain for processing the voice command based on the scores of the voice command for each of the plurality of domains, wherein the domain scores and part-of-speech scores of each of the plurality of words are updated based on whether the selected domain matches the user's intent.
[0013] According to embodiments of this disclosure, it is possible to select the domain to be used by the user from a plurality of domains capable of processing the user's voice commands.
[0014] According to embodiments of this disclosure, even when commands used for multiple domains are similar, domain classification errors can be eliminated or reduced by analyzing user speech patterns based on user speech history and learning domain classification rules.
[0015] According to embodiments of this disclosure, domain classification rules can be specialized to adapt to user characteristics by analyzing user discourse patterns based on user discourse history and learning domain classification rules.
[0016] The advantages of this disclosure are not limited to those described above; other advantages not mentioned above will be clearly understood by those skilled in the art from the description given below. Attached Figure Description
[0017] Figure 1 A block diagram illustrating a domain classification system according to an embodiment of the present disclosure.
[0018] Figure 2 This is a flowchart illustrating the process of learning domain judgment rules in a learning module according to an embodiment of the present disclosure.
[0019] Figure 3This is a flowchart illustrating the process by which a domain classification system according to an embodiment of the present disclosure selects the domain corresponding to a user's voice command and corrects domain classification errors.
[0020] Figure 4 A block diagram illustrating an exemplary computing device that may be used to implement the methods or apparatus according to this disclosure. Detailed Implementation
[0021] In the following description, some exemplary embodiments of the present disclosure will be described in detail with reference to the accompanying drawings. In the following description, the same reference numerals preferably denote the same elements, although these elements are shown in different drawings. Additionally, in the following description of some embodiments, detailed descriptions of known functions and configurations therein will be omitted for clarity and brevity.
[0022] Additionally, terms such as first, second, A, B, (a), (b), etc., are used only to distinguish one component from another and do not imply or suggest the substance, order, or sequence of the components. Throughout the specification, when a part 'includes' or 'contains' a component, that part means that other components are included, but not excluded, unless specifically stated to the contrary. Terms such as 'unit', 'module', etc., refer to one or more units for performing at least one function or operation, which may be implemented by hardware, software, or a combination thereof.
[0023] The following detailed description and appendix Figure 1 This invention is intended to describe exemplary embodiments of the invention and is not intended to represent the only embodiment in which the invention may be practiced.
[0024] Figure 1 A block diagram illustrating a domain classification system according to an embodiment of the present disclosure is shown for illustrative purposes.
[0025] The domain classification system 100 includes a domain determination module 110, a command processing module 120, an error detection module 130, a domain suggestion module 140, a classification history storage module 150, a learning module 160, a first database, and a second database. The domain classification system 100 can be implemented as an embedded device, a server, or an electronic device in an autonomous driving system, etc. (Not necessarily) Figure 1 All modules described herein are essential components, and some modules included in the domain classification system 100 may be added, modified, or removed in other embodiments. Figure 1 The components described herein represent functionally different elements, and one or more components may be integrated into the actual physical environment.
[0026] The domain determination module 110 receives voice commands from the user. The domain determination module 110 can receive the waveform of the voice command or the voice command converted into text data.
[0027] The domain determination module 110 can select (determine) a domain from multiple domains for processing the user's voice commands. For example, the multiple domains can be a navigation domain and a generative AI domain, and the domain determination module 110 can select the navigation domain in response to the user's voice command "guide me home," and select the generative AI domain in response to the user's voice command "find the latest movies." The process by which the domain determination module 110 selects a domain for processing the user's voice commands may be referred to hereinafter as domain classification.
[0028] The domain determination module 110 can select one of multiple domains based on domain classification rules stored in the first database.
[0029] The first database can store default domain classification rules and / or custom domain classification rules based on users' utterance patterns. Default domain classification rules refer to the default domain classification rules initially designed when designing the language recognition system, while custom domain classification rules refer to domain classification rules that reflect the utterance patterns of specific users by analyzing their utterance patterns.
[0030] In one implementation, the domain determination module 110 initially classifies domains based on default domain classification rules, but after accumulating training data based on user usage, it classifies domains based on domain classification rules customized for the corresponding user's speech patterns.
[0031] Domain classification rules include domain score and part-of-speech score for each different word.
[0032] The domain determination module 110 can calculate the scores of voice commands for each of the multiple domains based on domain classification rules.
[0033] In one embodiment, the score for a speech command in each of the multiple domains can be calculated as the sum of the products of the domain scores and part-of-speech scores of the words included in the user's utterance (ΣD). i ·P i ). Here, D i Domain score for each word in a specific domain, and P i The part-of-speech score is calculated for each word. For example, if the user's voice command is "guide the way home", the domain determination module 110 performs part-of-speech tagging (POST) on the voice command string to obtain information about the parts of speech of multiple words ("guide: verb", "route: noun", "to: particle", "home: noun"), and calculates the score of the voice command for each domain in multiple domains (e.g., score). 导航 =1.5 引导,导航 ·1.0 动词 +2.0 路线,导航 1.5名词 +1.5 到,导航 0.5 助词 +1.2 家,导航 1.5 名词 =7.05, fraction 生成式AI =1.5 引导,生成式AI ·1.0 动词 +1.2 路线,生成式AI 1.5 名词 +0.8 到,生成式AI 0.5 助词 +1.1 家,生成式AI 1.5 名词 =5.35).
[0034] The domain determination module 110 can select the domain with the highest score from multiple domains as the domain for processing the user's voice commands. In the example above, since the navigation domain has a score of 7.05 and the generative AI domain has a score of 5.35, the domain determination module 110 is able to select the navigation domain as the domain for processing the user's voice commands.
[0035] Command processing module 120 inputs the user's voice commands into the domain selected by domain determination module 110 and executes the processing results (e.g., performing the operation instructed by the user, displaying the results, or providing voice guidance (text-to-speech (TTS))). In the example above, since domain determination module 110 selects the navigation domain, command processing module 120 inputs the user's voice commands into the navigation domain, sets the vehicle's destination to the user's home based on the processing results of the navigation domain, and then begins driving control.
[0036] Error detection module 130 checks whether the domain classification of domain determination module 110 matches the user's utterance intent. If error detection module 130 determines that the currently selected domain is not the domain the user wants, then error detection module 130 can determine that an error has occurred in domain classification. Error detection module 130 can determine the domain classification error based on user feedback.
[0037] In one implementation, the error detection module 130 may determine that a domain classification error has occurred based on the following judgment rules.
[0038] First, the error detection module 130 can determine a domain classification error based on the user's attempt to perform an operation of another language recognition service.
[0039] Typically, because different execution screens (or trigger screens) are displayed for each domain, users can determine whether the current domain screen is the expected domain screen by viewing the execution screen. Therefore, when a user performs an operation to perform another language recognition service after language recognition has been triggered and the user can identify the currently selected domain (e.g., when the execution screen is displayed), the error detection module 130 can determine that the domain classification of the domain determination module 110 is incorrect. Performing another language recognition service operation could be, for example, the user pressing the push-to-talk (PTT) button again and / or the user issuing the wake word again.
[0040] Second, the error detection module 130 can determine that the domain classification is incorrect based on the user's negative statement about the currently selected domain.
[0041] Even when the autonomous driving system processes the user's voice commands and transmits the processing results to the user (e.g., displaying the results on a screen, providing voice guidance, or performing vehicle control), the speech recognition microphone is activated to receive the next command. Therefore, if a speech pattern indicating an incorrect domain classification (e.g., "What is this?", "What went wrong?", "This happened again", "Stupid", etc.) is input to the speech recognition microphone when the autonomous driving system executes a voice command, the error detection module 130 can determine that the domain classification of the domain determination module 110 is incorrect.
[0042] In an additional embodiment, the error detection module 130 can use a second database containing negative word information to determine whether the user's utterance indicates dissatisfaction with the domain classification. The second database stores general negative expressions and negative expressions spoken by the user. The error detection module 130 can detect negative words in the user's utterance based on the information stored in the second database and quickly determine that the domain classification is incorrect.
[0043] The error detection module 130 can further determine whether the domain classification is valid (i.e., determine that no domain classification error has occurred) based on the following judgment rules. If all or some of the following judgment rules are met, the error detection module 130 can improve the accuracy of domain classification by determining that the domain classification is valid.
[0044] First, if the user selects a domain suggested by the domain suggestion module 140 (e.g., responds "yes" to the suggested domain), then the error detection module 130 can determine that the domain currently selected for the user's current voice command is invalid, and the suggested domain is valid.
[0045] Second, if the user does not retry the same or similar command when the command processing module 120 executes the command using the domain suggested by the domain suggestion module 140, or if the user does not manually input the same or similar command within a predetermined time, then the error detection module 130 can determine that the currently selected domain is valid for the user's current voice command. Since the user tends to perform manual operation within a short period of time when they believe that the speech recognition system has failed to recognize the voice command, the predetermined time can be, for example, 5 seconds.
[0046] Third, if the user does not input another command within a predetermined time (e.g., 10 seconds, or 50% of the total output time (or output information)) when the command processing module 120 outputs the processing result of the user's voice command (e.g., voice guidance), then the error detection module 130 can determine that the currently selected domain is valid for the user's current voice command.
[0047] The domain suggestion module 140 raises and processes user feedback on incorrect domain classification of the current voice command.
[0048] The domain suggestion module 140 receives error information about domain classification from the error detection module 130.
[0049] The domain suggestion module 140 suggests domain selections to the user based on keywords included in the user's voice commands and services provided by the domain. The currently selected domain can be excluded from the candidate domains suggested to the user.
[0050] Preferably, the domain suggestion module 140 can improve user convenience by suggesting service content when suggesting a domain, rather than directly suggesting a specific domain. For example, when a user says "I want to go to a nearby gas station with low gas prices" for the purpose of using the navigation service, and the domain determination module 110 classifies the user's utterance as a generative AI domain, the domain suggestion module 140 can suggest to the user, "Do you want route guidance to a nearby gas station with low gas prices? [Yes] or [No]". Furthermore, for example, when a user says "Find the latest movies" for the purpose of using the generative AI service, and the domain determination module 110 determines that the user's utterance is a navigation domain, the domain suggestion module 140 can suggest to the user, "Can I tell you about the latest movies showing? [Yes] or [No]".
[0051] The classification history storage module 150 stores the domain classification history of the domain determination module 110 and / or the domain suggestion module 140. The data stored in the classification history storage module 150 is used as training data for the learning module 160, which will be described below.
[0052] The data stored in the classification history storage module 150 can be divided into data that is valid in domain classification and data that is invalid in domain classification.
[0053] The first category data represents the matching status between the domain classification of the domain determination module 110 and the user's intent. The first category data includes information such as the content of the user's utterance, the domain selected by the domain determination module 110, and the validity of the domain determination. For example, the first category data may be stored in a format such as [dm = navigation domain, cmd = "I want to go to a nearby gas station with low gas prices", val = true]. Here, dm represents the type of domain selected by the domain determination module 110, cmd represents the voice command issued by the user, and val represents whether the domain classification is valid or invalid. "val = true" indicates that the domain classification is valid, and "val = false" indicates that the domain classification is invalid.
[0054] The second category data represents cases where the domain classification of the domain determination module 110 does not match the user's intent. The second category data includes information such as the content of the user's utterance, the domain selected by the domain determination module 110, and the validity of the domain determination. For example, the second category data may be stored in a format such as [dm = generative AI domain, cmd = "I want to go to a nearby gas station with low gas prices", val = false].
[0055] The third category data represents the situation where a user selects a new domain through the domain suggestion module 140. The third category data includes information such as the content of the user's utterance, the domain selected by the domain determination module 110, and the validity of the domain determination. For example, the third category data may be stored in a format such as [dm = generative AI domain, cmd = "I want to go to a nearby gas station with low gas prices", val = true].
[0056] The learning module 160 is trained to use one or more pieces of classification data stored in the classification history storage module 150 to customize domain classification rules for users and update the domain classification rules of the second database to the learned domain classification rules.
[0057] The first training dataset includes first-class data, the second training dataset includes second-class data, and the third training dataset includes third-class data.
[0058] When performing domain classification based on the user's utterance intent using the first training dataset, the learning module 160 is trained to increase the score of the current domain for the corresponding utterance pattern when subsequent inputs are utterances that are the same as or similar to the user's utterances.
[0059] When performing domain classification with intent using a second training dataset that differs from the user's utterances, the learning module 160 is trained to reduce the score of the current domain for the corresponding utterance pattern when subsequent inputs are utterances that are the same as or similar to the user's utterances.
[0060] When using a third training dataset to modify the domain classification based on the user's utterance intent, the learning module 160 is trained to increase the score of the modified domain for the corresponding utterance pattern when subsequent inputs are the same as or similar to the user's utterance.
[0061] In this way, the learning module 160 is trained to increase the score of the selected domain for valid domain classification and to decrease the score of the selected domain for invalid domain classification, and thus can prevent domain classification errors when a voice command that is the same as or similar to the user's previous voice command is input later.
[0062] The learning process of the learning module 160, which is used to adjust the scores of voice commands for each of the multiple domains, will be described in more detail below.
[0063] Figure 2 A flowchart illustrating the process of learning domain judgment rules in the learning module 160 according to an embodiment of the present disclosure.
[0064] The learning module 160 performs part-of-speech tagging (POST) on the voice command string (cmd) included in each piece of training data (i.e., classification data) (S210). Part-of-speech tagging is the process of extracting words included in a sentence, identifying the part of speech of each word, and assigning a label to each word. For example, part-of-speech tagging is performed on the user's voice command (cmd = "guide the way home") to obtain a POST list in the form of ["guide: verb", "route: noun", "to: particle", "home: noun"].
[0065] The learning module 160 checks the validity (val) of the domain (dm) selected in response to the user's voice command (cmd) and the domain classification of each training data (S220).
[0066] If the domain classification is valid (val = true) (S220 - Yes), then the learning module 160 assigns a higher domain score than before to the selected domain (dm) for words included in the POST list, such that the score of the voice command (cmd) for the selected domain (dm) increases (S230). In the example above, when the user's voice command (cmd = "Guide me home") is classified as a navigation domain (dm = navigation domain), the domain score of the words included in the POST list for the navigation domain increases by a predetermined value (e.g., 0.1) (e.g., P). 引导,导航 : 1.5→1.6, P 路线,导航 =2.0→2.1, P 到,导航 =1.5→1.6, P 家,导航 =1.2→1.3).
[0067] If the domain classification is invalid (val = false) (S220 - No), then the learning module 160 assigns a lower domain score than before to the selected domain (dm) for the words included in the POST list, resulting in a decrease in the score of the voice command (cmd) for the selected domain (dm) (S240). In the example above, when the user's voice command (cmd = "guide home") is classified as a generative AI domain (dm = generative AI domain), the domain score of the words included in the POST list for the generative AI domain is reduced by a predetermined value (e.g., 0.1) (e.g., P). 引导,生成式AI : 1.5→1.4, P 路线,生成式AI =2.0→1.9, P 到,生成式AI =1.5→1.4, P 家,生成式AI =1.2→1.1).
[0068] In one implementation, the learning module 160 may assign higher part-of-speech scores to nouns and verbs in the discourse than to other parts of speech. Furthermore, in one implementation, the part-of-speech score of nouns may be set higher than that of verbs. For example, the learning module 160 may set the part-of-speech score of words that are nouns to 1.5 and the part-of-speech score of words that are verbs to 1.0. This is because nouns generally have a higher relevance to the domain than verbs. For example, a user issuing a command containing "my home" might be using a navigation domain, such as setting a destination, while a user issuing a command containing "movies" might be using a generative AI domain, such as requesting information about movies.
[0069] By adjusting the domain scores and part-of-speech scores of words included in the user's utterance through the learning module 160, the desired domain classification can be achieved when the user speaks utterances similar to previous utterances in the future.
[0070] When the learning is complete, the learning module 160 updates the domain judgment rules stored in the first database to the modified domain judgment rules (S250).
[0071] Figure 3 This is a flowchart illustrating the process by which a domain classification system 100, according to one embodiment of the present disclosure, selects a domain corresponding to a user's voice command and updates the domain determination rules.
[0072] The domain classification system 100 receives voice commands from the user (S305).
[0073] The domain classification system 100 selects the domain for processing the user's voice command from multiple domains (S310). At this time, the domain classification system 100 can select the domain corresponding to the user's voice command based on the domain judgment rules stored in the first database.
[0074] Domain classification system 100 determines whether the selected domain is the domain the user wants to use (S315). At this time, domain classification system 100 may determine a domain classification error based on the user performing an operation on another domain. If a domain classification error is detected (S315 - Yes), domain classification system 100 suggests to the user whether to change the domain (S350). If no domain classification error is detected (S315 - No), domain classification system 100 displays the execution screen (e.g., trigger screen, prompt screen, etc.) of the selected domain to the user (S320).
[0075] Even when the execution screen is displayed, the domain classification system 100 determines whether the selected domain is the domain the user wants to use (S325). At this time, the domain classification system 100 may determine a domain classification error based on the user performing an operation in another domain or the user's negative utterance regarding the currently selected domain. If a domain classification error is detected (S325 - Yes), the domain classification system 100 suggests to the user whether to change the domain (S350). If no domain classification error is detected (S325 - No), the domain classification system 100 outputs the result obtained by processing the user's voice command according to the selected domain (S330).
[0076] Even when outputting results, the domain classification system 100 determines whether the selected domain is the domain the user wants to use (S335). At this time, the domain classification system 100 can determine a domain classification error based on the user's negative utterance regarding the currently selected domain. If a domain classification error is detected (S335 - Yes), the domain classification system 100 suggests to the user whether to change the domain (S350). If no domain classification error is detected (S335 - No), the domain classification system 100 stores classification history data including information about the user's voice commands and the corresponding domains (S340).
[0077] Domain classification system 100 determines whether the user has responded to the domain change suggestion and selected a new domain (S355). If the user has already selected a new domain (S355 - Yes), then domain classification system 100 changes the current domain to the domain selected by the user (S360) and stores classification history data including information about the user's voice command, the domain before the change, and the domain after the change (S340). If the user has not selected a new domain (S355 - No), then domain classification system 100 stores classification history data including information about the user's voice command and the currently selected domain (S340).
[0078] For reference Figure 2 The domain classification system 100 uses stored domain classification history as training data to learn domain classification rules (S345).
[0079] Figure 4A block diagram illustrating an exemplary computing device that can be used to implement the methods or apparatus according to this disclosure.
[0080] The computing device 400 may include some or all of memory 410, processor 420, storage device 430, input / output interface 440, and communication interface 450. The computing device 400 may structurally and / or functionally include at least a portion of the apparatus according to this disclosure. The computing device 400 may be a fixed computing device, such as a desktop computer or server, and a mobile computing device, such as a laptop computer, smartphone, or in-vehicle electronic system. The computing device 400 may be implemented as any dedicated hardware accelerator capable of efficiently processing computations of artificial intelligence models. For example, the computing device 400 may include a graphics processing unit (GPU), a tensor processing unit (TPU), or a neural network processing unit (NPU).
[0081] Memory 410 may store programs that cause processor 420 to perform methods or operations according to various embodiments of the present disclosure. For example, the program may include multiple instructions executable by processor 420, and by executing the multiple instructions by processor 420, the methods or operations described above can be performed. Memory 410 may be a single memory or multiple memories. In this case, information required to perform the methods or operations according to various embodiments of the present disclosure may be stored in a single memory, or may be partitioned and stored in multiple memories. When memory 410 consists of multiple memories, the multiple memories may be physically separated. Memory 410 may include at least one of volatile memory and non-volatile memory. Volatile memory includes static random access memory (SRAM) or dynamic random access memory (DRAM), and non-volatile memory includes flash memory.
[0082] Processor 420 may include at least one core capable of executing at least one instruction. Processor 420 may execute instructions stored in memory 410. Processor 420 may be a single processor or multiple processors.
[0083] Even when the power supply to the computing device 400 is cut off, the storage device 430 is able to maintain the stored data. For example, the storage device 430 may include non-volatile memory and may include storage media such as magnetic tape, optical disc, and magnetic disk. Programs stored in the storage device 430 may be loaded into the memory 410 before being executed by the processor 420. The storage device 430 may store files written in a programming language, and programs generated from these files by a compiler or similar means may be loaded into the memory 410. The storage device 430 may store data to be processed by the processor 420 and / or data processed by the processor 420.
[0084] Input / output interface 440 provides an interface for input devices such as a keyboard and mouse and / or output devices such as a display device and a printer. Users can trigger programs executed by processor 420 via input devices and / or check the processing results of processor 420 via output devices.
[0085] The communication interface 450 provides access to an external network. The computing device 400 can communicate with other devices through the communication interface 450.
[0086] The various elements of the apparatus or method according to the invention can be implemented in hardware or software, or a combination of hardware and software. The functions of each element can be implemented in software, and a microprocessor can be used to execute the software functions corresponding to each element.
[0087] Various embodiments of the systems and techniques described herein can be implemented using digital electronic circuits, integrated circuits, field-programmable gate arrays (FPGAs), application-specific integrated circuits (ASICs), computer hardware, firmware, software, and / or combinations thereof. Various embodiments can include implementations utilizing one or more computer programs executable on a programmable system. The programmable system includes at least one programmable processor configured to receive and send data and instructions from and to a storage system, at least one input device, and at least one output device; the programmable processor can be a dedicated processor or a general-purpose processor. The computer program (also referred to as a program, software, software application, or code) includes instructions for the programmable processor and is stored in a computer-readable recording medium.
[0088] Computer-readable recording media can include all types of storage devices capable of storing computer-readable data. Computer-readable recording media can be non-volatile or non-transitory media, such as read-only memory (ROM), random access memory (RAM), optical disc ROM (CD-ROM), magnetic tape, floppy disk, or optical data storage devices. Furthermore, computer-readable recording media can also include transient media such as data transmission media. Moreover, computer-readable recording media can be distributed across computer systems connected via a network, and computer-readable program code can be stored and executed in a distributed manner.
[0089] Although the operations in the flowcharts / timing diagrams of this specification are depicted as sequential, this is merely an exemplary description of the technical concept of one embodiment of this disclosure. In other words, those skilled in the art to which one embodiment of this disclosure pertains will understand that various modifications and changes can be made without departing from the essential characteristics of the embodiments of this disclosure; that is, the order depicted in the flowcharts / timing diagrams can be changed, and one or more operations can be performed in parallel. Therefore, the flowcharts / timing diagrams are not limited to a temporal order.
[0090] Although exemplary embodiments of this disclosure have been described for illustrative purposes, those skilled in the art will understand that various modifications, additions, and substitutions can be made without departing from the spirit and scope of the claimed invention. Therefore, exemplary embodiments of this disclosure have been described for the sake of brevity and clarity. The scope of the technical concept of these embodiments is not limited by the description. Therefore, those skilled in the art will understand that the scope of the claimed invention is not limited to the embodiments explicitly described above, but rather to the scope of the claims and their equivalents.
Claims
1. A method for determining a domain from a plurality of domains for processing a user's voice commands, the method comprising the steps of: Obtain multiple words contained in the voice command; The score of the speech command for each of the multiple domains is calculated based on the domain score and part-of-speech score of each word in the multiple words. as well as The domain for processing the voice command is selected based on the score of the voice command for each of the plurality of domains. Specifically, the domain score and part-of-speech score of each word among the plurality of words are updated based on whether the selected domain matches the user's intent.
2. The method of claim 1, wherein, When processing the voice command according to the selected domain, it is determined whether the selected domain matches the user's intent based on at least one additional input from the user to change the domain.
3. The method of claim 2, wherein, The at least one additional input for the user to change the domain includes at least one of pressing a push-to-talk (PTT) button and uttering a wake word.
4. The method according to claim 1, wherein, When processing the voice command according to the selected domain, it is determined whether the selected domain matches the user's intent based on at least one additional utterance of the user, including at least one negative word.
5. The method according to claim 1, wherein, If the selected domain matches the user's intent, the domain score of each word in the plurality of words is updated by increasing the domain score of each word in the plurality of words for the selected domain.
6. The method according to claim 1, wherein, If the selected domain does not match the user's intent, the domain score of each word in the plurality of words is updated by reducing the domain score of each word in the plurality of words for the selected domain.
7. The method according to claim 1, wherein, The part-of-speech scores for nouns and verbs are higher than those for non-nouns and non-verb parts of speech.
8. The method according to claim 7, wherein, The part-of-speech score of a noun is higher than that of a verb.
9. An apparatus for determining a domain from a plurality of domains for processing a user's voice command, the apparatus comprising: At least one memory for storing instructions; as well as At least one processor, The at least one processor is configured to perform the following processing by executing the instructions: To obtain multiple words included in the voice command; The score of the speech command for each of the multiple domains is calculated based on the domain score and part-of-speech score of each word in the multiple words; and The domain for processing the voice command is selected based on the score of the voice command for each of the plurality of domains. Specifically, the domain score and part-of-speech score of each word among the plurality of words are updated based on whether the selected domain matches the user's intent.