Method and apparatus for segmenting multi-intent sentence, storage medium and electronic device
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- HAIER YOUJIA INTELLIGENT TECH (BEIJING) CO LTD
- Filing Date
- 2022-08-31
- Publication Date
- 2026-06-23
AI Technical Summary
对每一个设备领域重复上述匹配过程,以实现对包含多个指令意图的语句的意图进行分割,但是这种匹配方式,面对包含多个指令意图的语句都需要多级匹配,不仅计算量大,而且每一级的匹配速度直接影响语句意图的识别效率,容易出现识别效率低,响应不及时的问题出现
[0040]In this embodiment, textual features of the text included in the statement to be segmented, which carries multiple execution intentions, are extracted; the statement to be segmented is converted into a feature image based on the textual features, wherein the feature image is used to represent textual features through image pixels; semantic segmentation is performed on the feature image to generate multiple semantic images, wherein each semantic image is used to represent a target execution intention among multiple execution intentions; target sub-statements corresponding to each semantic image in the multiple semantic images are identified to obtain multiple target sub-statements corresponding to the statement to be segmented. That is, firstly, textual features of the text included in the statement to be segmented are extracted, and the statement to be segmented is converted into a feature image using the textual features. Since the image pixels in the feature image can represent textual features, multiple semantic images can be obtained by performing semantic segmentation on the feature image to segment the multiple execution intentions included in the statement to be segmented into multiple corresponding semantic images. Finally, target sub-statements corresponding to each semantic image in the multiple semantic images are identified to obtain multiple target sub-statements corresponding to the statement to be segmented. In other words, during the semantic segmentation of the statement to be segmented, there is no need for multi-level matching. Instead, the statement to be segmented is converted into a feature image, and semantic segmentation is performed directly on the feature image. This achieves the segmentation of the statement to be segmented into multiple target sub-statements. By adopting the above technical solution, the problem of low segmentation efficiency in the process of segmenting multi-intent statements in related technologies is solved, and the technical effect of improving the segmentation efficiency in the process of segmenting multi-intent statements is achieved.
Smart Images

Figure CN115482378B_ABST
Abstract
Description
Technical Field
[0001] This application relates to the field of smart home technology, and more specifically, to a method and apparatus for segmenting multi-intent statements, a storage medium, and an electronic device. Background Technology
[0002] Human-computer interaction technology occupies an important position in the development of smart devices today. Intelligent and convenient human-computer interaction methods can not only greatly improve the efficiency of device control, but also improve the user's interactive experience. At present, most human-computer interaction technologies only support single intent recognition. For example, statements containing multiple instruction intents, such as "turn on the water heater to 40℃ and set the air conditioner to post-bath mode", will face recognition difficulties.
[0003] For statements containing multiple instruction intents, to ensure the accuracy of single intent recognition, the statement can first be classified into domains to determine the device type corresponding to the instruction. For example, the statement above corresponds to two device domains: "bathroom heater" and "air conditioner." Then, precise intent matching is performed on the identified device domains. For example, an air conditioner has four sub-intents: turn on, turn off, temperature, and mode setting. The intent corresponding to the air conditioner device in the statement above can be matched to "mode setting." This matching process is repeated for each device domain to segment the intent of statements containing multiple instruction intents. However, this matching method requires multi-level matching for statements containing multiple instruction intents, which is not only computationally intensive, but the matching speed of each level directly affects the recognition efficiency of the statement intent, easily leading to low recognition efficiency and untimely response.
[0004] There is still no effective solution to the problem of low segmentation efficiency in the process of segmenting multi-intent statements in related technologies. Summary of the Invention
[0005] This application provides a method, apparatus, storage medium, and electronic device for segmenting multi-intent statements, in order to at least solve the problem of low segmentation efficiency in the process of segmenting multi-intent statements in related technologies.
[0006] According to one embodiment of the present application, a method for segmenting multi-intent statements is provided, including:
[0007] Extract the textual features of the text contained in the sentence to be segmented that carries multiple execution intentions;
[0008] The sentence to be segmented is converted into a feature image based on the text features, wherein the feature image is used to represent the text features through image pixels;
[0009] The feature image is semantically segmented to generate multiple semantic images, wherein each semantic image is used to represent a target execution intent among the multiple execution intents;
[0010] Identify the target sub-statement corresponding to each semantic image in the plurality of semantic images to obtain the plurality of target sub-statements corresponding to the statement to be segmented.
[0011] Optionally, converting the sentence to be segmented into a feature image based on the text features includes:
[0012] Construct an initial matrix diagram corresponding to the statement to be segmented, wherein the matrix rows in the initial matrix diagram correspond to each character in the statement to be segmented, and the matrix columns in the initial matrix diagram correspond to each character in the statement to be segmented.
[0013] For each pixel position on the initial matrix diagram, target operation is performed on the first text feature of the row text corresponding to the pixel position and the second text feature of the column text corresponding to the pixel position to obtain the target pixel value corresponding to the pixel position;
[0014] The target pixel value is added to the pixel position on the initial matrix map to obtain the target matrix map as the feature image.
[0015] Optionally, the step of performing target operation on the first text feature of the row text corresponding to the pixel position and the second text feature of the column text corresponding to the pixel position to obtain the target pixel value corresponding to the pixel position includes:
[0016] Perform N operation types from multiple operation types on the first text feature and the second text feature to obtain N operation results, where N is an integer greater than or equal to 2;
[0017] The N calculation results are used to construct the pixel value of the N channels corresponding to the pixel position, which is then used as the target pixel value.
[0018] Optionally, performing N types of operations on the first and second text features to obtain N operation results includes:
[0019] Multiply the first character feature and the second character feature to obtain the first operation result;
[0020] A similarity calculation is performed on the first text feature and the second text feature to obtain a second calculation result;
[0021] A full connection operation is performed on the first text feature and the second text feature to obtain a third operation result, wherein the N operation results include the first operation result, the second operation result and the third operation result.
[0022] Optionally, constructing the N calculation results into the N-channel pixel value corresponding to the pixel position as the target pixel value includes:
[0023] Obtain the target splicing order and target splicing format;
[0024] The first calculation result, the second calculation result, and the third calculation result are concatenated into the target concatenation format according to the target concatenation order to obtain the target pixel value.
[0025] Optionally, the step of semantically segmenting the feature image to generate multiple semantic images includes:
[0026] The feature image is input into the target semantic segmentation model multiple times to obtain multiple segmentation results output by the target semantic segmentation model. The target semantic segmentation model is trained by using feature image samples labeled with multiple standard semantic images to train the initial semantic segmentation model. The feature image samples are used to represent standard sentences carrying multiple intentions. Each of the multiple standard semantic images is used to represent the target sentence corresponding to one intention in the standard sentence.
[0027] The multiple segmentation results are determined as multiple semantic images.
[0028] Optionally, the step of inputting the feature image multiple times into the target semantic segmentation model to obtain multiple segmentation results output by the target semantic segmentation model includes:
[0029] Each time, the feature image and the reference image are input into the target semantic segmentation model, wherein, in the case of the first input, the reference image is the initial image, and in the case of subsequent inputs, the reference image is the segmentation result output by the target semantic segmentation model in the previous input;
[0030] Obtain one segmentation result output by the target semantic segmentation model, until multiple segmentation results are obtained.
[0031] Optionally, the extraction of textual features from the text included in the sentence to be segmented, which carries multiple execution intentions, includes:
[0032] Each character in the sentence to be segmented is converted into a character vector, resulting in multiple character vectors;
[0033] Bidirectional feature extraction is performed on multiple character vectors to obtain the forward and backward features corresponding to each character vector;
[0034] The forward features and the backward features are concatenated to obtain the character features corresponding to each character.
[0035] Optionally, identifying the target sub-statement corresponding to each of the plurality of semantic images includes:
[0036] Extract the target text corresponding to the target image pixels that belong to the target pixel type from each of the semantic images, wherein the image pixels in each of the semantic images are divided into pixel types according to the execution intent expressed;
[0037] The target text is constructed into the target sub-statement corresponding to each semantic image.
[0038] According to another aspect of the embodiments of this application, a computer-readable storage medium is also provided, wherein a computer program is stored in the computer program, and the computer program is configured to execute the above-described multi-intent statement segmentation method at runtime.
[0039] According to another aspect of the embodiments of this application, an electronic device is also provided, including a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein the processor executes the above-described multi-intent statement segmentation method through the computer program.
[0040] In this embodiment, textual features of the text included in the statement to be segmented, which carries multiple execution intentions, are extracted; the statement to be segmented is converted into a feature image based on the textual features, wherein the feature image is used to represent textual features through image pixels; semantic segmentation is performed on the feature image to generate multiple semantic images, wherein each semantic image is used to represent a target execution intention among multiple execution intentions; target sub-statements corresponding to each semantic image in the multiple semantic images are identified to obtain multiple target sub-statements corresponding to the statement to be segmented. That is, firstly, textual features of the text included in the statement to be segmented are extracted, and the statement to be segmented is converted into a feature image using the textual features. Since the image pixels in the feature image can represent textual features, multiple semantic images can be obtained by performing semantic segmentation on the feature image to segment the multiple execution intentions included in the statement to be segmented into multiple corresponding semantic images. Finally, target sub-statements corresponding to each semantic image in the multiple semantic images are identified to obtain multiple target sub-statements corresponding to the statement to be segmented. In other words, during the semantic segmentation of the statement to be segmented, there is no need for multi-level matching. Instead, the statement to be segmented is converted into a feature image, and semantic segmentation is performed directly on the feature image. This achieves the segmentation of the statement to be segmented into multiple target sub-statements. By adopting the above technical solution, the problem of low segmentation efficiency in the process of segmenting multi-intent statements in related technologies is solved, and the technical effect of improving the segmentation efficiency in the process of segmenting multi-intent statements is achieved. Attached Figure Description
[0041] The accompanying drawings, which are incorporated in and form part of this specification, illustrate embodiments consistent with this application and, together with the description, serve to explain the principles of this application.
[0042] To more clearly illustrate the technical solutions in the embodiments of this application or the prior art, the drawings used in the description of the embodiments or the prior art will be briefly introduced below. Obviously, for those skilled in the art, other drawings can be obtained based on these drawings without creative effort.
[0043] Figure 1 This is a schematic diagram of the hardware environment for a multi-intent statement segmentation method according to an embodiment of this application;
[0044] Figure 2 This is a flowchart of a method for segmenting multi-intent statements according to an embodiment of this application;
[0045] Figure 3 This is a schematic diagram of a multi-intent statement segmentation method according to an embodiment of this application;
[0046] Figure 4This is a schematic diagram illustrating the generation of a feature image according to an embodiment of this application;
[0047] Figure 5 This is a schematic diagram of semantic segmentation according to an embodiment of this application;
[0048] Figure 6 This is a schematic diagram illustrating the generation of a semantic image according to an embodiment of this application;
[0049] Figure 7 This is a schematic diagram of a multi-intent statement segmentation process according to an embodiment of this application;
[0050] Figure 8 This is a structural block diagram of a multi-intent statement segmentation device according to an embodiment of this application. Detailed Implementation
[0051] To enable those skilled in the art to better understand the present application, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of the present application, and not all embodiments. Based on the embodiments in the present application, all other embodiments obtained by those of ordinary skill in the art without creative effort should fall within the scope of protection of the present application.
[0052] It should be noted that the terms "first," "second," etc., in the specification, claims, and accompanying drawings of this application are used to distinguish similar objects and are not necessarily used to describe a specific order or sequence. It should be understood that such data can be interchanged where appropriate so that the embodiments of this application described herein can be implemented in orders other than those illustrated or described herein. Furthermore, the terms "comprising" and "having," and any variations thereof, are intended to cover non-exclusive inclusion; for example, a process, method, system, product, or apparatus that comprises a series of steps or units is not necessarily limited to those steps or units explicitly listed, but may include other steps or units not explicitly listed or inherent to such processes, methods, products, or apparatus.
[0053] According to one aspect of the embodiments of this application, a method for segmenting multi-intent statements is provided. This method for segmenting multi-intent statements is widely applicable to whole-house intelligent digital control application scenarios such as smart homes, smart home ecosystems, and intelligence house ecosystems. Optionally, in this embodiment, Figure 1 This is a schematic diagram of a hardware environment for a multi-intent statement segmentation method according to an embodiment of this application. The above-described multi-intent statement segmentation method can be applied to, for example... Figure 1The hardware environment shown consists of terminal device 102 and server 104. For example... Figure 1 As shown, server 104 is connected to terminal device 102 via a network and can be used to provide services (such as application services) to the terminal or clients installed on the terminal. A database can be set up on the server or independently of the server to provide data storage services for server 104. Cloud computing and / or edge computing services can be configured on the server or independently of the server to provide data processing services for server 104.
[0054] The aforementioned network may include, but is not limited to, at least one of the following: wired network, wireless network. The aforementioned wired network may include, but is not limited to, at least one of the following: wide area network, metropolitan area network, local area network. The aforementioned wireless network may include, but is not limited to, at least one of the following: Wi-Fi (Wireless Fidelity), Bluetooth. The terminal device 102 may not be limited to PC, mobile phone, tablet computer, smart air conditioner, smart range hood, smart refrigerator, smart oven, smart stove, smart washing machine, smart water heater, smart washing equipment, smart dishwasher, smart projector, smart TV, smart clothes rack, smart curtains, smart audio-visual equipment, smart socket, smart speaker, smart speaker box, smart fresh air equipment, smart kitchen and bathroom equipment, smart bathroom equipment, smart robot vacuum cleaner, smart window cleaning robot, smart mopping robot, smart air purifier, smart steam oven, smart microwave oven, smart water heater, smart air purifier, smart water dispenser, smart door lock, etc.
[0055] This embodiment provides a method for segmenting multi-intent statements, applied to the aforementioned device terminal. Figure 2 This is a flowchart of a multi-intent statement segmentation method according to an embodiment of this application, such as... Figure 2 As shown, the process includes the following steps:
[0056] Step S202: Extract the text features of the text included in the sentence to be segmented that carries multiple execution intentions;
[0057] Step S204: Convert the sentence to be segmented into a feature image based on the text features, wherein the feature image is used to represent the text features through image pixels;
[0058] Step S206: Semantic segmentation is performed on the feature image to generate multiple semantic images, wherein each semantic image is used to represent a target execution intent among the multiple execution intents;
[0059] Step S208: Identify the target sub-statement corresponding to each semantic image in the plurality of semantic images to obtain the plurality of target sub-statements corresponding to the statement to be segmented.
[0060] Through the above steps, firstly, the textual features of the text included in the statement to be segmented are extracted. These textual features are then used to convert the statement into a feature image. Since the image pixels in the feature image can represent textual features, semantic segmentation can be performed on the feature image to obtain multiple semantic images. This allows the multiple execution intentions included in the statement to be segmented to corresponding semantic images. Finally, the target sub-statements corresponding to each semantic image are identified, thus obtaining the multiple target sub-statements corresponding to the segmented statement. In other words, during the semantic segmentation of the statement to be segmented, multi-level matching is not required. Instead, the statement to be segmented is converted into a feature image, and semantic segmentation is directly performed on the feature image, thereby achieving the segmentation of the statement to be segmented into multiple target sub-statements. This technical solution solves the problem of low segmentation efficiency in the segmentation of multi-intent statements in related technologies, achieving a significant improvement in the efficiency of multi-intent statement segmentation.
[0061] In the technical solution provided in step S202 above, the statement to be segmented can be, but is not limited to, any statement containing multiple intentions. In the scenario of controlling a smart device, the statement to be segmented can be, but is not limited to, a voice command containing multiple execution intentions. For example, the command "turn on the air conditioner and dishwasher" spoken by the user actually includes two execution intentions, namely "turn on the air conditioner" and "turn on the dishwasher".
[0062] Optionally, in this embodiment, the statement to be segmented may be, but is not limited to, a preprocessed statement. Preprocessing may include, but is not limited to, methods such as: converting full-width characters to half-width characters, converting uppercase numbers to Arabic numerals, converting uppercase letters to lowercase letters, removing emojis, word segmentation, and stop word filtering. Word segmentation is the process of recombinizing a continuous sequence of characters into a sequence of words according to certain rules. In English writing, words are naturally separated by spaces; this application can treat each character as a separate sequence.
[0063] Optionally, in this embodiment, the text included in the sentence to be segmented may be, but is not limited to, a single character or a word in the sentence to be segmented, wherein each word expresses a meaning. For example, taking "turn on the air conditioner and dishwasher" as the sentence to be segmented, the text included in the sentence to be segmented may be eight individual characters: "open", "air conditioner", "air conditioner", "and", "wash", "dish", and "machine". The text included in the sentence to be segmented may also be four words: "open", "air conditioner", "and", and "dishwasher".
[0064] In an exemplary embodiment, textual features of the text included in a statement to be segmented carrying multiple execution intentions can be extracted in the following manner, but not limited to: converting each text included in the statement to be segmented into a character vector to obtain multiple character vectors; performing bidirectional feature extraction on the multiple character vectors to obtain forward features and backward features corresponding to each character vector; and concatenating the forward features and the backward features to obtain the textual features corresponding to each character.
[0065] Optionally, in this embodiment, Figure 3 This is a schematic diagram of a multi-intent statement segmentation method according to an embodiment of this application, such as... Figure 3 As shown, taking "turn on the air conditioner and dishwasher" as the sentence to be segmented as an example, each character in the sentence to be segmented can be converted into a character vector in the following way, but not limited to: extracting features from each character in "turn on the air conditioner and dishwasher". The feature extraction is divided into embedding and extraction layers. The embedding can use the Albert model to extract the character vector of each character in the whole sentence. That is, each character in "turn on the air conditioner and dishwasher" can be input into the Albert model to obtain multiple character vectors.
[0066] Optionally, in this embodiment, as Figure 3 As shown, after obtaining multiple character vectors, these vectors can be input into a bidirectional LSTM model (BiLSTM, Bi-directional Long Short-Term Memory) for bidirectional feature extraction, resulting in forward features (fw_cell) and backward features (bw_cell) for each character vector. The fw_cell and bw_cell can be concatenated to obtain a vector representing the character features of each character.
[0067] In the technical solution provided in step S204 above, semantic segmentation technology in the field of image processing can be used as a reference. First, the statement to be segmented is converted into a feature image. Then, the image pixels in the feature image are processed by semantic segmentation technology, and image pixels belonging to the same type are divided into the same region. This achieves the segmentation of multiple execution intentions included in the statement to be segmented into multiple target sub-statements. For example, "turn on the air conditioner and dishwasher" can be converted into a feature image in the above way. Subsequently, semantic segmentation is performed on the feature image to obtain the target sub-statements corresponding to "turn on the air conditioner and dishwasher": "turn on the air conditioner" and "turn on the dishwasher".
[0068] In an exemplary embodiment, the statement to be segmented can be converted into a feature image according to the text features in the following ways, but not limited thereto: constructing an initial matrix diagram corresponding to the statement to be segmented, where the matrix rows in the initial matrix diagram correspond to each character in the statement to be segmented in sequence, and the matrix columns in the initial matrix diagram correspond to each character in the statement to be segmented in sequence; for each pixel position on the initial matrix diagram, performing a target operation on the first text feature of the row text corresponding to the pixel position and the second text feature of the column text corresponding to the pixel position to obtain a target pixel value corresponding to the pixel position; adding the target pixel value to the pixel position on the initial matrix diagram to obtain a target matrix diagram as the feature image.
[0069] Optionally, in this embodiment, taking "turn on the air conditioner and the dishwasher" as the statement to be segmented, each character in the statement to be segmented is sequentially corresponded to the matrix diagram, and an initial matrix diagram corresponding to the statement to be segmented is constructed. Figure 4 It is a schematic diagram of the generation of a feature image according to an embodiment of the present application, as Figure 4 shown, taking "turn on the air conditioner and the dishwasher" as the statement to be segmented, the matrix rows in the initial matrix diagram correspond to each character in "turn on the air conditioner and the dishwasher" in sequence. Then, a target operation is performed on each pixel position in the initial matrix diagram to obtain a target pixel value. Taking pixel position 1 as an example, the row text corresponding to pixel position 1 is "打", the first text feature corresponding to the row text "打" is F1, the column text corresponding to pixel position 1 is "开", the second text feature corresponding to the column text "开" is F2, performing a target operation on the first text feature (F1) and the second text feature (F2), the target pixel value of pixel position 1 is obtained as F21, adding the target pixel value (F21) to pixel position 1 on the initial matrix diagram, and performing the above process on each pixel position in sequence to obtain a target matrix diagram as the feature image.
[0070] In an exemplary embodiment, the first text feature of the row text corresponding to the pixel position and the second text feature of the column text corresponding to the pixel position can be subjected to a target operation in the following ways, but not limited thereto, to obtain a target pixel value corresponding to the pixel position: performing N operation types among multiple operation types on the first text feature and the second text feature to obtain N operation results, where N is an integer greater than or equal to 2; constructing the N operation results into an N-channel pixel value corresponding to the pixel position as the target pixel value.
[0071] Optionally, in this embodiment, as Figure 3As shown, target operations can be performed on the first and second text features. Operation types such as "multiplication", "cos similarity" and "fully connected" can be used as the target operation methods. Each operation type can produce one operation result. Here, three operation results can be obtained. Each operation result constitutes one channel. After the target operation, three channel images can be obtained.
[0072] In one exemplary embodiment, N types of multiple operations can be performed on the first text feature and the second text feature in the following ways to obtain N operation results: multiplying the first text feature and the second text feature to obtain a first operation result; performing a similarity operation on the first text feature and the second text feature to obtain a second operation result; and performing a fully connected operation on the first text feature and the second text feature to obtain a third operation result, wherein the N operation results include the first operation result, the second operation result, and the third operation result.
[0073] Optionally, in this embodiment, the order of operations for multiplication, similarity, and full connection can be, but is not limited to, any order.
[0074] In one exemplary embodiment, the N operation results can be constructed into the pixel value of the N channels corresponding to the pixel position as the target pixel value by means of, but not limited to, the following: obtaining the target splicing order and the target splicing format; splicing the first operation result, the second operation result and the third operation result into the target splicing format according to the target splicing order to obtain the target pixel value.
[0075] Optionally, in this embodiment, by obtaining the target splicing order and target splicing format, the splicing method between the first calculation result, the second calculation result, and the third calculation result can be unified, that is, the first calculation result, the second calculation result, and the third calculation result use the same set of target splicing order and target splicing format.
[0076] In the technical solution provided in step S206 above, Figure 5 This is a schematic diagram of semantic segmentation according to an embodiment of this application, such as... Figure 5 As shown, semantic segmentation is an important branch of image processing and machine vision. Semantic segmentation aims to determine the category of each pixel for precise segmentation. The pixels representing cars, traffic lights, and road signs are segmented into different regions.
[0077] Optionally, in this embodiment, semantic segmentation technology from image processing is applied to the execution intent recognition of multi-intent statements. Since each character in the statement to be segmented corresponds to a pixel position, semantic segmentation can be used to semantically segment the feature image to obtain multiple semantic images. Figure 6 This is a schematic diagram illustrating the generation of a semantic image according to an embodiment of this application, such as... Figure 6 As shown, each pixel position in the feature image corresponds to a target pixel value. In natural language processing, there is also a semantic segmentation task. We can treat each character in a sentence as a pixel. The target pixel value can include (1 or 0). 0 can represent that it does not belong to the sentence, and 1 can represent that it belongs to the sentence. Therefore, by performing semantic segmentation on the feature image, we can obtain multiple semantic images (semantic image 1 and semantic image 2). Among them, semantic image 1 is used to represent the execution intention "turn on the air conditioner"; semantic image 2 is used to represent the execution intention "turn on the dishwasher".
[0078] In one exemplary embodiment, the feature image can be semantically segmented in a manner that is not limited to generating multiple semantic images by: inputting the feature image multiple times into a target semantic segmentation model to obtain multiple segmentation results output by the target semantic segmentation model, wherein the target semantic segmentation model is trained on an initial semantic segmentation model using feature image samples labeled with multiple standard semantic images, the feature image samples being used to represent standard statements carrying multiple intentions, and each of the multiple standard semantic images being used to represent a target statement corresponding to one intention in the standard statement; and the multiple segmentation results being determined as multiple semantic images.
[0079] Optionally, in this embodiment, the target semantic segmentation model may be, but is not limited to, the UNET model. Before training the semantic segmentation model, it is necessary to first define the corpus format, which is mainly divided into standard statements and target statements. The standard statements can be statements input by the user, and the target statements are used to represent an intent of the corresponding standard statements. For example, the target statements corresponding to "set the dishwasher to quick wash and then start" can be "set the dishwasher to quick wash" and "start the dishwasher".
[0080] Optionally, in this embodiment, the corpus used for training during the semantic segmentation model training process may include the following types:
[0081] Single device with multiple intent types, for example: standard statement: dishwasher set to quick wash and then start, the corresponding target statements can be "dishwasher set to quick wash" and "dishwasher start";
[0082] Multi-device single intent type, for example: standard statement: turn on the air conditioner and dishwasher, the corresponding target statements can be "turn on the dishwasher" and "turn on the air conditioner";
[0083] Multiple devices and multiple intent types, for example: standard statement: turn on the water heater and then set the air conditioner to post-bath mode, the corresponding target statements can be "turn on the water heater" and "set the air conditioner to post-bath mode".
[0084] Optionally, in this embodiment, before training the semantic segmentation model, the standard sentences can first undergo data preprocessing, mainly including full-width to half-width conversion, uppercase numbers to Arabic numerals, uppercase letters to lowercase letters, emoji removal, word segmentation, and stop word filtering. Word segmentation is the process of recombinizing a continuous sequence of characters into a sequence of words according to certain rules. In English writing, words are naturally separated by spaces; this application can treat each character as a part of a sequence.
[0085] In one exemplary embodiment, the feature image may be input into the target semantic segmentation model multiple times in the following manner to obtain multiple segmentation results output by the target semantic segmentation model: each time the feature image and a reference image are input into the target semantic segmentation model, wherein, in the case of the first input, the reference image is the initial image, and in the case of subsequent inputs, the reference image is the segmentation result output by the target semantic segmentation model in the previous input; one segmentation result output by the target semantic segmentation model is obtained, until multiple segmentation results are obtained.
[0086] Optionally, in this embodiment, as Figure 3 As shown, after the sentence "turn on the air conditioner and dishwasher" is segmented by the UNET model, the results 1 "turn on the air conditioner" and 2 "turn on the dishwasher" are related. Therefore, before segmenting result 2, the result 1 of the previous sentence needs to be added as a channel to the 3-channel graph as the input of the UNET model to obtain the output result 2 of the UNET model. In particular, since there is no result of the previous sentence in the first step, we initialize a graph with all zeros and add it to the 3-channel graph to make it a 4-channel graph.
[0087] In the technical solution provided in step S208 above, each semantic image corresponds to a target sub-statement, and the target sub-statement is used to represent an execution intent in the corresponding statement to be segmented. Therefore, by identifying the target sub-statement corresponding to each semantic image in multiple semantic images, multiple execution intents corresponding to the statement to be segmented can be obtained.
[0088] In an exemplary embodiment, the target sub - statement corresponding to each semantic image among the multiple semantic images can be identified, but not limited to, through the following methods: Extract the target text corresponding to the target image pixels whose image pixels belong to the target pixel type from each semantic image, where the image pixels in each semantic image are divided into pixel types according to the expressed execution intention; Construct the target text into the target sub - statement corresponding to each semantic image.
[0089] In an exemplary embodiment, multiple execution intentions included in the statement to be segmented can be segmented out, that is, identify the multiple execution intentions indicated in the statement to be segmented. After that, the multiple execution intentions can be transmitted to the corresponding devices to be controlled respectively to control different devices to execute the corresponding intentions. For example, for the above "turn on the air conditioner and the dishwasher", after identifying the corresponding execution intentions as "turn on the air conditioner" and "turn on the dishwasher", "turn on the air conditioner" can be input to the air - conditioner device and the operation of "turn on the air conditioner" can be executed. At the same time, "turn on the dishwasher" can be input to the dishwasher device and the operation of "turn on the dishwasher" can be executed.
[0090] Optionally, in this embodiment, as Figure 6 shown, in semantic image 1, the target texts corresponding to the (number 1) target image pixels belonging to the target type are respectively: "turn", "on", "wash", "dish", "washer", and the target sub - statement constructed by the target texts can be "turn on the dishwasher".
[0091] To better understand the above process of segmenting multi - intention statements, the following further describes the segmentation process of the multi - intention statements in combination with optional embodiments, but it is not used to limit the technical solutions of the embodiments of the present application.
[0092] In this embodiment, a method for segmenting multi - intention statements is provided. Figure 7 It is a schematic diagram of a segmentation process of multi - intention statements according to an embodiment of the present application. As Figure 7 shown, it mainly includes the following steps:
[0093] Step S701: In a production environment, the user's utterance will first pass through a small binary classification model to determine whether the user's utterance is multi - intention. If it is multi - intention, it will pass through the semantic segmentation model; if not, it will directly enter the recognition module, where the semantic segmentation model is obtained after training and optimization with the corpus of screened data.
[0094] Step S702: If it is multi - intention, after entering the semantic segmentation model, multiple target sub - statements will be segmented out.
[0095] Step S703: Finally, each target sub-statement is entered into the recognition module in parallel, and the results are summarized. The target sub-statements can enter recognition module a, recognition module b, and recognition module c respectively, or they can enter the same recognition module.
[0096] By drawing on semantic segmentation techniques from image processing, the above implementation method splits a user's multi-intent utterance into two atomic intents. For example, "turn on the air conditioner" and "turn on the dishwasher" are split into two sentences: "turn on the air conditioner" and "turn on the dishwasher." These split sentences are then fed into a single-intent recognition system for identification. In other words, natural language is converted into an image, achieving language-to-graph transformation. This utilizes semantic segmentation techniques from image processing and machine vision, applying image processing methods to natural language processing methods, thus realizing knowledge transfer and innovation.
[0097] Through the above description of the embodiments, those skilled in the art can clearly understand that the methods according to the above embodiments can be implemented by means of software plus necessary general-purpose hardware platforms. Of course, they can also be implemented by hardware, but in many cases the former is a better implementation method. Based on this understanding, the technical solution of this application, in essence, or the part that contributes to the prior art, can be embodied in the form of a software product. This computer software product is stored in a storage medium (such as ROM / RAM, magnetic disk, optical disk) and includes several instructions to cause a terminal device (which may be a mobile phone, computer, server, or network device, etc.) to execute the methods of the various embodiments of this application.
[0098] Figure 8 This is a structural block diagram of a multi-intent statement segmentation device according to an embodiment of this application; as follows: Figure 8 As shown, it includes:
[0099] Extraction module 802 is used to extract the text features of the text included in the sentence to be segmented that carries multiple execution intentions;
[0100] The conversion module 804 is used to convert the sentence to be segmented into a feature image based on the text features, wherein the feature image is used to represent the text features through image pixels;
[0101] The segmentation module 806 is used to perform semantic segmentation on the feature image to generate multiple semantic images, wherein each semantic image is used to represent a target execution intent among the multiple execution intents;
[0102] The recognition module 808 is used to recognize the target sub-statement corresponding to each semantic image in the plurality of semantic images, and to obtain the plurality of target sub-statements corresponding to the statement to be segmented.
[0103] Through the above embodiments, firstly, the textual features of the text included in the statement to be segmented are extracted. These textual features are then used to convert the statement into a feature image. Since the image pixels in the feature image can represent textual features, semantic segmentation can be performed on the feature image to obtain multiple semantic images. This allows the multiple execution intentions included in the statement to be segmented to corresponding semantic images. Finally, the target sub-statements corresponding to each semantic image are identified, thus obtaining the multiple target sub-statements corresponding to the segmented statement. In other words, during the semantic segmentation of the statement to be segmented, multi-level matching is not required. Instead, the statement to be segmented is converted into a feature image, and semantic segmentation is directly performed on the feature image, thereby achieving the segmentation of the statement to be segmented into multiple target sub-statements. This technical solution solves the problem of low segmentation efficiency in the segmentation of multi-intent statements in related technologies, achieving a significant improvement in the efficiency of segmentation during the segmentation of multi-intent statements.
[0104] In one exemplary embodiment, the conversion module includes:
[0105] The first construction unit is used to construct an initial matrix diagram corresponding to the statement to be segmented, wherein the matrix rows in the initial matrix diagram correspond to each character in the statement to be segmented, and the matrix columns in the initial matrix diagram correspond to each character in the statement to be segmented.
[0106] The calculation unit is used to perform target calculation on the first text feature of the row text corresponding to the pixel position and the second text feature of the column text corresponding to the pixel position for each pixel position on the initial matrix diagram, so as to obtain the target pixel value corresponding to the pixel position;
[0107] An adding unit is used to add the target pixel value to the pixel position on the initial matrix map to obtain the target matrix map as the feature image.
[0108] In one exemplary embodiment, the arithmetic unit is further configured to:
[0109] Perform N operation types from multiple operation types on the first text feature and the second text feature to obtain N operation results, where N is an integer greater than or equal to 2;
[0110] The N calculation results are used to construct the pixel value of the N channels corresponding to the pixel position, which is then used as the target pixel value.
[0111] In one exemplary embodiment, the arithmetic unit is further configured to:
[0112] Multiply the first character feature and the second character feature to obtain the first operation result;
[0113] A similarity calculation is performed on the first text feature and the second text feature to obtain a second calculation result;
[0114] A full connection operation is performed on the first text feature and the second text feature to obtain a third operation result, wherein the N operation results include the first operation result, the second operation result and the third operation result.
[0115] In one exemplary embodiment, the arithmetic unit is further configured to:
[0116] Obtain the target splicing order and target splicing format;
[0117] The first calculation result, the second calculation result, and the third calculation result are concatenated into the target concatenation format according to the target concatenation order to obtain the target pixel value.
[0118] In one exemplary embodiment, the segmentation module includes:
[0119] The input unit is used to input the feature image multiple times into the target semantic segmentation model to obtain multiple segmentation results output by the target semantic segmentation model. The target semantic segmentation model is obtained by training an initial semantic segmentation model using feature image samples labeled with multiple standard semantic images. The feature image samples are used to represent standard sentences carrying multiple intentions. Each of the multiple standard semantic images is used to represent a target sentence corresponding to one intention in the standard sentence.
[0120] A determining unit is used to determine the multiple segmentation results into multiple semantic images.
[0121] In one exemplary embodiment, the input unit is further configured to:
[0122] Each time, the feature image and the reference image are input into the target semantic segmentation model, wherein, in the case of the first input, the reference image is the initial image, and in the case of subsequent inputs, the reference image is the segmentation result output by the target semantic segmentation model in the previous input;
[0123] Obtain one segmentation result output by the target semantic segmentation model, until multiple segmentation results are obtained.
[0124] In one exemplary embodiment, the extraction module includes:
[0125] A conversion unit is used to convert each character included in the sentence to be segmented into a character vector, resulting in multiple character vectors;
[0126] The first extraction unit is used to perform bidirectional feature extraction on multiple character vectors to obtain the forward and backward features corresponding to each character vector;
[0127] The splicing unit is used to splice the forward features and the backward features to obtain the character features corresponding to each character.
[0128] In one exemplary embodiment, the identification module includes:
[0129] The second extraction unit is used to extract the target text corresponding to the target image pixel that belongs to the target pixel type from each of the semantic images, wherein the image pixels in each of the semantic images are divided into pixel types according to the execution intent expressed;
[0130] The second construction unit is used to construct the target text into the target sub-statement corresponding to each semantic image.
[0131] Optionally, in this embodiment, the storage medium may be configured to store program code for performing the following steps:
[0132] S1, extract the text features of the text included in the sentence to be segmented that carries multiple execution intentions;
[0133] S2, the sentence to be segmented is converted into a feature image based on the text features, wherein the feature image is used to represent the text features through image pixels;
[0134] S3, perform semantic segmentation on the feature image to generate multiple semantic images, wherein each semantic image is used to represent a target execution intent among the multiple execution intents;
[0135] S4, identify the target sub-statement corresponding to each semantic image in the plurality of semantic images, and obtain the plurality of target sub-statements corresponding to the statement to be segmented.
[0136] Embodiments of this application also provide an electronic device including a memory and a processor, wherein the memory stores a computer program and the processor is configured to run the computer program to perform the steps in any of the above method embodiments.
[0137] Optionally, the electronic device may further include a transmission device and an input / output device, wherein the transmission device is connected to the processor and the input / output device is connected to the processor.
[0138] Optionally, in this embodiment, the processor can be configured to perform the following steps via a computer program:
[0139] S1, extract the text features of the text included in the sentence to be segmented that carries multiple execution intentions;
[0140] S2, the sentence to be segmented is converted into a feature image based on the text features, wherein the feature image is used to represent the text features through image pixels;
[0141] S3, perform semantic segmentation on the feature image to generate multiple semantic images, wherein each semantic image is used to represent a target execution intent among the multiple execution intents;
[0142] S4, identify the target sub-statement corresponding to each semantic image in the plurality of semantic images, and obtain the plurality of target sub-statements corresponding to the statement to be segmented.
[0143] Optionally, in this embodiment, the storage medium may include, but is not limited to, various media capable of storing program code, such as USB flash drives, read-only memory (ROM), random access memory (RAM), portable hard drives, magnetic disks, or optical disks.
[0144] Optionally, specific examples in this embodiment can refer to the examples described in the above embodiments and optional implementations, and will not be repeated here.
[0145] Obviously, those skilled in the art should understand that the modules or steps of this application described above can be implemented using general-purpose computing devices. They can be centralized on a single computing device or distributed across a network of multiple computing devices. Optionally, they can be implemented using computer-executable program code, thereby storing them in a storage device for execution by a computing device. In some cases, the steps shown or described can be performed in a different order than those presented here, or they can be fabricated as separate integrated circuit modules, or multiple modules or steps can be fabricated as a single integrated circuit module. Thus, this application is not limited to any particular combination of hardware and software.
[0146] The above description is only a preferred embodiment of this application. It should be noted that for those skilled in the art, several improvements and modifications can be made without departing from the principle of this application, and these improvements and modifications should also be considered within the scope of protection of this application.
Claims
1. A method for segmenting multi-intent statements, characterized in that, include: Extract the textual features of the text contained in the sentence to be segmented that carries multiple execution intentions; The sentence to be segmented is converted into a feature image based on the text features, wherein the feature image is used to represent the text features through image pixels; The feature image is semantically segmented to generate multiple semantic images, wherein each semantic image is used to represent a target execution intent among the multiple execution intents; Identify the target sub-statement corresponding to each semantic image in the plurality of semantic images to obtain the plurality of target sub-statements corresponding to the statement to be segmented; The step of converting the sentence to be segmented into a feature image based on the text features includes: constructing an initial matrix diagram corresponding to the sentence to be segmented, wherein the matrix rows in the initial matrix diagram correspond sequentially to each character in the sentence to be segmented, and the matrix columns in the initial matrix diagram correspond sequentially to each character in the sentence to be segmented; for each pixel position on the initial matrix diagram, performing a target operation on the first text feature of the row text corresponding to the pixel position and the second text feature of the column text corresponding to the pixel position to obtain the target pixel value corresponding to the pixel position; and adding the target pixel value to the pixel position on the initial matrix diagram to obtain a target matrix diagram as the feature image.
2. The method according to claim 1, characterized in that, The step of performing target calculation on the first text feature of the row text corresponding to the pixel position and the second text feature of the column text corresponding to the pixel position to obtain the target pixel value corresponding to the pixel position includes: Perform N operation types from multiple operation types on the first text feature and the second text feature to obtain N operation results, where N is an integer greater than or equal to 2; The N calculation results are used to construct the pixel value of the N channels corresponding to the pixel position, which is then used as the target pixel value.
3. The method according to claim 2, characterized in that, The step of performing N types of operations on the first and second text features to obtain N operation results includes: Multiply the first character feature and the second character feature to obtain the first operation result; A similarity calculation is performed on the first text feature and the second text feature to obtain a second calculation result; A full connection operation is performed on the first text feature and the second text feature to obtain a third operation result, wherein the N operation results include the first operation result, the second operation result and the third operation result.
4. The method according to claim 3, characterized in that, The step of constructing the N channel pixel value corresponding to the pixel position from the N calculation results as the target pixel value includes: Obtain the target splicing order and target splicing format; The first calculation result, the second calculation result, and the third calculation result are concatenated into the target concatenation format according to the target concatenation order to obtain the target pixel value.
5. The method according to claim 1, characterized in that, The step of semantic segmentation of the feature image to generate multiple semantic images includes: The feature image is input into the target semantic segmentation model multiple times to obtain multiple segmentation results output by the target semantic segmentation model. The target semantic segmentation model is trained by using feature image samples labeled with multiple standard semantic images to train the initial semantic segmentation model. The feature image samples are used to represent standard sentences carrying multiple intentions. Each of the multiple standard semantic images is used to represent the target sentence corresponding to one intention in the standard sentence. The multiple segmentation results are determined as multiple semantic images.
6. The method according to claim 5, characterized in that, The step of inputting the feature image multiple times into the target semantic segmentation model to obtain multiple segmentation results output by the target semantic segmentation model includes: Each time, the feature image and the reference image are input into the target semantic segmentation model, wherein, in the case of the first input, the reference image is the initial image, and in the case of subsequent inputs, the reference image is the segmentation result output by the target semantic segmentation model in the previous input; Obtain one segmentation result output by the target semantic segmentation model, until multiple segmentation results are obtained.
7. The method according to claim 1, characterized in that, The extraction of textual features from the text included in the segmented statement carrying multiple execution intentions includes: Each character in the sentence to be segmented is converted into a character vector, resulting in multiple character vectors; Bidirectional feature extraction is performed on multiple character vectors to obtain the forward and backward features corresponding to each character vector; The forward features and the backward features are concatenated to obtain the character features corresponding to each character.
8. The method according to claim 1, characterized in that, The identification of the target sub-statement corresponding to each of the plurality of semantic images includes: Extract the target text corresponding to the target image pixels that belong to the target pixel type from each of the semantic images, wherein the image pixels in each of the semantic images are divided into pixel types according to the execution intent expressed; The target text is constructed into the target sub-statement corresponding to each semantic image.
9. A computer-readable storage medium, characterized in that, The computer-readable storage medium includes a stored program, wherein the program, when executed, performs the method of any one of claims 1 to 8.
10. An electronic device comprising a memory and a processor, characterized in that, The memory stores a computer program, and the processor is configured to execute the method of any one of claims 1 to 8 through the computer program.