Idiom recognition method and device, storage medium and electronic equipment

By extracting text fragments of four characters and using an approximate pinyin library and finite automata for matching, the problem of unsatisfactory idiom recognition results was solved, and the accuracy and efficiency of idiom recognition were improved.

CN116013302BActive Publication Date: 2026-06-19SHANGHAI ZHENGDA XIMALAYA NETWORK TECH CO LTD

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
SHANGHAI ZHENGDA XIMALAYA NETWORK TECH CO LTD
Filing Date
2022-12-14
Publication Date
2026-06-19

AI Technical Summary

Technical Problem

The recognition of idioms in speech recognition is not ideal, mainly because the training data of idioms is insufficient and the recognition model needs to be retrained, resulting in low recognition efficiency and poor accuracy.

Method used

By acquiring the text sequence of the speech to be recognized, the target text segment is extracted into 4 characters. Multiple pronunciation combinations are generated using an approximate pinyin library, and a finite automaton is used for matching to determine the target idiom that is successfully matched.

Benefits of technology

It improved the accuracy of idiom recognition, enhanced the efficiency and precision of speech recognition models in recognizing idioms, and reduced misrecognition.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN116013302B_ABST
    Figure CN116013302B_ABST
Patent Text Reader

Abstract

This application provides a method, apparatus, storage medium, and electronic device for idiom recognition, relating to the field of speech recognition. The electronic device acquires a text sequence of speech to be recognized; extracts a target text segment from the text sequence, wherein the target text segment includes four characters; acquires multiple pronunciation combinations that are similar in pronunciation to the target text segment; and matches these multiple pronunciation combinations with the pronunciation combinations of multiple idioms to obtain the successfully matched target idiom. Thus, by generating multiple pronunciation combinations that are similar in pronunciation to the target text segment and matching them with existing pronunciation combinations of multiple idioms, the successfully matched target idiom is determined from multiple idioms, thereby improving the accuracy of idiom recognition based on existing speech recognition models.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This application relates to the field of speech recognition, and more specifically, to a method, apparatus, storage medium, and electronic device for recognizing idioms. Background Technology

[0002] Idioms are a major feature of traditional Chinese culture. They have fixed structures and expressions, conveying specific meanings, and are used as a whole in sentences. Because a large portion of idioms have been passed down from ancient times, they represent a story or allusion, leading to significant differences between idioms and common expressions.

[0003] Therefore, in speech recognition applications, the recognition of idioms is often not ideal. Summary of the Invention

[0004] To overcome at least one deficiency in the prior art, this application provides an idiom recognition method, apparatus, storage medium, and electronic device to improve the recognition rate of idioms, specifically including:

[0005] Firstly, this application provides a method for recognizing idioms, the method comprising:

[0006] Obtain the text sequence of the speech to be recognized;

[0007] Extract a target text segment from the text sequence, wherein the target text segment includes 4 characters;

[0008] Obtain multiple pronunciation combinations that are close to the pronunciation of the target text segment;

[0009] The multiple pronunciation combinations are matched with the pronunciation combinations of multiple idioms to obtain the target idioms that have been successfully matched.

[0010] Secondly, this application provides an idiom recognition device, the device comprising:

[0011] The text sequence module is used to obtain the text sequence of the speech to be recognized;

[0012] A text fragment module is used to extract a target text fragment from the text sequence, wherein the target text fragment includes 4 characters;

[0013] The idiom recognition module is used to obtain multiple pronunciation combinations that are close to the pronunciation of the target text segment;

[0014] The idiom recognition module is also used to match the multiple pronunciation combinations with the pronunciation combinations of multiple idioms to obtain the target idiom that has been successfully matched.

[0015] Thirdly, this application provides a storage medium storing a computer program, which, when executed by a processor, implements the idiom recognition method.

[0016] Fourthly, this application provides an electronic device, which includes a processor and a memory. The memory stores a computer program, and when the computer program is executed by the processor, it implements the idiom recognition method.

[0017] Compared with the prior art, this application has the following beneficial effects:

[0018] The idiom recognition method, apparatus, storage medium, and electronic device provided in this application involve the electronic device acquiring a text sequence of speech to be recognized; extracting a target text segment from the text sequence, wherein the target text segment includes four characters; acquiring multiple pronunciation combinations that are close to the pronunciation of the target text segment; and matching the multiple pronunciation combinations with the pronunciation combinations of multiple idioms to obtain the successfully matched target idiom. Thus, by generating multiple pronunciation combinations that are close to the pronunciation of the target text segment and matching them with existing pronunciation combinations of multiple idioms, the successfully matched target idiom is determined from multiple idioms, thereby improving the accuracy of idiom recognition based on existing speech recognition models. Attached Figure Description

[0019] To more clearly illustrate the technical solutions of the embodiments of this application, the accompanying drawings used in the embodiments will be briefly introduced below. It should be understood that the following drawings only show some embodiments of this application and should not be regarded as a limitation of the scope. For those skilled in the art, other related drawings can be obtained based on these drawings without creative effort.

[0020] Figure 1 A flowchart illustrating the idiom recognition method provided in this application embodiment;

[0021] Figure 2 This is one of the schematic diagrams of the sliding window principle provided in the embodiments of this application;

[0022] Figure 3 This is the second schematic diagram of the sliding window principle provided in the embodiments of this application;

[0023] Figure 4 This is one of the schematic diagrams of a finite state machine provided in the embodiments of this application;

[0024] Figure 5 This is the second schematic diagram of a finite state machine provided in the embodiments of this application;

[0025] Figure 6 This is a schematic diagram illustrating the result of the compose operation provided in the embodiments of this application;

[0026] Figure 7 This is a schematic diagram of the structure of the idiom recognition method provided in the embodiments of this application;

[0027] Figure 8 This is a schematic diagram of the electronic device structure provided in an embodiment of this application.

[0028] Icons: 101-Text sequence module; 102-Text fragment module; 103-Idiom recognition module; 201-Memory; 202-Processor; 203-Communication unit; 204-System bus. Detailed Implementation

[0029] To make the objectives, technical solutions, and advantages of the embodiments of this application clearer, the technical solutions of the embodiments of this application will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of this application, and not all embodiments. The components of the embodiments of this application described and shown in the accompanying drawings can generally be arranged and designed in various different configurations.

[0030] Therefore, the following detailed description of the embodiments of this application provided in the accompanying drawings is not intended to limit the scope of the claimed application, but merely to illustrate selected embodiments of the application. All other embodiments obtained by those skilled in the art based on the embodiments of this application without inventive effort are within the scope of protection of this application.

[0031] It should be noted that similar labels and letters in the following figures indicate similar items. Therefore, once an item is defined in one figure, it does not need to be further defined and explained in subsequent figures.

[0032] In the description of this application, it should be noted that the terms "first," "second," "third," etc., are used only for distinguishing descriptions and should not be construed as indicating or implying relative importance. Furthermore, the terms "comprising," "including," or any other variations thereof are intended to cover non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements includes not only those elements but also other elements not expressly listed, or elements inherent to such a process, method, article, or apparatus. Without further limitations, an element defined by the phrase "comprising one..." does not exclude the presence of other identical elements in the process, method, article, or apparatus that includes said element.

[0033] Since a large part of idioms have been passed down from ancient times, they represent a story or an allusion, enabling idioms to convey rich semantic information in just a few Chinese characters. This also leads to a huge difference between the expression ways of idioms and conventional expressions, resulting in less-than-ideal recognition effects of idioms in the application scenarios of speech recognition.

[0034] Exemplarily, taking the idiom "in a melon patch or under a plum tree" as an example, its original meaning is that when passing by a melon patch, don't bend down to tie your shoes to avoid being suspected of picking melons; when walking under a plum tree, don't raise your hand to adjust your hat to avoid being suspected of picking plums; now it is used to metaphorize situations that are likely to arouse suspicion.

[0035] Currently, speech recognition models based on neural networks need to be trained with a large amount of corpus to be obtained. However, the training corpus of idioms is very few, resulting in poor recognition effects. Moreover, if we want to collect such idiom corpus, it is often very difficult and the sources of corpus are few. Even if we generate audio through speech synthesis, we also need to retrain the model, and the training cycle is relatively long, and the effect may not be good. For example, "in this life" is recognized as "jin sheng jin shi", "jin sheng jing shi", "jin shen jing shi", etc.

[0036] In other related technologies, the recognition accuracy of idioms can be improved by means of hot words. However, there are more than 20,000 current idioms. Constructing more than 20,000 hot words will cause a decline in the overall recognition efficiency.

[0037] It should be noted that the defects existing in the above solutions of the prior art are all the results obtained by the inventors after practice and careful research. Therefore, the process of discovering the above problems and the solutions proposed by the embodiments of the present application below for the above problems should all be the contributions made by the inventors to the present application during the invention and creation process, rather than being understood as the technical content known to those skilled in the art.

[0038] In view of this, this application provides an idiom recognition method to improve the recognition accuracy of idioms by utilizing existing speech recognition models. This idiom recognition method can be applied to electronic devices such as mobile terminals, tablet computers, laptops, and desktop computers. In some embodiments, mobile terminals may include smart home devices, wearable devices, smart mobile devices, virtual reality devices, augmented reality devices, etc. In some embodiments, smart home devices may include smart lighting devices, control devices for smart appliances, smart monitoring devices, smart TVs, smart cameras, smart speakers, etc. In some embodiments, wearable devices may include smart bracelets, smart shoelaces, smart glass, smart helmets, smartwatches, smart clothing, smart backpacks, smart accessories, etc. In some embodiments, smart mobile devices may include smartphones, personal digital assistants (PDAs), gaming devices, navigation devices, etc.

[0039] Of course, this idiom recognition method can also be applied to servers, which can be a single server or a group of servers. Server groups can be centralized or distributed (e.g., servers can be a distributed system). In some embodiments, the server can be local or remote relative to the user terminal. In some embodiments, the server can be implemented on a cloud platform; by way of example only, a cloud platform can include private cloud, public cloud, hybrid cloud, community cloud, distributed cloud, inter-cloud, multi-cloud, etc., or any combination thereof. In some embodiments, the server can be implemented on an electronic device having one or more components.

[0040] Based on the above introduction, the following will combine... Figure 1 The steps of this method are described in detail, but it should be understood that the operations in the flowchart may not be implemented in sequence, and steps without logical contextual relationships may be reversed in order or performed simultaneously. Furthermore, those skilled in the art, guided by the content of this application, may add one or more other operations to the flowchart, or remove one or more operations from the flowchart. Figure 1 As shown, the method includes:

[0041] S101, Obtain the text sequence of the speech to be recognized.

[0042] The text sequence can be obtained by a speech recognition model recognizing the speech to be recognized. That is, the electronic device receives the speech to be recognized and inputs it into the speech recognition model to obtain the text sequence of the speech to be recognized. The speech recognition model can be any recognition model that has already been trained. In other words, this embodiment can use an existing speech recognition model without the need for retraining using corpus.

[0043] For example, assuming the electronic device is a child-assisted reading device used to assess whether a child's pronunciation is standard when reading poems, stories, etc., the speech to be recognized can be audio data collected by the child-assisted reading device during the child's reading.

[0044] S102, extract the target text segment from the text sequence.

[0045] The target text segment consists of four characters. It's worth noting that after converting the speech to be recognized into a text sequence, it's unknown whether it contains idioms, or even if it does, its position within the text sequence. Since most idioms consist of four characters, this embodiment uses four characters as the standard to extract the target text segment from the text sequence and further identifies whether the target text segment is a misidentified idiom; if it is, it is corrected to the correct idiom.

[0046] Research has found that if no text sequence of the speech to be recognized is obtained, recognizing an idiom within it once will reduce the efficiency of speech recognition. Therefore, this embodiment also provides an idiom prediction model for initially determining whether a text sequence contains idioms. Thus, the optional implementation of step S102 includes:

[0047] S102-1, Input the text sequence into the idiom prediction model to obtain the prediction structure of the text sequence.

[0048] S102-2, If the probability is greater than the probability threshold, then extract the target text segment from the text sequence.

[0049] The idiom prediction model can be a binary classification model based on LSTM. After inputting the word vectors of the text sequence, it outputs the probability that the text sequence contains idioms. If the probability is greater than a preset probability threshold, it further identifies the possible idioms in the text sequence, thereby improving the efficiency of speech recognition.

[0050] To avoid missing any possible idioms, this embodiment uses a sliding window approach to extract target text segments from the text sequence. Specifically, the implementation of step S102-2 includes:

[0051] S102-2-1, if the previous text segment cannot be combined into an idiom, move a preset step length based on the current position of the sliding window in the text sequence, and take the intercepted text segment as the target text segment.

[0052] S102-2-2, if the previous text segment can be combined into an idiom, move a step length of 4 characters based on the current position of the sliding window in the text sequence, and take the intercepted text segment as the target text segment.

[0053] Exemplarily, assume that the text sequence is "Wasted this life and now, one will never get anything in the next life" as an example, and the size of the sliding window is 4 characters, and the step length is 1 character.

[0054] As Figure 2 shown, first intercept the target text segment "Wasted this", and determine whether it can form an idiom; if not, move the sliding window 1 character along the order of the text sequence, intercept the target text segment "Spent this life", and determine whether it can form an idiom; if not, continue to move 1 character along the order of the text sequence, and intercept the target text segment "This life and now".

[0055] As Figure 3 shown, intercept the target text segment from the text sequence in the manner Figure 2 shown. When the intercepted target text segment is "This life and now", since "This life and now" actually corresponds to the idiom "This life and this world", move the sliding window 4 characters along the order of the text sequence, and intercept the target text segment "One will never again". Thus, repeat the above implementation method until the text sequence is traversed.

[0056] S103, obtain multiple pronunciation combinations that are close to the pronunciation of the target text segment.

[0057] In this embodiment, there is an approximate pinyin library, which records the pinyin of a large number of characters and the pinyin of characters with similar pronunciations to these characters. For example, the pinyin of "今" is represented as "jin1", then the similar pronunciations include "jin1", "jin4", "jing1", "jing4", "jing1". Here, it should be understood that the number "1" in "jin1" represents the first tone; similarly, the number "2" represents the second tone, the number "3" represents the third tone, and the number "4" represents the fourth tone.

[0058] S103-1, obtain the pinyin of each target character in the target text segment.

[0059] S103-2, respectively obtain the approximate pinyin of the target characters from the approximate pinyin library according to the pinyin of each target character.

[0060] S103-3. Combine the approximate pinyins of multiple target characters to obtain multiple pronunciation combinations.

[0061] Continuing with the example of "今生今是", and restricting the number of approximate pronunciations of each target character to within 5, we obtain 5 approximate pinyins for the first "今", 5 approximate pinyins for "生", 5 approximate pronunciations for the second "今", and 5 approximate pinyins for "是". Then, through permutation and combination, we can obtain 5×5×5×5 = 625 pronunciation combinations.

[0062] S104. Match the multiple pronunciation combinations with the pronunciation combinations of multiple idioms to obtain the target idioms that match successfully.

[0063] In an optional implementation, we can adopt a method based on a finite state transducer (FST) to match the multiple pronunciation combinations with the pronunciation combinations of multiple idioms. In a specific implementation, step S104 includes:

[0064] S104-1. Generate a finite state transducer file to be matched based on the multiple pronunciation combinations.

[0065] S104-2. Match the finite state transducer file to be matched with the finite state transducer file of idioms to obtain the target idioms that match successfully.

[0066] In a specific implementation, we can use the tool OpenFST to generate the finite state transducer file to be matched and the finite state transducer file of idioms. Exemplarily, assume that the finite state transducer file to be matched is called T.fst, and the finite state transducer file of idioms is called L.fst. Then, perform a compose operation on T.fst and L.fst. If the match is successful, output the idiom that matches successfully; otherwise, output a return result indicating a failed match.

[0067] Exemplarily, continuing with the example of "今生今是", generate a finite state transducer file to be matched based on the multiple pronunciation combinations of "今生今是", and perform a compose operation on it and the finite state transducer file of idioms. If the operation is successful, output "今生今世".

[0068] Among them, the compose operation is a function provided in OpenFST for combining information at different levels. The low-level (fine-grained) a.fst can be composed with the high-level (coarse-grained) b.fst to generate a c.fst with the low-level (fine-grained) input and the high-level (coarse-grained) output. For ease of understanding, the following is combined with Figure 4 、 Figure 5 、 Figure 6This paper provides an exemplary introduction to the principles of the compose operation in the tool OpenFST.

[0069] Figure 4 A finite automaton file is shown, which is called a.fst; Figure 5 Another finite automaton file is shown, which is called b.fst. From the figure, we can see that a finite automaton file contains three elements: the initial state, the final state (the final state contains weights), and the transition edges. The transition edges have input symbols, output symbols, and edge weights.

[0070] for Figure 5 The finite automaton shown will output xz when the input is ac, with a weight sum of 6.5.

[0071] When the input is bc, the output is yz, with a weighted sum of 7.5.

[0072] This embodiment will Figure 4 and Figure 5 The finite automata shown is composed to obtain the following: Figure 6 c.fst shown:

[0073] c.fst = a.fst compose b.fst

[0074] As can be seen, performing the `compose` operation on the contents of the two finite automata files a.fst and b.fst effectively preserves the parts of a.fst's output that are identical to b.fst's input, and regenerates c.fst. This embodiment only uses the existing OpenFST. For a more detailed introduction to the `compose` operation in OpenFST, please refer to the relevant documentation.

[0075] Thus, this embodiment generates multiple pronunciation combinations that are close to the pronunciation of the target text fragment, matches them with existing pronunciation combinations of multiple idioms, and determines the target idiom that is successfully matched from multiple idioms, thereby improving the accuracy of idiom recognition based on existing speech recognition models.

[0076] In addition, to avoid introducing subjective human error, a large number of sample statistics were used to obtain the approximate pinyin for each target character.

[0077] In a specific implementation, the electronic device can also acquire multiple sample voices; input the multiple sample voices into the speech recognition model to obtain the text sequences of each sample voice.

[0078] Then, the electronic device counts the homophonic characters that are close to the pronunciation of each sample text according to the text sequences of multiple sample voices, where the sample text refers to the characters that appear in the actual text sequences of multiple sample voices;

[0079] Finally, the electronic device generates an approximate pinyin library based on the homophonic characters of each sample text.

[0080] Exemplarily, 10 million sample voices are randomly selected as the corpus, and the recognition results are obtained by using a trained speech recognition model. By comparing the recognition results with the labeled results, the homophonic characters of Chinese characters can be obtained.

[0081] For example, the homophonic characters of a sample voice are "Today is a nice day", and the recognition result is "Jin Tian is a Hao Tian Qi". It can be concluded that "Jin" is the homophonic character of "今", "Tian" is the homophonic character of "天", and "Hao" is the homophonic character of "好". Of course, "今" is also regarded as the homophonic character of "今". In this way, after a large number of statistics, for each character, the top 5 frequently occurring homophonic characters are selected, and the pinyins of the 5 homophonic characters are used as the approximate pinyins of the pinyin of this character.

[0082] In this way, compared with specifying the approximate pinyin of each pinyin, the approximate pinyin statistically obtained by the statistical model based on the speech recognition model is more adaptable to the recognition rules of the speech recognition model.

[0083] Based on the same inventive concept as the idiom recognition method provided in this embodiment, this embodiment also provides an idiom recognition device. Among them, the idiom recognition device includes at least one software function module that can be stored in the memory in software form or固化 in the operating system (Operating System, abbreviated as OS) of the electronic device. The processor in the electronic device is used to execute the executable module stored in the memory. For example, the software function modules and computer programs included in the idiom recognition device, etc. Please refer to Figure 7 As shown, functionally divided, the idiom recognition device may include:

[0084] The text sequence module 101 is used to obtain the text sequence of the voice to be recognized.

[0085] In this embodiment, the text sequence module 101 is used to implement Figure 1 Step S101 in. For a detailed description of the text sequence module 101, please refer to the detailed introduction of step S101.

[0086] The text fragment module 102 is used to intercept the target text fragment from the text sequence, where the target text fragment includes 4 characters.

[0087] In this embodiment, the text fragment module 102 is used to implement Figure 1For a detailed description of the text fragment module 102, see the detailed introduction of step S102.

[0088] The idiom recognition module 103 is used to obtain multiple pronunciation combinations that are close to the pronunciation of the target text fragment.

[0089] The idiom recognition module 103 is also used to match multiple pronunciation combinations with the pronunciation combinations of multiple idioms to obtain the target idiom that has been successfully matched.

[0090] In this embodiment, the idiom recognition module 103 is used to implement Figure 1 For a detailed description of the text fragment module 102, see steps S103-S104.

[0091] It is worth noting that, since they share the same inventive concept as the idiom recognition method, the above-mentioned text sequence module 101, text fragment module 102, and idiom recognition module 103 can also be used to implement other steps or sub-steps of the idiom recognition method. This embodiment does not specifically limit these steps.

[0092] In addition, the functional modules in the various embodiments of this application can be integrated together to form an independent part, or each module can exist independently, or two or more modules can be integrated to form an independent part.

[0093] It should also be understood that if the above embodiments are implemented as software functional modules and sold or used as independent products, they can be stored in a computer-readable storage medium. Based on this understanding, the technical solution of this application, in essence, or the part that contributes to the prior art, or part of the technical solution, can be embodied in the form of a software product. This computer software product is stored in a storage medium and includes several instructions to cause a computer device (which may be a personal computer, a server, or a network device, etc.) to execute all or part of the steps of the methods described in the various embodiments of this application.

[0094] Therefore, this embodiment also provides a storage medium storing a computer program. When the computer program is executed by a processor, it implements the idiom recognition method provided in this embodiment. The storage medium can be any medium capable of storing program code, such as a USB flash drive, external hard drive, read-only memory (ROM), random access memory (RAM), magnetic disk, or optical disk.

[0095] Please refer to Figure 8This embodiment provides an electronic device that may include a processor 202 and a memory 201. The memory 201 stores a computer program, and the processor reads and executes the computer program in the memory 201 corresponding to the above-described embodiments to implement the idiom recognition method provided in this embodiment.

[0096] See also Figure 8 The electronic device may also include a communication unit. The memory 201, processor 202 and communication unit 203 are electrically connected to each other directly or indirectly through system bus 204 to realize data transmission or interaction.

[0097] The memory 201 can be an information recording device based on any electronic, magnetic, optical, or other physical principles, used to record execution instructions, data, etc. In some embodiments, the memory 201 can be, but is not limited to, volatile memory, non-volatile memory, memory drive, etc.

[0098] In some embodiments, the volatile memory may be random access memory (RAM); in some embodiments, the non-volatile memory may be read-only memory (ROM), programmable read-only memory (PROM), erasable programmable read-only memory (EPROM), electrically erasable programmable read-only memory (EEPROM), flash memory, etc.; in some embodiments, the storage drive may be a disk drive, solid-state drive, any type of storage disk (such as optical disc, DVD, etc.), or similar storage media, or a combination thereof.

[0099] The communication unit 203 is used to send and receive data over a network. In some embodiments, the network may include a wired network, a wireless network, a fiber optic network, a telecommunications network, an intranet, the Internet, a local area network (LAN), a wide area network (WAN), a wireless local area network (WLAN), a metropolitan area network (MAN), a public switched telephone network (PSTN), a Bluetooth network, a ZigBee network, or a near field communication (NFC) network, or any combination thereof. In some embodiments, the network may include one or more network access points. For example, the network may include wired or wireless network access points, such as base stations and / or network switching nodes, through which one or more components of the service request processing system can connect to the network to exchange data and / or information.

[0100] The processor 202 may be an integrated circuit chip with signal processing capabilities, and the processor may include one or more processing cores (e.g., a single-core processor or a multi-core processor). By way of example only, the processor described above may include a Central Processing Unit (CPU), an Application-Specific Integrated Circuit (ASIC), an Application-Specific Instruction-set Processor (ASIP), a Graphics Processing Unit (GPU), a Physics Processing Unit (PPU), a Digital Signal Processor (DSP), a Field Programmable Gate Array (FPGA), a Programmable Logic Device (PLD), a controller, a microcontroller unit, a Reduced Instruction Set Computing (RISC) computer, or a microprocessor, or any combination thereof.

[0101] It should be understood that the apparatus and methods disclosed in the above embodiments can also be implemented in other ways. The apparatus embodiments described above are merely illustrative. For example, the flowcharts and block diagrams in the accompanying drawings show the architecture, functionality, and operation of possible implementations of apparatus, methods, and computer program products according to various embodiments of this application. In this regard, each block in a flowchart or block diagram may represent a module, segment, or portion of code containing one or more executable instructions for implementing a specified logical function. It should also be noted that in some alternative implementations, the functions marked in the blocks may occur in a different order than those marked in the drawings. For example, two consecutive blocks may actually be executed substantially in parallel, and they may sometimes be executed in reverse order, depending on the functions involved. It should also be noted that each block in a block diagram and / or flowchart, and combinations of blocks in block diagrams and / or flowcharts, can be implemented using a dedicated hardware-based system that performs the specified function or action, or using a combination of dedicated hardware and computer instructions.

[0102] The above descriptions are merely various embodiments of this application, but the scope of protection of this application is not limited thereto. Any variations or substitutions that can be easily conceived by those skilled in the art within the technical scope disclosed in this application should be included within the scope of protection of this application. Therefore, the scope of protection of this application should be determined by the scope of the claims.

Claims

1. An idiom recognition method, characterized by, The method includes: Obtain the text sequence of the speech to be recognized; Extract a target text segment from the text sequence, wherein the target text segment includes 4 characters; Obtain the pinyin of each target character in the target text fragment; Based on the pinyin of each target character, obtain the approximate pinyin of the target character from the approximate pinyin library; By combining the approximate pinyin of multiple target characters, multiple pronunciation combinations can be obtained; The multiple pronunciation combinations are matched with the pronunciation combinations of multiple idioms to obtain the target idioms that have been successfully matched.

2. The idiom recognition method of claim 1, wherein, The method further includes: Obtain multiple sample voice messages; The multiple sample speech samples are input into the speech recognition model to obtain the text sequences of each sample speech sample. Based on the text sequences of the multiple sample speech samples, near-homophones that are close to the pronunciation of each sample text are counted, wherein the sample text refers to the text that appears in the actual text sequence of the multiple sample speech samples; The approximate pinyin library is generated based on the near-homophones of each sample character.

3. The idiom recognition method of claim 1, wherein, The process of obtaining the text sequence of the speech to be recognized includes: Receive the voice to be recognized; The speech to be recognized is input into the speech recognition model to obtain the text sequence of the speech to be recognized.

4. The idiom recognition method of claim 1, wherein, The step of matching the multiple pronunciation combinations with the pronunciation combinations of multiple idioms to obtain the target idiom with a successful match includes: Based on the multiple pronunciation combinations, a finite automaton file to be matched is generated; The finite automata file to be matched is matched with the finite automata file of idioms to obtain the target idioms that are successfully matched.

5. The idiom recognition method of claim 1, wherein, The step of extracting the target text segment from the text sequence includes: The text sequence is input into the idiom prediction model to obtain the probability that an idiom exists in the text sequence; If the probability is greater than the probability threshold, then the target text segment is extracted from the text sequence.

6. The idiom recognition method of claim 5, wherein, The step of extracting the target text segment from the text sequence includes: If the previous text fragment cannot be combined into an idiom, then based on the current position of the sliding window in the text sequence, a preset step size is moved, and the extracted text fragment is taken as the target text fragment; If the previous text fragment can be combined into an idiom, then based on the current position of the sliding window in the text sequence, move the sliding window by a step of 4 characters, and take the extracted text fragment as the target text fragment.

7. An idiom recognition apparatus characterized by comprising: The device includes: The text sequence module is used to obtain the text sequence of the speech to be recognized; A text fragment module is used to extract a target text fragment from the text sequence, wherein the target text fragment includes 4 characters; The idiom recognition module is used to obtain the pinyin of each target character in the target text segment; obtain the approximate pinyin of the target character from the approximate pinyin library according to the pinyin of each target character; and combine the approximate pinyin of multiple target characters to obtain multiple pronunciation combinations. The idiom recognition module is also used to match the multiple pronunciation combinations with the pronunciation combinations of multiple idioms to obtain the target idiom that has been successfully matched.

8. A storage medium, characterized by The storage medium stores a computer program, which, when executed by a processor, implements the idiom recognition method according to any one of claims 1-6.

9. An electronic device, comprising: The electronic device includes a processor and a memory, the memory storing a computer program, which, when executed by the processor, implements the idiom recognition method according to any one of claims 1-6.