Speech transcription multi-language transcription and switching method

By analyzing the language information collected by the microphone and using similarity evaluation values ​​to determine the speech transcription process, the problem of cumbersome language switching in existing technologies is solved, and fast and accurate multilingual translation is achieved.

CN116844549BActive Publication Date: 2026-06-30GUANGZHOU BAOLUN ELECTRONICS CO LTD

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
GUANGZHOU BAOLUN ELECTRONICS CO LTD
Filing Date
2023-07-10
Publication Date
2026-06-30

AI Technical Summary

Technical Problem

In existing technologies, speech-to-text systems require re-entry when changing languages, making the process cumbersome.

Method used

The central control unit analyzes the language information collected by the microphone, determines the probability of language types and sorts them, uses the corresponding language engine to perform text conversion, and optimizes the translation process through similarity evaluation values ​​and processing methods, including synonym replacement, noise reduction and other means.

Benefits of technology

It improved translation accuracy, shortened the judgment process, quickly resolved issues that did not meet preset standards, avoided recognition errors and noise interference, and protected the equipment.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN116844549B_ABST
    Figure CN116844549B_ABST
Patent Text Reader

Abstract

The present application relates to the technical field of voice communication, and particularly relates to a voice transcription multi-language transcription and switching method. The method comprises the following steps: S1, identifying a language type; S2, selecting a corresponding language engine for translation; S3, calculating a similarity evaluation value to determine whether the voice transcription meets a preset standard; S4, if the standard is met, the process continues; if the standard is not met, synonymous replacement is performed and the translation is re-performed, or the translation mode is adjusted to re-perform the translation; and S5, performing secondary determination on the re-translated result. Compared with the prior art, the present application has the beneficial effect that by calculating the similarity evaluation value and comparing it with the preset standard, the reason why the preset standard is not met can be quickly analyzed, so that the corresponding processing mode can be quickly determined, and the voice transcription process can be quickly adjusted according to the processing mode, thereby improving the accuracy of translation.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of voice communication technology, and in particular to a method for voice transcription and multilingual transcription and switching. Background Technology

[0002] Chinese Patent Application No. 202110371093.9 discloses a speech-to-text system and method based on a multilingual model, comprising: a platform, a client connected to the platform, a storage module, a voice service module, and a display module connected to the client; the platform is used to receive information sent by the client and the voice service module, and to send information to the client and the voice service module; the client is used to input user personal information and send it to the platform, send information sent by the platform to the user, and display information through the display module; the storage module is used to store voice data; the voice service module is used to transcribe and translate the user's voice data, and generate transcribed text and translated text; it can be seen that the speech-to-text system and method based on the multilingual model has the following problems: it can only recognize speech based on the input language type, and the language type must be re-entered when changing to another language, making the process cumbersome. Summary of the Invention

[0003] Therefore, the present invention provides a speech-to-text multilingual transcription and switching method to overcome the problem of cumbersome process in the prior art.

[0004] To achieve the above objectives, this invention provides a speech-to-text multilingual transcription and switching method. It includes:

[0005] Step S1: The central control unit analyzes the probability of the language type of the language information collected by several microphones set up in the venue, sorts the language types, and takes the language type with the highest probability as the preferred language type for the microphone.

[0006] Step S2: The central control unit selects the corresponding language engine to convert the preferred language type into first language text corresponding to each microphone.

[0007] Step S3: The central control unit selects a single microphone as the judgment target, randomly selects the recognized language type, and converts the first language text corresponding to the microphone of the judgment target into another language type recognized by a randomly selected other microphone to obtain the corresponding second language text. The second language text is then converted back into a new third language text corresponding to the judgment target. The central control unit determines whether the speech transcription meets the preset standard based on the similarity evaluation value between the third language text and the first language text.

[0008] Step S4: If the central control unit determines that the speech transcription meets the preset standard, then the speech transcription continues; if the central control unit determines that the speech transcription does not meet the preset standard, then the central control unit replaces the unconverted words in the first language text with synonyms and re-translates them, or adjusts the translation method to re-translate them.

[0009] In step S5, the central control unit calculates a new similarity evaluation value to make a secondary judgment on whether the speech transcription meets the preset standard. After the judgment meets the preset standard, the speech transcription text is provided to the corresponding microphone user for reading through the output unit according to the language of each microphone.

[0010] Further, in step S3, the central control unit divides the identified text information into several blocks composed of single characters, and records two adjacent blocks with meaning as a feature word group. The central control unit records the proportion of the feature word group in the first language text as 'a', the proportion of the feature word group in the third language text as 'A', the number of synonymous feature word groups in the first language text and the third language text as 'C', and the number of other feature word groups besides synonymous feature word groups among all feature words in the first language text and the third language text as 'D', and sets a similarity evaluation value. .

[0011] Furthermore, the central control unit determines a method for judging whether the speech transcription of the language information meets a preset standard based on the similarity evaluation value R, wherein:

[0012] The first determination method is that the central control unit determines that the speech transcription of the language information meets the preset standard, and continues to perform speech transcription on the next piece of language information; the first determination method satisfies that the similarity evaluation value is greater than the preset first-level similarity in the central control unit;

[0013] The second determination method is that the central control unit determines that the speech transcription does not meet the preset standard. The central control unit detects the number of feature words that are different from the original feature words and are not synonyms after the conversion, and determines the optimization method of speech transcription based on the number of feature words that are different from the original feature words and are not synonyms after the conversion. The second determination method satisfies that the similarity evaluation value is less than or equal to the first-level preset similarity and greater than or equal to the second-level preset similarity preset in the central control unit, and the second-level preset similarity is less than the first-level preset similarity.

[0014] The third determination method is that the central control unit determines that the speech transcription does not meet the preset standard and the reason for not meeting the preset standard is that there is noise in the speech extraction process. The central control unit determines the noise reduction method in the speech extraction process based on the difference between the secondary preset similarity and the similarity evaluation value R. The third determination method satisfies that the similarity evaluation value is less than the secondary preset similarity.

[0015] Furthermore, under the second determination method, the central control unit records the number of feature words that differ from the original feature words and are not synonyms after conversion as the number of unqualified feature word groups, and determines the specific reasons why the speech transcription does not meet the preset standard based on the number of unqualified features, wherein:

[0016] The first cause determination method is that the central control unit determines that the speech transcription does not meet the preset standard because the feature word group has multiple interpretations, and the central control unit re-translates the remaining feature word groups in the language text; the first cause determination method satisfies that the number of unqualified words is less than or equal to the preset number of unqualified words in the central control unit;

[0017] The second cause determination method is that the central control unit determines that the reason why the speech transcription does not meet the preset standard is that the translation of the feature word does not meet the preset standard, and determines the translation method of the feature word based on the difference between the number of unqualified words and the preset number of unqualified words; the second cause determination method satisfies that the number of unqualified words is greater than the preset number of unqualified words.

[0018] Furthermore, under the first cause determination method, the central control unit re-translates the remaining feature word groups in the language text, replaces the feature word groups with synonyms of multiple meanings to form several new sentences, converts the resulting several new sentences into the language corresponding to a randomly selected language type to obtain the updated second language text, converts the updated second text back into the updated third language text corresponding to the judgment target, calculates the similarity evaluation value R1 between the first language text and the updated third language text, and selects the updated third language text with the highest evaluation value R1 to perform a second judgment on whether the speech transcription meets the preset standard.

[0019] Furthermore, the central control unit calculates the updated similarity evaluation value R1 to determine a secondary judgment method for whether the speech transcription meets the preset standard, wherein:

[0020] The first and second determination methods are as follows: the central control unit determines that the speech transcription meets the preset standard, and outputs the language text converted according to the language corresponding to each microphone to the corresponding microphone user for reading through the output unit; the first and second determination methods satisfy that the updated similarity evaluation value R1 is greater than the first-level preset similarity.

[0021] The second and second determination method is that the central control unit determines that the speech transcription does not meet the preset standard. The reason for the speech transcription not meeting the preset standard is noise problem in the speech extraction process. The noise reduction method in the speech extraction process is determined according to the difference between the second-level preset similarity and the similarity evaluation value R. The second and second determination method satisfies that the updated similarity evaluation value R1 is less than or equal to the first-level preset similarity.

[0022] Furthermore, under the second cause determination method, the central control unit selects the second language with the second highest probability after the preferred language type for re-speech transcription and re-determination.

[0023] Furthermore, under the third determination method, the central control unit records the difference between the secondary preset similarity and the similarity evaluation value R as the secondary similarity difference, and determines the noise reduction method in the language extraction process based on the secondary similarity difference, wherein:

[0024] The first noise reduction method is that the central control unit uses a first power correction coefficient to correct the power of the filter to a first power; the first noise reduction method satisfies that the second-level similarity difference is greater than the first and second-level preset similarity difference preset in the central control unit;

[0025] The second noise reduction method is that the central control unit uses a second power correction coefficient to correct the power of the filter to a second power; the second noise reduction method satisfies that the second-level similarity difference is less than or equal to the first second-level preset similarity difference and greater than or equal to the second second-level preset similarity difference preset in the central control unit, and the first second-level preset similarity difference is greater than the second second-level preset similarity difference;

[0026] The third noise reduction method is that the central control unit uses a third power correction coefficient to correct the power of the filter to the third power; the third noise reduction method satisfies that the second-level similarity difference is less than the second-level preset similarity difference.

[0027] Furthermore, after adjusting the filter power to the critical power, the central control unit issues a noise warning to reduce ambient noise.

[0028] Furthermore, in step S1, the central control unit compares the special pronunciations in each language as characteristic sound waves with the collected sound wave information to determine the language type corresponding to the language collected by the microphone.

[0029] Compared with the prior art, the beneficial effects of the present invention are as follows: by calculating the probability of the language type corresponding to the microphone and sorting them according to the probability, the language type with the highest probability is recorded as the preferred language type. This allows for the rapid determination of the language type corresponding to each microphone. After selecting the corresponding language engine to generate the first language text, the language text obtained by each microphone is translated into the second language text corresponding to the other microphones. The second text is then translated back into the language corresponding to the first text to generate the third language text. The similarity evaluation value is calculated and compared with the preset standard, which allows for the rapid analysis of the reasons for not meeting the preset standard. This enables the rapid determination of the corresponding processing method and the rapid adjustment of the speech-to-text process according to the processing method, thereby improving the accuracy of the translation.

[0030] Furthermore, by setting a similarity evaluation value, this invention can quantify whether the meaning of the second text is the same as that of the first text. It can directly determine whether the meanings are the same through numerical comparison, thereby quickly judging whether the translation is accurate, shortening the judgment process, and thus further improving the accuracy of translation.

[0031] Furthermore, the present invention compares the obtained similarity evaluation value with a preset standard, which can quickly determine whether the speech transcription meets the preset standard, and determine the corresponding processing method when the preset standard is not met, which can quickly solve the problem of not meeting the preset standard, thereby further improving the accuracy of translation.

[0032] Furthermore, by identifying the specific reasons why the quantity of non-conforming items does not meet the preset standard, the present invention can quickly make adjustments based on the specific reasons, thereby further improving the accuracy of translation.

[0033] Furthermore, by replacing the remaining feature words with synonyms and re-translating the first text before calculating the updated similarity evaluation value, the present invention can re-evaluate the re-translated text, thus avoiding changes in the meaning of the second text due to improper translation when a word has multiple meanings, thereby further improving the accuracy of translation.

[0034] Furthermore, this invention performs a secondary judgment on speech transcription by calculating the updated similarity evaluation value, which can quickly generate the corresponding language text or determine that the reason for not meeting the preset standard is noise and quickly perform noise reduction processing, thereby further improving the accuracy of translation.

[0035] Furthermore, this invention improves translation accuracy by selecting the next language for speech recognition and calculating a similarity evaluation value when the preferred language does not meet the preset standard. This avoids translation errors caused by speech recognition mistakes or personal accents that could lead to a difference between the translation result and the original meaning.

[0036] Furthermore, by calculating the second-order similarity difference, this invention can quickly determine the adjustment method for the power of the filter, thereby quickly adjusting the power of the filter and filtering out noise, avoiding noise interference with the speech recognition process, and thus further improving the accuracy of translation.

[0037] Furthermore, this invention issues a warning after the filter power is adjusted to a critical value to prevent excessive adjustment of the filter power and damage to the filter, thus protecting the filter's lifespan.

[0038] Furthermore, this invention can quickly identify language types by analyzing the characteristic sound waves of language, thereby further improving the accuracy of translation. Attached Figure Description

[0039] Figure 1 This is a flowchart of the speech-to-text multilingual transcription and switching method described in this invention;

[0040] Figure 2 This is a flowchart illustrating the determination method described in this invention.

[0041] Figure 3 This is a flowchart of the secondary determination method described in this invention;

[0042] Figure 4 This is a flowchart illustrating the noise reduction method determination process described in this invention. Detailed Implementation

[0043] To make the objectives and advantages of the present invention clearer, the present invention will be further described below with reference to embodiments; it should be understood that the specific embodiments described herein are merely for explaining the present invention and are not intended to limit the present invention.

[0044] Preferred embodiments of the present invention will now be described with reference to the accompanying drawings. Those skilled in the art should understand that these embodiments are merely illustrative of the technical principles of the present invention and are not intended to limit the scope of protection of the present invention.

[0045] It should be noted that in the description of this invention, the terms "upper", "lower", "left", "right", "inner", "outer", etc., which indicate directions or positional relationships, are based on the directions or positional relationships shown in the accompanying drawings. This is only for the convenience of description and is not intended to indicate or imply that the device or element must have a specific orientation, or be constructed and operated in a specific orientation. Therefore, it should not be construed as a limitation of this invention.

[0046] Furthermore, it should be noted that, in the description of this invention, unless otherwise explicitly specified and limited, the terms "installation," "connection," and "linking" should be interpreted broadly. For example, they can refer to a fixed connection, a detachable connection, or an integral connection; they can refer to a mechanical connection or an electrical connection; they can refer to a direct connection or an indirect connection through an intermediate medium; and they can refer to the internal connection of two components. Those skilled in the art can understand the specific meaning of the above terms in this invention according to the specific circumstances.

[0047] Please see Figure 1 As shown, it is a flowchart of the speech-to-text multilingual transcription and switching method of the present invention, including:

[0048] Step S1: The central control unit analyzes the probability of the language type of the language information collected by several microphones set up in the venue, sorts the language types, and takes the language type with the highest probability as the preferred language type for the microphone.

[0049] Step S2: The central control unit selects the corresponding language engine to convert the preferred language into first language text corresponding to each language type of each microphone;

[0050] Step S3: The central control unit selects a single microphone as the judgment target, randomly selects the recognized language type, and converts the first language text corresponding to the microphone of the judgment target into another language type recognized by a randomly selected other microphone to obtain the corresponding second language text. The second language text is then converted back into a new third language text corresponding to the judgment target. The central control unit determines whether the speech transcription meets the preset standard based on the similarity evaluation value between the third language text and the first language text.

[0051] Step S4: If the central control unit determines that the speech transcription meets the preset standard, then the speech transcription continues; if the central control unit determines that the speech transcription does not meet the preset standard, then the central control unit replaces the unconverted words in the first language text with synonyms and re-translates them, or adjusts the translation method to re-translate them.

[0052] In step S5, the central control unit calculates a new similarity evaluation value to make a secondary judgment on whether the speech transcription meets the preset standard. After the judgment meets the preset standard, the speech transcription text is provided to the corresponding microphone user for reading through the output unit according to the language of each microphone.

[0053] Specifically, in step S3, the central control unit divides the identified text information into several blocks composed of single characters, and records two adjacent blocks with meaning as a feature word group. The central control unit records the proportion of the feature word group in the first language text as 'a', the proportion of the feature word group in the third language text as 'A', the number of synonymous feature word groups in the first language text and the third language text as 'C', and the number of other feature word groups besides synonymous feature word groups among all feature words in the first language text and the third language text as 'D', and sets a similarity evaluation value. .

[0054] Please see Figure 2 As shown, it is a flowchart of the determination method of the present invention. The central control unit determines whether the speech transcription of the language information meets the preset standard based on the similarity evaluation value R, wherein:

[0055] The first determination method is that the central control unit determines that the speech transcription of the language information meets the preset standard, and continues to perform speech transcription on the next piece of language information; the first determination method satisfies that the similarity evaluation value is greater than the preset first-level similarity of 0.9 in the central control unit;

[0056] The second determination method is that the central control unit determines that the speech transcription does not meet the preset standard. The central control unit detects the number of feature words that are different from the original feature words and are not synonyms after conversion, and determines the optimization method of speech transcription based on the number of feature words that are different from the original feature words and are not synonyms after conversion. The second determination method satisfies that the similarity evaluation value is less than or equal to the first-level preset similarity and greater than or equal to the second-level preset similarity of 0.7 preset in the central control unit.

[0057] The third determination method is that the central control unit determines that the speech transcription does not meet the preset standard and the reason for not meeting the preset standard is that there is noise in the speech extraction process. The central control unit determines the noise reduction method in the speech extraction process based on the difference between the secondary preset similarity and the similarity evaluation value R. The third determination method satisfies that the similarity evaluation value is less than the secondary preset similarity.

[0058] Specifically, under the second determination method, the central control unit records the number of feature words that differ from the original feature words and are not synonyms after conversion as the number of unqualified feature word groups. Based on the number of unqualified features, it determines the specific reasons why the speech transcription does not meet the preset standard, wherein:

[0059] The first cause determination method is that the central control unit determines that the speech transcription does not meet the preset standard because the feature word group has multiple interpretations, and the central control unit re-translates the remaining feature word groups in the language text; the first cause determination method satisfies that the number of unqualified words is less than or equal to the preset number of unqualified words 3 preset in the central control unit;

[0060] The second cause determination method is that the central control unit determines that the reason why the speech transcription does not meet the preset standard is that the translation of the feature word does not meet the preset standard, and determines the translation method of the feature word based on the difference between the number of unqualified words and the preset number of unqualified words; the second cause determination method satisfies that the number of unqualified words is greater than the preset number of unqualified words.

[0061] Specifically, under the first cause determination method, the central control unit re-translates the remaining feature words in the language text, replaces the feature words with synonyms of multiple meanings of the feature words to form several new sentences, converts the resulting several new sentences into the language corresponding to a randomly selected language type to obtain the updated second language text, converts the updated second text back into the updated third language text corresponding to the judgment target, calculates the similarity evaluation value between the first language text and the updated third language text, selects the highest similarity evaluation value as R1, and makes a second judgment on whether the speech transcription meets the preset standard based on the updated third language text.

[0062] Please see Figure 3 As shown, this is a flowchart of the secondary judgment method described in this invention. The central control unit calculates the updated similarity evaluation value R1 to determine the secondary judgment method for whether the speech transcription meets the preset standard, wherein:

[0063] The first and second determination methods are as follows: the central control unit determines that the speech transcription meets the preset standard, and outputs the language text converted according to the language corresponding to each microphone to the corresponding microphone user for reading through the output unit; the first and second determination methods satisfy that the updated similarity evaluation value R1 is greater than the first-level preset similarity.

[0064] The second and second determination method is that the central control unit determines that the speech transcription does not meet the preset standard. The reason for the speech transcription not meeting the preset standard is noise problem in the speech extraction process. The noise reduction method in the speech extraction process is determined according to the difference between the second-level preset similarity and the similarity evaluation value R. The second and second determination method satisfies that the updated similarity evaluation value R1 is less than or equal to the first-level preset similarity.

[0065] Specifically, under the second cause determination method, the central control unit selects the second language with the second highest probability after the preferred language type for re-speech transcription and re-determination.

[0066] Please see Figure 4 As shown, this is a flowchart of the noise reduction method determination process of the present invention. Under the third determination method, the central control unit records the difference between the secondary preset similarity and the similarity evaluation value R as the secondary similarity difference, and determines the noise reduction method in the language extraction process based on the secondary similarity difference, wherein:

[0067] The first noise reduction method is that the central control unit uses a first power correction coefficient of 1.1 to correct the power of the filter to a first power; the first noise reduction method satisfies that the second-level similarity difference is greater than the first second-level preset similarity difference of 0.15 preset in the central control unit;

[0068] The second noise reduction method is that the central control unit uses a second power correction coefficient of 1.2 to correct the power of the filter to a second power; the second noise reduction method satisfies that the second-level similarity difference is less than or equal to the first second-level preset similarity difference and greater than or equal to the second second-level preset similarity difference of 0.10 preset in the central control unit;

[0069] The third noise reduction method is that the central control unit uses a third power correction coefficient of 1.3 to correct the power of the filter to the third power; the third noise reduction method satisfies that the second-level similarity difference is less than the second-level preset similarity difference.

[0070] Specifically, the central control unit adjusts the filter power to the critical power and then issues a noise warning to reduce ambient noise.

[0071] Specifically, in step S1, the central control unit compares the special pronunciations in each language as characteristic sound waves with the collected sound wave information to determine the language type corresponding to the language collected by the microphone.

[0072] The technical solution of the present invention has been described above with reference to the preferred embodiments shown in the accompanying drawings. However, it will be readily understood by those skilled in the art that the scope of protection of the present invention is obviously not limited to these specific embodiments. Without departing from the principles of the present invention, those skilled in the art can make equivalent changes or substitutions to the relevant technical features, and the technical solutions after these changes or substitutions will all fall within the scope of protection of the present invention.

[0073] The above description is merely a preferred embodiment of the present invention and is not intended to limit the invention. Various modifications and variations can be made to the present invention by those skilled in the art. Any modifications, equivalent substitutions, improvements, etc., made within the spirit and principles of the present invention should be included within the scope of protection of the present invention.

Claims

1. A voice transcription multi-language transcription and switching method, characterized by, include: Step S1: The central control unit analyzes the probability of the language type of the language information collected by several microphones set up in the venue, sorts the language types, and takes the language type with the highest probability as the preferred language type for the microphone. Step S2: The central control unit selects the corresponding language engine to convert the preferred language into first language text corresponding to each language type of each microphone; Step S3: The central control unit selects a single microphone as the judgment target, randomly selects the recognized language type, and converts the first language text corresponding to the microphone of the judgment target into another language type recognized by a randomly selected other microphone to obtain the corresponding second language text. The second language text is then converted back into the third language text corresponding to the judgment target. The central control unit determines whether the speech transcription meets the preset standard based on the similarity evaluation value between the third language text and the first language text. Step S4: If the central control unit determines that the speech transcription meets the preset standard, then continue the speech transcription process. If the central control unit determines that the speech transcription does not meet the preset standard, the central control unit will replace the unconverted words in the first language text with synonyms and re-translate them, or adjust the translation method to re-translate them. Step S5: The central control unit calculates the updated similarity evaluation value to make a second judgment on whether the speech transcription meets the preset standard. After the judgment meets the preset standard, the speech text converted according to the language of each microphone is provided to the corresponding microphone user for reading through the output unit. In step S3, the central control unit divides the identified text information into at least one block composed of a single character, and records two adjacent blocks with meaning as a feature word group. The central control unit records the proportion of the feature word group in the first language text as 'a', the proportion of the feature word group in the third language text as 'A', the number of synonymous feature word groups in the first language text and the third language text as 'C', and the number of other feature word groups besides synonymous feature word groups among all feature words in the first language text and the third language text as 'D'. A similarity evaluation value is then set. .

2. The speech-to-text multilingual transcription and switching method according to claim 1, characterized in that, The central control unit determines a method for judging whether the speech transcription of the language information meets a preset standard based on the similarity evaluation value R, wherein: The first determination method is that the central control unit determines that the speech transcription of the language information meets the preset standard, and continues to perform speech transcription on the next piece of language information; the first determination method satisfies that the similarity evaluation value is greater than the preset first-level similarity in the central control unit; The second determination method is that the central control unit determines that the speech transcription does not meet the preset standard. The central control unit detects the number of feature words that are different from the original feature words and are not synonyms after conversion, and determines the optimization method of speech transcription based on the number of feature words that are different from the original feature words and are not synonyms after conversion. The second determination method satisfies that the similarity evaluation value is less than or equal to the first-level preset similarity and greater than or equal to the second-level preset similarity preset in the central control unit, and the second-level preset similarity is less than the first-level preset similarity. The third determination method is that the central control unit determines that the speech transcription does not meet the preset standard and the reason for not meeting the preset standard is that there is noise in the speech extraction process. The central control unit determines the noise reduction method in the speech extraction process based on the difference between the secondary preset similarity and the similarity evaluation value R. The third determination method satisfies that the similarity evaluation value is less than the secondary preset similarity.

3. The speech-to-text multilingual transcription and switching method according to claim 2, characterized in that, Under the second determination method, the central control unit records the number of feature words that differ from the original feature words and are not synonyms after conversion as the number of unqualified feature word groups. Based on the number of unqualified features, it determines the specific reasons why the speech transcription does not meet the preset standards, wherein: The first cause determination method is that the central control unit determines that the speech transcription does not meet the preset standard because the feature word group has multiple interpretations, and the central control unit re-translates the remaining feature word groups in the language text; the first cause determination method satisfies that the number of unqualified words is less than or equal to the preset number of unqualified words in the central control unit; The second cause determination method is that the central control unit determines that the reason why the speech transcription does not meet the preset standard is that the translation of the feature word does not meet the preset standard, and determines the translation method of the feature word based on the difference between the number of unqualified words and the preset number of unqualified words; the second cause determination method satisfies that the number of unqualified words is greater than the preset number of unqualified words.

4. The speech-to-text multilingual transcription and switching method according to claim 3, characterized in that, Under the first cause determination method, the central control unit re-translates the remaining feature word groups in the language text, replaces the feature word groups with synonyms of multiple meanings to form several new sentences, converts the resulting several new sentences into the language corresponding to a randomly selected language type to obtain the updated second language text, converts the updated second text back into the updated third language text corresponding to the judgment target, calculates the similarity evaluation value between the first language text and the updated third language text, selects the highest similarity evaluation value as R1, and makes a second judgment on whether the speech transcription meets the preset standard based on the updated third language text.

5. The speech-to-text multilingual transcription and switching method according to claim 4, characterized in that, The central control unit calculates the updated similarity evaluation value R1 to determine a secondary judgment method for whether the speech transcription meets the preset standard, wherein: The first and second determination methods are as follows: the central control unit determines that the speech transcription meets the preset standard, and outputs the language text converted according to the language corresponding to each microphone to the corresponding microphone user for reading through the output unit; the first and second determination methods satisfy that the updated similarity evaluation value R1 is greater than the first-level preset similarity. The second and second determination method is that the central control unit determines that the speech transcription does not meet the preset standard. The reason for the speech transcription not meeting the preset standard is noise problem in the speech extraction process. The noise reduction method in the speech extraction process is determined according to the difference between the second-level preset similarity and the similarity evaluation value R. The second and second determination method satisfies that the updated similarity evaluation value R1 is less than or equal to the first-level preset similarity.

6. The speech-to-text multilingual transcription and switching method according to claim 3, characterized in that, Under the second cause determination method, the central control unit selects the second language with the second highest probability (excluding the preferred language) for re-speech transcription and re-determination.

7. The speech-to-text multilingual transcription and switching method according to claim 2, characterized in that, Under the third determination method, the central control unit records the difference between the secondary preset similarity and the similarity evaluation value R as the secondary similarity difference, and determines the noise reduction method in the language extraction process based on the secondary similarity difference, wherein: The first noise reduction method is that the central control unit uses a first power correction coefficient to correct the power of the filter to a first power; the first noise reduction method satisfies that the second-level similarity difference is greater than the first and second-level preset similarity difference preset in the central control unit; The second noise reduction method is that the central control unit uses a second power correction coefficient to correct the power of the filter to a second power; the second noise reduction method satisfies that the second-level similarity difference is less than or equal to the first second-level preset similarity difference and greater than or equal to the second second-level preset similarity difference preset in the central control unit, and the first second-level preset similarity difference is greater than the second second-level preset similarity difference; The third noise reduction method is that the central control unit uses a third power correction coefficient to correct the power of the filter to the third power; the third noise reduction method satisfies that the second-level similarity difference is less than the second-level preset similarity difference.

8. The speech-to-text multilingual transcription and switching method according to claim 7, characterized in that, After adjusting the filter power to the critical power, the central control unit issues a noise warning to reduce ambient noise.

9. The speech-to-text multilingual transcription and switching method according to claim 1, characterized in that, In step S1, the central control unit compares the special pronunciations in each language as characteristic sound waves with the collected sound wave information to determine the language type corresponding to the language collected by the microphone.