Methods, devices, media, and electronic equipment for dynamically determining the duration of speech segmentation
By dynamically updating the duration of speech segmentation, based on user voice and silence detection, and using historical duration datasets to adjust the duration of segmentation, the problem of inaccurate segmentation in voice interaction is solved, and the efficiency and accuracy of interaction are improved.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- CHINA FAW CO LTD
- Filing Date
- 2023-06-09
- Publication Date
- 2026-06-30
AI Technical Summary
In existing technologies, improper VAD interval settings can lead to abnormal or inefficient voice interaction and an inability to accurately segment voice input.
By acquiring user voice data, current sentence segment duration, and silence duration, the pending duration is dynamically updated and stored in a historical duration dataset. The current sentence segment duration is then adjusted using the historical duration dataset, dynamically based on individual speaking habits and speaking speed.
It enables dynamic adjustment of sentence segmentation duration based on individual differences, avoiding the impact of abnormal historical waiting time duration, and improving the accuracy and efficiency of voice interaction.
Smart Images

Figure CN116884400B_ABST
Abstract
Description
Technical Field
[0001] This application relates to the field of speech recognition technology, and more specifically, to a method, apparatus, medium, and electronic device for dynamically determining the duration of speech segmentation. Background Technology
[0002] Sentence segmentation is crucial for correctly interpreting the semantics of audio. For example, in the audio of "Sorry, no song could be played," a clear time interval (i.e., the time interval when VAD=No) exists between different Chinese characters is visible. If we assume the duration of VAD=No is 300ms, the audio can be divided into two sentences: "Sorry" and "No song could be played." If we assume the duration of VAD=No is 500ms, the audio can be identified as a single sentence: "Sorry, no song could be played."
[0003] However, in practical applications, setting the VAD interval too short may split a single voice input into multiple parts, leading to abnormal interactions. For example, the phrase "Navigate to Xizhimen" might be segmented into a single sentence due to a long interval between the input and the short VAD interval, even though "Xizhimen" is not a complete sentence. Conversely, setting the VAD interval too long would result in longer waiting times for each voice interaction, reducing the efficiency of voice interaction.
[0004] Therefore, this application provides a method for dynamically determining the duration of speech segmentation to solve the above-mentioned technical problems. Summary of the Invention
[0005] The purpose of this application is to provide a method, apparatus, medium, and electronic device for dynamically determining the duration of speech segmentation, which can solve at least one of the technical problems mentioned above. The specific solution is as follows:
[0006] According to a specific embodiment of this application, in a first aspect, this application provides a method for dynamically determining the duration of speech segmentation, including:
[0007] Get the user's voice recording, the current sentence duration, and the current pending time duration;
[0008] Perform silence detection on the user's voice and obtain the current silence duration;
[0009] The current pending time is updated based on the current silence duration, the current pending time, the current sentence segmentation duration, and the preset maximum sentence segmentation duration. The current pending time is then stored as a historical pending time in the historical duration dataset.
[0010] Obtain the current storage quantity from the historical duration dataset;
[0011] When the current storage quantity is less than or equal to a preset first storage quantity threshold, the current sentence segmentation duration is dynamically updated based on the historical duration dataset and the preset first sentence segmentation rule.
[0012] When the current storage quantity is greater than the preset first storage quantity threshold, the current sentence segmentation duration is dynamically updated based on the historical duration dataset and the preset second sentence segmentation rule.
[0013] Optionally, updating the current waiting time based on the current silence duration, the current waiting time, the current sentence segmentation duration, and the preset maximum sentence segmentation duration includes:
[0014] When the current silence duration is less than the current sentence break duration, and the current silence duration is greater than the current pending duration, the current pending duration is updated based on the current silence duration.
[0015] Optionally, the method further includes:
[0016] When the current silence duration is greater than the current sentence break duration, and the current silence duration is less than the preset maximum sentence break duration, the current pending duration is updated based on the current silence duration.
[0017] Optionally, the method further includes:
[0018] When the current silence duration is greater than the current sentence break duration, and the current silence duration is greater than or equal to the preset maximum sentence break duration, the current waiting time duration remains unchanged.
[0019] Optionally, when the current storage quantity is less than or equal to a preset first storage quantity threshold, dynamically updating the current sentence segmentation duration based on the historical duration dataset and a preset first sentence segmentation rule includes:
[0020] When the current storage quantity is less than or equal to a preset first storage quantity threshold, the maximum first historical waiting time is obtained based on the historical duration dataset.
[0021] The current sentence segmentation duration is dynamically updated based on the first historical pending duration.
[0022] Optionally, when the current storage quantity is greater than a preset first storage quantity threshold, dynamically updating the current sentence segmentation duration based on the historical duration dataset and a preset second sentence segmentation rule includes:
[0023] When the current storage quantity is greater than a preset first storage quantity threshold, a historical duration queue is obtained based on the historical duration dataset, arranged in order of duration.
[0024] Based on a preset high percentage, the second historical pending duration corresponding to the rank is obtained from the historical duration queue;
[0025] The current sentence segmentation duration is dynamically updated based on the second historical waiting time duration.
[0026] Optionally, after dynamically updating the current sentence segmentation duration, the following may also be included:
[0027] When the current storage quantity equals a preset second storage quantity threshold, the earliest historical pending time duration is deleted from the historical duration dataset, wherein the preset second storage quantity threshold is greater than a preset first storage quantity threshold.
[0028] According to a specific embodiment of this application, in a second aspect, this application provides a device for dynamically determining the duration of speech segmentation, comprising:
[0029] The first acquisition unit is used to acquire the user's voice, the current sentence segmentation duration, and the current pending time duration;
[0030] The second acquisition unit is used to perform silence detection on the user's voice and acquire the current silence duration.
[0031] The storage unit is used to update the current pending time based on the current silence duration, the current pending time duration, the current sentence break duration, and the preset maximum sentence break duration, and to store the current pending time as a historical pending time duration in the historical duration dataset;
[0032] The third acquisition unit is used to acquire the current storage quantity from the historical duration dataset;
[0033] The first update unit is used to dynamically update the current sentence segmentation duration based on the historical duration dataset and the preset first sentence segmentation rule when the current storage quantity is less than or equal to a preset first storage quantity threshold.
[0034] The second update unit is used to dynamically update the current sentence segmentation duration based on the historical duration dataset and the preset second sentence segmentation rule when the current storage quantity is greater than the preset first storage quantity threshold.
[0035] Optionally, updating the current waiting time based on the current silence duration, the current waiting time, the current sentence segmentation duration, and the preset maximum sentence segmentation duration includes:
[0036] When the current silence duration is less than the current sentence break duration, and the current silence duration is greater than the current pending duration, the current pending duration is updated based on the current silence duration.
[0037] Optionally, updating the current waiting time based on the current silence duration, the current waiting time, the current sentence segmentation duration, and the preset maximum sentence segmentation duration includes:
[0038] When the current silence duration is greater than the current sentence break duration, and the current silence duration is less than the preset maximum sentence break duration, the current pending duration is updated based on the current silence duration.
[0039] Optionally, updating the current waiting time based on the current silence duration, the current waiting time, the current sentence segmentation duration, and the preset maximum sentence segmentation duration includes:
[0040] When the current silence duration is greater than the current sentence break duration, and the current silence duration is greater than or equal to the preset maximum sentence break duration, the current waiting time duration remains unchanged.
[0041] Optionally, when the current storage quantity is less than or equal to a preset first storage quantity threshold, dynamically updating the current sentence segmentation duration based on the historical duration dataset and a preset first sentence segmentation rule includes:
[0042] When the current storage quantity is less than or equal to a preset first storage quantity threshold, the maximum first historical waiting time is obtained based on the historical duration dataset.
[0043] The current sentence segmentation duration is dynamically updated based on the first historical pending duration.
[0044] Optionally, when the current storage quantity is greater than a preset first storage quantity threshold, dynamically updating the current sentence segmentation duration based on the historical duration dataset and a preset second sentence segmentation rule includes:
[0045] When the current storage quantity is greater than a preset first storage quantity threshold, a historical duration queue is obtained based on the historical duration dataset, arranged in order of duration.
[0046] Based on a preset high percentage, the second historical pending duration corresponding to the rank is obtained from the historical duration queue;
[0047] The current sentence segmentation duration is dynamically updated based on the second historical waiting time duration.
[0048] Optionally, after dynamically updating the current sentence segmentation duration, the following may also be included:
[0049] The deletion unit is used to delete the earliest historical pending duration from the historical duration dataset when the current storage quantity is equal to a preset second storage quantity threshold, wherein the preset second storage quantity threshold is greater than a preset first storage quantity threshold.
[0050] According to a specific embodiment of this application, in a third aspect, this application provides a computer-readable storage medium having a computer program stored thereon, wherein when the program is executed by a processor, it implements the method for dynamically determining the duration of speech segmentation as described in any of the preceding claims.
[0051] According to a specific embodiment of this application, in a fourth aspect, this application provides an electronic device, including: one or more processors; and a storage device for storing one or more programs, which, when executed by the one or more processors, cause the one or more processors to implement the dynamic determination method for speech segmentation duration as described in any of the preceding claims.
[0052] Compared with the prior art, the above-described solutions of this application have at least the following beneficial effects:
[0053] This application provides a method, apparatus, medium, and electronic device for dynamically determining the duration of speech segmentation. This application updates the current pending duration based on the user's current silence duration, the previously determined current pending duration, the current segmentation duration, and a preset maximum segmentation duration. The current pending duration is stored as a historical pending duration in a historical duration dataset. The historical duration dataset stores the historical pending durations corresponding to each silence during the silence detection process of the user's speech. By dynamically updating the current segmentation duration using multiple historical pending durations stored in the historical duration dataset, the influence of abnormal historical pending durations on the determination of the current segmentation duration is avoided. Furthermore, the current segmentation duration can be dynamically updated according to each individual's speaking habits and speed, thereby meeting the needs of segmenting poetry. Attached Figure Description
[0054] Figure 1 A flowchart is shown for a method for dynamically determining the duration of speech segmentation according to an embodiment of this application;
[0055] Figure 2 A unit block diagram of a device for dynamically determining the duration of speech segmentation according to an embodiment of this application is shown. Detailed Implementation
[0056] To make the objectives, technical solutions, and advantages of this application clearer, the application will be further described in detail below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of this application, and not all embodiments. Based on the embodiments in this application, all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of this application.
[0057] The terminology used in the embodiments of this application is for the purpose of describing particular embodiments only and is not intended to limit the application. The singular forms “a,” “said,” and “the” used in the embodiments of this application and the appended claims are also intended to include the plural forms, and “multiple” generally includes at least two unless the context clearly indicates otherwise.
[0058] It should be understood that the term "and / or" used in this article is merely a description of the relationship between related objects, indicating that three relationships can exist. For example, A and / or B can represent: A existing alone, A and B existing simultaneously, and B existing alone. Additionally, the character " / " in this article generally indicates that the preceding and following related objects have an "or" relationship.
[0059] It should be understood that although the terms first, second, third, etc., may be used in the embodiments of this application, these descriptions should not be limited to these terms. These terms are only used to distinguish the descriptions. For example, first may also be referred to as second without departing from the scope of the embodiments of this application, and similarly, second may also be referred to as first.
[0060] Depending on the context, the words “if” or “suppose” as used here can be interpreted as “when” or “in response to determination” or “in response to detection.” Similarly, depending on the context, the phrases “if determination” or “if detection (of the stated condition or event)” can be interpreted as “when determination” or “in response to determination” or “when detection (of the stated condition or event)” or “in response to detection (of the stated condition or event).”
[0061] It should also be noted that the terms "comprising," "including," or any other variations thereof are intended to cover non-exclusive inclusion, such that an article or device that comprises a list of elements includes not only those elements but also other elements not expressly listed, or elements inherent to such an article or device. Without further limitation, an element defined by the phrase "comprising one..." does not exclude the presence of other identical elements in the article or device that includes said element.
[0062] It should be noted that any symbols and / or numbers present in the specification that are not marked in the accompanying drawings are not reference numerals.
[0063] The optional embodiments of this application are described in detail below with reference to the accompanying drawings.
[0064] The embodiments provided in this application are embodiments of a method for dynamically determining the duration of speech segmentation.
[0065] The following is combined Figure 1 The embodiments of this application will be described in detail.
[0066] Step S101: Obtain the user's voice, the current sentence duration, and the current pending duration.
[0067] This application embodiment can dynamically adjust the sentence segmentation duration based on the rhythm and characteristics of the pre-recorded audio or video while playing it, ensuring accurate sentence segmentation based on the dynamically adjusted segmentation duration. For example, this applies to teaching audio or video pre-recorded in a classroom. It can also dynamically adjust the sentence segmentation duration based on the rhythm and characteristics of the audio while recording on-site audio or video, ensuring accurate real-time sentence segmentation based on the dynamically adjusted segmentation duration. For example, this applies to teaching audio or video collected on-site in a classroom. Of course, this application embodiment is not limited to these embodiments.
[0068] The user's voice can be the voice of a user conversation, such as the voice of a multi-person chat; it can also be the voice of a human-computer interaction, such as the voice of a driver interacting with a vehicle system; it can also be the voice of a speaker; this application is not limited to these.
[0069] Sentence segmentation duration is used to distinguish the time boundary between one complete semantic segment in speech. Since everyone's speaking habits and speed differ, sentence segmentation duration also varies. Fixed sentence segmentation durations can only suit a specific group of people and cannot meet the needs of a wide range of individuals. Therefore, this application provides a dynamic method for determining dynamic speech segmentation duration, which can dynamically adjust the sentence segmentation duration according to the characteristics of various groups of people, thereby meeting the actual needs of sentence segmentation.
[0070] The current punctuation duration is the dynamically determined punctuation duration during the process of segmenting speech. Each time a silence is encountered in the speech, the current punctuation duration is checked to determine if it needs to be updated.
[0071] The pending duration is the intermediate duration for determining the sentence segmentation duration, and it serves as a reference duration for generating the sentence segmentation duration.
[0072] The current pending duration is the dynamically determined pending duration during the process of segmenting speech. Each time silence is encountered in the speech, the current pending duration is checked to determine if it needs to be updated.
[0073] Step S102: Perform silence detection on the user's voice and obtain the current silence duration.
[0074] This application embodiment uses silence in speech as a feature to dynamically update the current sentence segment duration.
[0075] The term "silence" does not mean the complete absence of sound in speech. Rather, it refers to the sound waves whose average amplitude is less than a preset silence amplitude threshold within a preset duration.
[0076] The silence duration refers to the duration of silence.
[0077] Step S103: Update the current pending time length based on the current silence duration, the current pending time length, the current sentence segmentation duration, and the preset maximum sentence segmentation duration, and store the current pending time length as a historical pending time length in the historical duration dataset.
[0078] The historical duration dataset can be a data table in a database, a spreadsheet, a text file, or a data structure built in memory.
[0079] The historical duration dataset stores the historical pending duration corresponding to each instance of silence during the silence detection process of the user's voice. This embodiment dynamically updates the current sentence segmentation duration using multiple historical pending durations in the historical duration dataset, avoiding the influence of abnormal historical pending durations on determining the current sentence segmentation duration.
[0080] In some specific embodiments, updating the current pending time based on the current silence duration, the current pending time duration, the current sentence segmentation duration, and the preset maximum sentence segmentation duration, and storing the current pending time duration as a historical pending time in the historical duration dataset, includes the following steps:
[0081] Step S103a: When the current silence duration is less than the current sentence break duration and the current silence duration is greater than the current pending duration, update the current pending duration based on the current silence duration.
[0082] In this specific embodiment, from the moment silence is detected, if the silence ends within the current sentence segmentation duration, the current silence duration is recorded. When the current silence duration is greater than the current pending time duration, the current pending time duration is updated using the current silence duration. For example, when the current silence duration T = 10ms is less than the current sentence segmentation duration T1 = 20ms, and the current silence duration T = 10ms is greater than the current pending time duration T0 = 8ms, the current pending time duration is updated based on the current silence duration T = 10ms, i.e., T0 = 10ms.
[0083] To avoid the current pending time being too short, which could lead to an excessively short sentence segmentation and result in semantic errors, this specific embodiment adjusts the current pending time to a higher duration, ensuring it is at an appropriate length.
[0084] In some other specific embodiments, the method further includes the following steps:
[0085] Step S103b: When the current silence duration is greater than the current sentence break duration and the current silence duration is less than the preset maximum sentence break duration, update the current pending duration based on the current silence duration.
[0086] In this specific embodiment, from the moment silence is detected, if the silence has not ended within the current sentence break duration but has ended within the preset maximum sentence break duration, the current silence duration is recorded, and the current pending time duration is updated using the current silence duration. For example, when the current silence duration T = 25ms is greater than the current sentence break duration T1 = 20ms, and the current silence duration T = 25ms is less than the preset maximum sentence break duration T2 = 3000ms, the current pending time duration is updated based on the current silence duration T = 25ms, i.e., T0 = 25ms.
[0087] To avoid an excessively long pending time, which could lead to multiple sentences being misidentified as a single sentence and resulting in semantic errors, this specific embodiment adjusts the currently generated pending time to a shorter duration, ensuring it is at an appropriate length.
[0088] In some other specific embodiments, the method further includes the following steps:
[0089] Step S103c: When the current silence duration is greater than the current sentence break duration, and the current silence duration is greater than or equal to the preset maximum sentence break duration, the current waiting time duration remains unchanged.
[0090] For example, when the current silence duration T = 3500ms is greater than the current sentence break duration T1 = 20ms, and the current silence duration T = 3500ms is greater than or equal to the preset maximum sentence break duration T2 = 3000ms, the current waiting duration T0 = 25ms remains unchanged.
[0091] When the current silence duration is greater than the current sentence segmentation duration, and the current silence duration is greater than or equal to the preset maximum sentence segmentation duration, it can be understood that the current silence duration is the end information of the next round of user voice. Updating the current pending duration at this time is meaningless for correctly determining the current sentence segmentation duration; on the contrary, it causes distortion of the current sentence segmentation duration. Therefore, this specific embodiment does not update the current sentence segmentation duration.
[0092] Step S104: Obtain the current storage quantity from the historical duration dataset.
[0093] For example, if 10,000 historical pending durations are stored in the historical duration dataset, then the current storage quantity = 10,000.
[0094] Step S105: When the current storage quantity is less than or equal to a preset first storage quantity threshold, the current sentence segmentation duration is dynamically updated based on the historical duration dataset and the preset first sentence segmentation rule.
[0095] In some specific embodiments, when the current storage quantity is less than or equal to a preset first storage quantity threshold, dynamically updating the current sentence segmentation duration based on a historical duration dataset and a preset first sentence segmentation rule includes the following steps:
[0096] Step S105-1: When the current storage quantity is less than or equal to a preset first storage quantity threshold, obtain the maximum first historical waiting time based on the historical duration dataset.
[0097] Step S105-2: Dynamically update the current sentence segmentation duration based on the first historical waiting time duration.
[0098] For example, if the preset first storage quantity threshold is 100, the current storage quantity is 80, and 80 historical pending durations are stored in the historical duration dataset, if the largest historical pending duration (i.e. the first historical pending duration) is 300ms, then the current sentence segmentation duration is 300ms.
[0099] In this specific embodiment, since the number of historical pending duration samples in the historical duration dataset is too small, the maximum historical pending duration is used to update the current sentence segmentation duration. Priority is given to ensuring the semantic integrity after sentence segmentation, avoiding excessively short segments that could cause semantic errors.
[0100] Step S106: When the current storage quantity is greater than the preset first storage quantity threshold, the current sentence segmentation duration is dynamically updated based on the historical duration dataset and the preset second sentence segmentation rule.
[0101] In the embodiments of this application, the order of steps S105 and S106 is as follows.
[0102] In some specific embodiments, the step of dynamically updating the current sentence segmentation duration based on the historical duration dataset and the preset second sentence segmentation rule when the current storage quantity is greater than a preset first storage quantity threshold includes the following steps:
[0103] Step S106-1: When the current storage quantity is greater than the preset first storage quantity threshold, obtain a historical duration queue arranged in order of duration based on the historical duration dataset.
[0104] The historical duration queue arranged in order of duration can be understood as either generating the historical duration queue in ascending order or in descending order.
[0105] Step S106-2: Obtain the second historical pending duration corresponding to the rank from the historical duration queue based on the preset high percentage.
[0106] The preset high-ranking percentage refers to the percentage of the ranking of the second historical pending duration relative to the total ranking of all historical pending durations in the historical duration queue, arranged from low to high. For example, if the preset high-ranking percentage is 99%, and the total number of historical pending durations in the historical duration queue is 10,000, then when these 10,000 historical pending durations are arranged from low to high, the percentage of the second historical pending duration relative to the total ranking of all historical pending durations in the historical duration queue is 99%. That is, the historical pending duration ranked at position 10,000 × 99% = 9,900 in the historical duration queue is the second historical pending duration.
[0107] Step S106-3: Dynamically update the current sentence segmentation duration based on the second historical waiting time duration.
[0108] For example, continuing the above example, if the historical pending duration at position 9900 in the historical duration queue is 200ms, then the current sentence segmentation duration is 200ms.
[0109] In some specific embodiments, after dynamically updating the current sentence segmentation duration, the following steps are also included:
[0110] When the current storage quantity equals a preset second storage quantity threshold, the earliest historical pending time duration is deleted from the historical duration dataset, wherein the preset second storage quantity threshold is greater than a preset first storage quantity threshold.
[0111] The earliest historical pending duration is the first historical pending duration stored in the historical duration dataset.
[0112] For example, if the preset second storage quantity threshold is 10,000, and the current storage quantity is equal to the preset second storage quantity threshold, which is also 10,000, then the earliest historical pending duration will be deleted from the historical duration dataset. This ensures that after the next historical pending duration is stored, the storage quantity in the historical duration dataset will always remain at 10,000, which guarantees both the representativeness of the current sentence segmentation duration and the efficiency of dynamically determining the current sentence segmentation duration.
[0113] This application embodiment updates the current pending time length based on the user's current silence duration, the previously determined current pending time length, the current sentence segmentation duration, and the preset maximum sentence segmentation duration. The current pending time length is stored as a historical pending time length in a historical duration dataset. The historical duration dataset stores the historical pending time lengths corresponding to each silence during the silence detection process of the user's voice. By dynamically updating the current sentence segmentation duration using multiple historical pending time lengths stored in the historical duration dataset, the influence of abnormal historical pending time lengths on determining the current sentence segmentation duration is avoided. Furthermore, the current sentence segmentation duration can be dynamically updated according to each individual's speaking habits and speed, thereby meeting the needs of punctuating poetry.
[0114] This application also provides apparatus embodiments that follow the above embodiments, for implementing the method steps described in the above embodiments. The interpretation of the same names is the same as that in the above embodiments, and they have the same technical effects as those in the above embodiments, so they will not be repeated here.
[0115] like Figure 2 As shown, this application provides a dynamic determination device 200 for the duration of speech segmentation, comprising:
[0116] The first acquisition unit 201 is used to acquire the user's voice, the current sentence segmentation duration, and the current pending duration;
[0117] The second acquisition unit 202 is used to perform silence detection on the user's voice and acquire the current silence duration;
[0118] The storage unit 203 is used to update the current waiting time based on the current silence duration, the current waiting time duration, the current sentence break duration and the preset maximum sentence break duration, and to store the current waiting time duration as a historical waiting time duration in the historical duration dataset.
[0119] The third acquisition unit 204 is used to acquire the current storage quantity from the historical duration dataset;
[0120] The first update unit 205 is used to dynamically update the current sentence segmentation duration based on the historical duration dataset and the preset first sentence segmentation rule when the current storage quantity is less than or equal to a preset first storage quantity threshold.
[0121] The second update unit 206 is used to dynamically update the current sentence segmentation duration based on the historical duration dataset and the preset second sentence segmentation rule when the current storage quantity is greater than the preset first storage quantity threshold.
[0122] Optionally, updating the current waiting time based on the current silence duration, the current waiting time, the current sentence segmentation duration, and the preset maximum sentence segmentation duration includes:
[0123] When the current silence duration is less than the current sentence break duration, and the current silence duration is greater than the current pending duration, the current pending duration is updated based on the current silence duration.
[0124] Optionally, updating the current waiting time based on the current silence duration, the current waiting time, the current sentence segmentation duration, and the preset maximum sentence segmentation duration includes:
[0125] When the current silence duration is greater than the current sentence break duration, and the current silence duration is less than the preset maximum sentence break duration, the current pending duration is updated based on the current silence duration.
[0126] Optionally, updating the current waiting time based on the current silence duration, the current waiting time, the current sentence segmentation duration, and the preset maximum sentence segmentation duration includes:
[0127] When the current silence duration is greater than the current sentence break duration, and the current silence duration is greater than or equal to the preset maximum sentence break duration, the current waiting time duration remains unchanged.
[0128] Optionally, when the current storage quantity is less than or equal to a preset first storage quantity threshold, dynamically updating the current sentence segmentation duration based on the historical duration dataset and a preset first sentence segmentation rule includes:
[0129] When the current storage quantity is less than or equal to a preset first storage quantity threshold, the maximum first historical waiting time is obtained based on the historical duration dataset.
[0130] The current sentence segmentation duration is dynamically updated based on the first historical pending duration.
[0131] Optionally, when the current storage quantity is greater than a preset first storage quantity threshold, dynamically updating the current sentence segmentation duration based on the historical duration dataset and a preset second sentence segmentation rule includes:
[0132] When the current storage quantity is greater than a preset first storage quantity threshold, a historical duration queue is obtained based on the historical duration dataset, arranged in order of duration.
[0133] Based on a preset high percentage, the second historical pending duration corresponding to the rank is obtained from the historical duration queue;
[0134] The current sentence segmentation duration is dynamically updated based on the second historical waiting time duration.
[0135] Optionally, after dynamically updating the current sentence segmentation duration, the following may also be included:
[0136] The deletion unit is used to delete the earliest historical pending duration from the historical duration dataset when the current storage quantity is equal to a preset second storage quantity threshold, wherein the preset second storage quantity threshold is greater than a preset first storage quantity threshold.
[0137] This application embodiment updates the current pending time length based on the user's current silence duration, the previously determined current pending time length, the current sentence segmentation duration, and the preset maximum sentence segmentation duration. The current pending time length is stored as a historical pending time length in a historical duration dataset. The historical duration dataset stores the historical pending time lengths corresponding to each silence during the silence detection process of the user's voice. By dynamically updating the current sentence segmentation duration using multiple historical pending time lengths stored in the historical duration dataset, the influence of abnormal historical pending time lengths on determining the current sentence segmentation duration is avoided. Furthermore, the current sentence segmentation duration can be dynamically updated according to each individual's speaking habits and speed, thereby meeting the needs of punctuating poetry.
[0138] This embodiment provides an electronic device, comprising: at least one processor; and a memory communicatively connected to the at least one processor; wherein the memory stores instructions executable by the at least one processor, the instructions being executed by the at least one processor to enable the at least one processor to perform the method steps described in the above embodiment.
[0139] This application provides a non-volatile computer storage medium storing computer-executable instructions that can perform the steps described in the above embodiments.
[0140] Finally, it should be noted that the various embodiments in this specification are described in a progressive manner, with each embodiment focusing on its differences from other embodiments. Similar or identical parts between embodiments can be referred to interchangeably. For the systems or apparatus disclosed in the embodiments, since they correspond to the methods disclosed in the embodiments, the descriptions are relatively simple, and relevant parts can be referred to the method section.
[0141] The above embodiments are only used to illustrate the technical solutions of this application, and are not intended to limit them. Although this application has been described in detail with reference to the foregoing embodiments, those skilled in the art should understand that modifications can still be made to the technical solutions described in the foregoing embodiments, or equivalent substitutions can be made to some of the technical features. Such modifications or substitutions do not cause the essence of the corresponding technical solutions to deviate from the spirit and scope of the technical solutions of the embodiments of this application.
Claims
1. A method for dynamically determining the duration of speech segmentation, characterized in that, include: Get the user's voice recording, the current sentence duration, and the current pending time duration; Perform silence detection on the user's voice and obtain the current silence duration; The current pending time is updated based on the current silence duration, the current pending time, the current sentence segmentation duration, and the preset maximum sentence segmentation duration. The current pending time is then stored as a historical pending time in the historical duration dataset. Obtain the current storage quantity from the historical duration dataset; When the current storage quantity is less than or equal to a preset first storage quantity threshold, the current sentence segmentation duration is dynamically updated based on the historical duration dataset and the preset first sentence segmentation rule. When the current storage quantity is greater than a preset first storage quantity threshold, the current sentence segmentation duration is dynamically updated based on the historical duration dataset and the preset second sentence segmentation rule; The step of updating the current waiting time based on the current silence duration, the current waiting time duration, the current sentence segmentation duration, and the preset maximum sentence segmentation duration includes: When the current silence duration is less than the current sentence break duration, and the current silence duration is greater than the current pending duration, the current pending duration is updated based on the current silence duration; The method further includes: When the current silence duration is greater than the current sentence break duration, and the current silence duration is less than the preset maximum sentence break duration, the current waiting time duration is updated based on the current silence duration. The method further includes: When the current silence duration is greater than the current sentence break duration, and the current silence duration is greater than or equal to the preset maximum sentence break duration, the current waiting time duration remains unchanged.
2. The method according to claim 1, characterized in that, When the current storage quantity is less than or equal to a preset first storage quantity threshold, the current sentence segmentation duration is dynamically updated based on the historical duration dataset and the preset first sentence segmentation rule, including: When the current storage quantity is less than or equal to a preset first storage quantity threshold, the maximum first historical waiting time is obtained based on the historical duration dataset. The current sentence segmentation duration is dynamically updated based on the first historical pending duration.
3. The method according to claim 1, characterized in that, When the current storage quantity exceeds a preset first storage quantity threshold, dynamically updating the current sentence segmentation duration based on the historical duration dataset and a preset second sentence segmentation rule includes: When the current storage quantity is greater than a preset first storage quantity threshold, a historical duration queue is obtained based on the historical duration dataset, arranged in order of duration. Based on a preset high percentage, the second historical pending duration corresponding to the rank is obtained from the historical duration queue; The current sentence segmentation duration is dynamically updated based on the second historical waiting time duration.
4. The method according to claim 1, characterized in that, After dynamically updating the current sentence segmentation duration, it also includes: When the current storage quantity equals a preset second storage quantity threshold, the earliest historical pending time duration is deleted from the historical duration dataset, wherein the preset second storage quantity threshold is greater than a preset first storage quantity threshold.
5. A device for dynamically determining the duration of speech segmentation, characterized in that, Used to perform the dynamic determination method for speech segmentation duration as described in any one of claims 1-4; The dynamic determination device for speech segmentation duration includes: The first acquisition unit is used to acquire the user's voice, the current sentence segmentation duration, and the current pending time duration; The second acquisition unit is used to perform silence detection on the user's voice and acquire the current silence duration. The storage unit is used to update the current pending time based on the current silence duration, the current pending time duration, the current sentence break duration, and the preset maximum sentence break duration, and to store the current pending time duration as a historical pending time duration in the historical duration dataset; The third acquisition unit is used to acquire the current storage quantity from the historical duration dataset; The first update unit is used to dynamically update the current sentence segmentation duration based on the historical duration dataset and the preset first sentence segmentation rule when the current storage quantity is less than or equal to the preset first storage quantity threshold. The second update unit is used to dynamically update the current sentence segmentation duration based on the historical duration dataset and the preset second sentence segmentation rule when the current storage quantity is greater than the preset first storage quantity threshold.
6. A computer-readable storage medium having a computer program stored thereon, characterized in that, When the program is executed by the processor, it implements the method as described in any one of claims 1 to 4.
7. An electronic device, characterized in that, include: One or more processors; Storage device for storing one or more programs. Wherein, when the one or more programs are executed by the one or more processors, the one or more processors implement the method as described in any one of claims 1 to 4.