Method and device for determining open set voiceprint recognition threshold, and electronic equipment
By calculating the similarity between the voiceprint feature vectors of known and unknown classes and the average voiceprint vectors of the categories, the threshold range for voiceprint recognition is determined and the threshold with the highest accuracy is selected. This solves the problem of low computational efficiency in open set voiceprint recognition and achieves efficient and accurate voiceprint recognition.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- CHINA TELECOM CORP LTD
- Filing Date
- 2023-09-04
- Publication Date
- 2026-06-23
Smart Images

Figure CN117174093B_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the field of voiceprint recognition technology, and in particular to a method, apparatus, and electronic device for determining an open-set voiceprint recognition threshold. Background Technology
[0002] Voiceprint recognition refers to identifying the speaker of a given audio recording, given an existing voiceprint feature database. It is generally divided into closed-set voiceprint recognition (CSR) and open-set voiceprint recognition (OSR) based on the scope of speaker identification. Closed-set voiceprint recognition simply selects the class with the highest similarity score from known voiceprint categories; while open-set voiceprint recognition first needs to distinguish whether it is a new voiceprint category before further voiceprint recognition or expansion of the voiceprint database.
[0003] Therefore, for open-set voiceprint recognition, an open-set voiceprint recognition threshold is needed to determine whether the test audio originates from a new voiceprint category, and the appropriateness of the preset threshold directly affects the accuracy of open-set voiceprint recognition. However, the open-set voiceprint recognition threshold needs to be obtained by analyzing the representation and distribution of known and unknown voiceprint data in a high-dimensional space, which results in a large computational load and long computation time, reducing the computational efficiency of the open-set voiceprint recognition threshold. Furthermore, the audio recording environments for test data and training data are different, so the optimal open-set voiceprint recognition threshold obtained based on the training data will often fail in different test scenarios, requiring continuous adjustment according to the application scenario, further reducing the computational efficiency of the open-set voiceprint recognition threshold. Summary of the Invention
[0004] This invention provides a method, apparatus, and electronic device for determining an open-set voiceprint recognition threshold, in order to solve the problem of low computational efficiency caused by existing open-set voiceprint recognition threshold calculation methods.
[0005] In a first aspect, embodiments of the present invention provide a method for determining an open-set voiceprint recognition threshold, the method comprising:
[0006] Obtain the known class test voiceprint feature vectors corresponding to the known class voiceprint datasets and the unknown class voiceprint feature vectors corresponding to the unknown class voiceprint datasets;
[0007] Calculate the first similarity between the test voiceprint feature vector of each known class and the mean voiceprint vector of each class, and calculate the second similarity between the voiceprint feature vector of each unknown class and the mean voiceprint vector of each class.
[0008] Based on the first similarity and the second similarity, the voiceprint recognition threshold range is determined;
[0009] Calculate the accuracy of each voiceprint recognition threshold within the aforementioned voiceprint recognition threshold range;
[0010] Based on the accuracy of each voiceprint recognition threshold, a target voiceprint recognition threshold is determined from the range of the voiceprint recognition thresholds.
[0011] Secondly, embodiments of the present invention also provide a device for determining an open-set voiceprint recognition threshold, the device comprising:
[0012] The first acquisition module is used to acquire the known class test voiceprint feature vectors corresponding to the known class voiceprint dataset and the unknown class voiceprint feature vectors corresponding to the unknown class voiceprint dataset.
[0013] The first calculation module is used to calculate the first similarity between the voiceprint feature vector of each known class and the mean voiceprint vector of each class, and to calculate the second similarity between the voiceprint feature vector of each unknown class and the mean voiceprint vector of each class.
[0014] The first determining module is used to determine the voiceprint recognition threshold range based on the first similarity and the second similarity;
[0015] The second calculation module is used to calculate the accuracy of each voiceprint recognition threshold within the range of the voiceprint recognition threshold.
[0016] The second determining module is used to determine the target voiceprint recognition threshold from the range of the voiceprint recognition thresholds based on the accuracy of each voiceprint recognition threshold.
[0017] Thirdly, embodiments of the present invention also provide an electronic device, including a memory, a transceiver, and a processor:
[0018] The memory is used to store computer programs; the transceiver is used to send and receive data under the control of the processor; the processor is used to read the computer programs in the memory and execute the method for determining the open set voiceprint recognition threshold as described above.
[0019] Fourthly, embodiments of the present invention also provide a processor-readable storage medium storing a computer program for causing the processor to execute the above-described method for determining the open-set voiceprint recognition threshold.
[0020] In the above embodiments of the present invention, by obtaining the known class test voiceprint feature vectors corresponding to the known class voiceprint dataset and the unknown class voiceprint feature vectors corresponding to the unknown class voiceprint dataset, a first similarity is calculated between each known class test voiceprint feature vector and the mean value of the voiceprint vectors of each category, and a second similarity is calculated between each unknown class voiceprint feature vector and the mean value of the voiceprint vectors of each category. Based on the first similarity and the second similarity, a voiceprint recognition threshold range is determined, the accuracy of each voiceprint recognition threshold within the voiceprint recognition threshold range is calculated, and a target voiceprint recognition threshold is determined from the voiceprint recognition threshold range based on the accuracy of each voiceprint recognition threshold.
[0021] This method calculates the similarity between the known class test voiceprint feature vectors of the known class voiceprint dataset and the unknown class voiceprint feature vectors of the unknown class voiceprint dataset and the mean value of the voiceprint feature vectors of each class. By analyzing the similarity distribution between similar and dissimilar classes, the voiceprint recognition threshold range is first determined. Then, the target voiceprint recognition threshold is determined by the accuracy of the voiceprint recognition threshold within the range. This method achieves the clustering of voiceprint features in high-dimensional space, avoids similarity calculation for all voiceprint feature vectors, greatly reduces the amount of computation, improves the speed of similarity distribution analysis, and improves computational efficiency. Attached Figure Description
[0022] To more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings used in the description of the embodiments of the present invention will be briefly introduced below. Obviously, the drawings described below are only some embodiments of the present invention. For those skilled in the art, other drawings can be obtained based on these drawings without creative effort.
[0023] Figure 1 A flowchart illustrating the steps of the method for determining the open set voiceprint recognition threshold provided in an embodiment of the present invention;
[0024] Figure 2 A schematic diagram of the distribution map of known voiceprint feature vectors and the distribution map of unknown voiceprint feature vectors provided in the embodiments of the present invention;
[0025] Figure 3 This is a graph showing the relationship between voiceprint recognition threshold and accuracy provided in an embodiment of the present invention.
[0026] Figure 4 A flowchart illustrating the specific steps of the method for determining the open set voiceprint recognition threshold provided in this embodiment of the invention;
[0027] Figure 5 This is a structural block diagram of the device for determining the open set voiceprint recognition threshold provided in an embodiment of the present invention;
[0028] Figure 6This is a structural block diagram of an electronic device provided in an embodiment of the present invention. Detailed Implementation
[0029] In this embodiment of the invention, the term "and / or" describes the relationship between associated objects, indicating that three relationships can exist. For example, A and / or B can represent: A existing alone, A and B existing simultaneously, or B existing alone. The character " / " generally indicates that the preceding and following associated objects have an "or" relationship.
[0030] In the embodiments of this application, the term "multiple" refers to two or more, and other quantifiers are similar.
[0031] The technical solutions of the embodiments of this application will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only a part of the embodiments of this application, and not all of the embodiments. Based on the embodiments of this application, all other embodiments obtained by those of ordinary skill in the art without creative effort are within the scope of protection of this application.
[0032] Specifically, embodiments of the present invention provide a method for determining the open-set voiceprint recognition threshold, such as... Figure 1 As shown, the specific steps may include the following:
[0033] Step 101: Obtain the known class test voiceprint feature vector corresponding to the known class voiceprint dataset and the unknown class voiceprint feature vector corresponding to the unknown class voiceprint dataset.
[0034] In step 101 above, if it is necessary to obtain the voiceprint recognition threshold used to distinguish whether a voiceprint is a new category in the open-set voiceprint recognition task, then it is necessary to obtain voiceprint data from the voiceprint data test set. This voiceprint data test set includes known class test voiceprint datasets and unknown class voiceprint datasets. Furthermore, it is necessary to obtain the known class test voiceprint feature vectors corresponding to the known class test voiceprint datasets in the voiceprint data test set, and also to obtain the unknown class voiceprint feature vectors corresponding to the unknown class voiceprint datasets in the voiceprint data test set.
[0035] It is understandable that the known class test voiceprint feature vector corresponding to the known class voiceprint dataset refers to the known class test voiceprint feature vector corresponding to each known class test voiceprint data in the known class voiceprint dataset. The unknown class voiceprint feature vector corresponding to the unknown class voiceprint dataset refers to the unknown class voiceprint feature vector corresponding to each unknown class voiceprint data in the unknown class voiceprint dataset.
[0036] It should be noted that a known class voiceprint dataset refers to a collection of voiceprint data whose category is already known. An unknown class voiceprint dataset refers to a collection of voiceprint data whose category is unknown.
[0037] Step 102: Calculate the first similarity between the test voiceprint feature vector of each known class and the mean voiceprint vector of each class, and calculate the second similarity between the voiceprint feature vector of each unknown class and the mean voiceprint vector of each class.
[0038] Specifically, before step 102, it is necessary to obtain the mean value of the voiceprint vector corresponding to each category, and use the mean value of the voiceprint vector as the high-dimensional feature representation of that category.
[0039] After step 101, the similarity between the test voiceprint feature vector of each known class and the mean voiceprint vector of each class is calculated, thus obtaining multiple similarities, which are referred to as multiple first similarities. Furthermore, the similarity between the voiceprint feature vector of each unknown class and the mean voiceprint vector of each class is calculated, thus obtaining multiple similarities, which are referred to as multiple second similarities.
[0040] It is understandable that the first similarity score can be obtained by multiplying the number of known class test voiceprint feature vectors by the average number of voiceprint vectors. The second similarity score can be obtained by multiplying the number of unknown class voiceprint feature vectors by the average number of voiceprint vectors.
[0041] Step 103: Determine the voiceprint recognition threshold range based on the first similarity and the second similarity.
[0042] Specifically, by measuring the first similarity between the voiceprint feature vector of each known class and the mean voiceprint vector of each class, and the second similarity between the voiceprint feature vector of each unknown class and the mean voiceprint vector of each class, the range of similarity values can be determined. This range of similarity values is then used as the range of voiceprint recognition threshold values, i.e., the voiceprint recognition threshold range.
[0043] Step 104: Calculate the accuracy of each voiceprint recognition threshold within the range of the voiceprint recognition threshold.
[0044] Specifically, the accuracy of each voiceprint recognition threshold within the range of voiceprint recognition thresholds is calculated. This allows us to determine the accuracy of the voiceprint recognition result obtained by performing open-set voiceprint recognition using each voiceprint recognition threshold within the range of voiceprint recognition thresholds.
[0045] Step 105: Determine the target voiceprint recognition threshold from the range of the voiceprint recognition thresholds based on the accuracy of each voiceprint recognition threshold.
[0046] Specifically, based on the accuracy corresponding to each voiceprint recognition threshold within the voiceprint recognition threshold range, the optimal target voiceprint recognition threshold can be determined from the voiceprint recognition threshold range. Open set voiceprint recognition can be performed through this target voiceprint recognition threshold, which can make the voiceprint recognition more accurate.
[0047] In the above embodiments of the present invention, based on the known class test voiceprint feature vectors of the known class voiceprint dataset and the unknown class voiceprint feature vectors of the unknown class voiceprint dataset, the similarity between them and the mean value of the voiceprint feature vectors of each class is calculated. By analyzing the similarity distribution between the same and different classes, the voiceprint recognition threshold range is first determined, and then the target voiceprint recognition threshold is determined by the accuracy of the voiceprint recognition threshold within the range. This method realizes the clustering of voiceprint features in high-dimensional space, avoids similarity calculation for all voiceprint feature vectors, greatly reduces the amount of computation, improves the speed of similarity distribution analysis, and improves computational efficiency.
[0048] In one specific embodiment, the method for determining the open-set voiceprint recognition threshold may further include:
[0049] Obtain a voiceprint data training set, which includes N known training class voiceprint data with category labels, where N is an integer greater than 1;
[0050] The voiceprint feature extraction model is used to extract the known class training voiceprint feature vector for each known class training voiceprint data.
[0051] Based on the known class training voiceprint feature vectors of each known class training voiceprint data, calculate the mean voiceprint vector of the known class training voiceprint feature vectors belonging to the same category.
[0052] Specifically, a voiceprint data training set needs to be obtained. This training set includes N voiceprint data points (i.e., audio data) and the category label of the known class corresponding to each voiceprint data point. Essentially, each voiceprint data point is a training voiceprint data point with a known class label. The deep learning network model is trained using the N known class training voiceprint data points in the training set to obtain the voiceprint feature extraction model. For example, if a database contains audio data from 400 people, each with a total duration exceeding 300 seconds, 300 people are selected as the voiceprint data training set, and the remaining 100 people are selected as part of the test set. The audio data of each person in the training set is divided into 100 segments of 3 seconds each, serving as the known class training voiceprint data in the training set. The segments beyond 300 seconds are also divided into known class test voiceprint data in the test set. Furthermore, 30 segments from each person's 100 segmented audio data can be used as a validation set, not participating in model training, but used to verify model accuracy. The model with the highest validation set accuracy is selected as the voiceprint feature extraction model.
[0053] In addition, each known class of training voiceprint data is input into the voiceprint feature extraction model. The model extracts the useful features from the known class of training voiceprint data and outputs the corresponding known class of training voiceprint feature vector. Through the above feature extraction method, N known class training voiceprint feature vectors are extracted corresponding to N known class training voiceprint data. Each known class training voiceprint feature vector is a multi-dimensional feature vector; for example, each known class training voiceprint feature vector is a 512-dimensional feature vector.
[0054] Since each known class of training voiceprint data has a category label, each known class of training voiceprint feature vector also has a category label. Therefore, the N known class training voiceprint feature vectors are classified according to their category labels. The vector mean of the known class training voiceprint feature vectors belonging to the same category is calculated to obtain the voiceprint vector mean for that category. This voiceprint vector mean is a multi-dimensional matrix of voiceprint vector mean values. The above method yields the voiceprint vector mean for each category. This voiceprint vector mean is stored in a database, with each voiceprint vector mean having a category label. This method effectively compresses the high-dimensional, massive voiceprint data training set, reducing computational complexity.
[0055] The training process of the voiceprint feature extraction model is explained below:
[0056] First, a deep learning network model is created. This model is a deep metric learning model, which can be divided into a backbone network layer, a projection layer, and a classification layer. The backbone network layer uses a residual network structure and outputs a one-dimensional feature vector through global average pooling. The projection layer uses a multi-layer perceptron (MLP) structure, preferably a three-layer MLP. Each layer has batch normalization (BN). The first two layers have rectified linear units (ReLU), while the last layer does not have ReLU, ultimately outputting a feature vector. The classification layer uses the normalized exponential function Softmax for multi-class classification, and the loss function is a weighted sum of the cross-entropy loss function and the circle loss function, as shown in the following formula:
[0057] LOSS=λ1*CrossEntropy Loss+λ2*Circle Loss
[0058] Where LOSS represents the loss function of a deep learning network model;
[0059] λ1 represents the weight value of CrossEntropy Loss;
[0060] λ2 represents the weight value of Circle Loss.
[0061] The aforementioned deep learning network model, through repeated iterations, weight updates, and optimizations across multiple training cycles, achieves a convergent, stable, and highly accurate voiceprint recognition model. The classification layer is extracted from the voiceprint recognition model, retaining only the backbone network layer and the projection layer, forming a voiceprint feature extraction model. The backbone network layer extracts useful features, while the projection layer performs dimensionality reduction and feature optimization. The output of the projection layer is the output of the voiceprint feature extraction model. This voiceprint feature extraction model can express the characteristics of the original sound data more accurately with fewer dimensions, thus facilitating the clustering of similar feature vectors and the separation of dissimilar feature vectors in subsequent voiceprint recognition tasks.
[0062] As a specific embodiment of step 101, obtaining the known class test voiceprint feature vector corresponding to the known class voiceprint dataset and the unknown class voiceprint feature vector corresponding to the unknown class voiceprint dataset may specifically include:
[0063] Obtain a known class voiceprint dataset and an unknown class voiceprint dataset. The known class voiceprint dataset includes P known class test voiceprint data with category labels, and the unknown class voiceprint dataset includes Q unknown class voiceprint data without category labels, where P and Q are both integers greater than 1.
[0064] The voiceprint feature extraction model extracts the known class test voiceprint feature vector for each known class test voiceprint data and the unknown class voiceprint feature vector for each unknown class voiceprint data.
[0065] Specifically, to obtain the voiceprint recognition threshold used to distinguish whether a voiceprint belongs to a new category in an open-set voiceprint recognition task, it is necessary to obtain voiceprint data from a voiceprint test set. This test set includes known-class voiceprint datasets and unknown-class voiceprint datasets. The known-class voiceprint dataset includes P known-class test voiceprint data with known category labels, and the unknown-class voiceprint dataset includes Q unknown-class voiceprint data without category labels. Each known-class test voiceprint data is input into the voiceprint feature extraction model, and the corresponding known-class test voiceprint feature vector is output; similarly, each unknown-class voiceprint data is input into the voiceprint feature extraction model, and the corresponding unknown-class voiceprint feature vector is output.
[0066] As a specific embodiment of step 102, the first similarity and the second similarity are calculated using the Euclidean distance formula or the cosine distance formula.
[0067] The Euclidean distance calculation formula is as follows:
[0068]
[0069] Where X represents a known class of test voiceprint feature vector or an unknown class of voiceprint feature vector;
[0070] Y represents the mean value of the voiceprint vector;
[0071] M represents the total dimension of X or Y;
[0072] i represents the i-th dimension of X or the i-th dimension of Y, where i is a value greater than 0 and less than or equal to M;
[0073] x i This represents the i-th dimension of the voiceprint feature vector of X;
[0074] y i This represents the mean of the i-th dimension of the voiceprint vector of Y;
[0075] dist(X, Y) represents the Euclidean distance, i.e., the first similarity or the second similarity.
[0076] The formula for calculating the cosine distance is:
[0077]
[0078]
[0079] Wherein, cos(θ) 0,1 ) represents the cosine distance, i.e., the first similarity or the second similarity;
[0080] M represents the total dimension of the known class test voiceprint feature vectors, the total dimension of the unknown class voiceprint feature vectors, or the total dimension of the mean voiceprint vectors;
[0081] j represents the j-th dimension of the mean voiceprint vector;
[0082] express or in, Let j represent the j-th dimension of the known class test voiceprint feature vector. Represents the j-th dimension of the voiceprint feature vector of the unknown class;
[0083] W represents the Wth category;
[0084] This represents the j-th dimension of the voiceprint vector mean in the W-th category. Additionally, The specific calculation formula is as follows:
[0085]
[0086] Where W represents the W-th category;
[0087] M represents the total dimension of the known class of test voiceprint feature vectors or the total dimension of the unknown class of voiceprint feature vectors;
[0088] n represents the number of known class test voiceprint feature vectors or the number of unknown class voiceprint feature vectors in the Wth category;
[0089] d represents the voiceprint feature vector of the d-th known class or the voiceprint feature vector of the d-th unknown class in the W-th category;
[0090] This represents the first dimension of the known class test voiceprint feature vector in the d-th known class test voiceprint feature vector in the W-th category, or the first dimension of the unknown class voiceprint feature vector in the d-th unknown class test voiceprint feature vector.
[0091] This represents the second dimension of the known class test voiceprint feature vector in the d-th known class test voiceprint feature vector in the W-th category, or the second dimension of the unknown class voiceprint feature vector in the d-th unknown class test voiceprint feature vector.
[0092] This represents the M-dimensional known class test voiceprint feature vector of the d-th known class test voiceprint feature vector in the W-th category, or the M-dimensional unknown class voiceprint feature vector of the d-th unknown class test voiceprint feature vector.
[0093] As a specific embodiment of step 103, determining the voiceprint recognition threshold range based on the first similarity and the second similarity may specifically include:
[0094] Step 1031: Based on the first similarity between the test voiceprint feature vector of each known class and the mean voiceprint vector of each category, obtain the distribution map of the known class voiceprint feature vectors with respect to the first similarity.
[0095] Specifically, by using the first similarity between the voiceprint feature vector of each known class and the mean voiceprint vector of each class, a distribution map of the known class voiceprint feature vectors with respect to the first similarity is obtained. In other words, by using the first similarity between the voiceprint feature vector of each known class and the mean voiceprint vector of each class, a distribution map of the known class voiceprint feature vectors with respect to the first similarity is plotted on a Cartesian coordinate system where the horizontal axis represents similarity and the vertical axis represents the number of voiceprint feature vectors.
[0096] Step 1032: Based on the second similarity between each unknown class voiceprint feature vector and the mean of each class voiceprint vector, obtain the distribution map of the unknown class voiceprint feature vector with respect to the second similarity.
[0097] Specifically, similar to the method for obtaining the distribution map of known class voiceprint feature vectors, the distribution map of unknown class voiceprint feature vectors is obtained by using the second similarity between each unknown class voiceprint feature vector and the average voiceprint vector of each class. In other words, the distribution map of unknown class voiceprint feature vectors is obtained by plotting the second similarity between each unknown class voiceprint feature vector and the average voiceprint vector of each class on a Cartesian coordinate system where the horizontal axis represents similarity and the vertical axis represents the number of voiceprint feature vectors.
[0098] Step 1033: Determine the voiceprint recognition threshold range based on the known voiceprint feature vector distribution map and the unknown voiceprint feature vector distribution map.
[0099] Specifically, by placing the distribution maps of known and unknown voiceprint feature vectors in the same Cartesian coordinate system, the distribution and overlap between the known and unknown voiceprint feature vector distribution maps can be clearly displayed, thereby determining the range of voiceprint recognition threshold values.
[0100] It should be noted that the above-mentioned distribution maps of known and unknown voiceprint feature vectors are only examples of how to display them using distribution maps; other methods can also be used. Furthermore, plotting the distribution maps of known and unknown voiceprint feature vectors in a Cartesian coordinate system is only one example; other methods can also be used, and no specific limitation is made.
[0101] As a specific embodiment of step 1033, determining the voiceprint recognition threshold range based on the known voiceprint feature vector distribution map and the unknown voiceprint feature vector distribution map may specifically include:
[0102] Obtain the first space formed between the known type of voiceprint feature vector distribution map and the axis of the first similarity, and the second space formed between the unknown type of voiceprint feature vector distribution map and the axis of the second similarity;
[0103] Obtain the overlapping portion of the first space and the second space;
[0104] The similarity range corresponding to the overlapping portion is used as the voiceprint recognition threshold range.
[0105] Specifically, a first space is obtained between the distribution map of known voiceprint feature vectors and the coordinate axis of the first similarity, and a second space is obtained between the distribution map of unknown voiceprint feature vectors and the coordinate axis of the second similarity. This allows us to intuitively know the overlapping part between the first space and the second space, and thus know the range of similarity values of the overlapping part. The range of similarity values of the overlapping part is used to determine the range of voiceprint recognition threshold. In other words, the range of similarity values of the overlapping part is the range of voiceprint recognition threshold.
[0106] The above embodiments are illustrated below with a specific example:
[0107] like Figure 2 As shown, the horizontal axis represents similarity, and the vertical axis represents the number of similarities. If a distribution map of known class voiceprint feature vectors is plotted in this coordinate system, the similarity on the horizontal axis corresponds to the first similarity, and the number of similarities on the vertical axis corresponds to the number of known class test voiceprint feature vectors. If a distribution map of unknown class voiceprint feature vectors is plotted in this coordinate system, the similarity on the horizontal axis corresponds to the second similarity, and the number of similarities on the vertical axis corresponds to the number of unknown class voiceprint feature vectors. The dashed lines in the coordinate system represent the distribution map of unknown class voiceprint feature vectors, and the solid lines represent the distribution map of known class voiceprint feature vectors. The space formed between the solid lines and the horizontal axis is the first space, and the space formed between the dashed lines and the horizontal axis is the second space. The overlapping portion of the first and second spaces can be clearly seen in the diagram. The similarity value of this overlapping portion is 0.6-1, and the voiceprint recognition threshold range is 0.6-1.
[0108] As a specific embodiment of step 104, calculating the accuracy of each voiceprint recognition threshold within the voiceprint recognition threshold range may specifically include:
[0109] Obtain S voiceprint recognition thresholds within the specified voiceprint recognition threshold range, where S is an integer greater than 1;
[0110] For the first voiceprint recognition threshold, a first number of known class test voiceprint feature vectors with a first similarity greater than the first voiceprint recognition threshold is obtained, and a second number of unknown class voiceprint feature vectors with a second similarity less than the first voiceprint recognition threshold is obtained. The first voiceprint recognition threshold represents any one of the S voiceprint recognition thresholds.
[0111] The accuracy of the first voiceprint recognition threshold is calculated based on the first quantity and the second quantity.
[0112] Based on the accuracy of the first voiceprint recognition threshold, the accuracy of the S voiceprint recognition thresholds is obtained.
[0113] Specifically, S voiceprint recognition thresholds within the specified range are obtained. These thresholds can be obtained randomly or according to a set pattern, with each interval containing a preset value from smallest to largest. For example, the minimum value within the range can be used as the initial first voiceprint recognition threshold. The second threshold is the sum of the first threshold and a preset value. Similarly, the third threshold is the sum of the second threshold and a preset value, and so on, until the last voiceprint recognition threshold within the range is obtained.
[0114] For the first voiceprint recognition threshold among S voiceprint recognition thresholds, a first similarity greater than the first voiceprint recognition threshold indicates that the known class test voiceprint feature vector is correctly classified, and a second similarity less than the first voiceprint recognition threshold indicates that the unknown class voiceprint feature vector is correctly classified. Therefore, based on the distribution map of known class voiceprint feature vectors, the total number of known class test voiceprint feature vectors with a first similarity greater than the first voiceprint recognition threshold (i.e., the first number) is obtained, and based on the distribution map of unknown class voiceprint feature vectors, the total number of unknown class voiceprint feature vectors with a second similarity less than the first voiceprint recognition threshold (i.e., the second number) is obtained. The accuracy of the first voiceprint recognition threshold can be calculated using the first and second numbers. The accuracy of each of the S voiceprint recognition thresholds is calculated in the above manner. Therefore, the accuracy of the voiceprint recognition result obtained by performing open-set voiceprint recognition using each voiceprint recognition threshold within its range can be determined.
[0115] Furthermore, the accuracy of the first voiceprint recognition threshold is calculated using the following formula:
[0116]
[0117] Wherein, ACC represents the accuracy of the first voiceprint recognition threshold;
[0118] M1 represents the first quantity; M2 represents the second quantity;
[0119] C represents the sum of the number of the first similarity and the number of the second similarity.
[0120] As a specific embodiment of step 105, determining the target voiceprint recognition threshold from the range of voiceprint recognition thresholds based on the accuracy of each voiceprint recognition threshold may specifically include:
[0121] Based on the accuracy of each voiceprint recognition threshold, the voiceprint recognition threshold with the highest accuracy within the range of voiceprint recognition thresholds is determined as the target voiceprint recognition threshold.
[0122] Specifically, based on the accuracy of each voiceprint recognition threshold within the range of voiceprint recognition thresholds, the voiceprint recognition threshold with the highest accuracy within the range can be determined as the target voiceprint recognition threshold. Open set voiceprint recognition can be performed using this target voiceprint recognition threshold, which can make the voiceprint recognition more accurate.
[0123] Furthermore, by using each voiceprint recognition threshold within the threshold range and its corresponding accuracy, a Receiver Operating Characteristic (ROC) curve can be generated, representing the relationship between the voiceprint recognition threshold and accuracy. For example: Figure 3 As shown, the horizontal axis represents the voiceprint recognition threshold, and the vertical axis represents the accuracy. This ROC curve provides a clear overview of the accuracy corresponding to each voiceprint recognition threshold, as well as the highest accuracy value. Figure 3 Among them, the accuracy corresponding to the voiceprint recognition threshold of 0.8472 is the highest value, with the highest accuracy being 0.7. Therefore, 0.847 can be determined as the target voiceprint recognition threshold.
[0124] The process of determining the open-set voiceprint recognition threshold is illustrated below through a specific embodiment:
[0125] like Figure 4 As shown, step 401: Obtain the voiceprint data training set, and train the deep learning network model using the voiceprint data training set to obtain the voiceprint feature extraction model.
[0126] Step 402: Extract the known class training voiceprint feature vector of each known class training voiceprint data in the voiceprint data training set through the voiceprint feature extraction model, and calculate the mean voiceprint vector of the known class training voiceprint feature vectors belonging to the same category.
[0127] Step 403: Obtain the known class voiceprint dataset and the unknown class voiceprint dataset from the voiceprint data test set, and extract the known class test voiceprint feature vector of each known class test voiceprint data in the known class voiceprint dataset using the voiceprint feature extraction model, and extract the unknown class voiceprint feature vector of each unknown class voiceprint data in the unknown class voiceprint dataset using the voiceprint feature extraction model.
[0128] Step 404: Calculate the first similarity between the test voiceprint feature vector of each known class and the mean voiceprint vector of each class, and calculate the second similarity between the voiceprint feature vector of each unknown class and the mean voiceprint vector of each class.
[0129] Step 405: Based on the first similarity between the known class test voiceprint feature vector and the mean voiceprint vector of each class, obtain the distribution map of the known class voiceprint feature vector with respect to the first similarity; and based on the second similarity between the unknown class voiceprint feature vector and the mean voiceprint vector of each class, obtain the distribution map of the unknown class voiceprint feature vector with respect to the second similarity.
[0130] Step 406: Obtain the first space formed between the known type voiceprint feature vector distribution map and the axis of the first similarity, and the second space formed between the unknown type voiceprint feature vector distribution map and the axis of the second similarity.
[0131] Step 407: Obtain the overlapping portion of the first space and the second space, and use the similarity range corresponding to the overlapping portion as the voiceprint recognition threshold range.
[0132] Step 408: For the first voiceprint recognition threshold within the voiceprint recognition threshold range, obtain the first number of known class test voiceprint feature vectors with the first similarity greater than the first voiceprint recognition threshold, and obtain the second number of unknown class voiceprint feature vectors with the second similarity less than the first voiceprint recognition threshold.
[0133] Step 409: Calculate the accuracy of the first voiceprint recognition threshold based on the first quantity and the second quantity.
[0134] Step 410: Based on the accuracy of each voiceprint recognition threshold, determine the voiceprint recognition threshold with the highest accuracy within the range of the voiceprint recognition thresholds as the target voiceprint recognition threshold.
[0135] In summary, the embodiments of the present invention, by leveraging the characteristics of the voiceprint feature extraction model—namely clustering and dissimilar separation—allow the mean of voiceprint vectors to represent categories, avoiding the need for subsequent similarity calculations on all voiceprint feature vectors. This improves processing speed and reduces computational complexity. Furthermore, based on the known and unknown class test voiceprint feature vectors, the similarity between these vectors and the mean of each category's voiceprint vector is calculated, and corresponding distribution maps of known and unknown class voiceprint feature vectors are plotted. The overlap between these distribution maps allows for faster determination of the voiceprint recognition threshold range. By calculating the accuracy corresponding to the voiceprint recognition threshold within this range, the optimal target voiceprint recognition threshold can be quickly determined. This entire process improves the speed and accuracy of obtaining the voiceprint recognition threshold in open-set voiceprint recognition, achieving high recognition accuracy and high computational efficiency, further enhancing the system's robustness, reliability, and efficiency.
[0136] The above describes the method for determining the open set voiceprint recognition threshold provided by the embodiments of the present invention. The following will describe the device for determining the open set voiceprint recognition threshold provided by the embodiments of the present invention in conjunction with the accompanying drawings.
[0137] like Figure 5 As shown, this embodiment of the invention also provides a device 500 for determining an open-set voiceprint recognition threshold, the device comprising:
[0138] The first acquisition module 501 is used to acquire the known class test voiceprint feature vectors corresponding to the known class voiceprint dataset and the unknown class voiceprint feature vectors corresponding to the unknown class voiceprint dataset.
[0139] The first calculation module 502 is used to calculate the first similarity between the test voiceprint feature vector of each known class and the mean voiceprint vector of each class, and to calculate the second similarity between the voiceprint feature vector of each unknown class and the mean voiceprint vector of each class.
[0140] The first determining module 503 is used to determine the voiceprint recognition threshold range based on the first similarity and the second similarity.
[0141] The second calculation module 504 is used to calculate the accuracy of each voiceprint recognition threshold within the range of the voiceprint recognition threshold.
[0142] The second determining module 505 is used to determine a target voiceprint recognition threshold from the range of the voiceprint recognition thresholds based on the accuracy of each voiceprint recognition threshold.
[0143] In the above embodiments of the present invention, based on the known class test voiceprint feature vectors of the known class voiceprint dataset and the unknown class voiceprint feature vectors of the unknown class voiceprint dataset, the similarity between them and the mean value of the voiceprint feature vectors of each class is calculated. By analyzing the similarity distribution between the same and different classes, the voiceprint recognition threshold range is first determined, and then the target voiceprint recognition threshold is determined by the accuracy of the voiceprint recognition threshold within the range. This method realizes the clustering of voiceprint features in high-dimensional space, avoids similarity calculation for all voiceprint feature vectors, greatly reduces the amount of computation, improves the speed of similarity distribution analysis, and improves computational efficiency.
[0144] Optionally, the device further includes:
[0145] The second acquisition module is used to acquire a voiceprint data training set, which includes N known class training voiceprint data with category labels, where N is an integer greater than 1.
[0146] The first extraction module is used to extract the known class training voiceprint feature vector of each known class training voiceprint data through the voiceprint feature extraction model;
[0147] The third calculation module is used to calculate the average voiceprint vector of the known class training voiceprint feature vectors belonging to the same category, based on the known class training voiceprint feature vector of each known class training voiceprint data.
[0148] Optionally, the first acquisition module 501 is specifically used for:
[0149] Obtain a known class voiceprint dataset and an unknown class voiceprint dataset. The known class voiceprint dataset includes P known class test voiceprint data with category labels, and the unknown class voiceprint dataset includes Q unknown class voiceprint data without category labels, where P and Q are both integers greater than 1.
[0150] The voiceprint feature extraction model extracts the known class test voiceprint feature vector for each known class test voiceprint data and the unknown class voiceprint feature vector for each unknown class voiceprint data.
[0151] Optionally, the first determining module 503 is specifically used for:
[0152] Based on the first similarity between the voiceprint feature vector of each known class and the mean voiceprint vector of each category, obtain the distribution map of the voiceprint feature vector of the known class with respect to the first similarity.
[0153] Based on the second similarity between each unknown class voiceprint feature vector and the mean voiceprint vector of each category, obtain the distribution map of the unknown class voiceprint feature vector with respect to the second similarity.
[0154] Based on the distribution map of known voiceprint feature vectors and the distribution map of unknown voiceprint feature vectors, the voiceprint recognition threshold range is determined.
[0155] Optionally, when the first determining module 503 determines the voiceprint recognition threshold range based on the known type voiceprint feature vector distribution map and the unknown type voiceprint feature vector distribution map, it is specifically used for:
[0156] Obtain the first space formed between the known type of voiceprint feature vector distribution map and the axis of the first similarity, and the second space formed between the unknown type of voiceprint feature vector distribution map and the axis of the second similarity;
[0157] Obtain the overlapping portion of the first space and the second space;
[0158] The similarity range corresponding to the overlapping portion is used as the voiceprint recognition threshold range.
[0159] Optionally, the second calculation module 504 is specifically used for:
[0160] Obtain S voiceprint recognition thresholds within the specified voiceprint recognition threshold range;
[0161] For the first voiceprint recognition threshold, a first number of known class test voiceprint feature vectors with a first similarity greater than the first voiceprint recognition threshold is obtained, and a second number of unknown class voiceprint feature vectors with a second similarity less than the first voiceprint recognition threshold is obtained. The first voiceprint recognition threshold represents any one of the S voiceprint recognition thresholds.
[0162] The accuracy of the first voiceprint recognition threshold is calculated based on the first quantity and the second quantity.
[0163] Based on the accuracy of the first voiceprint recognition threshold, the accuracy of the S voiceprint recognition thresholds is obtained.
[0164] Optionally, the accuracy of the first voiceprint recognition threshold is calculated using the following formula:
[0165]
[0166] Wherein, ACC represents the accuracy of the first voiceprint recognition threshold;
[0167] M1 represents the first quantity; M2 represents the second quantity;
[0168] C represents the sum of the number of the first similarity and the number of the second similarity.
[0169] Optionally, the second determining module 505 is specifically used for:
[0170] Based on the accuracy of each voiceprint recognition threshold, the voiceprint recognition threshold with the highest accuracy within the range of voiceprint recognition thresholds is determined as the target voiceprint recognition threshold.
[0171] In summary, the embodiments of the present invention, by leveraging the characteristics of the voiceprint feature extraction model—namely clustering and dissimilar separation—allow the mean of voiceprint vectors to represent categories, avoiding the need for subsequent similarity calculations on all voiceprint feature vectors. This improves processing speed and reduces computational complexity. Furthermore, based on the known and unknown class test voiceprint feature vectors, the similarity between these vectors and the mean of each category's voiceprint vector is calculated, and corresponding distribution maps of known and unknown class voiceprint feature vectors are plotted. The overlap between these distribution maps allows for faster determination of the voiceprint recognition threshold range. By calculating the accuracy corresponding to the voiceprint recognition threshold within this range, the optimal target voiceprint recognition threshold can be quickly determined. This entire process improves the speed and accuracy of obtaining the voiceprint recognition threshold in open-set voiceprint recognition, achieving high recognition accuracy and high computational efficiency, further enhancing the system's robustness, reliability, and efficiency.
[0172] It should be noted that the device for determining the open set voiceprint recognition threshold provided in this embodiment of the invention can implement all the method steps implemented in the above-mentioned method embodiment for determining the open set voiceprint recognition threshold, and can achieve the same technical effect. Here, the parts that are the same as those in the method embodiment and the beneficial effects will not be described in detail.
[0173] It should be noted that the division of units in the embodiments of this application is illustrative and only represents one logical functional division. In actual implementation, other division methods may be used. Furthermore, the functional units in the various embodiments of this application can be integrated into one processing unit, or each unit can exist physically separately, or two or more units can be integrated into one unit. The integrated units described above can be implemented in hardware or as software functional units.
[0174] If the integrated unit is implemented as a software functional unit and sold or used as an independent product, it can be stored in a processor-readable storage medium. Based on this understanding, the technical solution of this application, in essence, or the part that contributes to the prior art, or all or part of the technical solution, can be embodied in the form of a software product. This computer software product is stored in a storage medium and includes several instructions to cause a computer device (which may be a personal computer, server, or network device, etc.) or processor to execute all or part of the steps of the methods described in the various embodiments of this application. The aforementioned storage medium includes various media capable of storing program code, such as USB flash drives, portable hard drives, read-only memory (ROM), random access memory (RAM), magnetic disks, or optical disks.
[0175] like Figure 6 As shown, embodiments of the present invention also provide an electronic device, including a memory 620, a transceiver 610, and a processor 600:
[0176] Memory 620 is used to store computer programs;
[0177] Transceiver 610 is used to send and receive data under the control of the processor;
[0178] Processor 600 is configured to read a computer program from memory and execute the steps of the method for determining the open-set voiceprint recognition threshold as described in any of the above embodiments.
[0179] Among them, Figure 6 In this context, the bus architecture can include any number of interconnected buses and bridges, specifically linking various circuits together, represented by one or more processors (processor 600) and memory (memory 620). The bus architecture can also link together various other circuits such as peripheral devices, voltage regulators, and power management circuits, which are well known in the art and therefore will not be described further herein. The bus interface provides an interface. The transceiver 610 can be multiple elements, including transmitters and receivers, providing a unit for communicating with various other devices over transmission media, including wireless channels, wired channels, optical fibers, etc. The processor 600 is responsible for managing the bus architecture and general processing, and the memory 620 can store data used by the processor 600 during operation.
[0180] The processor 600 can be a central processing unit (CPU), an application-specific integrated circuit (ASIC), a field-programmable gate array (FPGA), or a complex programmable logic device (CPLD). The processor can also adopt a multi-core architecture.
[0181] The processor invokes a computer program stored in memory to execute any of the open-set voiceprint recognition threshold determination methods provided in the embodiments of this application, according to the obtained executable instructions. The processor and memory can also be physically separated.
[0182] It should be noted that the electronic device provided in this embodiment of the invention can implement all the method steps implemented in the above embodiment of the method for determining the open set voiceprint recognition threshold, and can achieve the same technical effect. Here, the parts that are the same as those in the method embodiment and the beneficial effects will not be described in detail.
[0183] Embodiments of the present invention also provide a processor-readable storage medium storing a computer program for causing the processor to execute the above-described method for determining the open-set voiceprint recognition threshold.
[0184] The processor-readable storage medium can be any available medium or data storage device that the processor can access, including but not limited to magnetic memory (e.g., floppy disk, hard disk, magnetic tape, magneto-optical disk (MO)), optical memory (e.g., CD, DVD, BD, HVD), and semiconductor memory (e.g., ROM, EPROM, EEPROM, non-volatile memory (NAND FLASH), solid-state drive (SSD)).
[0185] Those skilled in the art will understand that embodiments of this application can be provided as methods, systems, or computer program products. Therefore, this application can take the form of a completely hardware embodiment, a completely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, this application can take the form of a computer program product implemented on one or more computer-usable storage media (including, but not limited to, disk storage and optical storage) containing computer-usable program code.
[0186] This application is described with reference to flowchart illustrations and / or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of this application. It will be understood that each block of the flowchart illustrations and / or block diagrams, and combinations of blocks in the flowchart illustrations and / or block diagrams, can be implemented by computer-executable instructions. These computer-executable instructions can be provided to a processor of a general-purpose computer, special-purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, generate instructions for implementing the flowchart... Figure 1 One or more processes and / or boxes Figure 1 A device that provides the functions specified in one or more boxes.
[0187] These processor-executable instructions may also be stored in a processor-readable memory that can direct a computer or other programmable data processing device to operate in a particular manner, such that the instructions stored in the processor-readable memory produce an article of manufacture including instruction means, which are implemented in a process Figure 1 One or more processes and / or boxes Figure 1 The function specified in one or more boxes.
[0188] These processors can execute instructions that can also be loaded onto a computer or other programmable data processing device, causing a series of operational steps to be performed on the computer or other programmable device to produce a computer-implemented process, thereby providing instructions that execute on the computer or other programmable device for implementing the process. Figure 1 One or more processes and / or boxes Figure 1 The steps of the function specified in one or more boxes.
[0189] Obviously, those skilled in the art can make various modifications and variations to this application without departing from the spirit and scope of this application. Therefore, if such modifications and variations fall within the scope of the claims of this application and their equivalents, this application also intends to include such modifications and variations.
Claims
1. A method for determining the threshold of open-set voiceprint recognition, characterized in that, The method includes: Obtain the known class test voiceprint feature vectors corresponding to the known class voiceprint datasets and the unknown class voiceprint feature vectors corresponding to the unknown class voiceprint datasets; Calculate the first similarity between the test voiceprint feature vector of each known class and the mean voiceprint vector of each class, and calculate the second similarity between the voiceprint feature vector of each unknown class and the mean voiceprint vector of each class. Based on the first similarity and the second similarity, the voiceprint recognition threshold range is determined; Calculate the accuracy of each voiceprint recognition threshold within the aforementioned voiceprint recognition threshold range; Based on the accuracy of each voiceprint recognition threshold, a target voiceprint recognition threshold is determined from the range of the voiceprint recognition thresholds. The accuracy of calculating the voiceprint recognition threshold within the specified threshold range includes: Obtain S voiceprint recognition thresholds within the specified voiceprint recognition threshold range, where S is an integer greater than 1; For the first voiceprint recognition threshold, a first number of known class test voiceprint feature vectors with a first similarity greater than the first voiceprint recognition threshold is obtained, and a second number of unknown class voiceprint feature vectors with a second similarity less than the first voiceprint recognition threshold is obtained. The first voiceprint recognition threshold represents any one of the S voiceprint recognition thresholds. The accuracy of the first voiceprint recognition threshold is calculated based on the first quantity and the second quantity. Based on the accuracy of the first voiceprint recognition threshold, the accuracy of the S voiceprint recognition thresholds is obtained.
2. The method according to claim 1, characterized in that, The method further includes: Obtain a voiceprint data training set, which includes N known class training voiceprint data with category labels, where N is an integer greater than 1; The voiceprint feature extraction model is used to extract the known class training voiceprint feature vector for each known class training voiceprint data. Based on the known class training voiceprint feature vectors of each known class training voiceprint data, calculate the mean voiceprint vector of the known class training voiceprint feature vectors belonging to the same category.
3. The method according to claim 1, characterized in that, The process of obtaining the known class test voiceprint feature vectors corresponding to the known class voiceprint dataset and the unknown class voiceprint feature vectors corresponding to the unknown class voiceprint dataset includes: Obtain a known class voiceprint dataset and an unknown class voiceprint dataset. The known class voiceprint dataset includes P known class test voiceprint data with category labels, and the unknown class voiceprint dataset includes Q unknown class voiceprint data without category labels, where P and Q are both integers greater than 1. The voiceprint feature extraction model extracts the known class test voiceprint feature vector for each known class test voiceprint data and the unknown class voiceprint feature vector for each unknown class voiceprint data.
4. The method according to claim 1, characterized in that, Determining the voiceprint recognition threshold range based on the first similarity and the second similarity includes: Based on the first similarity between the voiceprint feature vector of each known class and the mean voiceprint vector of each category, obtain the distribution map of the voiceprint feature vector of the known class with respect to the first similarity. Based on the second similarity between each unknown class voiceprint feature vector and the mean voiceprint vector of each category, obtain the distribution map of the unknown class voiceprint feature vector with respect to the second similarity. Based on the distribution map of known voiceprint feature vectors and the distribution map of unknown voiceprint feature vectors, the voiceprint recognition threshold range is determined.
5. The method according to claim 4, characterized in that, The step of determining the voiceprint recognition threshold range based on the known voiceprint feature vector distribution map and the unknown voiceprint feature vector distribution map includes: Obtain the first space formed between the known type of voiceprint feature vector distribution map and the axis of the first similarity, and the second space formed between the unknown type of voiceprint feature vector distribution map and the axis of the second similarity; Obtain the overlapping portion of the first space and the second space; The similarity range corresponding to the overlapping portion is used as the voiceprint recognition threshold range.
6. The method according to claim 1, characterized in that, The accuracy of the first voiceprint recognition threshold is calculated using the following formula: Wherein, ACC represents the accuracy of the first voiceprint recognition threshold; M1 represents the first quantity; M2 represents the second quantity; C represents the sum of the number of the first similarity and the number of the second similarity.
7. The method according to claim 1, characterized in that, The step of determining the target voiceprint recognition threshold from the range of voiceprint recognition thresholds based on the accuracy of each voiceprint recognition threshold includes: Based on the accuracy of each voiceprint recognition threshold, the voiceprint recognition threshold with the highest accuracy within the range of voiceprint recognition thresholds is determined as the target voiceprint recognition threshold.
8. A device for determining the threshold of open-set voiceprint recognition, characterized in that, The device includes: The first acquisition module is used to acquire the known class test voiceprint feature vectors corresponding to the known class voiceprint dataset and the unknown class voiceprint feature vectors corresponding to the unknown class voiceprint dataset. The first calculation module is used to calculate the first similarity between the voiceprint feature vector of each known class and the mean voiceprint vector of each class, and to calculate the second similarity between the voiceprint feature vector of each unknown class and the mean voiceprint vector of each class. The first determining module is used to determine the voiceprint recognition threshold range based on the first similarity and the second similarity; The second calculation module is used to calculate the accuracy of each voiceprint recognition threshold within the range of the voiceprint recognition threshold. The second determining module is used to determine a target voiceprint recognition threshold from the range of the voiceprint recognition thresholds based on the accuracy of each voiceprint recognition threshold. The second calculation module is specifically used for: Obtain S voiceprint recognition thresholds within the specified voiceprint recognition threshold range, where S is an integer greater than 1; For the first voiceprint recognition threshold, a first number of known class test voiceprint feature vectors with a first similarity greater than the first voiceprint recognition threshold is obtained, and a second number of unknown class voiceprint feature vectors with a second similarity less than the first voiceprint recognition threshold is obtained. The first voiceprint recognition threshold represents any one of the S voiceprint recognition thresholds. The accuracy of the first voiceprint recognition threshold is calculated based on the first quantity and the second quantity. Based on the accuracy of the first voiceprint recognition threshold, the accuracy of the S voiceprint recognition thresholds is obtained.
9. An electronic device, characterized in that, Includes memory, transceiver, and processor: Memory, used to store computer programs; Transceiver, used to send and receive data under the control of the processor; A processor for reading a computer program from the memory and executing the method for determining an open-set voiceprint recognition threshold as described in any one of claims 1 to 7.
10. A processor-readable storage medium, characterized in that, The processor-readable storage medium stores a computer program for causing the processor to perform the method for determining the open-set voiceprint recognition threshold as described in any one of claims 1 to 7.