A secure voiceprint authentication method based on hashing and feature transformation
By combining a voiceprint authentication method based on hashing and feature transformation with a lightweight speech synthesis algorithm, the target voiceprint is generated, which solves the problem of the inadequacy of a single defense method in voiceprint authentication and realizes an efficient and secure voiceprint authentication service.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- XIDIAN UNIV
- Filing Date
- 2023-04-26
- Publication Date
- 2026-06-30
AI Technical Summary
Existing voiceprint authentication technologies are insufficient in defending against replay attacks, spoofing attacks, and adversarial attacks due to their reliance on single defense methods. Furthermore, they struggle to balance privacy protection and system availability, resulting in poor security, privacy, and usability.
A voiceprint authentication method based on hashing and feature transformation is adopted. By generating the target voiceprint through speech synthesis and hashing between the user device and the identity authentication server, and combined with a lightweight speech synthesis algorithm, it can defend against replay attacks and adversarial attacks, and protect the user's voiceprint privacy.
It simplifies the operation process while ensuring privacy and security, improves the usability of voiceprint authentication, defends against various attack methods, and reduces deployment costs and user operation complexity.
Smart Images

Figure CN116566616B_ABST
Abstract
Description
Technical Field
[0001] This invention belongs to the field of biometric recognition and identity authentication technology, specifically relating to a secure voiceprint authentication method based on hashing and feature transformation. Background Technology
[0002] Voiceprint authentication, by avoiding the need for managing complex account information and providing convenient operation, has extremely high application value, especially in scenarios such as assisted driving and smart homes, driven by the ever-growing demand for internet services. For example, while driving, the driver's identity can be verified through voiceprint, and services such as navigation can be provided based on the driver's voice commands. Cloud-based remote voiceprint authentication services are highly versatile and easy to deploy at low cost. However, because user voiceprint information needs to be uploaded to the cloud, and the user's environment is also at risk of being monitored, recorded, or interfered with, the security and privacy of voiceprint authentication services are at great risk. Therefore, while ensuring the high availability of voiceprint authentication services, it is of great significance to study how to protect the privacy of user voiceprint information and defend against potential attacks.
[0003] Researchers have noted a number of hidden dangers in the field of voiceprint authentication, but existing work still faces several challenges. First, in defense against replay and spoofing attacks, i.e., liveness detection, collecting user voice data for authentication often requires additional sampling types or quantities, which typically incurs additional deployment costs, complicates user operations, and negatively impacts usability. Furthermore, some physical features of liveness detection are susceptible to interference or forgery, rendering detection ineffective. Second, while many schemes exist for detecting spoofing attacks, they are reactive measures implemented after private voiceprint information has been leaked. How to proactively protect voiceprint information in authentication systems from leakage remains an unresolved issue, as existing methods are still imperfect in terms of accuracy, usability, and overhead. Third, while adversarial attacks are typically time-consuming and difficult to execute effectively from the air, effective defenses are scarce once the authentication system is successfully breached. Adversarial training-based defenses cannot generate robust machine learning models, and defense methods that add or reduce noise affect the accuracy and efficiency of authentication. Finally, existing work rarely comprehensively considers all three types of threats mentioned above and provides an effective solution.
[0004] In recent years, researchers have noticed the privacy and security issues of voiceprint authentication services. However, the above methods, when improving the privacy and security of voiceprint authentication, usually only consider a single risk issue and are difficult to provide comprehensive protection. In addition, due to the different issues they focus on, the methods adopted by various schemes often have defects: (1) The disadvantage of privacy protection methods is that the use of cryptographic methods usually limits the size of user voice files and processing efficiency, while methods based on user voice features require more complex user actions, reducing usability. (2) The disadvantage of liveness detection methods is that they often require additional data acquisition equipment, user operation, system operating overhead, etc., reducing the usability advantage of the system. Some systems may also be affected by noise or noise reduction processing, making accuracy and robustness need to be improved. (3) The disadvantage of adversarial attack defense methods is that although noise processing-based methods effectively reduce the success rate of attacks, they cannot completely resist adversarial attacks. Summary of the Invention
[0005] To address the aforementioned problems in related technologies, this invention provides a secure voiceprint authentication method based on hashing and feature transformation. The technical problem to be solved by this invention is achieved through the following technical solution:
[0006] This invention provides a secure voiceprint authentication method based on hashing and feature transformation, comprising:
[0007] The user equipment sends request information to the service platform based on user operations;
[0008] The service platform sends a request to the identity authentication server based on the request information;
[0009] When the request is an identity authentication request, the identity authentication server determines the first synthesized voiceprint and the first challenge code of the user equipment, sends the first challenge code to the user equipment, and processes the first synthesized voiceprint based on the hash parameters corresponding to the user equipment to obtain the authentication target voiceprint; the first challenge code represents the content of the voice data that needs to be input.
[0010] The user equipment receives first voice data in a first mode input by the user according to the first challenge code, generates a second synthesized voiceprint based on the first voice data and second voice data in a second mode, converts the second synthesized voiceprint into a response voiceprint, and sends it to the identity authentication server.
[0011] The identity authentication server verifies the response voiceprint based on the target voiceprint and the first challenge code;
[0012] When the verification is successful, the service platform provides services to the user equipment.
[0013] The present invention has the following beneficial technical effects:
[0014] 1. Privacy-preserving voiceprint authentication service. This invention achieves privacy protection for the original voiceprint through voiceprint synthesis, preventing attackers from obtaining the user's voiceprint through user information stored in the cloud server and using it to carry out malicious acts.
[0015] 2. Secure voiceprint authentication service. This invention uses hash-based target voiceprint feature transformation to defend against replay attacks and adversarial attacks, overcoming the shortcomings of existing methods such as high deployment costs, complex operation processes, and the ability to defend only against single security risks.
[0016] 3. High-availability voiceprint authentication service. This invention protects user voiceprint privacy by employing voiceprint synthesis and authentication target updating methods. During the authentication phase, only a normal speaking voice input is required, making it simpler to operate than existing methods while ensuring privacy, security, accuracy, and efficiency.
[0017] The present invention will be further described in detail below with reference to the accompanying drawings and embodiments. Attached Figure Description
[0018] Figure 1 An architecture diagram of an exemplary voiceprint authentication service system provided in an embodiment of the present invention;
[0019] Figure 2 A flowchart illustrating an exemplary secure voiceprint authentication method based on hash and feature transformation provided in an embodiment of the present invention;
[0020] Figure 3 An exemplary registration flowchart provided for embodiments of the present invention;
[0021] Figure 4 An exemplary authentication flowchart provided for embodiments of the present invention;
[0022] Figure 5 An exemplary interaction flowchart between UA and UAP is provided for embodiments of the present invention;
[0023] Figure 6 An exemplary interface diagram of a user performing voiceprint authentication on a webpage, provided as an embodiment of the present invention;
[0024] Figure 7 This is an exemplary interface diagram of a user performing voiceprint authentication in a mobile application, provided as an embodiment of the present invention. Detailed Implementation
[0025] The present invention will be further described in detail below with reference to specific embodiments, but the implementation of the present invention is not limited thereto.
[0026] In the description of this invention, the terms "first" and "second" are used for descriptive purposes only and should not be construed as indicating or implying relative importance or implicitly specifying the number of indicated technical features. Thus, a feature defined as "first" or "second" may explicitly or implicitly include one or more of that feature. In the description of this invention, "a plurality of" means two or more, unless otherwise explicitly specified.
[0027] In the description of this specification, the references to terms such as "one embodiment," "some embodiments," "example," "specific example," or "some examples," etc., indicate that a specific feature, structure, material, or characteristic described in connection with that embodiment or example is included in at least one embodiment or example of the invention. In this specification, the illustrative expressions of the above terms do not necessarily refer to the same embodiment or example. Furthermore, the specific features, structures, materials, or characteristics described may be combined in any suitable manner in one or more embodiments or examples. In addition, those skilled in the art can combine and integrate the different embodiments or examples described in this specification.
[0028] Although the invention has been described herein in conjunction with various embodiments, those skilled in the art will understand and implement other variations of the disclosed embodiments by reviewing the accompanying drawings, disclosure, and appended claims in carrying out the claimed invention. In the claims, the word "comprising" does not exclude other components or steps, and "a" or "an" does not exclude a plurality. A single processor or other unit can implement several functions listed in the claims. While different dependent claims may recite certain measures, this does not mean that these measures cannot be combined to produce good results.
[0029] In recent years, researchers have begun to focus on the security and privacy issues in voiceprint authentication. Based on the different risks addressed, security and privacy enhancement schemes for voiceprint authentication can be divided into three categories: 1. Voiceprint privacy protection, preventing attackers from obtaining voiceprint privacy data and launching spoofing attacks; 2. Voiceprint liveness detection, detecting whether input data comes from a live user to resist replay attacks and spoofing attacks; 3. Defense against adversarial attacks on voiceprints, preventing voiceprint authentication services from being attacked by adversarial attacks. Representative related works are shown in Table 1.
[0030]
[0031] Table 1
[0032] (1) Voiceprint privacy protection. Scheme [1] proposes to reduce the possibility of voiceprint information leakage by using deep voiceprint features after multiple transformations and their corresponding blockchain keys. Scheme [2] proposes to use homomorphic encryption to protect the privacy of users' voice data. Based on homomorphism, the data in the server can be queried and the similarity verified in an encrypted state. Scheme [3] uses a confusion function to protect voiceprint features. Scheme [4] proposes a privacy protection method based on synthesized voiceprints. The voiceprint features of users in two speaking modes are synthesized into a new voiceprint and uploaded to the server for identity verification, thereby achieving the purpose of protecting the original voiceprint.
[0033] (2) Voiceprint liveness detection, which involves collecting additional physical information about the user's voice or other behavioral data of the user, and analyzing whether its characteristics match the behavior of a legitimate target user to defend against replay attacks and spoofing attacks. Liveness detection schemes are divided into two categories: the first category is detection methods based solely on voice features. Scheme [5] proposes a liveness detection method based on voice gestures. Since moving the recording device while speaking produces a unique Kepler effect, detecting the corresponding voiceprint features can be used to distinguish malicious voices from replay attacks or spoofing attacks. They also propose another method based on the difference in voice arrival time to establish a voice model (Scheme [6]), thereby distinguishing between human speech and machine playback. Scheme [7] proposes a liveness detection method based on distorted noise. When a user speaks close to the microphone, the breathing airflow produces a unique distorted noise. This noise is difficult to be recorded and imitated by replay attacks from a distance, and can be used to identify legitimate users and defend against replay attacks. Scheme [8] proposes a voiceprint authentication scheme that can resist replay attacks and spoofing attacks. In addition to the random challenge code, each authentication randomly specifies the speaking mode of repeating the challenge code, such as accent, pause, etc. Even if the old voice is recorded, it is difficult for attackers to carry out spoofing attacks by replaying or imitating. The second type is a multimodal detection method based on other features. Scheme [9] collects video information and monitors the motion information of the user device at the same time, that is, the data of the accelerometer. Liveness detection is achieved by analyzing whether the motion features of the video and the sensor data match. Scheme
[10] detects the energy field formed by sound propagation and uses this to distinguish between different sound sources of human speech and machine playback. Scheme
[11] places the user between two Wi-Fi signal antennas. Since different mouth shapes will form different obstructions to the signal when speaking, it is possible to distinguish whether the input voice data comes from a live person.
[0034] (3) Defense against adversarial attacks using voiceprints. Researchers have focused on defending against adversarial attacks using noise-based methods. Proposal
[12] proposes an adversarial attack detection method based on the Shahe test. They input the original data and the data after noise reduction into the same neural network. If the recognition results are the same, it means that the original data is not affected by adversarial noise; otherwise, it means that there is adversarial noise in the original data, which makes the recognition result of the neural network tamper with the attack target. Proposal
[13] also noted the vulnerability of adversarial attacks. In this paper, the samples of adversarial attacks were denoised and denoised respectively. The test results showed that both methods would interfere with the noise used by adversarial attacks, which greatly reduced the success rate of the attack.
[0035] The above methods, while improving the privacy and security of voiceprint authentication, usually only consider a single risk issue and are difficult to provide comprehensive protection. In addition, due to the different issues they focus on, the methods adopted by various schemes often have defects: (1) The disadvantage of privacy protection methods is that the use of cryptographic methods usually limits the size of user voice files and processing efficiency, while methods based on user voice features require more complex user actions, reducing usability. (2) The disadvantage of liveness detection methods is that they often require additional data acquisition equipment, user operations, system operating overhead, etc., reducing the usability advantage of the system. Some systems may also be affected by noise or noise reduction processing, making accuracy and robustness need to be improved. (3) The disadvantage of adversarial attack defense methods is that although noise processing-based methods effectively reduce the success rate of attacks, they cannot completely resist adversarial attacks.
[0036] To address the above problems, this invention provides a secure voiceprint authentication method based on hashing and feature transformation. This method is applied to... Figure 1The voiceprint authentication service system shown in the figure can solve the following technical problems: (1) Solve the problem of voiceprint privacy protection while ensuring authentication efficiency and accuracy: Although encryption and fuzzy extraction have a strong protective effect on voiceprint data, if the original voice data is not fuzzy extracted or multidimensional extracted, the computational overhead is large; if dimensionality reduction extraction is performed, the authentication accuracy will decrease. Therefore, it is necessary to design a lightweight and high-precision privacy protection method to protect the user's voiceprint information. (2) Solve the problem that voiceprint authentication services are vulnerable to replay attacks, spoofing attacks and adversarial attacks: It is difficult to integrate all the defense methods for a single problem into the same system, and they are easily interfered with by various external factors. Therefore, it is necessary to design defense methods based on voice so that they can comprehensively defend against various attacks. (3) Maintain good availability while providing privacy and security enhancements: For the security and privacy of voiceprint authentication, additional user operations, data collection and processing processes usually reduce system efficiency, increase deployment costs, increase user operation complexity, and ultimately reduce the availability of voiceprint authentication. Therefore, it is necessary to design an optimization process for voiceprint synthesis and updating to improve the availability of voiceprint authentication.
[0037] like Figure 1 As shown, the system comprises three main modules: a Relying Party (RP), a User Authentication Provider (UAP), and a User Agent (UA). The RP provides various network services to users and may include more than one network service provider. The RP also includes an authentication request module to handle user access and request authentication services from the UAP. The UAP can provide authentication services to multiple RPs and includes a database for storing user information, an authentication management module, and a security auxiliary module for protecting user data privacy and authentication security. Users access network services through the UA device. In addition to its user interface, the UA includes a microphone for collecting user data and a security auxiliary module for protecting raw user data and the authentication process.
[0038] For example, both UAP and RP can be deployed on an 11Gen Intel® Core™ i7-1185G7 3.00GHz CPU and a Microsoft Windows 10 operating system, acting as remote servers to provide access services to the user's UA device via the network. The UA, depending on the application scenario, is deployed on different terminal devices to obtain similar voiceprint authentication services.
[0039] Figure 2 This is a flowchart of a secure voiceprint authentication method based on hashing and feature transformation provided in an embodiment of the present invention, as shown below. Figure 2As shown, the method includes the following steps:
[0040] S101. The user equipment sends a request message to the service platform based on the user's operation.
[0041] Here, users can initiate a registration request (PRC) or an authentication request (PAC) to the RP through their own UA device. When initiating a PRC, the UA device can include the PRC and its own address (UA). add Both are sent to the RP; however, when initiating a PAC, the UA device can send the PRC and its own address (UA) together. add and UA id Send it to RP along with the other device.
[0042] S102. The service platform sends a request to the identity authentication server based on the request information.
[0043] Here, when a user registers with UAP, the UA device will use the PRC and its own address (UA). add When sent together to the RP, the RP will include the PRC and UA. add and its own service provider identifier RP id Together they are sent to UAP; when a user wants to access the RP service, the UA device will send the PRC and UA... add and UA id When sent together to the RP, the RP will include the PRC and UA. add UA id and RP id Both are sent to UAP, thus completing the sending of the request.
[0044] S103. When the request is an identity authentication request, the identity authentication server determines the first synthesized voiceprint and the first challenge code of the user device, sends the first challenge code to the user device, and processes the first synthesized voiceprint based on the hash parameters corresponding to the user device to obtain the authentication target voiceprint; the first challenge code represents the content of the voice data that needs to be input.
[0045] Here, when the request is an authentication request, UAP determines the UA based on the user agent. id Find UA id The system retrieves the synthesized voiceprint syn_v0 (the first synthesized voiceprint) registered by the User Agent (UA) from the user configuration file and uses the challenge code set {CC} stored within itself. B A string is randomly selected from the (first set of challenge codes), and a challenge code CC is generated based on the selected string. n (First Challenge Code) and CC nSend to UA; {CC} B It contains multiple strings. Next, UAP processes syn_v0 based on the counter n (hash parameter) corresponding to UA, generating c_v. n (Authenticating target voiceprint).
[0046] Specifically, the UAP reads the first audio data x and the first sampling rate fs from syn_v0, extracts voiceprint features based on the first audio data x and the first sampling rate fs, and obtains the first initial fundamental frequency feature f0, the first spectral envelope feature sp, and the first aperiodic feature ap; and performs hash processing on the first initial fundamental frequency feature f0 according to the counter n corresponding to UA to obtain the first fundamental frequency feature f0. n The target voiceprint c_v is obtained by performing speech synthesis processing on the first fundamental frequency feature f0, the first spectral envelope feature sp, the first aperiodic feature ap, and the first sampling rate fs using a preset speech synthesis function. n .
[0047] Specifically, the process of reading the first audio data x and the first sampling rate fs from syn_v0 can be represented as: x,fs=sf.read(syn_v0).
[0048] Specifically, the process of extracting voiceprint features based on the first audio data x and the first sampling rate fs to obtain the first initial fundamental frequency feature f0, the first spectral envelope feature sp, and the first aperiodic feature ap can be expressed as: f0,sp,ap=pw.wav2world(x,fs).
[0049] Specifically, the first initial fundamental frequency feature f0 is hashed based on the counter n corresponding to UA to obtain the first fundamental frequency feature f0. n The process can be represented as: f0 n =f0*(SHA256) n (f0)molα+β), where α and β are preset coefficients, and since the hash value SHA256 n (f0) is f0 n One of the coefficients, β, is highly random. To prevent the coefficient from being too large or too small, which would degrade the quality of the generated speech, it is necessary to limit its range. Specifically, when α = 0.8 and β = 0.4, the generated c_v... n The quality is the best.
[0050] Specifically, the first initial fundamental frequency feature f0, the first spectral envelope feature sp, the first aperiodic feature ap, and the first sampling rate fs are processed by a preset speech synthesis function to obtain the target voiceprint c_v for authentication. n The process can be represented as: c_v n=pw.synthesis(f0 n pw.synthesis(.) is a speech synthesis function.
[0051] UAP updates the authentication target voiceprint through speech synthesis. The parameters used for speech synthesis are determined by the registered syn_v0 and the continuous hash features of syn_v0. This design not only preserves the inherent features of the user's voiceprint, but also hides the user's original voiceprint very well, making it difficult to extract from the synthesized voice.
[0052] S104. The user equipment receives the first voice data in the first mode input by the user according to the first challenge code, generates a second synthesized voiceprint according to the first voice data and the second voice data in the second mode, converts the second synthesized voiceprint into a response voiceprint, and sends it to the identity authentication server.
[0053] Here, the first mode and the second mode are two different speech expression modes. For example, the first mode can be a normal speaking mode, such as when the user speaks normally; the second mode can be a special speaking mode, such as when the user sings, imitates a child's voice, or imitates a voice of another gender, etc.
[0054] Here, the UA can process the first voice data n_v in the first mode. n The second speech data s_v0 in the second mode is processed to obtain the second synthesized voiceprint syn_v n , where syn_v n =syn(s_v0,n_v) n Specifically, n_v can be read. n The audio data and sampling rate, i.e., x1,fs1 = sf.read(n_v n The code then reads the audio data and sampling rate of s_v0, i.e., y,fs2 = sf.read(s_v0). Next, it extracts the fundamental frequency features of s_v0 based on y,fs2, i.e., f02 = pw.dio(y,fs2). Finally, it extracts n_v based on x1,fs1. n The voiceprint features, i.e., f01,sp1,ap1 = pw.wav2world(x1,fs1), can finally be replaced by the fundamental frequency features of s_v0, which can replace the same part of the voiceprint features of n_v0, i.e., syn_v n =pw.synthesis(f02,sp1,ap1,fs1), thus obtaining the second synthesized voiceprint syn_v n .
[0055] Here, s_v0 can be the voice data that the UA inputs during the registration process with the UAP.
[0056] Here, the method for converting the second synthesized voiceprint into a response voiceprint can be as follows: determine the first fundamental frequency feature f0 based on the first synthesized voiceprint syn_v0 and the counter n corresponding to the user equipment. n According to the second synthesized voiceprint syn_v n The counter n corresponding to the user equipment determines the second sampling rate fs', the second spectral envelope feature sp', and the second aperiodic feature ap'; based on the first fundamental frequency feature f0 n The second spectral envelope feature sp', the second aperiodic feature ap', and the second sampling rate fs' are used for speech synthesis to obtain the response speaker res_v. n .
[0057] Specifically, based on the second synthesized voiceprint syn_v n The process of determining the second sampling rate fs', the second spectral envelope feature sp', and the second aperiodic feature ap' for the counter n corresponding to the user equipment can be expressed as: x',fs'=sf.read(syn_v n f0',sp',ap'=pw.wav2world(x',fs') where x' is the second audio data and f0' is the second initial fundamental frequency feature.
[0058] Specifically, based on the first fundamental frequency characteristic f0 n The second spectral envelope feature sp', the second aperiodic feature ap', and the second sampling rate fs' are used for speech synthesis to obtain the response speaker res_v. n The process can be represented as: res_v n =pw.synthesis(f0 n ,sp',ap',fs').
[0059] In other words, in step S104, the user responds to the CC initiated by the UAP. n The challenge, the key to which is to provide synthesized speech containing the correct content and possessing the same characteristics as the updated target voiceprint used for authentication. To do this, users need to use the challenge code CC. n The input authentication speech is further synthesized with the special pattern speech stored in its UA during the previous registration process to generate the synthesized speech syn_v for the nth authentication. n Next, the UA updated syn_v n The features are designed to match the target voiceprint updated on the UAP. Specifically, the UA extracts the features of syn_v0 previously stored on the UA and obtains a new feature value f0 through hashing. n Then, UA synthesizes f0 n and syn_vn Generate response voiceprint res_v n .
[0060] S105. The identity authentication server verifies the response voiceprint based on the target voiceprint and the first challenge code.
[0061] Specifically, UAP determines the response voiceprint res_v n Does the content relate to the first challenge code CC? n The representation content is consistent; when consistent, UAP determines the target voiceprint c_v for authentication. n Does the characteristic match the response voiceprint res_v? n The characteristics are consistent; when consistent, UAP determines the response voiceprint res_v. n Verification passed; otherwise, verification failed.
[0062] In other words, after receiving the voice reply from the User Agent (UA), the User App verifies the voice in two steps. First, it verifies whether the voice content matches the challenge code CC. n The content is consistent. Then, it is verified whether the speech feature matches the authentication target speech c_v. n Similar. Only if both of the above verification steps are successful can UAP determine that the user has been successfully verified and set the verification result to "1".
[0063] S106. When the verification is successful, the service platform provides services to the user equipment.
[0064] Specifically, if the verification is successful, the UAP sends a notification message to the RP, and the RP provides network services to the UA based on the notification message.
[0065] The above S103 to S106 describe the process by which UAP authenticates the identity of a UA that has been successfully registered. The following S201 to S203 describe the registration process of the UA with UAP before the UA is authenticated.
[0066] S201. When the request is a registration request, the identity authentication server determines the second challenge code based on the second challenge code set and sends the second challenge code to the user equipment.
[0067] Here, when the request is a registration request, UAP retrieves the challenge code set {CC}. A The system randomly selects a string to generate a challenge code CC0 (the second challenge code) and sends it to the UA.
[0068] S202, The user equipment receives the second voice data input by the user according to the second challenge code and the third voice data input in the first mode, generates a first synthesized voiceprint according to the second voice data and the third voice data, and sends the first synthesized voiceprint to the identity authentication server.
[0069] Specifically, the UA receives the second voice data s_v0 input by the user according to the challenge code CC0 and the third voice data n_v0 in the first mode input according to the challenge code CC0, processes s_v0 and n_v0 to obtain the first synthesized voiceprint syn_v0.
[0070] Here, the principle for generating syn_v0 is the same as that for generating syn_v described above. n The principle is the same.
[0071] S203. The identity authentication server completes the registration based on the first synthesized voiceprint and jointly determines the hash parameters corresponding to the user equipment with the user equipment.
[0072] Specifically, UAP determines whether the content of the first synthesized voiceprint syn_v0 is consistent with the content represented by the challenge code CC0, and if they are consistent, generates a user identifier UA for UA. id And user profile, including the user identifier (UA) id The first synthesized voiceprint syn_v0 and the service provider identifier RP id After association, the data is stored in the user configuration file. Then, UAP and UA jointly determine the counter n corresponding to UA, and UAP will include the user identifier UA. id The registration result is sent to the RP, and the RP forwards the registration result to the UA.
[0073] Here, different UAs correspond to different counters n, and the process by which UAP and UA jointly determine the counter n corresponding to UA can be referenced from typical key negotiation algorithms, such as the Diffie-Hellman algorithm.
[0074] The following two examples illustrate the authentication and registration processes described above.
[0075] like Figure 3 As shown, the UA registration process includes:
[0076] Step 1: The user initiates a PRC through the UA, and the UA sends the PRC along with the address UA. add Submit them together to RP.
[0077] Step 2: RP receives PRC and UA add Afterwards, along with his own identity RP id Package them together and forward them to UAP.
[0078] Step 3: UAP receives PRC and UA add and RP id Then, from the challenge code set {CC} A A random string is selected to generate a challenge code CC0 and sent to the UA.
[0079] Step 4: The user needs to input two voice segments based on the challenge code CC0: one is a voice segment n_v0 in normal speaking mode, and the other is a voice segment s_v0 in a special speaking mode (e.g., singing, imitating a child's or another gender's voice). The UA generates a synthesized voiceprint syn_v0 based on n_v0 and s_v0 and sends it to the UAP, and the UA stores n_v0 and s_v0.
[0080] Step 5: UAP checks the received syn_v0 to ensure the user correctly duplicates CC0, and checks if the UA is a new user of RP; then, UAP creates a user profile and UA for that UA. id and syn_v0, UA id and RP id After association, it is stored in the user's configuration file.
[0081] Step 6: UAP and UA negotiate a security parameter using the Diffie-Hellman algorithm, such as a counter n, which is used to control the hashing time.
[0082] Step 7: UAP will include UA id The registration result is sent to the RP, which then forwards it to the UA. At this point, the UA's registration is complete.
[0083] like Figure 4 As shown, UA's authentication process includes:
[0084] Step 1: When a user wants to access the RP service, the user sends a PAC to the RP through their UA.
[0085] Step 2: RP connects PAC and UA add UA id and RP id Please forward this to UAP.
[0086] Step 3: UAP receives PAC and UA add UA id and RP id Then, identity authentication is processed. First, based on the User Agent (UA)... id Locating user profile (user profile) UA p and from the challenge code set {CC} B Randomly select a string to generate a challenge code CC n , where {CC} A ≠{CC} B .
[0087] Step 4: UAP from UA pExtract the registered voiceprint syn_v0, and generate the authentication target voiceprint c_v based on syn_v0. n .
[0088] Step 5: Users need to use the challenge code CC n Input a voice message in normal speaking mode (n_v) n Then UA based on n_v n Generate synthesized voiceprint syn_v with s_v0 n Then syn_v n Convert to responsive speech res_v n And will respond with voice res_v n Send to UAP for verification.
[0089] Step 6: UAP checks the received response voice res_v n To confirm its content is consistent with CC. n Consistent, and confirming the response voice res_v n Features and authentication target voiceprint c_v n Matching yields the authentication result (UA). ar .
[0090] Step 7: UAP will send the authentication result UA ar Notify the RP when the UA ar If authentication is successful, the RP provides network services to the UA.
[0091] The present invention has the following beneficial technical effects:
[0092] 1. Privacy-preserving voiceprint authentication service. This invention achieves privacy protection for the original voiceprint through voiceprint synthesis, preventing attackers from obtaining the user's voiceprint through user information stored in the cloud server and using it to carry out malicious acts.
[0093] 2. Secure voiceprint authentication service. This invention uses hash-based target voiceprint feature transformation to defend against replay attacks and adversarial attacks, overcoming the shortcomings of existing methods such as high deployment costs, complex operation processes, and the ability to defend only against single security risks.
[0094] 3. High-availability voiceprint authentication service. This invention protects user voiceprint privacy by employing voiceprint synthesis and authentication target updating methods. During the authentication phase, only a normal speaking voice input is required, making it simpler to operate than existing methods while ensuring privacy, security, accuracy, and efficiency.
[0095] The following two examples regarding User Agents (UAs) illustrate the application scenarios of the method provided by this invention:
[0096] Example 1
[0097] User Agent (UA) is a web-based application. When users need to log in to their accounts on a webpage, they can directly obtain a privacy-protected and secure identity authentication service via the network. The process consists of two parts: 1. User registration and parameter initialization: The user inputs voice, and UA generates synthesized speech locally, sends it to UAP for registration, and negotiates relevant parameters; 2. User authentication: The user initiates an identity authentication request, UAP presents an authentication challenge to the user, who needs to input voice as required, and UA generates a response speech to the challenge. Detailed process: Figure 5 As shown.
[0098] Voiceprint registration and initialization phase: During the registration process in this embodiment, the user is required to provide voice in two different speaking modes: normal mode voice and special mode voice. The UA processes the user's voice, and the resulting synthesized voice is uploaded to the UAP for user identity registration, serving as the basis for the target voiceprint in future user authentication. Simultaneously, the UA and UAP negotiate the value of counter n using the Diffie-Hellman algorithm. The UA and UAP store the necessary information for later use.
[0099] Voiceprint Authentication: Voiceprint authentication in this embodiment is a challenge-response process. In each authentication process, the features of the registered synthetic voiceprint are continuously hashed, and the hash result is combined with the original features of the registered synthetic voiceprint to generate a target voiceprint with new features for authentication. The target voiceprint is used on the UAP to determine whether the user meets the criteria during user voiceprint authentication. By updating the target voiceprint used for authentication, replay attacks and spoofing attacks can be resisted. Updating the target voiceprint used for authentication also makes it difficult for attackers to launch adversarial attacks through iterative attempts. Accordingly, to pass authentication, the user holding the UA must provide the correct voice response in normal voice mode by tracking the authentication challenge randomly generated by the UAP. It is worth noting that feature transformation and random voice content challenges effectively resist replay attacks in each authentication. The UA then synthesizes the user response and the previously saved special pattern voice to form a synthetic voice. Similar to the UAP, the UA further transforms the user's input voiceprint by continuously hashing the synthetic voiceprint features stored at registration and combining the result of the continuous hash with the synthesized voice to generate a response voiceprint sent to the UAP for verification. Figure 6 As shown, this is how a user responds to a challenge initiated by the UAP on a webpage (i.e., the AP). The user needs to repeat the challenge code, which is then processed by the UA and sent to the UAP for verification. Only when the voiceprint verification is successful can the user successfully log in to their account and obtain access to the RP.
[0100] Example 2
[0101] User Agent (UA) is an application based on mobile devices. When a user registers an account, verifies their identity, and obtains network services through an application installed on a mobile device (e.g., a smartphone), embodiments of the present invention can provide the user with a privacy-protected and secure voiceprint authentication service. The user's voiceprint registration and authentication process in this embodiment is similar to that in Embodiment 1, as described above. Figure 5 As shown. Figure 7 The image shows the login process where a user performs voiceprint authentication before accessing network services through a mobile application. The challenge code is "Nice too meet you".
[0102] The present invention differs from the existing solution "A method for protecting privacy-preserving biometric information used in voiceprint authentication" in the following three main aspects: 1. The speech synthesis algorithm used in the present invention for privacy protection differs from that of the existing solution. The speech synthesis algorithm used in the present invention is more lightweight and efficient, and the quality of the synthesized speech generated is higher. 2. The present invention achieves security enhancement based on hashing and feature transformation, which can prevent various attacks that may be encountered in voiceprint authentication, something that the existing solution cannot achieve. 3. The operation process of the present invention differs from that of the existing solution. For the system, it increases the workflow required for updating the authentication target, while for the user, it simplifies the authentication operation process. Overall, the present invention has higher usability.
[0103] The above description, in conjunction with specific preferred embodiments, provides a further detailed explanation of the present invention. It should not be construed that the specific implementation of the present invention is limited to these descriptions. For those skilled in the art, various simple deductions or substitutions can be made without departing from the concept of the present invention, and all such modifications and substitutions should be considered within the scope of protection of the present invention.
Claims
1. A secure voiceprint authentication method based on hash and feature transformation, characterized in that, include: The user equipment sends request information to the service platform based on user operations; The service platform sends a request to the identity authentication server based on the request information; When the request is an identity authentication request, the identity authentication server determines the first synthetic voiceprint and the first challenge code of the user equipment, sends the first challenge code to the user equipment, and processes the first synthetic voiceprint based on the hash parameters corresponding to the user equipment to obtain the authentication target voiceprint. The first challenge code represents the content of the voice data that needs to be input; The user equipment receives first voice data in a first mode input by the user according to the first challenge code, generates a second synthesized voiceprint based on the first voice data and second voice data in a second mode, converts the second synthesized voiceprint into a response voiceprint using the hash parameters corresponding to the user equipment, and sends it to the identity authentication server; the second voice data in the second mode is data pre-stored in the user equipment before the user equipment sends the identity authentication request to its service platform, and the second voice data in the second mode is voice data input by the user equipment according to the second challenge code sent by the identity authentication server when registering with the identity authentication server; The identity authentication server verifies the response voiceprint based on the target voiceprint and the first challenge code; When the verification is successful, the service platform provides services to the user equipment; The step of processing the first synthesized voiceprint based on the hash parameters corresponding to the user equipment to obtain the authentication target voiceprint includes: Read the first audio data and the first sampling rate from the first synthesized voiceprint; Based on the first audio data and the first sampling rate, voiceprint features are extracted to obtain the first initial fundamental frequency feature, the first spectral envelope feature, and the first non-periodic feature; The first initial baseband feature is hashed according to the hash parameters corresponding to the user equipment to obtain the first baseband feature; the hash parameters are used to control the number of hash operations. The target voiceprint for authentication is obtained by performing speech synthesis processing based on the first fundamental frequency feature, the first spectral envelope feature, the first aperiodic feature, and the first sampling rate. 2.The secure voiceprint authentication method based on hash and feature transformation of claim 1, wherein, The step of performing speech synthesis processing based on the first fundamental frequency feature, the first spectral envelope feature, the first aperiodic feature, and the first sampling rate to obtain the authentication target voiceprint includes: The authentication target voiceprint is obtained by performing speech synthesis processing on the first fundamental frequency feature, the first spectral envelope feature, the first non-periodic feature and the first sampling rate using a preset speech synthesis function. 3.The hash and feature transform based secure voiceprint authentication method of claim 1, wherein, The step of converting the second synthesized voiceprint into a response voiceprint using the hash parameters corresponding to the user equipment includes: The first fundamental frequency feature is determined based on the first synthesized voiceprint and the hash parameters corresponding to the user equipment; Based on the second synthesized voiceprint and the hash parameters corresponding to the user equipment, the second sampling rate, the second spectral envelope feature, and the second aperiodic feature are determined. The response voiceprint is obtained by performing speech synthesis processing based on the first fundamental frequency feature, the second spectral envelope feature, the second aperiodic feature, and the second sampling rate. 4.The hash and feature transform based secure voiceprint authentication method of claim 1, wherein, The request includes the user identifier of the user equipment; the identity authentication server determines the first synthetic voiceprint and first challenge code of the user equipment, including: The identity authentication server locates the user configuration file containing the user identifier based on the user identifier. The identity authentication server obtains the first synthesized voiceprint, which is pre-stored, from the user configuration file; The identity authentication server randomly selects a string from the first challenge code set and generates the first challenge code based on the selected string; the first challenge code set includes multiple strings.
5. The secure voiceprint authentication method based on hashing and feature transformation according to claim 1, characterized in that, The identity authentication server verifies the response voiceprint based on the target voiceprint and the first challenge code, including: The identity authentication server determines whether the content of the response voiceprint is consistent with the content represented by the first challenge code; When they match, the identity authentication server determines whether the features of the authentication target voiceprint are consistent with the features of the response voiceprint. When the responses match, the identity authentication server determines that the voiceprint verification of the response has passed.
6. The secure voiceprint authentication method based on hashing and feature transformation according to claim 1, characterized in that, The method further includes: When the request is a registration request, the identity authentication server determines the second challenge code based on the second challenge code set and sends the second challenge code to the user equipment; The user equipment receives the second voice data input by the user according to the second challenge code and the third voice data input in the first mode, generates the first synthesized voiceprint according to the second voice data and the third voice data, and sends the first synthesized voiceprint to the identity authentication server. The identity authentication server completes registration based on the first synthesized voiceprint and jointly determines the hash parameter corresponding to the user equipment with the user equipment.
7. The secure voiceprint authentication method based on hashing and feature transformation according to claim 6, characterized in that, The request includes the service provider identifier of the service platform; the identity authentication server completes registration based on the first synthesized voiceprint, and jointly determines the hash parameter corresponding to the user equipment with the user equipment, including: The identity authentication server determines whether the content of the first synthesized voiceprint is consistent with the content represented by the second challenge code, and if they are consistent, generates a user identifier and a user configuration file for the user equipment. The identity authentication server associates the user identifier, the first synthesized voiceprint, and the service provider identifier and stores them in the user configuration file; The identity authentication server and the user equipment jointly determine the hash parameters corresponding to the user equipment; The identity authentication server sends the registration result containing the user identifier to the service platform; The service platform forwards the registration result to the user's device.
8. The secure voiceprint authentication method based on hashing and feature transformation according to claim 1, characterized in that, The first mode and the second mode are two different speech expression modes.