A living body detection method and device based on a vibration signal and a storage medium
By generating random vibration signals through the vibration motor of the terminal device and combining them with signals collected by sensors, and using high-pass filtering and Fourier transform analysis, the problem of AI-synthesized videos being difficult to prevent is solved, achieving highly secure and accurate liveness detection.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- SICHUAN XW BANK CO LTD
- Filing Date
- 2023-05-31
- Publication Date
- 2026-06-19
AI Technical Summary
Existing liveness detection technologies are easily bypassed by AI-synthesized videos, making it difficult to effectively distinguish between real devices and real-time recorded videos, especially when faced with features like borderless or moiré patterns that indicate re-enactment.
Random vibration signals are generated by the vibration motor of the terminal device. The vibration signals are collected by the accelerometer and the audio signals are collected by the microphone. High-pass filtering, Fourier transform and cosine similarity analysis are used to determine whether the video is real.
It improves the security and accuracy of liveness detection, and can simultaneously detect the authenticity of the device and the real-time recording, providing a good user experience without requiring active user cooperation.
Smart Images

Figure CN116682181B_ABST
Abstract
Description
Technical Field
[0001] This invention belongs to the field of facial recognition technology, specifically relating to a liveness detection method, device, and storage medium based on vibration signals. Background Technology
[0002] In the process of conducting online business, in order to protect the security of customers' information, funds, and privacy, it is generally necessary to complete the online identity authentication of customers through video images using liveness detection and facial comparison technology. Liveness detection is to ensure that the video images collected are taken by real people using real devices in real time, and facial comparison collects facial features that match the facial features of the customer.
[0003] In existing technologies, the security of liveness detection primarily depends on whether the device is genuine, whether the recording is real-time, and whether the face is real. Most current solutions use device fingerprinting technology to verify device authenticity and identify risks. They introduce random factors to detect real-time recording, such as requiring customers to perform random actions or recite random number sequences. The authenticity of the face is determined by detecting whether it's a copied image, mask, composite, or compared with other faces. However, device fingerprinting technology struggles to detect all emulator software, device modification methods, and tools. With the continuous maturation of AI synthesis technology, it's now possible to generate extremely realistic human images and drive characters to perform specified random actions. While existing liveness detection technologies offer good protection against common attacks such as masks, printed photos, and screen captures, video injection attacks can easily breach these technologies. The high realism of AI-synthesized images, the absence of borders or moiré patterns in injected videos, and the complete indistinguishability from real-time video captured by a physical camera make existing liveness detection technologies vulnerable. Summary of the Invention
[0004] In view of this, the present invention provides a liveness detection method, device and storage medium based on vibration signals to solve the problem that existing AI-synthesized human images are highly realistic, and the injected video has no borders, moiré patterns and other copy features, and is no different from the video captured in real time by a physical camera, making it easy to bypass existing liveness detection technologies.
[0005] The technical solution adopted in this invention is as follows:
[0006] A liveness detection method based on vibration signals, comprising:
[0007] Step 1: Open the front camera to take a portrait and enter the liveness detection process;
[0008] Face detection technology is used to determine whether a complete face has been captured and whether the face meets the detection requirements.
[0009] After the terminal device detects a face, the vibration motor of the terminal device starts and randomly generates a vibration control signal S; while the vibration motor vibrates, the camera of the terminal device records the user's live video, and the accelerometer of the terminal device collects the vibration signal D of the terminal device.
[0010] The vibration signal D is the superposition of the noise signal generated by the external environment and the vibration signal generated by the vibration motor according to the vibration control signal S.
[0011] Specifically: The vibration control signal S is a vibration signal synthesized based on n random frequencies. The frequencies should avoid noise signal frequency ranges as much as possible, such as the shaking frequency of a person holding a mobile phone or the frequency of external ambient sound.
[0012] For example, considering that users inevitably experience additional vibrations when holding their phones for liveness detection, but these vibrations are all low-frequency, the generated vibration control signal S must consist of signals above 100 Hz to facilitate subsequent filtering of the acquired signals and more effectively analyze the detected feedback signals. n = 3 frequencies are extracted to form a frequency sequence L1, L1 = [f1, f2, f3], where f1 = 110 Hz, f2 = 130 Hz, and f3 = 140 Hz, constituting the signal.
[0013] S=sin(2πtf1)+sin(2πtf2)+sin(2πtf3).
[0014] Step 2: Analyze and compare the vibration signal D and the vibration control signal S to determine whether the live video is a real live video.
[0015] Step 2 specifically includes the following steps:
[0016] Step A: Obtain the frequency sequence L1 corresponding to the vibration signal S, with a sequence length of n;
[0017] Step B: Process the vibration data D using a high-pass filter function to filter out frequencies below the cutoff frequency f. c The low-frequency vibration signal is used to obtain the vibration signal D. h ;
[0018] The high-pass filter formula is as follows:
[0019]
[0020] Where D(τ) is the input signal, D h h(t) is the output signal, h(t) is the filter impulse response function, and * indicates convolution operation;
[0021] For example, the cutoff frequency f cSet to 100Hz. A Butterworth high-pass filter is used to filter out low-frequency signals; its impulse response function h(t) is as follows:
[0022]
[0023] Step C: For D h Perform a Fourier transform to obtain D h The frequency domain signal F is used to obtain the frequency sequence L2.
[0024] Specifically, the formula for the Fourier Transform (FT) is as follows:
[0025]
[0026] Find the frequencies corresponding to the top 2*n peak values with the largest modulus in F, select n positive frequencies from them, and sort them in ascending order to obtain the frequency sequence L2. For example, L1 = [f]. o1 ,f o2 ,f o3 Specifically, L2 = [110Hz, 130Hz, 140Hz].
[0027] Step D: Calculate the cosine similarity between L1 and L2 to obtain the similarity P. Based on the similarity P, determine whether the liveness video is a real liveness video.
[0028] The formula for calculating the cosine similarity is as follows:
[0029]
[0030] If P is greater than the specified threshold, the collected vibration signal is considered to be consistent with the generated vibration signal, and the video is considered to be captured in real time; otherwise, it is considered to be inconsistent.
[0031] It should be noted that, because the vibration motor and microphone are close together and located within the same unit, the characteristics of the acquired audio signal are obvious. To further improve the accuracy of vibration signal detection, optionally, the actual vibration signal can be further analyzed and compared to see if it matches the vibration control signal S, thereby enhancing accuracy. In step 1, while the vibration motor is vibrating, the microphone of the terminal device also collects the audio signal generated by the vibration.
[0032] Step 2 further includes filtering out high-frequency noise signals from the signal and retaining signals below the cutoff frequency f. c2 The low-frequency portion yields signal D. l (t),
[0033] D l (t)=D(t)-D h (t)
[0034] The filtered audio signal D l (t) is analyzed and compared with the vibration control signal S. If the comparison results are consistent, it is a real live video. If they are inconsistent, the test fails. This achieves the effect of further determining whether the live video is a real live video.
[0035] The analysis and processing of audio signals is similar to the analysis process of accelerometer sensor signals. Optionally, since the ambient sound environment in which the user is located is generally rich, and there are many high-frequency signals in the ambient sound, a low-pass filter can be used first to filter out the high-frequency components, thereby improving the accuracy of vibration frequency analysis of the vibration motor. For example, the cutoff frequency can be set to 200 Hz, that is, audio signals higher than 200 Hz are filtered out first.
[0036] A device for improving the safety of liveness detection based on random vibration signals, comprising:
[0037] The processor, as well as the camera, sensors, microphone, and vibration motor located on the mobile terminal;
[0038] The camera is used to collect the user's liveness video information;
[0039] The vibration motor is used to generate a vibration control signal S;
[0040] The sensor is used to collect the vibration signal D from the vibration motor;
[0041] The microphone is used to collect audio signals from the vibration motor;
[0042] The processor is used to analyze and compare the vibration signal D and the vibration signal S to determine whether the live video is a real live video.
[0043] A computer-readable storage medium storing at least one piece of program code, the at least one piece of program code being loaded and executed by a processor to implement a liveness detection method based on vibration signals.
[0044] In summary, due to the adoption of the above technical solution, the beneficial effects of the present invention are:
[0045] 1. In this invention, random signals are emitted by the terminal device hardware - vibration motor, and the signals are collected by the terminal device's sensors. The security of liveness detection is improved by analyzing, extracting and comparing the spectral information of the random signals from the collected signals. It has the ability to simultaneously detect whether it is a real device and whether it is recording in real time. It does not require active cooperation from the user, and has good security and user experience.
[0046] 2. In this invention, sound emitted by motor vibration is collected via a microphone and used as part of the audio input for liveness detection. During liveness verification, the sound frequency is extracted and its consistency with the motor vibration frequency is determined. This assists in liveness detection and improves its security. Attached Figure Description
[0047] The present invention will be described by way of example and with reference to the accompanying drawings, wherein:
[0048] Figure 1 This is a flowchart of the liveness detection process of the present invention;
[0049] Figure 2 This is a schematic diagram of the vibration signal management module of the present invention;
[0050] Figure 3 This is a flowchart of the vibration frequency analysis of the present invention;
[0051] Figure 4 This is an example diagram of the specified vibration control signal S of the present invention;
[0052] Figure 5 This is a schematic diagram of the vibration signal D collected by the present invention;
[0053] Figure 6 The high-pass filtered signal D of this invention h Schematic diagram
[0054] Figure 7 This is a schematic diagram of the frequency domain signal F of the present invention. Detailed Implementation
[0055] To make the objectives, technical solutions, and advantages of the embodiments of the present invention clearer, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of the present invention, and not all embodiments. The components of the embodiments of the present invention described and shown in the accompanying drawings can generally be arranged and designed in various different configurations.
[0056] Therefore, the following detailed description of the embodiments of the invention provided in the accompanying drawings is not intended to limit the scope of the claimed invention, but merely to illustrate selected embodiments of the invention. All other embodiments obtained by those skilled in the art based on the embodiments of the invention without inventive effort are within the scope of protection of the invention.
[0057] It should be noted that, unless otherwise specified, the embodiments and features described in this invention can be combined with each other.
[0058] It should be noted that similar labels and letters in the following figures indicate similar items. Therefore, once an item is defined in one figure, it does not need to be further defined and explained in subsequent figures.
[0059] In this invention, unless otherwise explicitly specified and limited, "above" or "below" the second feature can include direct contact between the first and second features, or contact between the first and second features through another feature between them. Furthermore, "above," "over," and "on top" of the second feature includes the first feature directly above or diagonally above the second feature, or simply indicates that the first feature is at a higher horizontal level than the second feature. "Below," "below," and "under" the second feature includes the first feature directly below or diagonally below the second feature, or simply indicates that the first feature is at a lower horizontal level than the second feature.
[0060] It should be noted that, unless otherwise specified, the embodiments and features described in this invention can be combined with each other.
[0061] Example 1
[0062] like Figure 1-6 As shown in the figure, an embodiment of the present invention discloses a liveness detection method based on vibration signals, comprising:
[0063] Step 1: Open the front camera to take a portrait and enter the liveness detection process;
[0064] Face detection technology is used to determine whether a complete face has been captured and whether the face meets the detection requirements.
[0065] After the terminal device detects a face, the vibration motor of the terminal device starts and randomly generates a vibration control signal S; while the vibration motor vibrates, the camera of the terminal device records the user's live video, and the accelerometer of the terminal device collects the vibration signal D of the terminal device.
[0066] The vibration signal D is the superposition of the noise signal generated by the external environment and the vibration signal generated by the vibration motor according to the vibration control signal S.
[0067] Specifically: The vibration control signal S is a vibration signal synthesized based on n random frequencies. The frequencies should avoid noise signal frequency ranges as much as possible, such as the shaking frequency of a person holding a mobile phone or the frequency of external ambient sound.
[0068] For example, considering that users inevitably experience additional vibrations when holding their phones for liveness detection, but these vibrations are all low-frequency, the generated vibration control signal S must consist of signals above 100 Hz to facilitate subsequent filtering of the acquired signals and more effectively analyze the detected feedback signals. n = 3 frequencies are extracted to form a frequency sequence L1, L1 = [f1, f2, f3], where f1 = 110 Hz, f2 = 130 Hz, and f3 = 140 Hz, constituting the control signal.
[0069] S=sin(2πtf1)+sin(2πtf2)+sin(2πtf3)
[0070] Control signal S such as Figure 4 As shown
[0071] Step 2: Analyze and compare the vibration signal D and the vibration control signal S to determine whether the live video is a real live video.
[0072] Step 2 specifically includes the following steps:
[0073] Step A: Process the vibration data D based on the high-pass filter function (the acquired vibration signal D is as follows). Figure 5 As shown), filters out frequencies lower than the cutoff frequency f. c The low-frequency vibration signal is used to obtain the vibration signal D. h ;
[0074] The high-pass filter formula is as follows:
[0075]
[0076] Where D(τ) is the input signal, D h h(t) is the output signal, h(t) is the filter impulse response function, and * indicates convolution operation;
[0077] For example, the cutoff frequency f c Set to 100Hz. A Butterworth high-pass filter is used to filter out low-frequency signals; its impulse response function h(t) is as follows:
[0078]
[0079] The signal D obtained after high-pass filtering h (t) such as Figure 6 As shown
[0080] Step B: For D h Perform a Fourier transform to obtain D h The frequency domain signal F is used to obtain the frequency sequence L2.
[0081] Specifically, the formula for the Fourier Transform (FT) is as follows:
[0082]
[0083] Frequency domain signal F such Figure 7 As shown
[0084] Find the frequencies corresponding to the top 2*n peak values with the largest modulus in F, select n positive frequencies from them, and sort them in ascending order to obtain the frequency sequence L2. For example, L1 = [f]. o1 ,f o2 ,f o3 Specifically, L2 = [110Hz, 130Hz, 140Hz].
[0085] Step C: Calculate the cosine similarity between L1 and L2 to obtain the similarity P. Based on the similarity P, determine whether the liveness video is a real liveness video.
[0086] The formula for calculating the cosine similarity is as follows:
[0087]
[0088] If P is greater than the specified threshold, the collected vibration signal is considered to be consistent with the generated vibration signal, and the video is considered to be captured in real time; otherwise, it is considered to be inconsistent.
[0089] It should be noted that, due to the close proximity of the vibration motor and microphone, and their integration within the same unit, the acquired audio signal characteristics are distinct. To further improve the accuracy of vibration signal detection, optionally, the actual vibration signal can be further analyzed and compared to see if it matches the vibration control signal S, thereby enhancing accuracy. In step 1, while the vibration motor vibrates, the microphone of the terminal device also collects the audio signal generated by the vibration; step 2 further includes filtering out high-frequency noise signals from the signal, retaining those below the cutoff frequency f. c2 The low-frequency portion yields signal D. l (t),
[0090] D l (t)=D(t)-D h (t)
[0091] The filtered audio signal D l (t) is analyzed and compared with the vibration control signal S. If the comparison results are consistent, it is a real live video. If they are inconsistent, the test fails. This achieves the effect of further determining whether the live video is a real live video.
[0092] The analysis and processing of audio signals is similar to the analysis process of accelerometer sensor signals. Optionally, since the ambient sound environment in which the user is located is generally rich, and there are many high-frequency signals in the ambient sound, a low-pass filter can be used first to filter out the high-frequency components, thereby improving the accuracy of vibration frequency analysis of the vibration motor. For example, the cutoff frequency can be set to 200 Hz, that is, audio signals higher than 200 Hz are filtered out first.
[0093] Example 2
[0094] This embodiment proposes a device for improving the safety of liveness detection based on random vibration signals, including:
[0095] The processor, as well as the camera, sensors, microphone, and vibration motor located on the mobile terminal;
[0096] The camera is used to collect the user's liveness video information;
[0097] The vibration motor is used to generate a vibration signal S;
[0098] The sensor is used to collect the vibration signal D from the vibration motor;
[0099] The microphone is used to collect audio signals from the vibration motor;
[0100] The processor is used to analyze and compare the vibration signal D and the vibration signal S to determine whether the live video is a real live video.
[0101] Example 3
[0102] This embodiment proposes a computer-readable storage medium storing at least one piece of program code, which is loaded and executed by a processor to implement a liveness detection method based on vibration signals.
[0103] The various embodiments in this specification are described in a progressive manner, with each embodiment focusing on the differences from other embodiments. The same or similar parts between the various embodiments can be referred to each other.
[0104] The above description of the disclosed embodiments enables those skilled in the art to make or use the invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the general principles defined herein may be implemented in other embodiments without departing from the spirit or scope of the invention. Therefore, the invention is not to be limited to the embodiments shown herein, but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.
Claims
1. A liveness detection method based on vibration signals, characterized in that, include: Step 1: After the terminal device detects a face, it randomly generates a vibration control signal S. The terminal device then controls the vibration motor to vibrate according to the vibration control signal S. While the vibration motor is vibrating, the camera of the terminal device records the user's live video, and the sensor of the terminal device collects the vibration signal D generated by the vibration of the terminal device. Step 2: Analyze and compare the vibration signal D and the vibration control signal S to determine whether the live video is a real live video. The vibration control signal S is a vibration signal synthesized based on n random frequencies, which avoid the frequency range of noise signals; The vibration signal D is the superposition of the noise signal generated by the external environment and the vibration signal generated by the vibration motor according to the vibration control signal S. Step 2 specifically includes the following steps: Step A: Obtain the frequency sequence L1 corresponding to the vibration signal S, with a sequence length of n; Step B: set the cutoff frequency f based on the frequency range where L1 is located c , and process the vibration signal D based on the filter function, filter out the vibration signal D not in the cutoff frequency f c range, and obtain the vibration signal D h ; Step C: on D h performing Fourier transform to obtain a frequency domain signal F of D h obtaining a frequency sequence L2 based on the frequency domain signal F Step D: Calculate the cosine similarity between L1 and L2 to obtain the similarity P. Based on the similarity P, determine whether the liveness video is a real liveness video. In step 1, while the vibration motor is vibrating, the microphone of the terminal device also collects the audio signal generated by the vibration. Step 2 further includes analyzing and comparing the audio signal with the vibration control signal S to further determine whether the live video is a real live video.
2. The method for liveness detection based on vibration signals according to claim 1, characterized in that, In step B, if the frequency range of L1 is higher than f c Then, a high-pass filter function is used to filter the signal D. The high-pass filter formula is as follows: ; in It is the input signal. For the output signal, h(t) is the impulse response function of the high-pass filter, and * denotes convolution operation; If the frequency range of L1 is lower than f c Then it is necessary to obtain the low-frequency part of the signal. 。 3. The method for liveness detection based on vibration signals according to claim 2, characterized in that, In step C, the formula for the Fourier transform is as follows: ; in, It is the spectrum of the output signal. It is frequency; Find the frequencies corresponding to the top 2*n peaks with the largest modulus in F, take the n positive frequencies, sort them in ascending order to obtain the frequency sequence L2. The length of L2 is the same as the length of L1, which is n.
4. The method for liveness detection based on vibration signals according to claim 1, characterized in that, In step D, the formula for calculating the cosine similarity is as follows: ; Among them, f i For elements in L1, f oi Let k be an element in L2, and k be equal to the length n of the frequency sequences L1 and L2.
5. A device for improving the safety of liveness detection based on random vibration signals, used to implement the liveness detection method based on vibration signals as described in any one of claims 1 to 4, characterized in that, include: The processor, as well as the camera, sensors, microphone, and vibration motor located on the mobile terminal; The camera is used to collect the user's liveness video information; The vibration motor is used to generate a vibration control signal S; The sensor is used to collect the vibration signal D from the vibration motor; The microphone is used to collect audio signals from the vibration motor; The processor is used to analyze and compare the vibration signal D, the audio signal, and the vibration signal S to determine whether the live video is a real live video.
6. A computer-readable storage medium, characterized in that, The computer-readable storage medium stores at least one piece of program code, which is loaded and executed by a processor to implement a vibration signal-based liveness detection method as described in any one of claims 1 to 4.