Main wheat selection method, device, equipment and storage medium
By combining user location data and audio signal parameters to select the main microphone, the reliability and real-time issues of main microphone selection in recording scenarios are solved, thus improving the recording effect.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- GUANGZHOU KINDLINK INTELLIGENT TECHNOLOGY CO LTD
- Filing Date
- 2024-12-17
- Publication Date
- 2026-06-19
AI Technical Summary
In recording and broadcasting scenarios, existing technologies for selecting the appropriate master microphone from multiple microphones to improve recording quality are susceptible to interference and computationally complex, making it difficult to meet real-time requirements.
By acquiring user location data and combining it with audio signal parameters collected by the microphone, a weighted summation and Kalman filtering process is used to select the microphone that is closer to the user and has better recording quality as the main microphone.
It improves the reliability of main microphone selection and recording quality, reduces computational complexity, and meets real-time requirements.
Smart Images

Figure CN122248301A_ABST
Abstract
Description
Technical Field
[0001] This application relates to the field of recording and broadcasting, and in particular to methods, apparatus, equipment and storage media for selecting a master microphone. Background Technology
[0002] In recording and broadcasting scenarios, multi-microphone fusion technology is typically used to integrate and process audio signals captured by multiple microphones to improve the audio quality and recording effect. Multiple microphones in a recording and broadcasting scenario are usually deployed in different locations in the space to increase their pickup range.
[0003] In some recording scenarios, users move around in the space and need to select one microphone from multiple microphones as the primary sound source, while using the others as secondary sound sources. The primary microphone is the main microphone, and its selection affects the final recording quality. Therefore, how to select the main microphone from multiple microphones has become a pressing technical problem that needs to be solved. Summary of the Invention
[0004] This application provides a method, apparatus, device, and storage medium for selecting a master microphone from multiple microphones.
[0005] Firstly, a main microphone selection method is provided, applied to a recording host in a recording space, wherein multiple microphones are deployed in the recording space, with different microphones deployed at different locations within the recording space, and the recording host is connected to the multiple microphones; the method includes:
[0006] Acquire user location data, which is used to reflect the location of the target user in the recording space;
[0007] By combining the audio signal parameters collected by the multiple microphones and the user location data, a main microphone is selected from the multiple microphones.
[0008] In this technical solution, user positioning data reflecting the target user's location within the recording space is acquired. This data is then combined with audio signal parameters collected by multiple microphones within the recording space to select a primary microphone from among the various microphones. Since the audio signal parameters collected by the microphones reflect their recording quality, and the user positioning data reflects the user's distance from the microphone, selecting a primary microphone by combining audio signal parameters from multiple microphones and user positioning data is equivalent to selecting a primary microphone using data from multiple dimensions. This enhances the reliability of microphone selection and helps choose a microphone closer to the user with better recording quality as the primary microphone, thereby improving the recording effect.
[0009] In conjunction with the first aspect, in one possible implementation, a camera is also deployed in the recording space, and a wireless receiving unit is provided in the recording host; the acquisition of user positioning data includes: acquiring target received signal strength and target image, wherein the target received signal strength is the received signal strength of a target wireless signal received by the wireless receiving unit, the target wireless signal is a wireless signal transmitted by a wireless transmitting device carried by the target user, and the target image is an image of the recording space captured by the camera, the target image containing the target user; and the user positioning data is determined based on the target received signal strength and the target image.
[0010] By combining images captured by the camera with the received signal strength, the user's location data in the recording space can be determined. This allows for user location to be achieved using existing hardware in the recording and broadcasting scenario, without the need for additional hardware, thus saving hardware costs.
[0011] In conjunction with the first aspect, in one possible implementation, the plurality of microphones are arranged in a matrix in the recording space; determining the user positioning data based on the target received signal strength and the target image includes: determining a first coordinate of the target user in a horizontal coordinate system based on the image area where the target user is located in the target image, wherein the first coordinate is a coordinate in a preset direction; and determining a second coordinate of the target user in the horizontal coordinate system based on the first coordinate and the target distance, wherein the second coordinate is a coordinate in a direction perpendicular to the preset direction.
[0012] The distance between the user and the recording host is determined by the received signal strength. Then, the user's coordinates in one direction of the horizontal coordinate system are determined based on the image. Finally, the user's coordinates in another direction of the horizontal coordinate system are determined based on the distance between the user and the recording host and the already determined position coordinates. This method can achieve user positioning on the horizontal plane. The calculation method is simple and can save the computing power required for user positioning.
[0013] In conjunction with the first aspect, in one possible implementation, selecting a master microphone from the plurality of microphones by combining the audio signal parameters collected by the plurality of microphones and the user positioning data includes: determining first-dimensional scoring data based on the audio signal parameters collected by the plurality of microphones, the first-dimensional scoring data including a plurality of first scores, the plurality of first scores being first scores of the plurality of microphones, the first scores being used to reflect the audio quality collected by the microphones; determining second-dimensional scoring data based on the location data of the plurality of microphones and the user positioning data, the second-dimensional scoring data including a plurality of second scores, the plurality of second scores being second scores of the plurality of microphones, the second scores being used to reflect the distance of the microphones relative to the user; and selecting a master microphone from the plurality of microphones based on the first-dimensional scoring data and the second-dimensional scoring data.
[0014] Using scoring data from two dimensions to select the main microphone from multiple microphones increases the reliability of the main microphone selection and helps improve the recorded audio quality.
[0015] In conjunction with the first aspect, in one possible implementation, the first score is positively correlated with audio quality, and the second score is negatively correlated with the distance between the microphone and the user; the step of selecting the main microphone from the plurality of microphones based on the first dimension score data and the second dimension score data includes: determining a confidence score corresponding to the target microphone based on the first score and the second score corresponding to the target microphone, wherein the target microphone is any one of the plurality of microphones, and the confidence score is used to reflect the reliability of using the microphone as the main microphone; and determining the microphone corresponding to the highest confidence score as the main microphone based on the confidence scores corresponding to each of the plurality of microphones.
[0016] The microphone with the highest confidence score is selected as the main microphone. The confidence score reflects the reliability of the microphone as the main microphone. This is equivalent to selecting the microphone with the highest reliability as the main microphone, which can improve audio quality.
[0017] In conjunction with the first aspect, in one possible implementation, determining the confidence score corresponding to the target microphone based on the first score and the second score corresponding to the target microphone includes: performing a weighted summation of the first score and the second score corresponding to the target microphone to obtain the confidence score corresponding to the target microphone.
[0018] The confidence score is obtained by weighted summation of the scores on the two dimensions, which ensures the accuracy of the confidence score.
[0019] In conjunction with the first aspect, in one possible implementation, after determining the first dimension scoring data based on the audio signal parameters collected by the plurality of microphones, the method further includes performing Kalman filtering on the first dimension scoring data.
[0020] By applying Kalman filtering to the scoring data corresponding to the audio signal parameters, the stability of the scoring data can be guaranteed.
[0021] Secondly, a master microphone selection device is provided, applied to a recording host in a recording space, wherein multiple microphones are deployed in the recording space, with different microphones deployed at different positions in the recording space, and the recording host is connected to the multiple microphones; the device includes:
[0022] The user positioning module is used to acquire user positioning data, which reflects the location of the target user in the recording space.
[0023] The microphone selection module is used to select the main microphone from the multiple microphones by combining the audio signal parameters collected by the multiple microphones and the user positioning data.
[0024] Thirdly, a computer device is provided, including a memory and a processor, the memory being connected to the processor, the processor being configured to execute one or more computer programs stored in the memory, the processor causing the computer device to implement the master microphone selection method of the first aspect described above when executing the one or more computer programs.
[0025] Fourthly, a computer-readable storage medium is provided, which stores a computer program, the computer program including program instructions, which, when executed by a processor, cause the processor to perform the master microphone selection method of the first aspect.
[0026] This application can achieve the following technical effects: it enables the selection of the main microphone; since the audio signal parameters collected by the microphone can reflect the recording effect of the microphone, and the user positioning data can reflect the distance of the user relative to the microphone, combining the audio signal parameters collected by multiple microphones and the user positioning data to select the main microphone is equivalent to combining data from multiple dimensions to select the main microphone, which can enhance the reliability of microphone selection, help to select a microphone that is closer to the user and has a better recording effect as the main microphone, thereby improving the recording effect. Attached Figure Description
[0027] To more clearly illustrate the technical solutions of the embodiments of this application, the drawings used in the description of the embodiments of this application will be briefly introduced below. Obviously, the drawings described below are only some embodiments of this application. For those skilled in the art, other drawings can be obtained based on these drawings without creative effort.
[0028] Figure 1 A schematic diagram of a recording and broadcasting system provided in an embodiment of this application;
[0029] Figure 2 A flowchart illustrating a method for selecting a main microphone provided in an embodiment of this application;
[0030] Figure 3 A schematic diagram of the target image and horizontal coordinate system provided in the embodiments of this application;
[0031] Figure 4 This is a schematic diagram of the structure of a main microphone selection device provided in an embodiment of this application;
[0032] Figure 5 This is a schematic diagram of the structure of a computer device provided in an embodiment of this application. Detailed Implementation
[0033] To make the objectives, technical solutions, and advantages of this application clearer, the following detailed description is provided in conjunction with the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative and not intended to limit the scope of this application. All other embodiments obtained by those skilled in the art based on the embodiments in this application without inventive effort are within the scope of protection of this application.
[0034] It should be noted that, unless there is a conflict, the various features in the embodiments of this application can be combined with each other, all of which are within the protection scope of this application. Furthermore, although functional modules are divided in the device schematic diagram and a logical order is shown in the flowchart, in some cases, the steps shown or described can be executed in a different order than the module division in the device or the order in the flowchart. Moreover, the terms "first," "second," and "third" used in this application do not limit the data or execution order, but only distinguish identical or similar items with essentially the same function and effect.
[0035] The technical solution of this application is applicable to recording and broadcasting scenarios. A recording and broadcasting scenario refers to a scenario in which a recording and broadcasting system is used to integrate and record live video and audio to generate a recording file. Recording and broadcasting scenarios can include, for example, educational recording and broadcasting scenarios, conference recording and broadcasting scenarios, performance recording and broadcasting scenarios, etc., and are not limited to the examples given here.
[0036] In recording and broadcasting scenarios, multiple microphones are typically deployed at different locations within the space to increase their pickup range. For example, in an educational recording and broadcasting setting, a standard recording classroom is 8 meters by 12 meters. A row of microphones is typically deployed on both the left and right sides of the classroom, with each row containing three microphones. These three microphones are suspended at positions 3 meters, 6 meters, and 9 meters from the front of the classroom, respectively. In total, there are six microphones deployed in the recording classroom, positioned at different locations within the classroom.
[0037] To improve recording quality, a primary microphone is selected from multiple microphones to record the speaker's voice in the space, while the others serve as secondary microphones to supplement the recording. The primary microphone is called the main microphone, and the secondary microphones are called auxiliary microphones. The sounds captured by the main and auxiliary microphones are processed and then mixed to obtain the final audio. For example, the sound recorded by the main microphone (after volume amplification) is mixed with the sound recorded by the auxiliary microphones (after volume reduction) to obtain the final audio. A well-chosen main microphone can enhance the desired sound and suppress unwanted sounds; conversely, an inappropriate selection may have the opposite effect. Therefore, the choice of main microphone significantly impacts the final recording quality, making its selection a crucial technical challenge.
[0038] Since the volume and signal-to-noise ratio (SNR) of the sound signal captured by a microphone can reflect the audio quality, a feasible microphone selection scheme is as follows: obtain the volume and SNR of the sound signals captured by different microphones, sum the weighted values of the volume and SNR of each microphone as the score of each microphone, and then select the microphone with the highest score as the main microphone. In other words, microphone selection is based on the audio signal parameters captured by the microphone. The problem with this microphone selection scheme is that the audio signal parameters are easily interfered with by other sound signals, leading to an unreasonable selection of the main microphone. For example, when the recording space is near a road, the volume of vehicle horns captured by the microphone near the road (hereinafter referred to as the first microphone) may be higher than the volume of the sound signal captured by the microphone near the user in the recording space (hereinafter referred to as the second microphone). Because the volume of the sound signal captured by the first microphone is higher than that captured by the second microphone, the microphone selection based on audio signal parameters would determine the first microphone as the main microphone. However, the sound signal captured by the second microphone is the valid sound signal that needs to be recorded.
[0039] An improved version of the microphone selection scheme could be as follows: Based on the audio signal parameters collected by the microphones, further identification and classification of the sound signals collected by different microphones would determine whether the collected sound signals are valid. The microphone with the highest score and the most valid sound signal would be selected as the main microphone. However, this sound signal identification and classification requires complex calculations, significant computing resources, and considerable processing time, potentially causing lag in the recording host and failing to meet real-time requirements.
[0040] Therefore, this application proposes a novel microphone selection scheme. By acquiring user location data and combining it with audio signal parameters collected by the microphone, a comprehensive microphone selection method is used. This adds user location data as a new dimension to the microphone selection process, enhancing its reliability and ensuring a more suitable microphone is chosen as the primary microphone, thus improving recording quality. This enhanced reliability through the addition of a new selection dimension does not incur excessive computational overhead or cause recording host lag, while still meeting real-time requirements.
[0041] The technical solution of this application can be applied to recording and broadcasting systems. For ease of understanding, the recording and broadcasting system of this application will be introduced first. See [link to relevant documentation]. Figure 1 , Figure 1 This is a schematic diagram of the composition structure of a recording and broadcasting system provided in an embodiment of this application, as shown below. Figure 1As shown, the recording system 10 includes a recording host 101, a camera 102, and a microphone 103, all deployed in the same recording space. The camera 102 captures images within the recording space to form video. The number of cameras 102 in the recording space can be multiple, depending on the recording requirements. For example, in an educational recording scenario, three cameras can be set up: a PTZ camera, a teacher camera, and a student camera. The PTZ camera and student camera are positioned at the rear of the recording classroom to record the entire classroom image; the teacher camera is positioned at the front of the classroom to record the podium and the surrounding area. The camera 102 can be a standard monocular camera, a binocular camera, or a depth camera. The microphone 103 records sound to form audio. Multiple microphones 103 are deployed in different locations within the recording space. In some cases, multiple microphones 103 can be arranged in a matrix within the recording space to record sound from different areas. The number of microphones 103 arranged in a matrix is m*n, where the values of m and n depend on the size of the recording space and the recording requirements. For example, in an educational recording scenario, the number of microphones is 2*3. The recording host 101 is the control device in the recording system 10. The recording host 101 is connected to both the camera 102 and the microphones 103, and is used to control and schedule the camera 102 and microphones 103 to complete the recording. The recording host 101 can receive video captured by the camera 102 and audio recorded by the microphones 103. The recording host 101 can also integrate the video captured by the camera 102 and the audio recorded by the microphones 103 to form a recorded video. The recording host 101 can also select and differentiate between multiple microphones 103, determining the main microphone from among them. The recording host 101 can also schedule or interact with other hardware in the recording space, and so on, not limited to the descriptions here. The position of the recording host 101 in the recording space can be set based on the recording requirements. For example, in an educational recording scenario, the recording host 101 can be installed on the wall at the back of the recording classroom.
[0042] The recording system 10 may also include other hardware required for recording, such as audio equipment, radio frequency tags, ultra-wideband base stations, large display screens, etc., and this application does not impose any restrictions on this.
[0043] The technical solution of this application is specifically applied to the recording host 101 in the recording and broadcasting system 10. The technical solution of this application is described in detail below.
[0044] See Figure 2 , Figure 2 This is a flowchart illustrating a master microphone selection method provided in an embodiment of this application, as shown below. Figure 2 As shown, the method includes the following steps:
[0045] S101, Obtain user location data.
[0046] Here, user location data is used to reflect the location of the target user within the recording space. The target user refers to the user whose voice needs to be recorded in the recording scenario, and can be the main speaker in the recording space. Taking an educational recording scenario as an example, the target user can be the teacher in the recording classroom.
[0047] User location data can reflect the target user's approximate location within the recording space, such as the area the target user occupies within the recording space. Alternatively, user location data can reflect the target user's precise location within the recording space. Whether the user location data specifically reflects the target user's approximate or precise location within the recording space depends on the hardware and methods used to determine the user location.
[0048] In some possible scenarios, target users can be located and their location data obtained based on the existing hardware in the recording system, which includes a recording host, microphone, and camera. In one feasible implementation, user location data can be obtained through the following steps A1-A2:
[0049] A1. Obtain the target received signal strength and target image.
[0050] Here, the target received signal strength refers to the received signal strength of the target wireless signal received by the wireless receiving unit in the recording host. The target wireless signal is the wireless signal transmitted by the wireless transmitting device carried by the target user.
[0051] In one specific implementation, the target wireless signal can be a Bluetooth signal, the wireless receiving unit in the recording host can be a Bluetooth receiving unit, and the wireless transmitting device carried by the target user can be a Bluetooth transmitting device. This Bluetooth transmitting device can be a Bluetooth module compatible with the recording host, or it can be a Bluetooth device such as a mobile phone carried by the target user. The Bluetooth transmitting device carried by the target user can be paired with the recording host via Bluetooth before recording. During recording, the recording host receives the Bluetooth signal transmitted by the Bluetooth transmitting device carried by the target user through its Bluetooth receiving unit, detects the received signal strength of the Bluetooth signal transmitted by the target user's Bluetooth transmitting device, and obtains a received signal strength indicator (RSSI).
[0052] Alternatively, the target wireless signal can also be other types of signals, such as a Zigbee signal.
[0053] The target image is an image of the recording space captured by a camera in the recording system. The target image includes the target user and encompasses the entire recording space. For example, in an educational recording scenario, the target image could be an image captured by a student camera. Exemplarily, the target image could be as follows: Figure 3 As shown, Figure 3 The recording space shown has six microphones set up. The six microphones are as follows: Figure 3 As shown in m1 to m6.
[0054] A2. Determine user location data based on the target received signal strength and target image.
[0055] When multiple microphones are arranged in a matrix in the recording space, user location data can be determined through the following steps A21-A23:
[0056] A21. Determine the target distance based on the strength of the received signal.
[0057] Here, the target distance is the distance between the target user and the recording host. For example, the target distance can be as follows: Figure 3 As shown in d1.
[0058] In one specific implementation, the target distance can be calculated based on a ranging formula. The ranging formula can be: D represents distance, rssi represents received signal strength, A represents received signal strength at a distance of 1 meter, and n represents environmental attenuation factor. Both A and n are preset values. Substituting the target's received signal strength into the ranging formula, the target distance can be calculated.
[0059] In another specific implementation, different distances corresponding to different received signal strengths can be measured and determined in advance to establish a correspondence between received signal strength and distance; based on the correspondence between received signal strength and distance, the distance corresponding to the target received signal strength can be determined as the target distance.
[0060] This application does not impose any restrictions on the specific implementation method of determining the target distance based on the received signal strength.
[0061] A22. Determine the first coordinate of the target user in the horizontal coordinate system based on the image region where the target user is located in the target image.
[0062] Here, the horizontal coordinate system refers to a coordinate system on a horizontal plane, parallel to the ground of the recording space. (See reference...) Figure 3 ,for Figure 3The target image shown can be used to establish a horizontal coordinate system as shown in Figure 3. The horizontal coordinate system can take a corner of the ground in the recording space as the origin and the left and right and front and back directions of the recording space as the two coordinate directions.
[0063] The first coordinate is the coordinate in the preset direction, which is the left-right direction of the recording space in the horizontal coordinate system. Figure 3 Taking the horizontal coordinate system in the image as an example, the camera captures an image of the classroom from back to front. The x-axis represents the left-right direction of the classroom, and the y-axis represents the front-back direction. The default direction is... Figure 3 In the x-axis direction, the first coordinate is the x-axis position coordinate.
[0064] In one feasible implementation, the image captured by the camera can be pre-divided into multiple image regions based on the distribution of multiple microphones in the horizontal coordinate system. The number of image regions is the same as the number of microphones in the left-right direction on the horizontal plane. Different first coordinates are set for different image regions based on the positions of the microphones in the left-right direction on the horizontal plane, and a correspondence between image regions and first coordinates is pre-established. Based on the correspondence between image regions and first coordinates, the first coordinate corresponding to the image region where the target user is located in the target image is determined as the first coordinate of the target user in the horizontal coordinate system.
[0065] With the distribution of microphones as Figure 3 As shown in the example, a row of microphones is deployed on each side of the classroom. The horizontal coordinate system has its origin at the lower right corner of the classroom's horizontal plane. The coordinates of microphone m1 in the horizontal coordinate system are (x1, y1), microphone m2 is (x2, y1), microphone m3 is (x1, y2), microphone m4 is (x2, y2), microphone m5 is (x1, y3), and microphone m6 is (x2, y3). Since there are two rows of microphones in the left and right directions, that is, two microphones in the left and right directions, the image captured by the camera can be divided into two image regions, as shown in the example. Figure 3Image regions T1 and T2 are shown in the diagram. Image region T1 represents the right side of the recording space, and image region T2 represents the left side. Corresponding x-axis coordinates are set for image regions T1 and T2. x1 + Δx is used as the x-axis coordinate for image region T1, and x2 - Δx is used as the x-axis coordinate for image region T2. Δx is a preset coordinate offset value, where Δx < (x2 + x1) / 2. If the target user is located in image region T1, the first coordinate of the target user in the horizontal coordinate system is determined to be x1 + Δx; if the target user is located in image region T2, the first coordinate of the target user in the horizontal coordinate system is determined to be x2 - Δx.
[0066] A23. Based on the target user's first coordinate in the horizontal coordinate system and the target distance, determine the target user's second coordinate in the horizontal coordinate system.
[0067] Here, the second coordinate is the coordinate in the horizontal coordinate system along a direction perpendicular to the preset direction. Using the horizontal coordinate system as an example... Figure 3 As shown in the example, the second coordinate is the y-axis coordinate.
[0068] In one specific implementation, the position coordinates of the recording host in the horizontal coordinate system can be obtained. Based on the distance calculation formula, the position coordinates of the recording host in the horizontal coordinate system, the first coordinate of the target user in the horizontal coordinate system, and the target distance, the second coordinate of the target user in the horizontal coordinate system can be determined.
[0069] You can continue to refer to this. Figure 3 Assuming the recording host's position coordinates in the horizontal coordinate system are (x0, y0), the target user's first coordinate in the horizontal coordinate system is xd, the target user's second coordinate in the horizontal coordinate system is yd, and the target distance is D, the distance calculation formula is:
[0070] (yd-y0) 2 +(xd-x0) 2 =D 2
[0071] Since x0, y0, xd, and D are all known terms, yd can be calculated, which is the second coordinate of the target user in the horizontal coordinate system.
[0072] With the recording host as the origin in the horizontal coordinate system, the second coordinate of the target user in the horizontal coordinate system can also be determined based on the Pythagorean theorem, that is...
[0073] This application does not limit the method by which the second coordinate of the target user in the horizontal coordinate system is determined based on the first coordinate of the target user in the horizontal coordinate system and the target distance.
[0074] It should be noted that the user location data determined in steps A21-A23 above is mainly applicable to the case where the camera in the recording system is a monocular camera. The user location data determined in steps A21-A23 above reflects the approximate location of the user in space.
[0075] Through the above steps A21-A23, the distance between the user and the recording host is determined based on the received signal strength. Then, the user's coordinates in one direction of the horizontal coordinate system are determined based on the image. Finally, the user's coordinates in another direction of the horizontal coordinate system are determined based on the distance between the user and the recording host and the already determined position coordinates. This can achieve user positioning on the horizontal plane. The calculation method is simple and can save the computing power required for user positioning.
[0076] By combining the images captured by the camera and the received signal strength through the above steps A1-A2, the user's location data in the recording space can be determined. This allows the user's location to be determined using the existing hardware in the recording scene without the need for additional hardware, thus saving hardware costs.
[0077] Besides determining the user's location data through the methods described in steps A1-A2 above, other feasible methods can also be used. For example, if the camera in the recording system is a depth camera, the user's position in the depth image can be determined based on the depth image captured by the depth camera, and then the user's position in the world coordinate system can be determined based on the transformation between the pixel coordinate system and the world coordinate system. Similarly, if the camera in the recording system is a stereo camera, the user's position in the world coordinate system can also be determined based on the information captured by the stereo camera; this application does not impose any limitations on this method.
[0078] In other possible scenarios, additional hardware devices can be added to the recording system to determine user location data. For example, Bluetooth beacons or radio frequency beacons can be added to the recording system to determine user location data based on triangulation technology. User location data determined by triangulation technology can accurately reflect the target user's position in the recording space. Alternatively, multiple positioning technologies can be combined to determine and obtain user location data; and so on, not limited to the examples here.
[0079] S102 combines audio signal parameters collected from multiple microphones with user positioning data to select the main microphone from among the multiple microphones.
[0080] In some possible situations, the master microphone can be selected from multiple microphones using the following steps B1-B3:
[0081] B1. Determine the first dimension scoring data based on the audio signal parameters collected by multiple microphones.
[0082] Here, audio signal parameters are parameters that reflect the audio quality captured by the microphone. Audio signal parameters may include volume and signal-to-noise ratio; audio signal parameters may also include other parameters that reflect audio quality.
[0083] The first-dimensional scoring data includes multiple first-level scores, which are the first-level scores from multiple microphones. These first-level scores reflect the audio quality captured by the microphones. For example, if there are six microphones, the scores from these six microphones... Figure 3 As shown, the first dimension rating data includes 6 first ratings, namely the first rating of microphone m1, the first rating of microphone m2, the first rating of microphone m3, the first rating of microphone m4, the first rating of microphone m5, and the first rating of microphone m6.
[0084] The first score is positively correlated with the audio quality captured by the microphone; that is, the better the audio quality captured by the microphone, the higher the first score, and the worse the audio quality, the lower the first score.
[0085] In one feasible implementation, the audio signal parameters include volume and signal-to-noise ratio (SNR). In the process of determining the first dimension score data based on the audio signal parameters collected by multiple microphones, the volume and SNR collected by the target microphone can be weighted and summed to obtain the first score of the target microphone, where the target microphone is any one of the multiple microphones. For each of the multiple microphones, the volume and SNR collected by each microphone are weighted and summed separately to obtain the first score of each microphone, thus obtaining the first dimension score data.
[0086] For example, multiple microphones, such as Figure 3 As shown, the volume and signal-to-noise ratio (SNR) data collected by microphone m1 can be weighted and summed to obtain the first score of microphone m1; the volume and SNR data collected by microphone m2 can be weighted and summed to obtain the first score of microphone m2; the volume and SNR data collected by microphone m3 can be weighted and summed to obtain the first score of microphone m3; the volume and SNR data collected by microphone m4 can be weighted and summed to obtain the first score of microphone m4; the volume and SNR data collected by microphone m5 can be weighted and summed to obtain the first score of microphone m5; the volume and SNR data collected by microphone m6 can be weighted and summed to obtain the first score of microphone m6; the first scores of microphones m1 to m6 constitute the first dimension of the score data.
[0087] A microphone score is obtained by weighted summation of the volume and signal-to-noise ratio received by the microphone, which can objectively evaluate the audio quality captured by the microphone.
[0088] Of course, the first score of each microphone can be determined in other ways, and this application does not limit this.
[0089] In some possible cases, after determining the first score for each microphone and obtaining the first-dimensional score data, Kalman filtering can be applied to the first-dimensional score data.
[0090] By applying Kalman filtering to the scoring data corresponding to the audio signal parameters, the stability of the scoring data can be guaranteed.
[0091] B2. Determine the second dimension scoring data based on the location data of multiple microphones and user positioning data.
[0092] Here, the second-dimensional rating data includes multiple second-level ratings, which are the second-level ratings of multiple microphones. These second-level ratings are used to reflect the distance between the microphones and the user. For example, there may be six microphones. Figure 3 As shown, the second dimension rating data includes 6 second ratings, namely the second rating of microphone m1, the second rating of microphone m2, the second rating of microphone m3, the second rating of microphone m4, the second rating of microphone m5, and the second rating of microphone m6.
[0093] The second score can be negatively correlated with the distance between the microphone and the user; that is, the closer the microphone is to the user, the higher the second score, and the farther the microphone is from the user, the lower the second score.
[0094] Specifically, the distances of multiple microphones relative to the target user can be determined based on the location data of multiple microphones and user positioning data, thus obtaining the relative distances of multiple microphones; based on the relative distances of multiple microphones, the second score of multiple microphones can be determined, thus obtaining the second dimension score data.
[0095] In the process of determining the second score of multiple microphones based on their relative distances, and thus obtaining the second-dimensional score data, the difference between a preset distance and the relative distance to the target microphone can be used to determine the second score of the target microphone. The preset distance could be, for example, the diagonal length of the recording space floor. Alternatively, the second score can be determined by multiplying the reciprocal of the relative distance to the target microphone by a preset score, such as 100. This process is repeated for the relative distance of each microphone to obtain the second scores for multiple microphones, thus yielding the second-dimensional score data.
[0096] B3. Based on the first-dimensional scoring data and the second-dimensional scoring data, select the main microphone from multiple microphones.
[0097] When the first rating is positively correlated with audio quality and the second rating is negatively correlated with the distance between the microphone and the user, the master microphone can be selected from the plurality of microphones through the following steps B31-B32:
[0098] B31. Determine the confidence score of the target microphone based on the first score and the second score corresponding to the target microphone.
[0099] Here, the target microphone is any one of multiple microphones, and the confidence score is used to reflect the reliability of using the microphone as the main microphone; the higher the confidence score, the higher the reliability of using the target microphone as the main microphone, and the lower the confidence score, the lower the reliability of using the target microphone as the main microphone.
[0100] In one feasible implementation, the first score and the second score corresponding to the target microphone can be weighted and summed to obtain the confidence score corresponding to the target microphone. The sum of the weights of the first and second scores is 1.
[0101] B32. Based on the confidence scores of each of the multiple microphones, the microphone with the highest confidence score is determined as the main microphone.
[0102] The microphone with the highest confidence score is selected as the main microphone. The confidence score reflects the reliability of the microphone as the main microphone. This is equivalent to selecting the microphone with the highest reliability as the main microphone, which can improve audio quality.
[0103] In the above Figure 2 In the corresponding technical solution, user positioning data reflecting the target user's location within the recording space is acquired. This data is then combined with audio signal parameters collected by multiple microphones within the recording space to select the primary microphone. Since the audio signal parameters collected by the microphones reflect their recording quality, and the user positioning data reflects the user's distance from the microphone, selecting the primary microphone by combining audio signal parameters from multiple microphones and user positioning data is equivalent to combining data from multiple dimensions. This enhances the reliability of microphone selection and helps choose a microphone closer to the user with better recording quality as the primary microphone, thereby improving the recording effect.
[0104] The method of this application has been described above; the apparatus of this application will be described below.
[0105] See Figure 4 , Figure 4This is a schematic diagram of a master microphone selection device provided in an embodiment of this application, applied to a recording host in a recording space. Multiple microphones are deployed in the recording space, with different microphones positioned at different locations within the recording space. The recording host is connected to the multiple microphones. Figure 4 As shown, the main microphone selection device 30 includes:
[0106] User positioning module 301 is used to acquire user positioning data, which reflects the location of the target user in the recording space.
[0107] The microphone selection module 302 is used to select the main microphone from the multiple microphones by combining the audio signal parameters collected by the multiple microphones and the user positioning data.
[0108] In one possible design, a camera is also deployed in the recording space, and a wireless receiving unit is provided in the recording host; the aforementioned user positioning module 301 is specifically used for: acquiring the target received signal strength and the target image, wherein the target received signal strength is the received signal strength of the target wireless signal received by the wireless receiving unit, the target wireless signal is the wireless signal transmitted by the wireless transmitting device carried by the target user, and the target image is an image of the recording space captured by the camera, the target image containing the target user; and determining the user positioning data based on the target received signal strength and the target image.
[0109] In one possible design, the multiple microphones are arranged in a matrix in the recording space; the user positioning module 301 is specifically used for: determining the target distance based on the target received signal strength, the target distance being the distance between the target user and the recording host; determining the first coordinate of the target user in a horizontal coordinate system based on the image area where the target user is located in the target image, the first coordinate being a coordinate in a preset direction; and determining the second coordinate of the target user in the horizontal coordinate system based on the first coordinate and the target distance, the second coordinate being a coordinate in a direction perpendicular to the preset direction.
[0110] In one possible design, the microphone selection module 302 is specifically used for: determining first-dimensional scoring data based on the audio signal parameters collected by the plurality of microphones, wherein the first-dimensional scoring data includes a plurality of first scores, which are the first scores of the plurality of microphones, and the first scores are used to reflect the audio quality collected by the microphones; determining second-dimensional scoring data based on the position data of the plurality of microphones and the user positioning data, wherein the second-dimensional scoring data includes a plurality of second scores, which are the second scores of the plurality of microphones, and the second scores are used to reflect the distance of the microphones relative to the user; and selecting the main microphone from the plurality of microphones based on the first-dimensional scoring data and the second-dimensional scoring data.
[0111] In one possible design, the first score is positively correlated with audio quality, and the second score is negatively correlated with the distance between the microphone and the user; the microphone selection module 302 is specifically used to: determine the confidence score corresponding to the target microphone based on the first score and the second score corresponding to the target microphone, wherein the target microphone is any one of the plurality of microphones, and the confidence score is used to reflect the reliability of using the microphone as the main microphone; and determine the microphone corresponding to the highest confidence score as the main microphone based on the confidence scores corresponding to the plurality of microphones.
[0112] In one possible design, the microphone selection module 302 is specifically used to: perform a weighted summation of the first score and the second score corresponding to the target microphone to obtain the confidence score corresponding to the target microphone.
[0113] In one possible design, the microphone selection module 302 is specifically used to perform Kalman filtering on the first dimension scoring data.
[0114] It should be noted that, Figure 4 For any content not mentioned in the corresponding embodiments, please refer to the description of the foregoing method embodiments, which will not be repeated here.
[0115] The aforementioned device acquires user positioning data reflecting the target user's location within the recording space. Then, it combines this data with audio signal parameters collected by multiple microphones in the recording space to select a primary microphone from among the various microphones. Since the audio signal parameters collected by the microphones reflect their recording quality, and the user positioning data reflects the user's distance from the microphone, combining these two data points to select the primary microphone is equivalent to using multi-dimensional data. This enhances the reliability of microphone selection and helps choose a microphone closer to the user with better recording quality as the primary microphone, thereby improving the overall recording effect.
[0116] See Figure 5 , Figure 5 This is a schematic diagram of the structure of a computer device provided in an embodiment of this application. The computer device 40 includes a processor 401 and a memory 402. The memory 402 is connected to the processor 401, for example, via a bus.
[0117] Processor 401 is configured to support the computer device 40 in performing the corresponding functions in the methods described in the above method embodiments. Processor 401 may be a central processing unit (CPU), a network processor (NP), a hardware chip, or any combination thereof. The aforementioned hardware chip may be an application-specific integrated circuit (ASIC), a programmable logic device (PLD), or a combination thereof. The aforementioned PLD may be a complex programmable logic device (CPLD), a field-programmable gate array (FPGA), a generic array logic (GAL), or any combination thereof.
[0118] Memory 402 is used to store program code, etc. Memory 402 may include volatile memory (VM), such as random access memory (RAM); memory 402 may also include non-volatile memory (NVM), such as read-only memory (ROM), flash memory, hard disk drive (HDD), or solid-state drive (SSD); memory 402 may also include combinations of the above types of memory.
[0119] The memory 402 is used to store non-volatile software programs, non-volatile computer-executable programs, and modules, such as the program instructions / modules corresponding to the master microphone selection method in the embodiments of this application. The core processor and the graphics processor work together to execute the various functional applications and data processing of the master microphone selection method by running the non-volatile software programs, instructions, and modules stored in the memory, thereby realizing the function of the master microphone selection method provided in the above method embodiments.
[0120] Memory 402 may include a program storage area and a data storage area, wherein the program storage area may store the operating system and applications required for at least one function. The data storage area may store data created based on the use of the master microphone selection device. In some embodiments, the memory may include memory remotely located relative to the processor, which can be connected to the master microphone selection device via a network. Examples of such networks include, but are not limited to, the Internet, corporate intranets, local area networks, mobile communication networks, and combinations thereof.
[0121] The one or more modules are stored in the memory. When executed by the one or more processors, they perform the master microphone selection method in any of the above method embodiments. For example, they perform the method steps described in the above method embodiments to realize the functions of the modules described in the above device embodiments.
[0122] This application also provides a computer-readable storage medium storing a computer program, the computer program including program instructions, which, when executed by a computer, cause the computer to perform the method described in the foregoing embodiments.
[0123] Those skilled in the art will understand that all or part of the processes in the above embodiments can be implemented by a computer program instructing related hardware. The program can be stored in a computer-readable storage medium, and when executed, it can include the processes of the embodiments of the above methods. The storage medium can be a magnetic disk, optical disk, read-only memory (ROM), or random access memory (RAM), etc.
[0124] The above-disclosed embodiments are merely preferred embodiments of this application and should not be construed as limiting the scope of this application. Therefore, any equivalent variations made in accordance with the claims of this application shall still fall within the scope of this application.
Claims
1. A method of selecting a primary wheat comprising, A recording host is used in a recording space, wherein multiple microphones are deployed in different locations within the recording space, and the recording host is connected to the multiple microphones; the method includes: Acquire user location data, which is used to reflect the location of the target user in the recording space; By combining the audio signal parameters collected by the multiple microphones and the user location data, a main microphone is selected from the multiple microphones.
2. The method of claim 1, wherein, The recording space is also equipped with a camera, and the recording host is equipped with a wireless receiving unit; The acquisition of user location data includes: The target received signal strength and target image are obtained, wherein the target received signal strength is the received signal strength of the target wireless signal received by the wireless receiving unit, the target wireless signal is the wireless signal transmitted by the wireless transmitting device carried by the target user, and the target image is an image of the recording space captured by the camera, and the target image includes the target user; The user location data is determined based on the received signal strength of the target and the target image.
3. The method of claim 2, wherein, The multiple microphones are arranged in a matrix in the recording space; determining the user positioning data based on the target received signal strength and the target image includes: The target distance is determined based on the received signal strength of the target, whereby the target distance is the distance between the target user and the recording host. Based on the image region where the target user is located in the target image, determine the first coordinate of the target user in the horizontal coordinate system, wherein the first coordinate is a coordinate in a preset direction; Based on the first coordinate and the target distance, the second coordinate of the target user in the horizontal coordinate system is determined, where the second coordinate is a coordinate in a direction perpendicular to the preset direction.
4. The method according to any one of claims 1 to 3, characterized in that, The step of selecting the main microphone from the multiple microphones by combining the audio signal parameters collected by the multiple microphones and the user location data includes: Based on the audio signal parameters collected by the multiple microphones, a first dimension score data is determined. The first dimension score data includes multiple first scores, which are the first scores of the multiple microphones. The first scores are used to reflect the audio quality collected by the microphones. Based on the location data of the multiple microphones and the user positioning data, a second dimension rating data is determined. The second dimension rating data includes multiple second ratings, which are the second ratings of the multiple microphones. The second ratings are used to reflect the distance of the microphones relative to the user. Based on the first dimension scoring data and the second dimension scoring data, select the main microphone from the plurality of microphones.
5. The method of claim 4, wherein, The first rating is positively correlated with audio quality, while the second rating is negatively correlated with the distance between the microphone and the user. The step of selecting the main microphone from the plurality of microphones based on the first dimension scoring data and the second dimension scoring data includes: Based on the first score and the second score corresponding to the target microphone, a confidence score is determined for the target microphone, where the target microphone is any one of the plurality of microphones, and the confidence score is used to reflect the reliability of using the microphone as the main microphone; Based on the confidence scores of each of the multiple microphones, the microphone with the highest confidence score is determined as the main microphone.
6. The method of claim 5, wherein, The step of determining the confidence score corresponding to the target microphone based on the first score and the second score corresponding to the target microphone includes: The confidence score corresponding to the target microphone is obtained by weighted summing of the first score and the second score corresponding to the target microphone.
7. The method of claim 4, wherein, After determining the first dimension scoring data based on the audio signal parameters collected by the multiple microphones, the method further includes: The scoring data for the first dimension is processed by Kalman filtering.
8. A master kernel selection device, characterized by comprising: A recording host is used in a recording space, wherein multiple microphones are deployed in the recording space, and different microphones are deployed in different positions in the recording space, and the recording host is connected to the multiple microphones; The device includes: The user positioning module is used to acquire user positioning data, which reflects the location of the target user in the recording space. The microphone selection module is used to select the main microphone from the multiple microphones by combining the audio signal parameters collected by the multiple microphones and the user positioning data.
9. A computer device, comprising: The device includes a memory and a processor, the memory being connected to the processor, the processor being configured to execute one or more computer programs stored in the memory, the processor causing the computer device to perform the method as described in any one of claims 1-7 when executing the one or more computer programs.
10. A computer readable storage medium characterized by, The computer-readable storage medium stores a computer program, the computer program including program instructions that, when executed by a processor, cause the processor to perform the method as described in any one of claims 1-7.