Perspective area display method based on sound source position and VR device

By detecting the location and type of sound source using the microphone on the VR device, and dynamically calculating the size and position of the see-through area, the security problem caused by the fixed see-through area in VR devices is solved, and flexible security protection is achieved.

CN119472980BActive Publication Date: 2026-06-26HISENSE VISUAL TECH CO LTD

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
HISENSE VISUAL TECH CO LTD
Filing Date
2024-03-06
Publication Date
2026-06-26

AI Technical Summary

Technical Problem

The position and size of the see-through area in existing VR devices are fixed, the display method is not flexible enough, and it cannot effectively avoid accidents and dangers during the immersive experience.

Method used

The system receives ambient sound through at least two microphones on the VR device, detects the type of sound source and determines its position coordinates relative to the head. If the sound source is outside the safe zone, the system calculates the size and position of the perspective area based on the intensity of the ambient sound received by the microphones and displays a real-world image in the virtual environment.

Benefits of technology

It achieves dynamic display of the perspective area, which can effectively avoid accidents and dangers during the immersive experience, protect user safety, and does not require additional hardware costs or reliance on pre-built datasets for model training.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN119472980B_ABST
    Figure CN119472980B_ABST
Patent Text Reader

Abstract

The application relates to the VR technical field, and provides a perspective area display method based on a sound source position and a VR device, which is used for improving the flexibility of perspective area display on a safety area. The method detects a sound source in a real environment and determines the position coordinates of the sound source relative to a head based on environmental sound received by at least two microphones on the VR device. When it is determined that the sound source is located outside the safety area according to the position coordinates, the size and position of the perspective area on the safety area are calculated according to the intensity of target environmental sound received by the at least two microphones and the position of the sound source, and the real environment image is displayed by starting the perspective area, so that accidents and dangers possibly occurring in the immersive experience process are avoided, and the safety of users is protected.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This application relates to the field of virtual reality (VR) projection technology, and provides a method for displaying a perspective area based on the location of a sound source and a VR device. Background Technology

[0002] In VR devices, the safe zone is an important interactive feature. It refers to a safe area or zone set in the physical space to ensure that users do not collide with surrounding objects, injure themselves or others, or cause other unexpected situations when using VR technology. The aim is to ensure that users can enjoy an immersive experience in a safe and controllable environment.

[0003] Currently, most security zones are typically a range defined or customized by the user, such as... Figure 1 As shown, the safe zone is a sphere centered on the user. The safe zone can be designed to be dynamically triggered, changing as the user moves. When the user is in a dangerous situation, a perspective area is activated within the safe zone to display the real environment, allowing the user to perceive the real world and avoiding the potential for accidents and detachment from reality that can occur when immersed in a virtual environment, thus protecting the user's safety.

[0004] However, in related technologies, the position and size of the perspective area are often fixed and triggered by a controller ray or a specific button on the VR device, making the display method inflexible. Summary of the Invention

[0005] This application provides a method and VR device for displaying a perspective area based on the location of a sound source, which improves the flexibility of displaying a perspective area within a safe zone.

[0006] On one hand, embodiments of this application provide a method for displaying a perspective region based on the location of a sound source, applied to VR devices, including:

[0007] At preset time intervals, acquire the volume of ambient sound received by at least two microphones on the VR device;

[0008] If at least one volume is greater than a preset volume threshold, then the type of sound source in the environment is detected based on the target ambient sound received by the at least two microphones during the period when the volume is greater than the preset volume threshold.

[0009] When the sound source type is a target type, the position coordinates of the sound source relative to the head are determined based on the time information of the target environment sound received by the at least two microphones and the position information of the at least two microphones;

[0010] If the sound source is determined to be outside the safe zone of the VR device based on the location coordinates, then the size and position of the see-through area of ​​the sound source on the sphere of the safe zone are calculated based on the location coordinates and the intensity of the target ambient sound received by the at least two microphones, and the see-through area is turned on.

[0011] The real-world image seen through the perspective area is displayed in the virtual environment.

[0012] On the other hand, embodiments of this application provide a VR device, including a processor, a memory, a display, and at least two microphones, wherein the at least two microphones, the display, the memory, and the processor are connected via a bus;

[0013] The memory stores a computer program, and the processor performs the following operations according to the computer program:

[0014] The volume of ambient sound received by the at least two microphones is acquired at preset time intervals;

[0015] If at least one volume is greater than a preset volume threshold, then the type of sound source in the environment is detected based on the target ambient sound received by the at least two microphones during the period when the volume is greater than the preset volume threshold.

[0016] When the sound source type is a target type, the position coordinates of the sound source relative to the head are determined based on the time information of the target environment sound received by the at least two microphones and the position information of the at least two microphones;

[0017] If the sound source is determined to be outside the safe zone of the VR device based on the location coordinates, then the size and position of the see-through area of ​​the sound source on the sphere of the safe zone are calculated based on the location coordinates and the intensity of the target ambient sound received by the at least two microphones, and the see-through area is turned on.

[0018] The real-world image seen through the perspective area is displayed in the virtual environment via the monitor.

[0019] Optionally, the processor calculates the size and position of the perspective area projected onto the safe zone based on the location coordinates and the intensity of the ambient sound received by the at least two microphones, combined with the spherical radius of the safe zone. Specifically, the operation is as follows:

[0020] The position coordinates are transformed from a rectangular coordinate system to a spherical coordinate system to obtain the spherical coordinates of the sound source;

[0021] The spherical coordinates are equidistantly projected onto the spherical surface of the safe zone to obtain the projection point, and the polar angle and azimuth angle of the spherical coordinates are used as the polar angle and azimuth angle of the projection point;

[0022] The target sound intensity is obtained based on the intensity of the target ambient sound received by each of the at least two microphones;

[0023] The radius of the projection point is determined based on the target sound intensity and the reference sound intensity; wherein the reference sound intensity is the average intensity of the background sound received by the at least two microphones;

[0024] The size and position of the perspective area are obtained based on the radius, polar angle, and azimuth angle of the projection point.

[0025] Optionally, the formula for calculating the radius of the projection point is:

[0026]

[0027] R = α * lg(L / L0)

[0028] Where i = 1, 2, ..., n, L i L represents the intensity of the target ambient sound received by the i-th microphone, L represents the target sound intensity, L0 represents the reference sound intensity, and R represents the radius of the projection point.

[0029] Optionally, the processor detects the type of sound source in the environment based on the target ambient sound received by at least two microphones within a time period greater than the preset volume threshold. Specifically, the operation is as follows:

[0030] For the target ambient sound received by at least two microphones within a time period exceeding the preset volume threshold, normalization and noise reduction processing are performed respectively;

[0031] Based on the waveform and frequency domain characteristics of the processed target environmental sound, the type of sound source in the environment is determined.

[0032] Optionally, the processor determines the position coordinates of the sound source relative to the head based on the time information of the target ambient sound received by the at least two microphones and the position information of the at least two microphones. Specifically, the operation is as follows:

[0033] The signal reception time difference between each pair of microphones is obtained based on the start time or end time of the reception of the target environmental sound by the at least two microphones respectively.

[0034] The initial position coordinates of the sound source relative to the head are obtained based on the signal reception time difference between each pair of microphones, the distance between the corresponding two microphones, and the relative positions of the corresponding two microphones to the center of the eyebrows.

[0035] Based on the initial position coordinates, the target position coordinates of the sound source relative to the head are obtained.

[0036] Optionally, the background sound includes at least one of ambient noise and noise generated by the VR device itself, and the preset volume threshold is greater than the average volume of the background sound.

[0037] On the other hand, embodiments of this application provide a computer-readable storage medium storing computer-executable instructions for causing a computer device to execute any of the perspective region display methods based on sound source location provided in embodiments of this application.

[0038] The beneficial effects of the perspective area display method and VR device based on sound source location provided in this application are as follows:

[0039] Based on ambient sounds received by at least two microphones on the VR device, the system detects sound sources in the real environment and determines the position coordinates of the sound sources relative to the head. When the sound source is determined to be outside the safe zone based on the position coordinates, the system calculates the size and position of the see-through area on the safe zone based on the intensity of the target ambient sound received by at least two microphones and the position of the sound source. By opening the see-through area, the system displays a real environment image, thereby avoiding accidents and dangers that may occur during the immersive experience and protecting user safety.

[0040] Other features and advantages of this application will be set forth in the description which follows, and will be apparent in part from the description, or may be learned by practicing the application. The objectives and other advantages of this application may be realized and obtained by means of the structures particularly pointed out in the written description, claims, and drawings. Attached Figure Description

[0041] To more clearly illustrate the technical solutions in the embodiments of this application or the prior art, the drawings used in the description of the embodiments or the prior art will be briefly introduced below. Obviously, the drawings described below are some embodiments of this application. For those skilled in the art, other drawings can be obtained based on these drawings without creative effort.

[0042] Figure 1 This is a schematic diagram of the safe zone of a VR device provided in an embodiment of this application;

[0043] Figure 2 A flowchart of a perspective region display method based on sound source location is provided for an embodiment of this application;

[0044] Figure 3 A flowchart illustrating a method for estimating the location of a sound source, as provided in an embodiment of this application;

[0045] Figure 4This application provides a schematic diagram illustrating the positional relationship between a sound source and two microphones in an embodiment.

[0046] Figure 5 A schematic diagram illustrating the principle of sound source location estimation provided in an embodiment of this application;

[0047] Figure 6 A flowchart illustrating a perspective region calculation method based on sound source location, provided for an embodiment of this application;

[0048] Figure 7 A perspective view of the area provided in the embodiments of this application;

[0049] Figure 8 A view showing the perspective area provided for the implementation of this application;

[0050] Figure 9 A flowchart illustrating the VR experience method provided in this application embodiment;

[0051] Figure 10 A structural diagram of a VR device provided in an embodiment of this application. Detailed Implementation

[0052] To make the objectives, technical solutions, and advantages of the embodiments of this application clearer, the technical solutions of this application will be clearly and completely described below with reference to the accompanying drawings of the embodiments of this application. Obviously, the described embodiments are only some embodiments of the technical solutions of this application, and not all embodiments. Based on the embodiments recorded in this application, all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of the technical solutions of this application.

[0053] Based on the exemplary embodiments shown in this application, all other embodiments obtained by those skilled in the art without inventive effort are within the scope of protection of this application. Furthermore, although the disclosures in this application are presented by way of one or more exemplary examples, it should be understood that each aspect of these disclosures can constitute a complete technical solution on its own.

[0054] Furthermore, the terms “including” and “having”, and any variations thereof, are intended to cover but not exclusively include, for example, a product or device that includes a series of components is not necessarily limited to those components that are clearly listed, but may include other components that are not clearly listed or that are inherent to such product or device.

[0055] As used in this application, the term "module" refers to any known or subsequently developed hardware, software, firmware, artificial intelligence, fuzzy logic, or combination of hardware and / or software code capable of performing the functions associated with that element.

[0056] The design concept of the embodiments of this application will be summarized below in conjunction with application scenarios.

[0057] VR devices protect user safety by displaying a realistic environment through a see-through area within a safe zone. However, in related technologies, the position and size of the see-through area are often fixed and triggered by controller rays or specific buttons on the VR device, resulting in a lack of flexibility in display methods.

[0058] With the development of Artificial Intelligence (AI) technology, deep learning has been widely applied in the VR field. Models trained using deep learning algorithms detect objects in images captured by VR devices, automatically activating the perspective region when a target object is detected. However, this method typically requires a given dataset and prior knowledge for model training, which is time-consuming and places certain demands on the processing power of the VR device. Furthermore, while this method achieves dynamic display of the perspective region, the position and size of the perspective region remain fixed.

[0059] In view of this, embodiments of this application provide a method and VR device for displaying a perspective area based on the location of a sound source. Based on the ambient sound received by at least two microphones on the VR device, the method detects the sound source in the real environment and determines the position coordinates of the sound source relative to the head. Then, it determines whether the sound source is located outside the safe zone by using the position coordinates. If so, it calculates the size and position of the perspective area on the safe zone based on the ambient sound received by at least two microphones, and opens the perspective area to display the real environment image, thereby avoiding accidents and dangers that may occur during the immersive experience and protecting user safety.

[0060] See Figure 2 The following is an implementation flow of a perspective region display method based on sound source location provided in this application embodiment. This flow is executed by a VR device and mainly includes the following steps:

[0061] S201: Acquire the volume of ambient sound received by at least two microphones on the VR device at preset time intervals.

[0062] Typically, VR devices have two or more microphone arrays to achieve spatial stereo sound. When users wear VR devices for immersive experiences such as gaming, entertainment, and social interaction, the user and their real-world environment are constantly changing—e.g., waving arms, running babies, passing pedestrians, toy cars, etc. These dynamic changes can potentially endanger the user and others. During immersive experiences, microphones can be activated to receive real-time sounds from the user's surroundings, and the sound volume can reflect the user's distance from the sound source, allowing for the analysis of potential dangers in the real-world environment through sound analysis.

[0063] In practice, during the immersive experience, the VR device synchronously acquires ambient sounds from at least two microphones at preset time intervals. That is, the timestamps of the two ambient sounds acquired by the VR device are the same. For each microphone, the amplitude of the ambient sound it receives is analyzed to obtain the volume of the ambient sound it receives.

[0064] Taking two microphones as an example, assuming the time interval is 1 second, in the first second, the VR device analyzes the ambient volume based on the ambient sound received by microphone 1 at timestamp S1, and also analyzes the ambient volume based on the ambient sound received by microphone 2 at timestamp S1; in the second second, the VR device analyzes the ambient volume based on the ambient sound received by microphone 1 at timestamp S2, and also analyzes the ambient volume based on the ambient sound received by microphone 2 at timestamp S2.

[0065] S202: Determine whether at least one volume is greater than a preset volume threshold. If yes, execute S203; otherwise, return to S201.

[0066] Considering that the ambient sound received by the microphone may include background noise and noise generated by the VR device itself (such as CPU noise, VR device fan noise, etc.), a volume threshold can be preset to distinguish between real sound and noise in order to avoid the noise in the background sound from interfering with the sound source in the environment.

[0067] In one optional embodiment, the process of setting the preset volume threshold includes:

[0068] First, acquire the background sound received by at least two microphones when there is no sound source, and analyze the volume of each background sound.

[0069] Then, calculate the average volume of each background sound and set a preset volume threshold that is greater than the average volume.

[0070] Taking two microphones as an example, assuming that the volume of the background sound received by microphone 1 is noise1 and the volume of the background sound received by microphone 2 is noise2, then the average volume is (noise1+noise2) / 2, and the preset volume threshold is: (noise1+noise2) / 2+σ.

[0071] Optionally, σ can be 0.5, 0.8, or 1 dB, and can be flexibly adjusted according to actual needs.

[0072] It should be noted that the embodiments of this application do not impose restrictive requirements on the setting method of the preset volume threshold. For example, the volume threshold can be set to be greater than the maximum volume of each background sound, or the volume threshold can be set to be greater than the minimum volume of each background sound.

[0073] S203: Detect the type of sound source in the environment based on the target ambient sound received by at least two microphones within a period of time that is greater than a preset volume threshold.

[0074] When the volume of at least one of the ambient sounds received by at least two microphones is greater than a preset volume threshold, it indicates that there are other sound sources in the user's real environment besides the background sound. At this time, the ambient sound source can be detected.

[0075] In one optional embodiment, the sound source detection process includes the following steps:

[0076] S2031: Normalize and reduce noise for target ambient sounds received by at least two microphones during a period of time exceeding a preset volume threshold.

[0077] The noise reduction process includes, but is not limited to, low-pass filtering, high-pass filtering, band-pass filtering, frequency domain filtering, and time domain filtering.

[0078] S2032: Determine the type of sound source in the environment based on the waveform and frequency domain characteristics of the processed target environmental sound.

[0079] In one example, a set of sound sources to be focused on can be pre-defined; these sources typically possess unique sonic characteristics. For instance, the sound of an object breaking has a low frequency, and the cry of a baby is low-pitched and rhythmic. Therefore, the types of sound sources in the environment can be determined by analyzing the waveform and frequency domain characteristics of the target environment's sounds.

[0080] It should be noted that the embodiments of this application do not impose restrictive requirements on the detection method of the sound source type. For example, neural networks, machine learning algorithms, and other algorithms can also be used for detection.

[0081] S204: Determine if the sound source type is the target type. If yes, execute S205; otherwise, return to S201.

[0082] Typically, target types include sound sources that may affect the safety of users and others during VR experiences, such as the sound of objects breaking, things falling, horns, and babies crying.

[0083] S205: Determine the position coordinates of the sound source relative to the head based on the time information of the target ambient sound received by at least two microphones and the position information of at least two microphones.

[0084] When the sound source type is target type, it indicates that there may be danger to the user or others. Therefore, it is necessary to estimate the relative position of the sound source and the user.

[0085] Sound source estimation refers to the process of determining the location, intensity, frequency components, and other relevant attributes of a sound source in an acoustic environment by analyzing received sound signals. Common sound source estimation methods include sound source location estimation to confirm the three-dimensional coordinates of the sound source, and sound source quantity estimation to confirm the number of sound sources. When using dual microphones for indoor sound source location estimation, this is mostly achieved by utilizing the time difference of arrival (TDOA) and / or level difference of arrival (LDOA) ​​of sound waves arriving at the two microphones.

[0086] Taking TDOA-based sound source location estimation as an example, in one optional implementation, the sound source location estimation process is as follows: Figure 3 As shown, the main steps include:

[0087] S2051: Based on the start or end time of receiving the target ambient sound from at least two microphones, obtain the signal reception time difference between each pair of microphones.

[0088] Taking two microphones (MICs) as an example, such as Figure 4 As shown, because the distances between the sound sources in the environment and MIC 1 and MIC 2 on the VR device are different, the starting times when MIC 1 and MIC 2 receive the target environmental sound from the same sound source are different, or the ending times when they stop receiving the target environmental sound are different, i.e., t1 ≠ t2. Therefore, the signal reception time difference between MIC 1 and MIC 2 is |t1-t2|.

[0089] S2052: Based on the reception time difference between each pair of microphones, the distance between the corresponding two microphones, and the relative positions of the corresponding two microphones with respect to the center of the eyebrows, obtain the initial position coordinates of the sound source relative to the head.

[0090] Since the position of each microphone on the VR device is fixed, the distance between any two microphones is also fixed, and the signal reception time difference between any two microphones is also fixed, and since the speed of sound propagation is constant, the distance difference between the two microphones for the same sound source to reach the same sound source is a constant value. Therefore, for each pair of microphones, a hyperbolic model representing the sound source position can be constructed based on the distance between the two microphones. Using the signal reception time difference between the two microphones, the position coordinates of the sound source on the hyperbolic model can be determined. Then, with the user's forehead as the origin, the position coordinates of the sound source are transformed into the head coordinate system based on the relative positions of the two microphones and the forehead, obtaining the position coordinates of the sound source relative to the head.

[0091] Taking two microphones as an example, such as Figure 5 As shown, Figure 4The diagram shows the location of the sound source in the target ambient sound received by MIC 1 and MIC 2. Based on the order of the start or end time of the target ambient sound received by MIC 1 and MIC 2, the quadrant of the curve where the sound source is located is determined. The sound source is located on the curve corresponding to the quadrant of the microphone that receives the target ambient sound first. Then, based on the distance between MIC 1 and MIC 2 and the signal reception time difference, the position coordinates of the sound source on the curve are calculated.

[0092] S2053: Based on the initial position coordinates, obtain the target position coordinates of the sound source relative to the head.

[0093] When a VR device contains only two microphones, the initial position coordinates between the two microphones are the target position coordinates of the sound source relative to the head; when a VR device contains more than two microphones, the average of the initial position coordinates between any two microphones is used as the target position coordinates of the sound source relative to the head.

[0094] It should be noted that, Figure 4 This is merely an example; the embodiments of this application do not impose limiting requirements on the number and location of microphones on VR devices.

[0095] It should be noted that the embodiments of this application do not impose restrictive requirements on the sound source location estimation method. For example, for microphone arrays, beamforming, minimum variance distortionless response (MVDR), subspace decomposition-based methods (such as the MUSIC algorithm), and deep learning methods can also be used.

[0096] S206: Determine whether the sound source is outside the safe zone of the VR device based on the location coordinates. If so, execute S207; otherwise, return to S201.

[0097] After determining the position coordinates of the sound source relative to the head, the positional relationship between the sound source and the safe zone can be determined based on these position coordinates.

[0098] For example, if the distance between the sound source and the head is determined to be greater than the radius of the VR device's safe zone based on the location coordinates, then the sound source is determined to be outside the safe zone.

[0099] S207: Based on the location coordinates and the intensity of the target ambient sound received by at least two microphones, calculate the size and position of the perspective area on the sphere of the safe zone of the sound source, and enable the perspective area.

[0100] When the sound source is located outside the safe zone, the spherical design of the safe zone is used to perform projection mapping based on the direction indicated by the coordinates of the location. Combined with the intensity of the target ambient sound received by at least two microphones, the position and size of the perspective area are calculated, and the perspective area is opened according to the calculated position and size.

[0101] like Figure 6 The diagram shows the calculation process for the perspective region, which mainly includes the following steps:

[0102] S2071: Transform the position coordinates from the rectangular coordinate system to the spherical coordinate system to obtain the spherical coordinates of the sound source.

[0103] Assuming the three-dimensional position coordinates of the sound source in a Cartesian coordinate system are (x, y, z), then the spherical coordinates of the sound source are: Where, radius Polar angle θ = arccos(x = z / r), azimuth angle

[0104] S2072: Project the spherical coordinates onto the spherical surface of the safe zone at equal intervals to obtain the projection point, and use the polar angle and azimuth angle of the spherical coordinates as the polar angle and azimuth angle of the projection point.

[0105] The projection point can be considered as the center of the perspective region on the sphere. The orientation of the perspective region on the sphere can be determined by the polar angle and azimuth angle of the sound source.

[0106] S2073: Obtain the target sound intensity based on the intensity of the target ambient sound received by at least two microphones.

[0107] Specifically, the formula for the target sound intensity L is as follows:

[0108]

[0109] Where n represents the number of microphones, n is an integer greater than or equal to 2, and i = 1, 2, ..., n, L i This represents the intensity of the target ambient sound received by the i-th microphone.

[0110] S2074: Determine the radius of the projection point based on the target sound intensity and the reference sound intensity.

[0111] Specifically, the formula for calculating the radius R of the projection point is:

[0112] R = α * lg(L / L0) (Formula 2)

[0113] Wherein, L0 represents the reference sound intensity, which in this embodiment is the average intensity of the background sound received by at least two microphones, α is an empirical constant confirmed based on the radius R0 of the safe zone and test data, and R represents the range of the perspective area on the sphere.

[0114] S2075: Obtain the size and position of the perspective area based on the radius, polar angle, and azimuth angle of the projection point.

[0115] like Figure 7 The diagram shown is a schematic representation of the calculated perspective region, where the radius of the sphere representing the safe zone is R0, and the coordinates of the sphere representing the sound source are... The spherical coordinates of the perspective region are It is located on the sphere on which the sound source projects onto the safe zone.

[0116] S208: Display the real-world environment image seen in the perspective area in the virtual environment.

[0117] In one optional embodiment, after enabling the perspective area, a new layer is created on top of the virtual image layer according to the size of the perspective area, and the real environment image seen by the perspective area is rendered and displayed on the new layer. This allows the user to see the real scene in the direction of the sound source when immersed in the virtual environment, avoid possible collisions, and protect their own and others' safety.

[0118] like Figure 8 The image shown is a rendering of the perspective area provided in this application. When the VR device detects the baby's crying sound based on the ambient sound received by the microphone, it calculates the position and size of the perspective area based on the location of the sound source and the intensity of the ambient sound. Through the perspective area, the real environment in the direction of the sound source is displayed in the virtual screen.

[0119] It should be noted that the embodiments of this application do not impose restrictive requirements on the display method of the perspective area. For example, in addition to displaying it as a new layer, an area can be divided on the layer of the virtual image by cutting out holes according to the spherical coordinates of the perspective area, and the real environment image seen in the view can be displayed in the area.

[0120] For example, see VR devices with two microphones. Figure 9 The flowchart of the VR experience method provided in this application embodiment mainly includes the following steps:

[0121] S901: Open the VR application and enter the VR scene.

[0122] S902: Detect whether the switch of the sound source control perspective area is turned on. If yes, execute S903; otherwise, execute S913.

[0123] S903: Obtain the configuration file for the sound source type.

[0124] This configuration file contains a series of target sound source types that affect user safety, such as breaking sounds, falling sounds, baby crying, and horn sounds.

[0125] S904: Detect whether the sound source type switch is turned on. If yes, execute S905; otherwise, execute S913.

[0126] In one alternative example, the detection of various sound sources in the configuration file can be controlled by a type switch. That is, when the sound source type switch is turned on, all types of sound sources in the configuration file can be detected, thereby preventing the problem of opening the perspective area triggered by false sound source detection.

[0127] S905: Acquire the volume of ambient sound received by the two microphones at preset time intervals.

[0128] S906: Determine if there is a volume greater than the preset volume threshold. If yes, execute S907; otherwise, return to S905.

[0129] S907: Detect the type of sound source in the environment based on the target ambient sound received by the two microphones within a period of time that exceeds a preset volume threshold.

[0130] S908: Determine if the sound source type is the target type. If yes, execute S909; otherwise, return to S905.

[0131] S909: Based on the time information of the target ambient sound received by the two microphones and the position information of the two microphones, determine the position coordinates of the sound source relative to the head.

[0132] S910: Determine whether the sound source is outside the safe zone of the VR device based on the location coordinates. If so, execute S911; otherwise, return to S905.

[0133] S911: Calculate the size and position of the visible area of ​​the sound source on the sphere of the safe zone based on the location coordinates and the intensity of the ambient sound received by the two microphones.

[0134] S912: Enable perspective view area, displaying the real environment image seen in the perspective view area in the virtual environment.

[0135] S913: To enable normal immersive interaction in a virtual environment.

[0136] It should be noted that the embodiments of this application do not impose restrictive requirements on the steps for determining the location and type of the sound source. For example, it is also possible to first detect whether the sound source is outside the safe zone, and then detect whether the sound source is the target type.

[0137] In another alternative embodiment, the sound source type switch is used only to enable sound source detection for certain types of sound sources in the configuration file.

[0138] For example, if the configuration file contains breaking sounds, falling sounds, and baby crying sounds, when the sound source type switch is not turned on, only breaking sounds and falling sounds can be detected. When the sound source type switch is turned on, in addition to detecting breaking sounds and falling sounds, baby crying sounds can also be detected.

[0139] In the embodiments of this application, based on ambient sounds received by at least two microphones on the VR device, specific types of sound sources in the real environment are detected and their position coordinates relative to the head are determined. When the position coordinates determine that the sound source is outside the safe zone, the size and position of the see-through area on the safe zone are calculated based on the ambient sounds received by at least two microphones, and the see-through area is activated to display a real environment image, thereby avoiding accidents and dangers that may occur during the immersive experience and protecting user safety. The entire process does not require additional hardware costs, and compared to existing AI detection algorithms, it does not require model training based on a pre-built dataset, saving resource costs.

[0140] Based on the same technical concept, this application provides a VR device that can implement the steps of the above-described method for displaying perspective areas based on the location of the sound source, and achieve the same technical effect.

[0141] See Figure 10 The VR device includes a processor 1001, a memory 100, a display 1003, and at least two microphones 1004. The microphones 1004, the display 1003, the memory 1002, and the processor 1001 are connected via a bus 1005.

[0142] The memory 1002 stores a computer program, and the processor 1001 performs the following operations according to the computer program:

[0143] At preset time intervals, acquire the volume of ambient sound received by at least two microphones 1004;

[0144] If at least one volume is greater than a preset volume threshold, the type of sound source in the environment is detected based on the target ambient sound received by at least two microphones 1004 during the period when the volume is greater than the preset volume threshold.

[0145] When the sound source type is the target type, the position coordinates of the sound source relative to the head are determined based on the time information of the target environment sound received by at least two microphones 1004 and the position information of at least two microphones.

[0146] If the sound source is determined to be outside the safe zone of the VR device based on the location coordinates, the size and position of the see-through area of ​​the sound source on the sphere of the safe zone are calculated based on the location coordinates and the intensity of the target ambient sound received by at least two microphones 1004, and the see-through area is enabled.

[0147] The real-world image seen through the perspective area is displayed in the virtual environment via the monitor 1003.

[0148] Optionally, the processor 1001 calculates the size and position of the perspective area projected onto the safe zone based on the position coordinates and the intensity of ambient sound received by at least two microphones 1004, combined with the spherical radius of the safe zone. Specifically, the operation is as follows:

[0149] Transform the position coordinates from the Cartesian coordinate system to the spherical coordinate system to obtain the spherical coordinates of the sound source;

[0150] The spherical coordinates are equidistantly projected onto the spherical surface of the safe zone to obtain the projection points, and the polar angle and azimuth angle of the spherical coordinates are used as the polar angle and azimuth angle of the projection points.

[0151] The target sound intensity is obtained based on the intensity of the target ambient sound received by at least two microphones 1004.

[0152] The radius of the projection point is determined based on the target sound intensity and the reference sound intensity; where the reference sound intensity is the average intensity of the background sound received by at least two microphones.

[0153] The size and position of the perspective area are obtained based on the radius, polar angle, and azimuth angle of the projection point.

[0154] Optionally, the formula for calculating the radius of the projection point is:

[0155]

[0156] R = α * lg(L / L0)

[0157] Where i = 1, 2, ..., n, L i L represents the intensity of the target ambient sound received by the i-th microphone 1004, L represents the target sound intensity, L0 represents the reference sound intensity, and R represents the radius of the projection point.

[0158] Optionally, the processor 1001 detects the type of sound source in the environment based on the target ambient sound received by at least two microphones 1004 within a period of time exceeding a preset volume threshold. Specifically, the processor 1001 performs the following operation:

[0159] For target ambient sounds received by at least two microphones 1004 within a period of time exceeding a preset volume threshold, normalization and noise reduction processing are performed respectively.

[0160] Based on the waveform and frequency domain characteristics of the processed target environmental sound, the type of sound source in the environment is determined.

[0161] Optionally, the processor 1001 determines the position coordinates of the sound source relative to the head based on the time information of the target ambient sound received by at least two microphones 1004 and the position information of at least two microphones 1004. Specifically, the operation is as follows:

[0162] The signal reception time difference between any two microphones 1004 is obtained based on the start or end time of the reception of the target ambient sound by at least two microphones 1004 respectively.

[0163] Based on the signal reception time difference between each pair of microphones 1004, the distance between the corresponding two microphones 1004, and the relative positions of the corresponding two microphones 1004 with the center of the eyebrows, the initial position coordinates of the sound source relative to the head are obtained.

[0164] Based on the initial position coordinates, the target position coordinates of the sound source relative to the head are obtained.

[0165] Optionally, the background sound includes at least one of ambient noise and noise generated by the VR device itself, and the preset volume threshold is greater than the average volume of the background sound.

[0166] In this embodiment, the memory 1002 may primarily include a program storage area and a data storage area. The program storage area may store the operating system and programs required for running instant messaging functions; the data storage area may store various instant messaging information and operation instruction sets. The memory 1002 may be volatile memory, such as random-access memory (RAM); it may also be non-volatile memory, such as read-only memory, flash memory, hard disk drive (HDD), or solid-state drive (SSD); or it may be any other medium capable of carrying or storing a desired computer program having an instruction or data structure form and accessible by a computer, but is not limited thereto. The memory 1002 may be a combination of the above-described memories.

[0167] The processor 1001 may include one or more central processing units (CPUs), GPUs, or digital processing units, etc. The processor 1001 is used to implement the steps of any of the aforementioned perspective region display methods based on sound source location when invoking a computer program stored in the memory 1002.

[0168] It should be noted that, Figure 10 This is merely an example illustrating the hardware necessary for a VR device to perform the sound source location-based perspective area display method steps provided in the embodiments of this application. Not shown, the VR device may also include conventional hardware such as microphones, communication interfaces, power supplies, and controllers.

[0169] This application embodiment does not limit the specific connection medium between the microphone 1004, display 1003, memory 1002, and processor 1001. In this application embodiment, the microphone 1004 and display 1003 are connected to the bus 1005 between the memory 1002 and processor 1001. Figure 10 The diagram uses thick lines to describe the connections between other components; these are for illustrative purposes only and should not be considered limiting. The 1005 bus can be divided into address bus, data bus, control bus, etc. For ease of description, Figure 10 It is described using only a thick line, but does not indicate that there is only one bus or one type of bus.

[0170] For ease of description, VR devices can be divided into modules (or units) according to their functions and described separately. Of course, in implementing this application, the functions of each module (or unit) can be implemented in one or more software or hardware components.

[0171] Those skilled in the art will understand that various aspects of this application can be implemented as a system, method, or program product. Therefore, various aspects of this application can be specifically implemented in the following forms: a completely hardware implementation, a completely software implementation (including firmware, microcode, etc.), or a combination of hardware and software implementations, collectively referred to herein as a "circuit," "module," or "system."

[0172] This application also provides a computer-readable storage medium for storing instructions that, when executed, can perform the steps of any of the sound source location-based perspective region display methods described in the foregoing embodiments.

[0173] This application also provides a computer program product for storing a computer program that performs the steps of any of the sound source location-based perspective region display methods in the foregoing embodiments.

[0174] Those skilled in the art will understand that embodiments of this application can be provided as methods, systems, or computer program products. Therefore, this application can take the form of a completely hardware embodiment, a completely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, this application can take the form of a computer program product embodied on one or more computer-usable storage media (including but not limited to disk storage, CD-ROM, optical storage, etc.) containing computer-usable program code.

[0175] This application is described with reference to flowchart illustrations and / or block diagrams of methods, apparatus (systems), and computer program products according to this application. It should be understood that each block of the flowchart illustrations and / or block diagrams, and combinations of blocks in the flowchart illustrations and / or block diagrams, can be implemented by computer program instructions. These computer program instructions can be provided to a processor of a general-purpose computer, special-purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, generate instructions for implementing the flowchart illustrations. Figure 1 One or more processes and / or boxes Figure 1 A device that provides the functions specified in one or more boxes.

[0176] These computer program instructions may also be stored in a computer-readable storage medium that can direct a computer or other programmable data processing device to function in a particular manner, such that the instructions stored in the computer-readable storage medium produce an article of manufacture including instruction means, which are implemented in a process Figure 1 One or more processes and / or boxes Figure 1 The function specified in one or more boxes.

[0177] These computer program instructions may also be loaded onto a computer or other programmable data processing equipment to cause a series of operational steps to be performed on the computer or other programmable equipment to produce a computer-implemented process, thereby providing instructions that execute on the computer or other programmable equipment for implementing the process. Figure 1 One or more processes and / or boxes Figure 1 The steps of the function specified in one or more boxes.

[0178] Obviously, those skilled in the art can make various modifications and variations to this application without departing from the spirit and scope of this application. Therefore, if such modifications and variations fall within the scope of the claims of this application and their equivalents, this application also intends to include such modifications and variations.

Claims

1. A method for displaying a perspective region based on the location of a sound source, characterized in that, Applied to VR devices, the method includes: At preset time intervals, acquire the volume of ambient sound received by at least two microphones on the VR device; If at least one volume is greater than a preset volume threshold, then based on the target ambient sound received by the at least two microphones during the time period when the volume is greater than the preset volume threshold, the type of sound source in the environment is detected, wherein the preset volume threshold is greater than the average volume of each background sound, and the volume of the background sound is determined in the absence of a sound source. When the sound source type is a target type, the position coordinates of the sound source relative to the head are determined based on the time information of the target environment sound received by the at least two microphones and the position information of the at least two microphones; If the sound source is determined to be outside the safe zone of the VR device based on the location coordinates, then the size and position of the see-through area of ​​the sound source on the sphere of the safe zone are calculated based on the location coordinates and the intensity of the target ambient sound received by the at least two microphones, and the see-through area is turned on. Display the real-world image seen in the perspective area in the virtual environment; The step of calculating the size and position of the visible area of ​​the sound source on the spherical surface of the safe zone based on the location coordinates and the intensity of the target ambient sound received by the at least two microphones includes: The position coordinates are transformed from a rectangular coordinate system to a spherical coordinate system to obtain the spherical coordinates of the sound source; The spherical coordinates are equidistantly projected onto the spherical surface of the safe zone to obtain projection points, and the polar angle and azimuth angle of the spherical coordinates are used as the polar angle and azimuth angle of the projection points. The target sound intensity is obtained based on the intensity of the target ambient sound received by each of the at least two microphones; The radius of the projection point is determined based on the target sound intensity and the reference sound intensity; wherein the reference sound intensity is the average intensity of the background sound received by the at least two microphones; The size and position of the perspective area are obtained based on the radius, polar angle, and azimuth angle of the projection point.

2. The method as described in claim 1, characterized in that, The formula for calculating the radius of the projection point is: in, , This represents the intensity of the target ambient sound received by the i-th microphone. L represents the target sound intensity, L0 represents the reference sound intensity, and R represents the radius of the projection point. This is an empirical constant confirmed based on the radius of the safety zone and test data.

3. The method as described in claim 1, characterized in that, The step of detecting the type of sound source in the environment based on the target ambient sound received by at least two microphones within a time period greater than the preset volume threshold includes: For target ambient sounds received by at least two microphones during a period exceeding the preset volume threshold, normalization and noise reduction processing are performed respectively. Based on the waveform and frequency domain characteristics of the processed target environmental sound, the type of sound source in the environment is determined.

4. The method according to any one of claims 1-3, characterized in that, Determining the position coordinates of the sound source relative to the head based on the time information of the target environmental sound received by the at least two microphones and the position information of the at least two microphones includes: The signal reception time difference between each pair of microphones is obtained based on the start or end time of the reception of the target environmental sound by the at least two microphones respectively. The initial position coordinates of the sound source relative to the head are obtained based on the signal reception time difference between each pair of microphones, the distance between the corresponding two microphones, and the relative positions of the corresponding two microphones to the center of the eyebrows. Based on the initial position coordinates, the target position coordinates of the sound source relative to the head are obtained.

5. The method according to any one of claims 1-3, characterized in that, The background sound includes at least one of ambient noise and noise generated by the operation of the VR device itself.

6. A VR device, characterized in that, It includes a processor, a memory, a display, and at least two microphones, wherein the microphones, the display, the memory, and the processor are connected via a bus; The memory stores a computer program, and the processor performs the following operations according to the computer program: The volume of ambient sound received by the at least two microphones is acquired at preset time intervals; If at least one volume is greater than a preset volume threshold, then based on the target ambient sound received by the at least two microphones during the time period when the volume is greater than the preset volume threshold, the type of sound source in the environment is detected, wherein the preset volume threshold is greater than the average volume of each background sound, and the volume of the background sound is determined in the absence of a sound source. When the sound source type is a target type, the position coordinates of the sound source relative to the head are determined based on the time information of the target environment sound received by the at least two microphones and the position information of the at least two microphones; If the sound source is determined to be outside the safe zone of the VR device based on the location coordinates, then the size and position of the see-through area of ​​the sound source on the sphere of the safe zone are calculated based on the location coordinates and the intensity of the target ambient sound received by the at least two microphones, and the see-through area is turned on. The real-world image seen through the perspective area is displayed in the virtual environment via the monitor. The processor calculates the size and position of the perspective area projected onto the safe zone based on the location coordinates and the intensity of the ambient sound received by the at least two microphones, combined with the spherical radius of the safe zone. Specifically, the operation is as follows: The position coordinates are transformed from a rectangular coordinate system to a spherical coordinate system to obtain the spherical coordinates of the sound source; The spherical coordinates are equidistantly projected onto the spherical surface of the safe zone to obtain projection points, and the polar angle and azimuth angle of the spherical coordinates are used as the polar angle and azimuth angle of the projection points. The target sound intensity is obtained based on the intensity of the target ambient sound received by each of the at least two microphones; The radius of the projection point is determined based on the target sound intensity and the reference sound intensity; wherein the reference sound intensity is the average intensity of the background sound received by the at least two microphones; The size and position of the perspective area are obtained based on the radius, polar angle, and azimuth angle of the projection point.

7. The VR device as described in claim 6, characterized in that, The formula for calculating the radius of the projection point is: in, , This represents the intensity of the target ambient sound received by the i-th microphone. L represents the target sound intensity, L0 represents the reference sound intensity, and R represents the radius of the projection point. This is an empirical constant confirmed based on the radius of the safety zone and test data.

8. The VR device as described in claim 6, characterized in that, The processor detects the type of sound source in the environment based on the target ambient sound received by at least two microphones within a time period greater than the preset volume threshold. Specifically, the operation is as follows: For the target ambient sound received by at least two microphones within a time period exceeding the preset volume threshold, normalization and noise reduction processing are performed respectively; Based on the waveform and frequency domain characteristics of the processed target environmental sound, the type of sound source in the environment is determined.