Method and apparatus for determining occlusion relationship

What is AI technical title?
AI technical title is built by PatSnap AI team. It summarizes the technical point description of the patent document.
By acquiring user interaction actions and images on the AR terminal, generating prompt information and inputting it into the segmentation network, the problem of complex computation in the existing technology is solved, and the accurate determination of foreground and background occlusion relationship is achieved, which is suitable for augmented reality applications.

CN116797624BActive Publication Date: 2026-06-16HANGZHOU YIXIAN XIANJIN TECH CO LTD

3 Cites 0 Cited by

Patent Information

Authority / Receiving Office: CN · China
Patent Type: Patents(China)
Current Assignee / Owner: HANGZHOU YIXIAN XIANJIN TECH CO LTD
Filing Date: 2023-05-30
Publication Date: 2026-06-16

Application Information

Patent Timeline

30 May 2023

Application

16 Jun 2026

Publication

CN116797624B

IPC: G06T7/194

AI Tagging

Application Domain

Image analysis

Explore More Agents

Novelty Search
Search existing technologies and assess novelty
↗
FTO
Analyze whether a product may infringe others' patents
↗
Design FTO
Check prior-design risk for exterior design
↗
Drafting
Draft patent application text based on a technical solution
↗
Find Solutions with TRIZ
Generate feasible solution to solve your technical challenge
↗

Similar Technology Patents

Get free access to AI patent search and analysis

Check patentability, review prior art and ask IP Agent with full patent context.

AI Technical Summary

⚠Technical Problem

Existing technologies involve complex computations in the process of selecting foreground and background and cannot flexibly obtain the occlusion relationship between foreground and background, resulting in inaccurate judgments.

⚗Method used

By acquiring the user's selection actions and images on the AR terminal, prompt information is generated and input into a segmentation network for segmentation, determining the occlusion relationship between the foreground and background, and supporting interactive methods such as clicking, drawing frames, drawing lines, inputting text or voice.

🎯Benefits of technology

It enables flexible acquisition of foreground and background occlusion relationships through simple interaction, improving the accuracy and efficiency of occlusion relationship determination, and is suitable for enhancing the realistic occlusion effect in real-world applications.

✦ Generated by Eureka AI based on patent content.

Smart Images

Figure CN116797624B_ABST

Patent Text Reader

Abstract

The embodiment of the application discloses a kind of determination method and device of occlusion relationship, and determination method of occlusion relationship includes: obtaining the selection action of user on AR terminal and the image selected by selection action;According to the generation prompt information of selection action;The prompt information and image are input into segmentation network for segmentation, and the occlusion relationship between foreground and background in space is obtained.The scheme provided by the application can flexibly obtain the relationship between foreground and background through simple interaction, so that the technical effect of real occlusion relationship can be achieved in augmented reality application.

Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of AR technology applications, and in particular to a method and apparatus for determining occlusion relationships. Background Technology

[0002] Augmented reality (AR) technology overlays virtual content onto a real-world scene to increase its information content or improve its appearance. However, directly overlaying virtual content onto a real-world scene can easily lead to a more cluttered augmented environment. For example, in some applications, the virtual content to be rendered might appear behind people, and directly overlaying it can obscure those people, negatively impacting the visual experience. A common solution is to use semantic segmentation or depth recovery methods to understand the scene, obtaining the mask and depth values of people or other objects to determine occlusion.

[0003] The drawback of existing technologies is that each object requires a separate network for segmentation, and they cannot flexibly determine object occlusion. For example, if there are two people in the image, and you only want one person as the foreground and the other as the background or erase them, semantic segmentation networks cannot achieve this. If instance segmentation is used, each person's mask can be obtained, but other methods are still needed for foreground selection.

[0004] There is currently no effective solution to the problem that existing technologies involve complex computations in the process of selecting foreground and background, making it difficult to flexibly acquire foreground and background and thus accurately determine occlusion relationships. Summary of the Invention

[0005] To address the aforementioned technical problems, embodiments of the present invention aim to provide a method and apparatus for determining occlusion relationships, thereby at least solving the problem that the prior art involves complex computations in the process of acquiring foreground and background selection, making it impossible to flexibly acquire foreground and background and thus accurately determine occlusion relationships.

[0006] The technical solution of this invention is implemented as follows:

[0007] This invention provides a method for determining occlusion relationships, including: acquiring a user's selection action on an AR terminal and the image selected by the selection action; generating prompt information based on the selection action; inputting the prompt information and the image into a segmentation network for segmentation to obtain the occlusion relationship between the foreground and background in space.

[0008] Optional actions include: clicking, drawing a frame, drawing a line, entering text, or speaking.

[0009] Further, optionally, generating prompt information based on the selected action includes: if the selected action is a click, obtaining the coordinates of the clicked point and generating prompt information based on the point coordinates.

[0010] Optionally, generating prompt information based on the selected action includes: when the selected action is a picture frame, obtaining the vertex coordinates of the picture frame and generating prompt information based on the vertex coordinates.

[0011] Optionally, generating prompt information based on the selected action includes: when the selected action is drawing a line, collecting a set of points for drawing the line according to a preset sampling rule, and generating prompt information based on the set of points.

[0012] Optionally, generating prompt information based on the selected action includes: if the selected action is speech, converting the speech to obtain the corresponding speech text; and generating prompt information based on the text.

[0013] Optionally, inputting the prompt information and image into the segmentation network for segmentation to obtain the occlusion relationship between the foreground and background in space includes: inputting the prompt information and image into the segmentation network for segmentation to obtain the mask value of the foreground in space; determining the background based on the mask value of the foreground; and obtaining the occlusion relationship between the foreground and background based on the foreground and background.

[0014] Further, optionally, the method further includes: when the first image and the second image are obtained by selecting an action on the AR terminal according to the time sequence, after obtaining the occlusion relationship between the foreground and the background in the first image, generating prompt information of the second image based on the mask value of the foreground in the first image, and segmenting the second image based on the prompt information of the second image and the second image through a segmentation network to obtain the occlusion relationship between the foreground and the background in the second image.

[0015] This invention provides an occlusion relationship determination device, comprising: an acquisition module for acquiring a user's selection action on an AR terminal and the image selected by the selection action; an information generation module for generating prompt information based on the selection action; and a segmentation module for inputting the prompt information and the image into a segmentation network for segmentation to obtain the occlusion relationship between the foreground and background in space.

[0016] Optional actions include: clicking, drawing a frame, drawing a line, entering text, or speaking.

[0017] This invention provides a method and apparatus for determining occlusion relationships. It acquires a user's selection action on an AR terminal and the image selected by that action; generates prompt information based on the selection action; and inputs the prompt information and image into a segmentation network for segmentation to obtain the occlusion relationship between the foreground and background in space. This allows for flexible determination of the foreground-background relationship through simple interaction, thus achieving a realistic occlusion effect in augmented reality applications. Attached Figure Description

[0018] The accompanying drawings, which are included to provide a further understanding of the invention and form part of this application, illustrate exemplary embodiments of the invention and, together with their description, serve to explain the invention and do not constitute an undue limitation thereof. In the drawings:

[0019] Figure 1 A flowchart illustrating a method for determining occlusion relationships provided in an embodiment of the present invention;

[0020] Figure 2 A schematic diagram of another method for determining occlusion relationships provided in an embodiment of the present invention;

[0021] Figure 3 This is a schematic diagram illustrating the acquisition of occlusion relationships in a method for determining occlusion relationships according to an embodiment of the present invention;

[0022] Figure 4 This is a schematic diagram of a device for determining occlusion relationships provided in an embodiment of the present invention. Detailed Implementation

[0023] To enable those skilled in the art to better understand the present invention, the technical solutions of the present invention will be clearly and completely described below with reference to the accompanying drawings of the embodiments of the present invention. Obviously, the described embodiments are only some embodiments of the present invention, and not all embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those skilled in the art without creative effort should fall within the scope of protection of the present invention.

[0024] It should be noted that the terms "first," "second," etc., in the specification, claims, and drawings of this invention are used to distinguish different objects, rather than to limit a specific order.

[0025] It should also be noted that the various embodiments of the present invention described below can be executed individually or in combination with each other, and the embodiments of the present invention do not impose specific limitations in this regard.

[0026] This invention provides a method for determining occlusion relationships. Figure 1This is a flowchart illustrating a method for determining occlusion relationships provided in an embodiment of the present invention; as shown below. Figure 1 As shown, when applied to AR devices, the method for determining occlusion relationships provided in this application embodiment includes:

[0027] Step S102: Obtain the user's selection action on the AR terminal and the image selected by the selection action;

[0028] Optional actions include: clicking, drawing a frame, drawing a line, entering text, or speaking.

[0029] Step S104: Generate prompt information based on the selected action;

[0030] Further, optionally, generating prompt information based on the selected action includes: if the selected action is a click, obtaining the coordinates of the clicked point and generating prompt information based on the point coordinates.

[0031] Optionally, generating prompt information based on the selected action includes: when the selected action is a picture frame, obtaining the vertex coordinates of the picture frame and generating prompt information based on the vertex coordinates.

[0032] Optionally, generating prompt information based on the selected action includes: when the selected action is drawing a line, collecting a set of points for drawing the line according to a preset sampling rule, and generating prompt information based on the set of points.

[0033] Optionally, generating prompt information based on the selected action includes: if the selected action is speech, converting the speech to obtain the corresponding speech text; and generating prompt information based on the text.

[0034] Step S106: Input the prompt information and image into the segmentation network for segmentation to obtain the occlusion relationship between the foreground and background in space.

[0035] Optionally, inputting the prompt information and image into the segmentation network for segmentation to obtain the occlusion relationship between the foreground and background in space includes: inputting the prompt information and image into the segmentation network for segmentation to obtain the mask value of the foreground in space; determining the background based on the mask value of the foreground; and obtaining the occlusion relationship between the foreground and background based on the foreground and background.

[0036] In summary, combining steps S102 to S106, Figure 2This is a schematic diagram illustrating another method for determining occlusion relationships provided in an embodiment of the present invention. When a user opens an augmented reality application, the default rendered content is on top, thus occluding the real scene. Therefore, when it is necessary to place a real object on top, the user can select the foreground on the screen by clicking, drawing lines, drawing frames, or speaking (i.e., the selection action in this embodiment). Based on the selection action, a prompt message is generated, and the image selected based on the prompt message and the selection action is transmitted to the segmentation network. In this embodiment, the segmentation network is illustrated using SAM and / or SEEM as examples. The SAM and / or SEEM network segments the image obtained from the prompt message and the selection action to obtain a mask value for the foreground, thereby obtaining an AR display with occlusion relationships that meet the user's needs.

[0037] like Figure 2 As shown, when a user opens an augmented reality application, the virtual content is on top by default, thus obscuring the real scene. At this time, the user can select the foreground on the screen by clicking, drawing lines, drawing frames, or using voice commands. Based on the interaction action (i.e., the selection action in this embodiment), prompt information will be generated and transmitted to the segmentation network.

[0038] Due to differences in selection operations during the generation of prompt information, the method for determining occlusion relationships provided in this application embodiment is implemented in the following manner:

[0039] When the user's interaction can be a click, for the SAM-type segmentation algorithm, the coordinates of the click can be transmitted to the SAM-type segmentation algorithm as a prompt.

[0040] When the user's interaction can be a picture frame interaction, the coordinates of the four vertices of the picture frame can generate prompt information and transmit it to the segmentation network;

[0041] When the user's interaction can be a line drawing interaction, the line is first sampled into points, and prompt information is generated based on the set of sampled points. The prompt information is then transmitted to the segmentation network. The sampling method can be uniform sampling or taking the center point, etc.

[0042] When the user's interaction can be voice interaction, it is first converted into text using a speech-to-text method, and then a prompt message is generated based on the converted text and transmitted to the segmentation network.

[0043] For SEEM-type algorithms, visual cues can be processed directly through a visual cue encoder.

[0044] After obtaining the foreground mask value through the segmentation network, the occlusion relationship between virtual content and real scene can be reset using the foreground mask.

[0045] The processing flow for a single frame is as described above.

[0046] The occlusion relationship determination method provided in this application embodiment can obtain the real occlusion relationship through interaction, and perform applications such as AR photography.

[0047] Further, optionally, the method for determining the occlusion relationship provided in this application embodiment further includes: when the first image and the second image are obtained by selecting an action on the AR terminal according to the time sequence, after obtaining the occlusion relationship between the foreground and the background in the first image, generating prompt information of the second image based on the mask value of the foreground in the first image, and segmenting the second image by a segmentation network based on the prompt information of the second image to obtain the occlusion relationship between the foreground and the background in the second image.

[0048] Specifically, such as Figure 3 As shown, Figure 3 This is a schematic diagram illustrating the acquisition of occlusion relationships in a method for determining occlusion relationships according to an embodiment of the present invention. The first image can be the image at time t, and the second image can be the image at time t+1. In this embodiment, time t and time t+1 can be times within the same time sequence. During the acquisition of the occlusion relationship between the foreground and background in the image at time t, as in steps S102 to S106, after obtaining the foreground mask value mask of the image at time t, a prompt message for the image at time t+1 is generated based on the foreground mask value mask of the image at time t.

[0049] The prompt information in the image at time t+1 can be based on the type of selected action. If the selected action is a click operation, the foreground mask value of the image at time t is sampled to obtain a set of points as a prompt; or, if the selected action is drawing a frame, the bounding box of the mask is calculated as a prompt; or, the mask is used directly as a prompt, or a combination of mask and bounding box or points is used as a prompt.

[0050] It should be noted that the embodiments of this application only use the image at time t and the image at time t+1 as examples for illustration. In actual operation, the method for determining the occlusion relationship provided in the embodiments of this application can be applied to the method of obtaining the occlusion relationship between consecutive images by cyclically executing the images of the preceding and following frames, so as to realize the method for determining the occlusion relationship provided in the embodiments of this application, without being specifically limited.

[0051] The occlusion relationship determination method provided in this application proposes to flexibly select foreground and background through interaction, thereby determining the occlusion relationship. On the screen, the user can specify the foreground by clicking, drawing lines, or drawing boxes / circles. The interactive action generates a prompt, which, along with the image of the current frame, is transmitted as input to a general segmentation network, such as an algorithm like SAM or SEEM, to obtain the foreground mask. These masks are then used as the foreground, while other unselected and unsegmented objects can be used as the background. The occlusion relationship determination method provided in this application does not require retraining the network for any new foreground object; the foreground mask can be obtained directly through interaction. The mask of the previous frame can be used as a prompt for the next frame, so that each subsequent frame has a foreground mask for occlusion determination. Single-frame operation can also be used for AR photography.

[0052] The occlusion relationship determination method provided in this application embodiment can flexibly obtain the relationship between foreground and background through interaction, thus enabling realistic occlusion relationships in augmented reality applications. This method can flexibly segment the foreground through interaction, and no retraining is required for new foreground categories. The interactive engineering (i.e., the selection operation in this application embodiment) is simple and easy to operate, and can solve the foreground selection problem that semantic segmentation struggles with. For example, in a scene where you want to take a picture, there are two people, and you don't want one of them to be in the foreground. If semantic segmentation is used, both would be considered foreground, or other interactive methods might be needed to select the foreground. The occlusion relationship determination method provided in this application embodiment can obtain the desired foreground through simple interaction, such as clicking or drawing a line on the selected person, thereby placing the virtual content behind the desired person and occluding unwanted people or objects.

[0053] This invention provides a method for determining occlusion relationships. The method involves acquiring a user's selection action on an AR terminal and the image selected by that action; generating prompt information based on the selection action; and inputting the prompt information and the image into a segmentation network for segmentation to obtain the occlusion relationship between the foreground and background in space. This allows for flexible determination of the foreground-background relationship through simple interaction, achieving a realistic occlusion effect in augmented reality applications.

[0054] This invention provides a device for determining occlusion relationships, such as... Figure 4 As shown, Figure 4This is a schematic diagram of an occlusion relationship determination device provided in an embodiment of the present invention. The occlusion relationship determination device provided in this application includes: an acquisition module 42, used to acquire a user's selection action on an AR terminal and the image selected by the selection action; an information generation module 44, used to generate prompt information based on the selection action; and a segmentation module 46, used to input the prompt information and the image into a segmentation network for segmentation to obtain the occlusion relationship between the foreground and the background in space.

[0055] Optional actions include: clicking, drawing a frame, drawing a line, entering text, or speaking.

[0056] This invention provides a device for determining occlusion relationships. It acquires a user's selection action on an AR terminal and the image selected by that action; generates prompt information based on the selection action; and inputs the prompt information and the image into a segmentation network for segmentation to obtain the occlusion relationship between the foreground and background in space. This allows for flexible determination of the foreground-background relationship through simple interaction, thus achieving a realistic occlusion effect in augmented reality applications.

[0057] This invention is described with reference to flowchart illustrations and / or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each block of the flowchart illustrations and / or block diagrams, and combinations of blocks in the flowchart illustrations and / or block diagrams, can be implemented by computer program instructions. These computer program instructions can be provided to a processor of a general-purpose computer, special-purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, generate instructions for implementing the flowchart illustrations and / or block diagrams. Figure 1 One or more processes and / or boxes Figure 1 A device that provides the functions specified in one or more boxes.

[0058] These computer program instructions may also be stored in a computer-readable storage medium that can direct a computer or other programmable data processing device to function in a particular manner, such that the instructions stored in the computer-readable storage medium produce an article of manufacture including instruction means, which are implemented in a process Figure 1 One or more processes and / or boxes Figure 1 The function specified in one or more boxes.

[0059] These computer program instructions may also be loaded onto a computer or other programmable data processing equipment to cause a series of operational steps to be performed on the computer or other programmable equipment to produce a computer-implemented process, thereby providing instructions that execute on the computer or other programmable equipment for implementing the process. Figure 1 One or more processes and / or boxes Figure 1The steps of the function specified in one or more boxes.

[0060] The above description is merely a preferred embodiment of the present invention and is not intended to limit the scope of protection of the present invention.

Claims

1. A method for determining occlusion relationships, characterized in that, include: Acquire the user's selection action on the AR terminal and the single-frame image selected by the selection action; Based on the selected action, a prompt message is generated; The prompt information and the image are input into a segmentation network for segmentation to obtain the occlusion relationship between the foreground and background in space; The selection action includes: clicking, drawing a frame, drawing a line, and entering text or voice. The step of generating prompt information based on the selected action includes: when the selected action is a click, obtaining the coordinates of the clicked point and generating the prompt information based on the coordinates of the clicked point; When the selected action is a frame, the vertex coordinates of the frame are obtained, and the prompt information is generated based on the vertex coordinates; When the selected action is drawing a line, a set of points for the line is collected according to a preset sampling rule, and the prompt information is generated based on the set of points. When the selected action is voice, the voice is converted to obtain the corresponding text; the prompt information is generated based on the text. The step of segmenting the prompt information and the image into a segmentation network to obtain the occlusion relationship between the foreground and the background in space includes: segmenting the prompt information and the image into a segmentation network to obtain the mask value of the foreground in the space; determining the background based on the mask value of the foreground; and obtaining the occlusion relationship between the foreground and the background based on the foreground and the background. The segmentation network includes SAM and / or SEEM. The image obtained by the prompt information and the selection action is segmented by the SAM and / or SEEM network to obtain the mask value of the foreground. After obtaining the mask value of the foreground by the segmentation network, the occlusion relationship between the virtual content and the real scene is reset by the mask of the foreground. The method further includes: When the first image and the second image are obtained according to the time sequence through the selection action on the AR terminal, after obtaining the occlusion relationship between the foreground and the background in the first image, the prompt information of the second image is generated according to the mask value of the foreground in the first image. The prompt information of the second image and the second image are segmented by the segmentation network to obtain the occlusion relationship between the foreground and the background in the second image. Wherein, the first image is the image at time t, and the second image is the image at time t+1; time t and time t+1 are times in the same time series; The prompt information for the image at time t+1 is based on the type of selected action. If the selected action is a click operation, the foreground mask value of the image at time t is sampled to obtain a set of points as the prompt; or, if the selected action is drawing a frame, the bounding box of the mask is calculated as the prompt.

2. A device for determining occlusion relationships, characterized in that, include: The acquisition module is used to acquire the user's selection action on the AR terminal and the single-frame image selected by the selection action; The information generation module is used to generate prompt information based on the selected action; The segmentation module is used to segment the prompt information and the image into a segmentation network to obtain the occlusion relationship between the foreground and background in space. The selection action includes: clicking, drawing a frame, drawing a line, and entering text or voice. The step of generating prompt information based on the selected action includes: when the selected action is a click, obtaining the coordinates of the clicked point and generating the prompt information based on the coordinates of the clicked point; When the selected action is a frame, the vertex coordinates of the frame are obtained, and the prompt information is generated based on the vertex coordinates; When the selected action is drawing a line, a set of points for the line is collected according to a preset sampling rule, and the prompt information is generated based on the set of points. When the selected action is voice, the voice is converted to obtain the corresponding text; the prompt information is generated based on the text. The step of segmenting the prompt information and the image into a segmentation network to obtain the occlusion relationship between the foreground and the background in space includes: segmenting the prompt information and the image into a segmentation network to obtain the mask value of the foreground in the space; determining the background based on the mask value of the foreground; and obtaining the occlusion relationship between the foreground and the background based on the foreground and the background. The segmentation network includes SAM and / or SEEM. The image obtained by the prompt information and the selection action is segmented by the SAM and / or SEEM network to obtain the mask value of the foreground. After obtaining the mask value of the foreground by the segmentation network, the occlusion relationship between the virtual content and the real scene is reset by the mask of the foreground. When the first image and the second image are obtained according to the time sequence through the selection action on the AR terminal, after obtaining the occlusion relationship between the foreground and the background in the first image, the prompt information of the second image is generated according to the mask value of the foreground in the first image. The prompt information of the second image and the second image are segmented by the segmentation network to obtain the occlusion relationship between the foreground and the background in the second image. Wherein, the first image is the image at time t, and the second image is the image at time t+1; time t and time t+1 are times in the same time series; The prompt information for the image at time t+1 is based on the type of selected action. If the selected action is a click operation, the foreground mask value of the image at time t is sampled to obtain a set of points as the prompt; or, if the selected action is drawing a frame, the bounding box of the mask is calculated as the prompt.