Key point extraction method and apparatus
By calculating the position and pose data of the camera device and generating a virtual container for matching and panoramic acquisition, the problem of low efficiency and poor accuracy of key point detection algorithms in the existing technology is solved, and a highly efficient augmented reality effect is achieved.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- BEIJING WODONG TIANJUN INFORMATION TECH CO LTD
- Filing Date
- 2023-04-19
- Publication Date
- 2026-06-16
AI Technical Summary
Existing keypoint detection algorithms are inefficient and have poor accuracy in augmented reality scenarios, resulting in poor augmented reality presentation.
By receiving video information collected by camera equipment, calculating the position and posture data of the camera equipment to determine the reference plane, generating a virtual container and matching it, using angle data to guide panoramic acquisition, and extracting key frames to determine key points.
It improves the computational efficiency and accuracy of key point localization, making it suitable for application scenarios such as online live streaming, and achieving efficient augmented reality effects.
Smart Images

Figure CN116453021B_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the field of augmented reality technology, and in particular to a method and apparatus for extracting key points. Background Technology
[0002] In augmented reality scenarios, it is necessary to present related virtual targets to real targets in the scene. For example, when a real flower is detected on a table, a virtual flower is presented next to it. Hereinafter, the real target (such as the real flower in the example above) is referred to as the augmented reality associated target. In practical applications, before achieving the augmented reality effect, it is necessary to first locate the augmented reality associated target in the video information using known keypoint detection algorithms. However, existing keypoint detection algorithms are inefficient and have poor accuracy, resulting in poor augmented reality presentation effects. Summary of the Invention
[0003] In view of this, embodiments of the present invention provide a key point extraction method and apparatus, which can improve the computational efficiency and accuracy of key points used for locating augmented reality associated targets.
[0004] To achieve the above objectives, according to one aspect of the present invention, a key point extraction method is provided.
[0005] The key point extraction method of this invention includes: receiving first video information about the current space collected by a camera device on a user terminal; obtaining position and orientation data of the camera device from the first video information; determining the position and orientation data of a reference plane in the current space based on the position and orientation data of the camera device; generating a virtual container of a preset shape and displaying it on the video acquisition interface of the user terminal, so that the user can move and / or scale the virtual container on the video acquisition interface to match the virtual container with an augmented reality associated target in the current space; responding to the completion of the matching, determining the angle data of the camera device toward the virtual container based on the position and orientation data of the reference plane; using the angle data to guide the camera device to perform panoramic acquisition of the virtual container to obtain second video information; extracting key frames from the second video information; and determining preset nodes of the virtual container in the key frames as key points for locating the augmented reality associated target.
[0006] Optionally, determining the position and attitude data of the reference plane in the current space based on the position and attitude data of the camera device includes: extracting preset position and attitude data of a calibration object used to indicate the reference plane from the first video information; and obtaining the position and attitude data of the reference plane based on the position and attitude data of the camera device, the position and attitude data of the calibration object, and a preset calibration function.
[0007] Optionally, the method further includes: after the matching is completed, calculating the position and orientation data of the virtual container based on the position and orientation data of the reference plane, and locking the position of the virtual container based on the calculated position and orientation data of the virtual container.
[0008] Optionally, the virtual container includes multiple preset components; and the step of guiding the camera device to perform panoramic acquisition of the virtual container using the angle data to obtain second video information includes: adjusting the appearance data of the components of the virtual container corresponding to the angle data to achieve the guidance.
[0009] Optionally, extracting keyframes from the second video information includes: determining image frames whose center point in the second video information coincides with the center point of the virtual container as target frames, and determining a preset number of image frames adjacent to the target frames as keyframes.
[0010] Optionally, the preset nodes of the virtual container include the center point and edge points of the virtual container, and the virtual container matching the augmented reality associated target of the current space includes: the virtual container accommodating the augmented reality associated target at a minimum scaling ratio, and the appearance data includes color data and / or transparency data; the method further includes: after the position locking is completed, performing focusing on the virtual container based on the position and pose data of the virtual container.
[0011] To achieve the above objectives, according to another aspect of the present invention, a key point extraction device is provided.
[0012] The key point extraction device in this embodiment of the invention may include: a reference plane positioning unit, configured to receive first video information of the current space collected by a camera device at the user end, obtain position and attitude data of the camera device from the first video information, and determine the position and attitude data of the reference plane in the current space based on the position and attitude data of the camera device; an associated target matching unit, configured to generate a virtual container of a preset shape and display it on the video acquisition interface at the user end, so that the user can move and / or scale the virtual container on the video acquisition interface to match the virtual container with an augmented reality associated target in the current space; a panoramic acquisition unit, configured to: in response to the completion of the matching, determine the angle data of the camera device toward the virtual container based on the position and attitude data of the reference plane, and use the angle data to guide the camera device to perform panoramic acquisition of the virtual container to obtain second video information; and a key point determination unit, configured to extract key frames from the second video information and determine the preset nodes of the virtual container in the key frames as key points of the augmented reality associated target.
[0013] Optionally, the virtual container includes multiple preset components; and the panoramic acquisition unit is further configured to: adjust the appearance data of the components of the virtual container corresponding to the angle data according to the angle data to achieve the guidance; the key point determination unit is further configured to: determine the image frame whose image center point in the second video information coincides with the center point of the virtual container as the target frame, and determine a preset number of image frames adjacent to the target frame as the key frame.
[0014] To achieve the above objectives, according to another aspect of the present invention, an electronic device is provided.
[0015] An electronic device according to the present invention includes: one or more processors; and a storage device for storing one or more programs, wherein when the one or more programs are executed by the one or more processors, the one or more processors implement the key point extraction method provided by the present invention.
[0016] To achieve the above objectives, according to another aspect of the present invention, a computer-readable storage medium is provided.
[0017] The present invention provides a computer-readable storage medium having a computer program stored thereon, which, when executed by a processor, implements the key point extraction method provided by the present invention.
[0018] According to the technical solution of the present invention, the embodiments described above have the following advantages or beneficial effects:
[0019] After receiving the first video information captured by the camera device, the server calculates the position and orientation data of the camera device to determine the position and orientation data of the reference plane. Then, it generates a virtual container of a preset shape and displays it on the user's video capture interface, allowing the user to manipulate the virtual container to match it with an augmented reality (AR) target. After matching, the server uses the position and orientation data of the reference plane to calculate the angle data of the camera device towards the virtual container. Based on this angle data, it adjusts the color or transparency of the components of the virtual container in real time, guiding the user to perform panoramic video capture to obtain the second video information. Finally, the server extracts keyframes from the second video information and identifies preset nodes of the virtual container in the keyframes as key points. This key point extraction scheme based on matching virtual containers with AR targets is efficient and accurate, suitable for applications such as online live streaming. After key point extraction, subsequent steps can use the extracted key points to locate the AR target and synthesize it with the virtual target. Finally, the video information is transmitted to the user terminal (e.g., the viewing end in a live streaming scenario) via streaming.
[0020] The further effects of the aforementioned unconventional alternative methods will be explained below in conjunction with specific implementation methods. Attached Figure Description
[0021] The accompanying drawings are provided to better understand the invention and are not intended to unduly limit the scope of the invention. Wherein:
[0022] Figure 1 This is a schematic diagram of the main steps of the key point extraction method in an embodiment of the present invention;
[0023] Figure 2 This is a schematic diagram illustrating the specific execution steps of the key point extraction method in this embodiment of the invention;
[0024] Figure 3 This is a schematic diagram of the components of the key point extraction device in an embodiment of the present invention;
[0025] Figure 4 This is an exemplary system architecture diagram that can be applied thereto according to embodiments of the present invention;
[0026] Figure 5 This is a schematic diagram of the electronic device structure used to implement the key point extraction method in the embodiments of the present invention. Detailed Implementation
[0027] The following description, in conjunction with the accompanying drawings, illustrates exemplary embodiments of the present invention, including various details to aid understanding. These details should be considered merely exemplary. Therefore, those skilled in the art will recognize that various changes and modifications can be made to the embodiments described herein without departing from the scope and spirit of the invention. Similarly, for clarity and brevity, descriptions of well-known functions and structures are omitted in the following description.
[0028] The key point extraction method of this invention can be used in augmented reality scenarios based on video information. It is understood that in current augmented reality architectures, virtual targets are generally added for specific objects in the scene; these specific objects are referred to as augmented reality associated targets below. The scenarios to which this invention can be applied include any scenario where a user terminal captures video information through a camera device (such as a webcam) and then transmits the video information over a network. The following will use a live streaming scenario as an example. It is understood that in the target scenario, the video information collected by the broadcaster is processed by the server and then streamed to the viewer. The server can achieve augmented reality effects during the video information processing. The execution entity of the following steps and apparatus can be a server in the corresponding environment. It should be noted that, without conflict, the embodiments of this invention and the technical features in the embodiments can be combined with each other.
[0029] Figure 1 This is a schematic diagram of the main steps of the key point extraction method according to an embodiment of the present invention.
[0030] like Figure 1As shown, the key point extraction method of this invention can be specifically executed according to the following steps:
[0031] Step S101: Receive the first video information of the current space collected by the camera device at the user end, obtain the position and attitude data of the camera device from the first video information, and determine the position and attitude data of the reference plane in the current space based on the position and attitude data of the camera device.
[0032] In this step, the user terminal (the broadcaster terminal in a live streaming scenario) first collects a segment of video information from the current space, i.e., the first video information, as the positioning basis for the reference plane. In specific applications, to better achieve the relevant data collection for augmented reality, the broadcaster terminal can download and install pre-written augmented reality components from the server to execute the following steps. After receiving the first video information, the server extracts the position and orientation data of the camera device. This position and orientation data can include distance and angle data in the X, Y, and Z directions. Next, the server can use the position and orientation data of the camera device to calculate the position and orientation data of the reference plane in the current space. This reference plane can be a plane where the augmented reality associated target is located or associated, or it can be a curved surface of any shape associated with the augmented reality associated target.
[0033] In practical applications, the server can use the position and pose data of the camera device and a pre-defined calibration function (e.g., a function to convert 2D data from video information into a 3D scene on the server, which can be provided by augmented reality components) to calculate the position and pose data of the reference plane. As a preferred option for higher accuracy, more accurate reference plane positioning can also be achieved using specific calibration objects (such as known ARUCO markers). Specifically, after the broadcaster places a pre-defined calibration object indicating the reference plane on the reference plane, the server first extracts the position and pose data of the calibration object from the first video information. Then, based on the position and pose data of the camera device, the position and pose data of the calibration object, and the aforementioned calibration function, it calculates the position and pose data of the reference plane. In practical scenarios, to reduce the computational load, a certain percentage (e.g., 20%) of data frames can be extracted for a fixed duration to perform the above calculations. During the calculation, the average of the calculation results of the most recent fixed number of data frames can be used as the current calculation result, thereby improving the smoothness of the calculation and avoiding the influence of factors such as video jitter.
[0034] Step S102: Generate a virtual container of a preset shape and display it on the user's video capture interface, so that the user can move and / or scale the virtual container on the video capture interface to match the augmented reality associated target in the current space.
[0035] After identifying the reference plane in the current space, the augmented reality (AR) associated target can be located based on the reference plane's position. Specifically, the server first generates a virtual container of a preset shape and displays it on the video capture interface of the broadcaster's terminal (it can have a certain degree of transparency to prevent obstruction). This container is used to contain or enclose the AR associated target in the current space for location. The virtual container can be dragged to change its position and can also be scaled up and down using other actions or buttons. As an optional solution, the shape of the virtual container can be a fixed shape (such as spherical or cubic) or a shape adapted to the AR associated target. For example, after the server determines the shape of the AR associated target based on the broadcaster's actions, it returns virtual containers with the same shape from a variety of preset virtual containers to the broadcaster's terminal. In addition, the above virtual container can include multiple preset components. For example, a spherical virtual container is divided into 6 different spherical surfaces with different longitude ranges, and each spherical surface is a component; or a spherical virtual container is composed of 324 points evenly distributed on the spherical surface, and each point is a component. The virtual container also has multiple preset nodes to characterize the position of the virtual container, such as the center point and the edge points in the X, Y, and Z directions (i.e., the intersections of the X-axis, Y-axis, Z-axis and the virtual container).
[0036] Step S103: In response to the completion of matching, determine the angle data of the camera device facing the virtual container based on the position and attitude data of the reference plane, and use the angle data to guide the camera device to perform panoramic acquisition of the virtual container to obtain the second video information.
[0037] In practical applications, "matching complete" means that the broadcaster drags and scales the virtual target to make the virtual target accommodate the augmented reality associated target at the smallest scaling ratio. At this point, the virtual target and the augmented reality associated target have the maximum degree of positional overlap. After locking the virtual container at the current position, key point detection for the augmented reality associated target can be performed based on the virtual container.
[0038] In this embodiment of the invention, after matching is completed, the server first calculates the position and orientation data of the virtual container based on the position and orientation data of the reference plane. Based on the calculated position and orientation data of the virtual container, the server locks the position of the virtual container so that panoramic data can be subsequently collected from its fixed position. Preferably, after locking, a focusing operation can be performed on the position data of the center point of the virtual container to ensure image clarity.
[0039] Subsequently, the server can determine the angle data of the camera device facing the virtual container based on the position and attitude data of the reference plane. This angle data then guides the camera device to perform panoramic acquisition of the virtual container to obtain second video information. Preferably, the server can adjust the appearance data of the components of the virtual container corresponding to the current angle of the camera device facing the virtual container to guide the panoramic acquisition. It can be understood that the panoramic acquisition refers to acquiring all-around data of the virtual container, such as 360-degree acquisition along the meridian crossing the sphere. Its definition is related to the division of the components of the virtual container.
[0040] For example, the above appearance data may include color data, transparency data, size data, etc. Taking a component consisting of six different spheres divided by longitude range as an example, the broadcaster uses a handheld terminal to surround and film the augmented reality target. When the server detects that the angle range of the camera device facing the virtual container has covered a certain sphere, it adjusts the transparency of that sphere to zero, thus achieving transparency. The other five uncovered spheres remain in a high-transparency state, and the broadcaster can continue to surround and film in the original direction to achieve panoramic capture. Taking a component consisting of the aforementioned 324 different points divided by multiple points as an example, the broadcaster uses a handheld terminal to surround and film the augmented reality target. When the server detects that the angle range of the camera device facing the virtual container has covered a certain point, it adjusts the transparency of that point to zero, thus achieving transparency. The other uncovered points remain in a high-transparency state, and the broadcaster can continue to surround and film in the original direction to achieve panoramic capture. Clearly, the latter example involves more computation than the former.
[0041] After the panoramic data acquisition is completed, the server can obtain the second video information captured by the broadcaster during the process.
[0042] Step S104: Extract keyframes from the second video information, and determine the preset nodes of the virtual container in the keyframes as key points for locating augmented reality related targets.
[0043] In this step, the server selects high-quality image frames from the image frames of the second video information as keyframes. For example, the server determines the image frame whose center point coincides with the center point of the virtual container in the second video information as the target frame, and determines a predetermined number of image frames adjacent to the target frame as keyframes. It can be understood that adjacent target frames refer to data frame numbers close to the target frame. They can correspond to image frames before the target frame, image frames after the target frame, or both. For example, the six adjacent target frames are the three closest image frames before the target frame and the three closest image frames after the target frame.
[0044] Subsequently, the server can determine preset nodes such as center points and edge points of the virtual container in the keyframe as key points for locating augmented reality associated targets, and transmit the 3D position and pose data of the key points in the live stream message body to the next node. The next node uses the position and pose data of the key points to perform entity segmentation of the augmented reality associated targets and synthesis of the virtual targets in the video information, and then sends the final generated video information to the viewing end through the streaming model, thereby completing the augmented reality effect in the live stream scene. Although the above description is based on a live stream scenario, this invention is not limited to live stream scenarios but can be applied to any augmented reality scenario involving video information.
[0045] Figure 2 This is a schematic diagram illustrating the specific execution steps of the key point extraction method in this embodiment of the invention. See [link / reference]. Figure 2 In step S201, the server receives the first video information transmitted from the broadcaster; in step S202, the server calculates the position and orientation data of the camera device from the first video information; in step S203, the server determines the position and orientation data of the reference plane in the current space based on the calculated position and orientation data of the camera device, as the calibration basis for subsequent steps; in step S204, the server generates a virtual container of the corresponding shape; in step S205, the server displays the virtual container on the video capture interface of the broadcaster; in step S206, the broadcaster operates the virtual container to achieve matching between the virtual container and the augmented reality associated target; in step S207, the server calculates the angle of the camera device toward the virtual container based on the position and orientation data of the reference plane; in step S208, the server, based on the camera... The device adjusts the color or transparency of the corresponding virtual container components by adjusting the angle of the device toward the virtual container; in step S209, the server guides the broadcaster to perform acquisition to obtain the second video information; in step S210, the server selects keyframes from each image frame of the second video information; in step S211, the server determines the preset nodes of the virtual container in the keyframes as key points; in step S212, the server transmits the three-dimensional position and pose data of the key points to the next node through the streaming media server; in step S213, the next node performs entity segmentation and virtual target synthesis in the video information based on the three-dimensional position and pose data of the key points; in step S214, the next node transmits the finally generated video information to the viewing end through the streaming model, completing the augmented reality effect in the live broadcast scene.
[0046] In the technical solution of this invention embodiment, after receiving the first video information collected by the camera device, the server calculates the position and attitude data of the camera device and then determines the position and attitude data of the reference plane. Next, it generates a virtual container of a preset shape and displays it on the user's video acquisition interface, allowing the user to operate the virtual container to match it with an augmented reality associated target. After matching is complete, the server uses the position and attitude data of the reference plane to calculate the angle data of the camera device towards the virtual container. Based on this angle data, it adjusts the color or transparency of the components of the virtual container in real time, thereby guiding the user to perform panoramic video acquisition of the virtual container to obtain second video information. Finally, the server extracts keyframes from the second video information and determines the preset nodes of the virtual container in the keyframes as key points. The above key point extraction scheme based on matching augmented reality associated targets with virtual containers is highly efficient and accurate, suitable for application scenarios such as online live streaming. After the key points are extracted, subsequent steps can locate the augmented reality associated target based on the extracted key points and then synthesize it with the virtual target. Finally, the video information is transmitted to the user terminal via streaming.
[0047] It should be noted that the collection, gathering, updating, analysis, processing, use, transmission, and storage of user personal information involved in the technical solution of this invention all comply with the provisions of relevant laws and regulations, are used for legitimate purposes, and do not violate public order and good morals. Necessary measures are taken to prevent unauthorized access to user personal information data and to safeguard user personal information security, network security, and national security.
[0048] For the foregoing method embodiments, they are described as a series of actions for ease of description. However, those skilled in the art should understand that the present invention is not limited to the described order of actions, and some steps may actually be performed in other orders or simultaneously. Furthermore, those skilled in the art should also understand that the embodiments described in the specification are preferred embodiments, and the actions and modules involved are not necessarily essential for implementing the present invention.
[0049] To facilitate better implementation of the above-described solutions of the embodiments of the present invention, related apparatus for implementing the above-described solutions is also provided below.
[0050] Please see Figure 3 As shown, the key point extraction device provided in this embodiment of the invention may include: a reference plane positioning unit 301, an associated target matching unit 302, a panoramic acquisition unit 303, and a key point determination unit 304.
[0051] The reference plane positioning unit 301 can be used to receive first video information about the current space collected by the camera device at the user end, obtain the position and attitude data of the camera device from the first video information, and determine the position and attitude data of the reference plane in the current space based on the position and attitude data of the camera device; the associated target matching unit 302 can be used to generate a virtual container of a preset shape and display it on the video acquisition interface at the user end, so that the user can move and / or scale the virtual container on the video acquisition interface to match the virtual container with the augmented reality associated target in the current space; the panoramic acquisition unit 303 can be used to: in response to the completion of the matching, determine the angle data of the camera device facing the virtual container based on the position and attitude data of the reference plane, and use the angle data to guide the camera device to perform panoramic acquisition of the virtual container to obtain second video information; the key point determination unit 304 can be used to extract key frames from the second video information and determine the preset nodes of the virtual container in the key frames as key points of the augmented reality associated target.
[0052] In this embodiment of the invention, the virtual container includes multiple preset components; and the panoramic acquisition unit 303 can be further used to: adjust the appearance data of the components of the virtual container corresponding to the angle data according to the angle data to achieve the guidance; the key point determination unit 304 can be further used to: determine the image frame whose image center point in the second video information coincides with the center point of the virtual container as the target frame, and determine the preset number of image frames adjacent to the target frame as the key frame.
[0053] As a preferred embodiment, the reference plane positioning unit 301 can be further used to: extract preset position and attitude data of the calibration object used to indicate the reference plane from the first video information; and obtain the position and attitude data of the reference plane based on the position and attitude data of the camera device, the position and attitude data of the calibration object, and the preset calibration function.
[0054] Preferably, the panoramic acquisition unit 303 can be further used to: calculate the position and attitude data of the virtual container based on the position and attitude data of the reference plane after the matching is completed, and lock the position of the virtual container based on the calculated position and attitude data of the virtual container.
[0055] Furthermore, in this embodiment of the invention, the preset nodes of the virtual container include the center point and edge points of the virtual container, and the virtual container matching the augmented reality associated target of the current space includes: the virtual container accommodating the augmented reality associated target at the minimum scaling ratio, and the appearance data including color data and / or transparency data; the panoramic acquisition unit 303 can be further used to: after the position locking is completed, perform focusing on the virtual container based on the position and pose data of the virtual container.
[0056] According to the technical solution of this embodiment of the invention, after receiving the first video information collected by the camera device, the server calculates the position and attitude data of the camera device and then determines the position and attitude data of the reference plane. Next, it generates a virtual container of a preset shape and displays it on the user's video acquisition interface, allowing the user to operate the virtual container to match it with an augmented reality associated target. After matching is complete, the server uses the position and attitude data of the reference plane to calculate the angle data of the camera device toward the virtual container, and adjusts the color or transparency of the components of the virtual container in real time based on this angle data, thereby guiding the user to perform panoramic video acquisition of the virtual container to obtain second video information. Finally, the server extracts keyframes from the second video information and determines the preset nodes of the virtual container in the keyframes as key points. The above key point extraction scheme based on matching augmented reality associated targets with virtual containers is highly efficient and accurate, suitable for application scenarios such as online live streaming. After the key point extraction is completed, subsequent steps can locate the augmented reality associated target based on the extracted key points and then synthesize it with the virtual target. Finally, the video information is transmitted to the user terminal via streaming.
[0057] Figure 4 An exemplary system architecture 400 is shown that can be applied to the key point extraction method or key point extraction device of the present invention.
[0058] like Figure 4 As shown, system architecture 400 may include terminal devices 401, 402, and 403, network 404, and server 405 (this architecture is merely an example; the components included in a specific architecture may be adjusted according to the specific application). Network 404 serves as the medium for providing a communication link between terminal devices 401, 402, and 403 and server 405. Network 404 may include various connection types, such as wired or wireless communication links or fiber optic cables.
[0059] Users can use terminal devices 401, 402, and 403 to interact with server 405 via network 404 to receive or send messages, etc. Various communication client applications, such as live streaming applications (for example only), can be installed on terminal devices 401, 402, and 403.
[0060] Terminal devices 401, 402, and 403 can be various electronic devices with displays and web browsing capabilities, including but not limited to smartphones, tablets, laptops, and desktop computers. In a live streaming scenario, terminal devices 401, 402, and 403 can be either broadcast terminals or viewing terminals.
[0061] Server 405 can be a server that provides various services, such as a live streaming server that supports live streaming applications operated by users using terminal devices 401, 402, and 403 (for example only). The live streaming server can process received augmented reality processing requests and feed back the processing results (such as video information synthesized from augmented reality associated targets and virtual targets—for example only) to terminal devices 401, 402, and 403.
[0062] It should be noted that the key point extraction method provided in the embodiments of the present invention is generally executed by server 405, and correspondingly, the key point extraction device is generally set in server 405.
[0063] It should be understood that Figure 4 The number of terminal devices, networks, and servers shown is merely illustrative. Depending on implementation needs, any number of terminal devices, networks, and servers can be included.
[0064] The present invention also provides an electronic device. The electronic device according to an embodiment of the present invention includes: one or more processors; and a storage device for storing one or more programs, wherein when the one or more programs are executed by the one or more processors, the one or more processors implement the key point extraction method provided by the present invention.
[0065] The following is for reference. Figure 5 It shows a schematic diagram of the structure of a computer system 500 suitable for implementing an electronic device according to embodiments of the present invention. Figure 5 The electronic device shown is merely an example and should not be construed as limiting the functionality and scope of use of the embodiments of the present invention.
[0066] like Figure 5 As shown, the computer system 500 includes a central processing unit (CPU) 501, which can perform various appropriate actions and processes based on programs stored in read-only memory (ROM) 502 or programs loaded from storage section 508 into random access memory (RAM) 503. The RAM 503 also stores various programs and data required for the operation of the computer system 500. The CPU 501, ROM 502, and RAM 503 are interconnected via a bus 504. An input / output (I / O) interface 505 is also connected to the bus 504.
[0067] The following components are connected to I / O interface 505: an input section 506 including a keyboard, mouse, etc.; an output section 507 including a cathode ray tube (CRT), liquid crystal display (LCD), etc., and speakers, etc.; a storage section 508 including a hard disk, etc.; and a communication section 509 including a network interface card such as a LAN card, modem, etc. The communication section 509 performs communication processing via a network such as the Internet. A drive 510 is also connected to I / O interface 505 as needed. A removable medium 511, such as a disk, optical disk, magneto-optical disk, semiconductor memory, etc., is installed on drive 510 as needed so that computer programs read from it can be installed into storage section 508 as needed.
[0068] In particular, according to the embodiments disclosed in this invention, the processes described in the above main step diagrams can be implemented as computer software programs. For example, embodiments of this invention include a computer program product comprising a computer program carried on a computer-readable medium, the computer program containing program code for performing the methods shown in the main step diagrams. In the above embodiments, the computer program can be downloaded and installed from a network via communication section 509, and / or installed from removable medium 511. When the computer program is executed by central processing unit 501, it performs the functions defined in the system of this invention.
[0069] It should be noted that the computer-readable medium shown in this invention can be a computer-readable signal medium or a computer-readable storage medium, or any combination thereof. A computer-readable storage medium can be, for example,—but not limited to—an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination thereof. More specific examples of a computer-readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer disk, a hard disk, random access memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), optical storage device, magnetic storage device, or any suitable combination thereof. In this invention, a computer-readable storage medium can be any tangible medium containing or storing a program that can be used by or in conjunction with an instruction execution system, apparatus, or device. In this invention, a computer-readable signal medium can include a data signal propagated in baseband or as part of a carrier wave, carrying computer-readable program code. Such propagated data signals can take various forms, including but not limited to electromagnetic signals, optical signals, or any suitable combination thereof. A computer-readable signal medium may also be any computer-readable medium other than a computer-readable storage medium, which can send, propagate, or transmit a program for use by or in connection with an instruction execution system, apparatus, or device. The program code contained on the computer-readable medium may be transmitted using any suitable medium, including but not limited to: wireless, wire, optical fiber, RF, etc., or any suitable combination thereof.
[0070] The flowcharts and block diagrams in the accompanying drawings illustrate the architecture, functionality, and operation of possible implementations of systems, methods, and computer program products according to various embodiments of the present invention. In this regard, each block in a flowchart or block diagram may represent a module, segment, or portion of code containing one or more executable instructions for implementing a specified logical function. It should also be noted that in some alternative implementations, the functions indicated in the blocks may occur in a different order than those indicated in the drawings. For example, two consecutively indicated blocks may actually be executed substantially in parallel, and they may sometimes be executed in reverse order, depending on the functions involved. It should also be noted that each block in a block diagram or flowchart, and combinations of blocks in a block diagram or flowchart, may be implemented using a dedicated hardware-based system that performs the specified function or operation, or using a combination of dedicated hardware and computer instructions.
[0071] The units described in the embodiments of the present invention can be implemented in software or hardware. The described units can also be housed in a processor; for example, a processor can be described as including a reference plane positioning unit, an associated target matching unit, a panoramic acquisition unit, and a key point determination unit. The names of these units do not necessarily limit the specific unit; for example, the reference plane positioning unit can also be described as "a unit that provides reference plane position and attitude data to the panoramic acquisition unit."
[0072] In another aspect, the present invention also provides a computer-readable medium, which may be included in the device described in the above embodiments; or it may exist independently and not assembled into the device. The computer-readable medium carries one or more programs, which, when executed by the device, cause the device to perform the following steps: receiving first video information about a current space collected by a camera device at a user end; obtaining position and orientation data of the camera device from the first video information; determining position and orientation data of a reference plane in the current space based on the position and orientation data of the camera device; generating a virtual container of a preset shape and displaying it on the video acquisition interface of the user end, so that the user can move and / or scale the virtual container on the video acquisition interface to match the virtual container with an augmented reality associated target in the current space; in response to the matching completion, determining angle data of the camera device toward the virtual container based on the position and orientation data of the reference plane; using the angle data to guide the camera device to perform panoramic acquisition of the virtual container to obtain second video information; extracting keyframes from the second video information; and determining preset nodes of the virtual container in the keyframes as key points for locating the augmented reality associated target.
[0073] In the technical solution of this invention embodiment, after receiving the first video information collected by the camera device, the server calculates the position and attitude data of the camera device and then determines the position and attitude data of the reference plane. Next, it generates a virtual container of a preset shape and displays it on the user's video acquisition interface, allowing the user to operate the virtual container to match it with an augmented reality associated target. After matching is complete, the server uses the position and attitude data of the reference plane to calculate the angle data of the camera device towards the virtual container. Based on this angle data, it adjusts the color or transparency of the components of the virtual container in real time, thereby guiding the user to perform panoramic video acquisition of the virtual container to obtain second video information. Finally, the server extracts keyframes from the second video information and determines the preset nodes of the virtual container in the keyframes as key points. The above key point extraction scheme based on matching augmented reality associated targets with virtual containers is highly efficient and accurate, suitable for application scenarios such as online live streaming. After the key points are extracted, subsequent steps can locate the augmented reality associated target based on the extracted key points and then synthesize it with the virtual target. Finally, the video information is transmitted to the user terminal via streaming.
[0074] The specific embodiments described above do not constitute a limitation on the scope of protection of this invention. Those skilled in the art should understand that various modifications, combinations, sub-combinations, and substitutions can occur depending on design requirements and other factors. Any modifications, equivalent substitutions, and improvements made within the spirit and principles of this invention should be included within the scope of protection of this invention.
Claims
1. A key point extraction method, characterized in that, include: Receive first video information about the current space collected by the camera device at the user end, obtain the position and attitude data of the camera device from the first video information, and determine the position and attitude data of the reference plane in the current space based on the position and attitude data of the camera device; A virtual container of a preset shape is generated and displayed on the video capture interface of the user terminal, so that the user can move and / or zoom the virtual container on the video capture interface to match the virtual container with the augmented reality associated target of the current space; In response to the completion of the matching, the angle data of the camera device toward the virtual container is determined based on the position and attitude data of the reference plane, and the angle data is used to guide the camera device to perform panoramic acquisition of the virtual container to obtain second video information; Keyframes are extracted from the second video information, and preset nodes of the virtual container in the keyframes are determined as key points for locating the augmented reality associated target.
2. The method according to claim 1, characterized in that, Determining the position and attitude data of the reference plane in the current space based on the position and attitude data of the camera device includes: Extract the preset position and orientation data of the calibration object used to indicate the reference plane from the first video information; The position and attitude data of the reference surface are obtained based on the position and attitude data of the camera device, the position and attitude data of the calibration object, and the preset calibration function.
3. The method according to claim 1, characterized in that, The method further includes: After the matching is completed, the position and orientation data of the virtual container are calculated based on the position and orientation data of the reference plane, and the position of the virtual container is locked based on the calculated position and orientation data of the virtual container.
4. The method according to claim 1, characterized in that, The virtual container includes multiple preset components; and the step of using the angle data to guide the camera device to perform panoramic acquisition of the virtual container to obtain second video information includes: The appearance data of the components of the virtual container corresponding to the angle data are adjusted according to the angle data to achieve the guidance.
5. The method according to claim 1, characterized in that, The extraction of keyframes from the second video information includes: Image frames whose center point in the second video information coincides with the center point of the virtual container are identified as target frames, and a preset number of image frames adjacent to the target frames are identified as key frames.
6. The method according to claim 3, characterized in that, The preset nodes of the virtual container include the center point and edge points of the virtual container, and the virtual container matches the augmented reality associated target of the current space by: the virtual container accommodating the augmented reality associated target with the minimum scaling ratio; The method further includes: after the position locking is completed, performing focusing on the virtual container based on the position and pose data of the virtual container.
7. A key point extraction device, characterized in that, include: The reference plane positioning unit is used to receive first video information about the current space collected by the camera device at the user end, obtain the position and attitude data of the camera device from the first video information, and determine the position and attitude data of the reference plane in the current space based on the position and attitude data of the camera device. The associated target matching unit is used to generate a virtual container of a preset shape and display it on the video capture interface of the user terminal, so that the user can move and / or zoom the virtual container on the video capture interface to match the virtual container with the augmented reality associated target of the current space; A panoramic acquisition unit is configured to: in response to the completion of the matching, determine the angle data of the camera device toward the virtual container based on the position and attitude data of the reference plane, and use the angle data to guide the camera device to perform panoramic acquisition of the virtual container to obtain second video information; The key point determination unit is used to extract key frames from the second video information and determine the preset nodes of the virtual container in the key frames as key points of the augmented reality associated target.
8. The apparatus according to claim 7, characterized in that, The virtual container includes multiple preset components; and, The panoramic acquisition unit is further configured to: adjust the appearance data of the components of the virtual container corresponding to the angle data according to the angle data to achieve the guidance; The key point determination unit is further configured to: determine the image frame whose center point in the second video information coincides with the center point of the virtual container as the target frame, and determine a preset number of image frames adjacent to the target frame as the key frame.
9. An electronic device, characterized in that, include: One or more processors; Storage device for storing one or more programs. When the one or more programs are executed by the one or more processors, the one or more processors implement the method as described in any one of claims 1-6.
10. A computer-readable storage medium having a computer program stored thereon, characterized in that, When the program is executed by the processor, it implements the method as described in any one of claims 1-6.