Patents

Literature

Patsnap Eureka AI that helps you search prior art, draft patents, and assess FTO risks, powered by patent and scientific literature data.

36 results about "Monocular image" patented technology

Filter

Efficacy Topic

Property

Owner

Technical Advancement

Application Domain

Technology Topic

Technology Field Word

Patent Country/Region

Patent Type

Patent Status

Application Year

Inventor

Pose estimation method and device, computer device and storage medium

PendingCN122156300AImage analysisRadiologyNuclear medicine

The application provides a pose estimation method and device, a computer device and a storage medium. The method comprises: acquiring a monocular image, a target mask image corresponding to a target to be measured in the monocular image, and a three-dimensional model of the target to be measured; sampling a plurality of initial pose data of the target to be measured under a determined view direction; determining initial position data of the target to be measured under each initial pose data based on the target mask image; wherein the initial pose data and the initial position data constitute initial pose data, and the plurality of initial pose data constitute an initial pose estimation set; selecting target quantity candidate pose data from the initial pose estimation set, adjusting the target quantity candidate pose data, and generating a plurality of intermediate pose data; constructing a loss function, and iteratively optimizing each intermediate pose data by using the loss function to generate optimized pose data; and determining pose estimation data corresponding to the target to be measured from the plurality of optimized pose data.

Pose estimation method and device, computer device and storage medium

Pose estimation method and device, computer device and storage medium

Pose estimation method and device, computer device and storage medium

Owner:TSINGHUA UNIVERSITY

Method and system for 3d human pose estimation based on context and anatomical interaction

PendingCN122116413ABiological modelsBiometric pattern recognitionPattern recognitionHuman body

The present application belongs to the field of computer vision, and provides a three-dimensional human pose estimation method and system based on context and anatomical interaction, comprising: acquiring a monocular image and extracting a two-dimensional coordinate sequence of human joint points in the image coordinate system in the image; based on anatomical structure constraints, the two-dimensional coordinate sequence and the original image are fused to generate initial high-dimensional features that fuse visual semantics and anatomical priors; the initial high-dimensional features are input into an interactive fusion module, the local anatomical structure and the global context semantics are cooperatively modeled, and the dynamic joint dependency relationship under the current pose is captured to obtain enhanced features after deep fusion; the enhanced features are subjected to nonlinear transformation and dimension mapping, and the three-dimensional spatial coordinates of the human joint points are output. This method achieves advanced estimation accuracy and generalization ability on public benchmarks, effectively solving the ambiguity and irrationality problems in monocular three-dimensional pose estimation.

Method and system for 3d human pose estimation based on context and anatomical interaction

Owner:SHAANXI NORMAL UNIV

Video see-through camera auto focus method and system for XR headsets

PendingCN122160625ABiological modelsMountingsImaging processingFixation point

The present application relates to a kind of video perspective camera automatic focusing method and system for XR head-mounted display, belong to image processing technical field, solve the existing video perspective camera focusing speed slow, and the depth information of monocular image and focusing parameter cannot be effectively mapped problem.Method includes: the current image and its corresponding depth map of the video perspective camera of XR head-mounted display are collected;According to the fixation point coordinates of user in current image, the local region with fixation point coordinates as center is extracted from depth map as fixation area depth map;Fixation area depth map and fixation point coordinates are input into trained focusing parameter prediction model, and the target focusing parameter of video perspective camera is obtained;According to target focusing parameter, video perspective camera is adjusted, and single focusing is completed.Accurate mapping from image to focusing parameter is realized, and focusing speed is improved.

Video see-through camera auto focus method and system for XR headsets

Owner:HANGZHOU HUIJIAN ZHILIAN TECH CO LTD

Bionic human eye target recognition system and method based on monocular vision and laser radar

PendingCN122260337AImage analysisCharacter and pattern recognitionOptical flowImage contour

The application discloses a binocular vision and laser radar-based bionic human eye target recognition system and method, which comprises the following steps: S1, collecting and registering synchronous monocular images and laser radar point clouds; S2, extracting image contours and point cloud attributes to generate feature maps and attribute sets; S3, jointly encoding to generate size-distance representations; S4, constructing an improved Ginzburg-Landau phase field energy model based on the representations and attribute sets, obtaining potential function parameters and anisotropy tensors; S5, introducing SE(2) sub-Riemann geodesic direction continuity constraints into a phase field gradient term to adjust the strength; S6, aligning and optimizing evolution of adjacent frame phase fields in time by using optical flow; S7, extracting zero isograms to generate contours and output parameters; S8, inputting the parameters and size-distance representations into a three-dimensional structure program generation module for matching and screening; S9, performing consistency checking and sorting to output a recognition result. The application fuses visual and point cloud features, and improves the recognition accuracy and stability in a complex scene.

Bionic human eye target recognition system and method based on monocular vision and laser radar

Owner:BEIJING KEANKE INTELLIGENT TECH CO LTD

A monocular image sequence satellite attitude estimation method

ActiveCN119068054BLabeled dataImage sequence

The application discloses a monocular image sequence satellite attitude estimation method, and belongs to the technical field of space target perception, and comprises the following steps: acquiring satellite sequence images shot by a space-based monocular camera as input images; completing segmentation of a solar panel and a load main body; obtaining a single-frame satellite attitude based on a segmentation result; screening feature points in the satellite sequence images based on a feature extraction deep learning network and a preliminary semantic segmentation result; guiding feature point matching based on semantic information of an extracted mask; iteratively optimizing the feature point matching by using a polar line constraint; obtaining an essential matrix between adjacent images; calculating satellite attitude changes in the satellite sequence images; and combining the single-frame satellite attitude to obtain a series of attitudes of the satellite in the satellite sequence images. According to the technical scheme, the attitude estimation task can be better completed without labeled data, and the generalization performance for a non-cooperative target satellite is improved.

A monocular image sequence satellite attitude estimation method

A monocular image sequence satellite attitude estimation method

A monocular image sequence satellite attitude estimation method

Owner:BEIHANG UNIV

Geometric consistency of endoscope and instrument anchoring three-dimensional reconstruction and polyp measurement method

PendingCN122391183AGeometric consistencyPolyps size

The application discloses a geometric consistency and instrument anchoring three-dimensional reconstruction and polyp measurement method of an endoscope. The method first projects monocular image features to a Riemannian manifold space through a geometric consistency perception module, generates a high-fidelity relative depth map by using a manifold-constrained latent diffusion model, and maintains cross-domain structural consistency. Secondly, by using a scale estimation module based on instrument anchoring, biopsy forceps in a field of view are automatically identified and key points are extracted, a PnP solver and a Bayesian regression are used to solve six degrees of freedom of the instrument, and finally, geometric constraint equations are constructed based on the known physical size of the biopsy forceps, a global absolute scale factor is solved, and the relative depth map is converted into a metric three-dimensional point cloud. A few-sample linear calibration mechanism is introduced, and linear deviation caused by domain offset can be eliminated by using a small amount of clinical frames. According to the application, sub-millimeter polyp size measurement can be realized under monocular endoscopy by using conventional surgical instruments without additional hardware.

Geometric consistency of endoscope and instrument anchoring three-dimensional reconstruction and polyp measurement method

Owner:SUZHOU ENTROPTONG INTELLIGENT TECHNOLOGY CO LTD

Image processing method and electronic device

PendingCN122313077AImaging processingMultiple sensor

This application discloses an image processing method and electronic device, relating to the field of image processing technology. The method includes extracting a first set of geometric edge lines from an initial depth image corresponding to a target monocular image; filtering geometric edge lines with gradient scores greater than a target score threshold based on a depth gradient map generated from the initial depth image and the initial set of edge lines in the target monocular image to determine a second set of geometric edge lines; fusing the first and second sets of geometric edge lines to determine the image geometric edge lines of the target monocular image, and constructing a final set of geometric edge lines. This method solves the problems of severe texture interference, insufficient utilization of depth information, difficulty in balancing the number and purity of edge lines, and insufficient real-time performance in related technologies. It achieves the technical effect of filtering texture noise while ensuring the number and robustness of edge lines, meeting the requirements of multi-sensor fusion tasks.

Image processing method and electronic device

Owner:INSPUR SUZHOU INTELLIGENT TECH CO LTD

A method for travel path control based on monocular vision recognition and an unmanned laying vehicle for crack-resistant base fabric.

ActiveCN120876808BPrecisely control the direction of travelImprove laying accuracyCharacter and pattern recognitionPattern recognitionMachine vision

This invention provides a method for controlling the travel path based on monocular vision recognition and an unmanned laying vehicle for crack-resistant base fabric, belonging to the technical field of machine vision recognition. The method includes using a monocular video acquisition device to collect video information of a target reference object on the crack-resistant base fabric laying equipment and converting it into image data to determine the initial center coordinate position of the target reference object in the image; adjusting the acquisition angle of the monocular image acquisition device on the target reference object according to the initial center coordinate position of the target reference object in the image, so that the distance between the initial center coordinate position of the target reference object in the image and a preset target center coordinate position in the image meets a preset initial distance; acquiring the travel center coordinate position of the target reference object in the image in real time and calculating the deviation between the travel center coordinate position and the target center coordinate position; and controlling the travel path based on the calculated deviation. This invention can significantly improve the flatness and efficiency of crack-resistant base fabric laying and effectively reduce labor costs.

A method for travel path control based on monocular vision recognition and an unmanned laying vehicle for crack-resistant base fabric.

Owner:SHANXI EXPRESSWAY DEV CO LTD

3D multi-person human pose estimation method and system in video based on spatio-temporal attention

PendingCN122090508ACharacter and pattern recognitionBiological modelsPattern recognitionRadiology

This invention discloses a method and system for 3D multi-person human pose estimation in videos based on spatiotemporal attention, relating to the field of pose estimation technology. This invention improves the human pose estimation scheme based on spatiotemporal feature fusion, effectively reconstructing the pose information of multiple people from monocular video sequences. The model can not only accurately estimate the 3D spatial position of each individual's joints, but also clearly reconstruct the relative positions and spatial hierarchy between people, effectively avoiding pose aliasing and computational complexity problems in multi-person scenes, and possessing excellent multi-person 3D human pose estimation capabilities. Without requiring multi-view or depth sensors, it achieves 3D multi-person human pose estimation results highly consistent with real-world scenes, verifying the effectiveness and practical value of the model in achieving high-precision 3D multi-person human pose estimation under monocular image conditions.

3D multi-person human pose estimation method and system in video based on spatio-temporal attention

Owner:NORTHEASTERN UNIV CHINA

Method, device and equipment for training monocular image three-dimensional scene reconstruction model

PendingCN122391419APattern recognitionIn vehicle

The present disclosure provides a monocular image three-dimensional scene reconstruction model training method, device and equipment, which uses a pre-trained monocular depth estimation model to manufacture a relative disparity map as a depth prior, introduces the installation height of a vehicle-mounted monocular camera as a physical prior to anchor the relative disparity map into an absolute scale pseudo-depth map, uses the absolute scale pseudo-depth map as a label to determine a cross-modal depth distillation loss, and trains a monocular image three-dimensional scene reconstruction model through the cross-modal depth distillation loss, a temporal self-supervised reprojection loss and an Eikonal loss. The present disclosure uses a cross-modal pseudo-supervised distillation mechanism based on a generative model to train a monocular image three-dimensional scene reconstruction model, without the need for sensors such as a laser radar and a multi-view image array to provide a supervision signal, or the need for manual labeling. The monocular image three-dimensional reconstruction model obtained through training can reconstruct a 3D scene using only a single image, and can be deployed in real time on a vehicle-mounted embedded device.

Method, device and equipment for training monocular image three-dimensional scene reconstruction model

Owner:BEIJING TRUNK TECHNOLOGY CO LTD

Two-dimensional image space to three-dimensional geographic space mapping system based on ai target detection

PendingCN122265568ACharacter and pattern recognitionInference methodsPattern recognitionComputer vision

The application relates to the general image data processing technical field and discloses a two-dimensional image space to three-dimensional geographic space mapping system based on AI target detection, which comprises a double-channel feature decoupling unit used for extracting an image gradient frequency feature tensor and a pixel-level semantic topology tensor from a two-dimensional pixel array; a semantic prior constraint inference unit used for matching a reference three-dimensional topology tensor in a geometric parameterized asset library by using the pixel-level semantic topology tensor to generate a depth scalar matrix; and a geographic space coupling mapping unit used for correcting a pose parameter matrix through coordinate residual errors of a virtual projection node and a measured pixel topology node. The application utilizes geometric prior to establish a scale constraint mechanism, solves a scale ambiguity problem in monocular image mapping, and improves the stability of geographic space reconstruction in a complex environment.

Two-dimensional image space to three-dimensional geographic space mapping system based on ai target detection

Two-dimensional image space to three-dimensional geographic space mapping system based on ai target detection

Owner:CHINA DYNAMICS TECH SHENZHEN CO LTD

Endoscope monocular image depth estimation method and system

ActiveCN120655691BImage pairNuclear medicine

The present application relates to a kind of endoscope monocular image depth estimation method and system, wherein, method includes: obtaining endoscope monocular current frame image;Endoscope monocular current frame image is used to DA-ICGA model to train endoscope monocular, the DA-ICGA model is based on DARES, including PoseNet module, DAM-LoRA module, also creatively introduce double-branch geometric perception module and image-level contrast learning module;Through the depth estimation of the DA-ICGA model trained to the endoscope monocular image to be detected.This application can effectively solve the artifact generated by overexposure area, and then improve monocular depth estimation precision.

Endoscope monocular image depth estimation method and system

Endoscope monocular image depth estimation method and system

Endoscope monocular image depth estimation method and system

Owner:JIANGNAN UNIV

An RGB monocular depth estimation method based on meta-learning dual-flow loss balancing

PendingCN122289341AFeature extractionRgb image

This invention proposes an RGB monocular depth estimation method based on meta-learning dual-stream loss equalization. The steps include: acquiring monocular images and obtaining standardized RGB images through multi-stage image preprocessing; using a differentiated network structure of a dual-stream feature extraction module to extract shallow geometric features and deep semantic features in parallel from the standardized RGB images; constructing a cross-modal heterogeneous interaction module based on multi-head cross-attention to perform deep fusion and dynamic adaptation of shallow geometric features and deep semantic features, obtaining the final fused feature map; selecting the original dataset corresponding to the training set and obtaining the equalized total loss through meta-training; obtaining the trained depth estimation model through end-to-end joint training using the equalized total loss; inputting the RGB monocular image to be processed into the trained depth estimation model and outputting the final depth map. This invention can achieve accurate mapping from monocular images to 3D depth maps, adapt to complex interference scenes, and eliminate dependence on real depth labels.

An RGB monocular depth estimation method based on meta-learning dual-flow loss balancing

Owner:HENAN INST OF ENG

A 3D track line detection method based on spatial topology mapping

PendingCN122368938ATopology mappingFeature extraction

This invention discloses a 3D track detection method based on spatial topology mapping, belonging to the field of track detection technology. The method first obtains a high-precision 2D track probability map from a monocular image by introducing a feature extraction network with a dual attention mechanism. Then, based on a camera imaging model, the 2D track key points are back-projected into 3D space to generate an initial 3D anchor point sequence. Furthermore, an offset prediction network is designed to correct the lateral and vertical positions of the anchor points, and a two-stage iterative optimization mechanism is used to improve coordinate accuracy. Finally, end-to-end network training is achieved by jointly optimizing the loss functions of 2D segmentation and 3D detection tasks. This invention establishes a direct topology mapping from image pixels to 3D space, effectively solving the problems of low accuracy and poor robustness in 3D reconstruction of traditional methods in complex scenes, and significantly improving the accuracy and continuity of 3D geometric structure perception of track lines.

A 3D track line detection method based on spatial topology mapping

Owner:BEIHANG UNIV

Vehicle-mounted high-precision positioning method and system based on combination of beidou and monocular fusion

PendingCN122386352ATime domainComputer graphics (images)

The application discloses a network distribution vehicle-mounted high-precision positioning method and system based on Beidou and monocular fusion, relates to the technical field of positioning and navigation, and comprises the following steps: obtaining the relative displacement vector of a vehicle in adjacent frames based on a time domain differential carrier phase model; virtually converting adjacent two monocular images into a dynamic binocular stereo vision model based on the relative displacement vector, and obtaining absolute depth information of image feature points; performing catenary fitting on power lines in the image, obtaining physical geometric features of a plane where the power lines are located, and establishing a physical gravity field geometric constraint; obtaining a constraint factor based on the physical gravity field geometric constraint, the time domain differential carrier phase model displacement constraint and the absolute depth information, constructing a tightly coupled factor graph model for optimizing the motion state of the vehicle, solving the tightly coupled factor graph model, and obtaining a fusion solution containing the position, speed and attitude of the vehicle. The application realizes high-precision and high-robustness positioning and attitude determination, and improves the network distribution inspection operation efficiency.

Vehicle-mounted high-precision positioning method and system based on combination of beidou and monocular fusion

Owner:SHAOXING DAMING ELECTRICITY CONSTRUCT CO LTD

Three-dimensional reconstruction model multi-modal watermarking processing method based on decoupled injection

PendingCN122367707AFeature vectorTraining phase

This invention relates to the field of model watermarking technology and discloses a multimodal watermarking processing method based on decoupled injection of 3D reconstruction models. The method involves acquiring a monocular image input, extracting image coding features through a 3D reconstruction backbone network, activating a progressive message encoder, and using a unified semantic module to convert multimodal copyright messages into watermark feature vectors. The watermark feature vectors are then input into a feature injection branch to generate a watermarked rendered image. A progressive decoupling strategy is applied to the rendered image, including a collaborative training phase and an independent optimization phase. The progressive message encoder is removed while the 3D reconstruction backbone network is retained, achieving copyright watermark embedding verification with zero changes to the model structure. The rendering module outputs multi-view rendered images carrying the watermark, achieving zero structural changes during deployment or inference. After training, auxiliary components are removed, and the deployed model maintains the same architecture as the original, without affecting reconstruction quality, balancing embedding strength and model performance.

Three-dimensional reconstruction model multi-modal watermarking processing method based on decoupled injection

Owner:MACAO POLYTECHNIC INST

Clothing segmentation

ActiveCN117157667BImage analysis2D-image generationComputer graphics (images)Radiology

Methods and systems are disclosed for performing operations including receiving a monocular image, the monocular image including a depiction of a user wearing a garment; generating a segmentation of the garment worn by the user in the monocular image; accessing a video feed, the video feed including a plurality of monocular images received prior to the monocular image; smoothing the segmentation of the garment worn by the user using the video feed to provide a smoothed segmentation of the garment worn by the user; and applying one or more visual effects to the monocular image based on the smoothed segmentation of the garment worn by the user.

Clothing segmentation

Clothing segmentation

Clothing segmentation

Owner:SNAP INC

Methods, apparatus, equipment, and storage media for estimating depth and self-motion trajectory

ActiveCN115953468BImage analysisInternal combustion piston enginesColor imageView based

This invention discloses a method, apparatus, device, and storage medium for estimating depth and self-motion trajectory. The method includes: acquiring a preset model, a source view, and a target view; the preset model includes a depth estimation network, a motion estimation network, and an implicit cue network; the implicit cue network is used to extract static and dynamic features between the source view and the target view from the motion estimation network and identity-map them to the depth estimation network; the source view and the target view are two adjacent color images; inputting the source view and the target view into the preset model; estimating the camera's self-motion trajectory based on the motion estimation network; and estimating the depth of the source view and / or the target view based on the depth estimation network and the static and dynamic features between the source view and the target view. The solution provided by this invention can effectively alleviate the artifact problem of moving objects, improve the estimation quality of monocular image depth, and reduce pose transformation errors, thereby achieving more accurate estimation of the camera's self-motion trajectory.

Methods, apparatus, equipment, and storage media for estimating depth and self-motion trajectory

Owner:AGRICULTURAL BANK OF CHINA

A monocular 3D detection method based on analog three-view and geometric relation reasoning

PendingCN122336738APattern recognitionGeometric relations

This invention discloses a monocular 3D detection method based on simulated three-view diagrams and geometric relationship reasoning in the field of computer vision technology. The method includes: extracting multi-scale forward-looking features from a monocular image; decoupling the three-view space by mapping the multi-scale forward-looking features to simulated bird's-eye view space and side view space, generating bird's-eye view feature maps and side view feature maps, and finally generating bird's-eye view center probability priors and side view center probability priors; introducing a geometry-guided refinement module, first performing node gating on the query vector based on the bird's-eye view and side view center probability priors, then constructing a fully connected graph based on the 3D center distance predicted by the query vector for message passing and feature updating; finally, inputting the query vector into the detection head to obtain the 3D detection result. This invention effectively improves the accuracy and robustness of monocular 3D detection in occluded and dense scenes by simulating orthogonal view decoupling features and combining geometric relationship graphs for reasoning.

A monocular 3D detection method based on analog three-view and geometric relation reasoning

Owner:NAT UNIV OF DEFENSE TECH

Target detection and semantic segmentation method, device and equipment, and storage medium

ActiveCN115410167BGet rid of strong dependenceimprove performanceCharacter and pattern recognitionImaging processingPoint cloud

The application relates to the field of image processing and discloses a target detection and semantic segmentation method, device and equipment and a storage medium, the method comprising the following steps: acquiring a same frame monocular image photographed by multiple cameras, and performing depth labeling to obtain a depth map corresponding to the monocular image; converting the image coordinates of each pixel in the depth map to a camera coordinate system according to the camera internal parameter, to obtain pseudo point clouds of each pixel in the depth map under the camera coordinate system; converting the pseudo point clouds and corresponding image features to a preset bird's-eye view coordinate system to obtain corresponding bird's-eye view point clouds and bird's-eye view features; and performing target detection and semantic segmentation under the bird's-eye view perspective based on the bird's-eye view point clouds and the corresponding bird's-eye view features. The method can break the strong dependence on ranging sensors under the premise of ensuring safety, reduces the hardware cost, effectively utilizes the results of 2D image perception tasks in 3D perception tasks, extracts image information and transforms the image information to a 3D space, and improves the performance of a perception algorithm.

Target detection and semantic segmentation method, device and equipment, and storage medium

Owner:GUANGZHOU WERIDE TECH LTD CO

Line-of-sight direction detection method and apparatus, electronic device, and storage medium

ActiveCN116434315BImage enhancementImage analysisRadiologyDirection detection

Embodiments of the present disclosure provide a line-of-sight direction detection method and device, electronic equipment and storage medium. The method comprises: obtaining a binocular image, processing the binocular image based on a binocular prediction model to obtain a binocular prediction result, the binocular prediction result representing a first line-of-sight direction corresponding to the binocular image; generating monocular calibration data according to the binocular prediction result, and configuring a monocular prediction model using the monocular calibration data; and when a monocular image is detected, processing the monocular image using the monocular prediction model to obtain a target line-of-sight direction corresponding to the monocular image. The monocular calibration data is generated using the binocular image collected in a normal state and the corresponding binocular prediction result, and the monocular prediction model is calibrated, so that the monocular prediction model has similar prediction ability to the binocular prediction model used to generate the binocular prediction result, and the target line-of-sight direction obtained based on the monocular prediction model has higher accuracy.

Line-of-sight direction detection method and apparatus, electronic device, and storage medium

Owner:BEIJING ZITIAO NETWORK TECH CO LTD

A method and system for automatic selection of visual markers for mobile robots

ActiveCN122115546AImage analysisNavigation instrumentsVisual markingVisual monitoring

The present application belongs to the field of visual monitoring, and particularly relates to a visual marker automatic selection method and system for a mobile robot, comprising: acquiring monocular image frame sequences and odometer data at continuous time points; detecting and decoding visual markers for each image to obtain measurement poses of the visual markers, and adaptively assigning uncertainty covariance matrices according to pixel areas, detection confidence or distances of the visual markers; calculating traces of the covariance matrices corresponding to the measurement poses, screening out effective observations satisfying an accuracy threshold, and selecting the one with the minimum trace as a representative observation when there are multiple effective observations at the same time; calculating a pose change between adjacent representative observation time points according to the odometer and assigning a motion uncertainty covariance matrix; taking the pose at the representative observation time point as an optimization variable, constructing a weighted optimization objective function, and solving the objective function to obtain an optimized pose and output a current positioning result.

A method and system for automatic selection of visual markers for mobile robots

A method and system for automatic selection of visual markers for mobile robots

A method and system for automatic selection of visual markers for mobile robots

Owner:SHANGHAI HENGZE FUHUI INTELLIGENT TECHNOLOGY CO LTD +1

A method for estimating 6D pose of a target object based on two-dimensional monocular image and a multi-target 6-DOF pose detection and three-dimensional scene modeling system

PendingCN122134799AImage analysis3D-image renderingImaging processingRadiology

This invention discloses a 6D pose estimation method for target objects based on two-dimensional monocular images and a multi-target 6-DOF pose detection and three-dimensional scene modeling system. It includes: S0. Target object rendering: using the 3D model of the target object, obtaining rendering images from different angles and the corresponding 6-DOF pose for each rendering image; S1. Image processing: performing distortion correction on the image based on the camera's distortion parameters, using a neural network to extract and save the mask of the corresponding target object with the same size as the original 2D image, obtaining the real image mask, and updating the camera intrinsic parameters accordingly; S2. Coarse prediction of model 6-DOF pose: selecting the pose with the highest similarity between the rendering image and the real image in the offline library in step S0 as the coarse pose through multi-level similarity estimation; S3. Precise estimation of target 6-DOF pose: refining the pose of each target object through a target hybrid local refinement strategy to obtain the precise 6-DOF pose; S4. 3D scene reconstruction: transferring the obtained precise 6-DOF pose to the world coordinate system, and then combining it with the digital model of each target object to form a 3D digital model of the scene.

A method for estimating 6D pose of a target object based on two-dimensional monocular image and a multi-target 6-DOF pose detection and three-dimensional scene modeling system

Owner:SUZHOU SPARK ROBOT TECH CO LTD

A roadside target detection method and device for travel map danger warning

ActiveCN117523172BImprove object detection accuracyFeature extractionComputer graphics (images)

The application provides a roadside target detection method and device for travel map danger warning, the method comprises the following steps: processing a roadside image by using a feature extraction model to obtain visual features and ground plane features; processing the visual features by using a visual encoder to obtain visual embedding features; processing the ground plane features by using a ground plane encoder to obtain ground plane embedding features; processing the visual embedding features and the ground plane embedding features by using a ground plane guided decoder to obtain a ground plane perception object query; processing the ground plane perception object query by using a detection head to obtain a target detection result; processing the ground plane features by using a ground plane predictor to obtain a ground plane equation graph; and calculating a target depth value by using two-dimensional pixel coordinates of a target bottom surface center, the ground plane equation graph and camera parameters. The application effectively improves the target detection accuracy of the roadside monocular image and provides a key data basis for the danger warning system of the travel map.

A roadside target detection method and device for travel map danger warning

A roadside target detection method and device for travel map danger warning

A roadside target detection method and device for travel map danger warning

Owner:TSINGHUA UNIVERSITY

Incremental 3D Reconstruction Method and System for Unmanned Aerial Vehicles in Complex Mountainous Environments

PendingCN122312948AEngineering3D reconstruction

This invention discloses an incremental 3D reconstruction method and system for unmanned aerial vehicles (UAVs) in complex mountainous environments, belonging to the field of 3D reconstruction technology. The method includes: acquiring multi-view monocular images; establishing feature correspondences between images; optimizing newly added images to construct a sparse 3D point cloud; initializing each 3D point in the sparse 3D point cloud as a 3D Gaussian primitive; based on a comparison of the normal change rate and a preset steepness threshold, making the Gaussian ellipsoid anisotropic or isotropic distribution; projecting and transforming the 3D Gaussian primitives and the 3D covariance matrix to generate 2D Gaussian distributed primitives, and weighted accumulating them to generate a 2D image; constructing a joint loss function to back-optimize the parameters of the 3D Gaussian primitives to obtain a static 3D Gaussian model; and generating 3D reconstruction results based on the static 3D Gaussian model. This invention enhances the robustness of feature detection and matching in complex mountainous environments by adaptively adjusting the scale parameters based on the slope angle and mean curvature.

Incremental 3D Reconstruction Method and System for Unmanned Aerial Vehicles in Complex Mountainous Environments

Incremental 3D Reconstruction Method and System for Unmanned Aerial Vehicles in Complex Mountainous Environments

Incremental 3D Reconstruction Method and System for Unmanned Aerial Vehicles in Complex Mountainous Environments

Owner:SICHUAN UNIVERSITY OF SCIENCE AND ENGINEERING

Robust stereo matching method based on underwater prior driving data and feature enhancement

PendingCN122156944ACharacter and pattern recognitionBiological modelsParallaxStereo matching

The application discloses a robust stereo matching method based on underwater prior driving data and feature enhancement, and belongs to the technical field of computer vision. The method comprises the following steps: a physical guidance geometry synthesis module is used to simulate nonlinear geometric distortion caused by underwater refraction and the like by using monocular images and depth prior explicit simulation, and physical consistent stereo training pairs are generated; a salient structure guidance enhancement module is used to extract and fuse scene salient structure representation through an attention mechanism, and the discriminability of features in weak texture areas is improved; and a coarse-to-fine disparity prediction and adaptive process is used. The application is mainly used in underwater robot navigation, ocean resource exploration and underwater structure detection and the like, can effectively overcome the matching problem caused by appearance degradation and geometric distortion of underwater images, realizes high-precision and strong-generalization depth estimation without real stereo labeled data, supports fast adaptation only by using monocular images of a target scene, and improves the practicability and flexibility of the method.

Robust stereo matching method based on underwater prior driving data and feature enhancement

Robust stereo matching method based on underwater prior driving data and feature enhancement

Robust stereo matching method based on underwater prior driving data and feature enhancement

Owner:SHANDONG UNIV OF SCI & TECH

A monocular image sequence-based 3D hand pose estimation method and system

PendingCN122116474ACharacter and pattern recognitionBiological modelsHeat mapSpatial mapping

The application belongs to the technical field of computer vision and image processing, and particularly relates to a 3D hand posture estimation method and system based on monocular image sequences, which comprises the following steps: acquiring an original image sequence of hand postures by using a monocular camera and performing preprocessing to obtain a monocular image sequence; inputting the monocular image sequence into a 2D posture estimation module constructed by using a LSNetPose2D lightweight network to obtain a joint heat map sequence; inputting the joint heat map sequence into a 3D posture regression module constructed by using an OverLoCKGraphMLP multi-stage hybrid architecture to obtain 3D hand coordinates, and realizing end-to-end 3D hand coordinate regression. The application effectively improves the estimation accuracy and robustness of hand three-dimensional postures under monocular vision conditions by combining deep learning and spatial mapping calibration technology, and solves the inaccurate estimation problem caused by missing parallax information, motion blur and occlusion in the prior art.

A monocular image sequence-based 3D hand pose estimation method and system

Owner:JILIN INST OF CHEM TECH

Automatic pool cleaning device, control method and computer storage medium

PendingCN122344955AAcquisition apparatusComputer graphics (images)

The application provides a pool automatic cleaning device, a control method and a computer storage medium. The pool automatic cleaning device comprises a monocular image acquisition device, and the monocular image acquisition device is used for image acquisition. The method comprises the following steps: controlling the pool automatic cleaning device to move on the water surface; acquiring water surface image information through the monocular image acquisition device during the movement; performing target object identification on the water surface image information; and in the case that a target object floating on the water surface within a first predetermined distance from the pool automatic cleaning device is identified, controlling the pool automatic cleaning device to clean the target object. The method controls the pool automatic cleaning device to actively clean the garbage floating on the water surface within the first predetermined distance from the pool automatic cleaning device, thereby improving the cleaning efficiency of the pool.

Automatic pool cleaning device, control method and computer storage medium

Owner:SHENZHEN AIPER INTELLIGENT CO LTD

Monocular depth estimation method based on multi-objective federated evolutionary neural architecture search

PendingCN122244121AImage analysisCharacter and pattern recognitionIterative searchEngineering

This invention belongs to the field of computer vision and federated learning technology, and relates to a monocular depth estimation method based on multi-objective federated evolutionary neural architecture search. First, a searchable lightweight encoder supernet is constructed based on a federated learning framework, and each candidate architecture of the encoder architecture population is generated by sampling from the supernet. Second, the NSGA-II multi-objective evolutionary algorithm is used to iteratively search the encoder architecture population to obtain a Pareto-optimal set of encoder architectures that satisfies the dual objectives of depth prediction accuracy and edge inference latency. Next, a monocular depth estimation network is constructed. Then, the monocular depth estimation network is co-trained under a federated learning framework until the network converges. Finally, the monocular image to be tested is input into the trained monocular depth estimation network to obtain a pixel-level depth map corresponding to the input image. This invention solves the problems of low model efficiency, high communication overhead, and difficulty in achieving a balance between accuracy and latency faced by existing models.

Monocular depth estimation method based on multi-objective federated evolutionary neural architecture search

Monocular depth estimation method based on multi-objective federated evolutionary neural architecture search

Monocular depth estimation method based on multi-objective federated evolutionary neural architecture search

Owner:HANGZHOU NORMAL UNIVERSITY

A method for identifying urban and rural planning land based on remote sensing images

PendingCN122313258ATopology informationTexture representation

This invention relates to the field of image recognition technology, specifically to a method for identifying urban and rural planning land use based on remote sensing imagery. This invention effectively integrates visual, geometric, and topological information, performing radiometric calibration and orthorectification on the acquired optical remote sensing images and metadata. Primarily targeting high-resolution optical remote sensing images, this invention separates complementary spatial-frequency domain texture representations through frequency domain transformation and depth feature extraction techniques to address the feature confusion caused by similar appearances but differences in microstructure among ground features. Utilizing morphological reconstruction operations and projection geometric constraints, through the detection and correlation analysis of shadow areas, it recovers the object-level mask and pseudo-3D height geometric attributes of buildings from monocular images, overcoming the limitations of traditional 2D images in perceiving vertical physical features. Based on scene adjacency graphs, it performs topological modeling and contextual information aggregation on the spatial distribution relationships between ground features, achieving refined classification and identification of building functional attributes.

A method for identifying urban and rural planning land based on remote sensing images

Owner:QINGDAO URBAN PLANNING & DESIGN INST

Popular searches

Angle of view Location data Three dimensional model Pose Computer device Anatomical structures Three-dimensional space Visual perception Nonlinear transformation Depth map