Structured data extraction method, terminal device, and storage medium

By using a multimodal model to detect and semantically correlate images from monitoring interfaces, the problem of data lacking physical semantics is solved, and structured data extraction with high accuracy and robustness is achieved.

CN122223733APending Publication Date: 2026-06-16SUNGROW SMART MAINTENANCE TECH CO LTD

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
SUNGROW SMART MAINTENANCE TECH CO LTD
Filing Date
2026-03-30
Publication Date
2026-06-16

AI Technical Summary

Technical Problem

In industrial monitoring scenarios, existing technologies extract data using fixed template matching methods that lack physical semantics, making the data unusable directly.

Method used

A multimodal model is used to detect images on the monitoring interface, obtain detection boxes, and match the data in the detection boxes with the corresponding physical semantics through semantic association processing, outputting a structured list of data items.

🎯Benefits of technology

It significantly improves the accuracy and robustness of data-physical semantic matching, enhances the scenario generalization ability of structured data extraction, and avoids structured failure caused by deviations in the association between data and physical semantics.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN122223733A_ABST
    Figure CN122223733A_ABST
Patent Text Reader

Abstract

The application discloses a structured data extraction method, a terminal device and a storage medium, relates to the technical field of computers, and the structured data extraction method comprises the following steps: detecting a monitoring interface image, obtaining a plurality of detection boxes, and determining the candidate regions of data in the monitoring interface image by using the detection boxes; based on image position information of the detection boxes, performing semantic correlation processing on the detection boxes by using a first multi-modal model, so that the data in the detection boxes is matched with corresponding physical semantics; and based on a matching result and the image position information of the detection boxes, outputting a list of structured data items. The spatial position and the global text semantic feature of the fused data are used, the problem that data is difficult to be accurately bound with corresponding physical semantics and data lacks physical semantics is solved, and to a certain extent, structured failure caused by the deviation of the association between data and physical semantics is avoided, and the accuracy and reliability of the extraction result are significantly improved.
Need to check novelty before this filing date? Find Prior Art