A scene text tracking method and system based on semantic guidance and structure correction

CN121963219BActive Publication Date: 2026-06-26NANKAI UNIV

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
NANKAI UNIV
Filing Date
2026-04-03
Publication Date
2026-06-26

Smart Images

  • Figure CN121963219B_ABST
    Figure CN121963219B_ABST
Patent Text Reader

Abstract

The application discloses a scene text tracking method and system based on semantic guidance and structure correction, and belongs to the technical field of image or video recognition. The method comprises the following steps: performing feature interaction on a template token sequence and a search area token sequence to output an original search area visual feature map; performing predictive token correction to obtain a corrected feature map; performing cross-expert calibration on a template frame and a search frame to generate a text semantic calibration mask; multiplying the corrected feature map and the text semantic calibration mask element by element to obtain a final fusion feature, and inputting the final fusion feature into a prediction head to perform position prediction and output a response map containing target center coordinates and size; adaptively adjusting a search scale and outputting a prediction result, and then fusing a constant speed motion model Kalman filter result to update a final target position. The application solves the problems of structure imbalance and deformation, realizes stable motion prediction, and thus realizes efficient and high-precision specific text instance tracking.
Need to check novelty before this filing date? Find Prior Art