Screening bar element recognition and interaction intent prediction method

By using pre-trained visual language model recognition and multimodal fusion technology, the problem of insufficient versatility and intelligence of automated tools in filtering element recognition and interactive intent prediction is solved, realizing fully automated verification and efficient and reliable operation of filtering/sorting functions, improving user experience and test coverage.

CN122244883APending Publication Date: 2026-06-19GUANGZHOU PINWEI SOFTWARE CO LTD

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
GUANGZHOU PINWEI SOFTWARE CO LTD
Filing Date
2026-04-21
Publication Date
2026-06-19

AI Technical Summary

Technical Problem

Existing automation tools lack versatility and intelligence in filter bar element recognition and interaction intent prediction. They cannot adapt to various styles of filter bars and cannot predict the associated interactive behaviors of click actions, resulting in a poor user experience.

Method used

A pre-trained visual language model is used to recognize various styles of filter bars and their internal operable buttons. A multimodal fusion mechanism is used to associate visual features with text semantics, accurately classify interaction behavior types, and generate automated operation instructions based on location information. Linear scaling is combined to adapt to different device resolutions. Image enhancement for e-commerce scenarios and multimodal joint optimization are introduced to improve the model's generalization ability. The training set is optimized through a confidence verification mechanism.

🎯Benefits of technology

It achieves fully automated verification of filtering/sorting functions, reduces verification costs, improves test coverage and user experience, and breaks through the limitations of traditional tools that require manual style adaptation and cannot predict interaction intent, ensuring efficient and reliable operation of automated tools in different devices and environments.

✦ Generated by Eureka AI based on patent content.
Patent Text Reader

Abstract

This application provides a method for identifying filter bar elements and predicting interactive intent. It uses a visual language model to uniformly identify filter bar controls and buttons of different styles, and combines multimodal fusion to predict interaction behavior types (pop-up / refresh / jump), solving the problems of poor versatility and lack of intent prediction in traditional tools. Automated instructions are generated based on coordinates and adapted to different device resolutions, achieving cross-platform operation with zero code maintenance. A closed-loop system of confidence verification and training set optimization improves model robustness. Ultimately, it achieves automated verification of the filtering function, replacing time-consuming manual click testing, significantly improving verification efficiency and reliability.
Need to check novelty before this filing date? Find Prior Art