Screening bar element recognition and interaction intent prediction method
By using pre-trained visual language model recognition and multimodal fusion technology, the problem of insufficient versatility and intelligence of automated tools in filtering element recognition and interactive intent prediction is solved, realizing fully automated verification and efficient and reliable operation of filtering/sorting functions, improving user experience and test coverage.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- GUANGZHOU PINWEI SOFTWARE CO LTD
- Filing Date
- 2026-04-21
- Publication Date
- 2026-06-19
AI Technical Summary
Existing automation tools lack versatility and intelligence in filter bar element recognition and interaction intent prediction. They cannot adapt to various styles of filter bars and cannot predict the associated interactive behaviors of click actions, resulting in a poor user experience.
A pre-trained visual language model is used to recognize various styles of filter bars and their internal operable buttons. A multimodal fusion mechanism is used to associate visual features with text semantics, accurately classify interaction behavior types, and generate automated operation instructions based on location information. Linear scaling is combined to adapt to different device resolutions. Image enhancement for e-commerce scenarios and multimodal joint optimization are introduced to improve the model's generalization ability. The training set is optimized through a confidence verification mechanism.
It achieves fully automated verification of filtering/sorting functions, reduces verification costs, improves test coverage and user experience, and breaks through the limitations of traditional tools that require manual style adaptation and cannot predict interaction intent, ensuring efficient and reliable operation of automated tools in different devices and environments.