Data processing method, device, storage medium and program product

By transforming high cardinality feature data into descriptive text and integrating it with unstructured data, the curse of dimensionality caused by high cardinality features is solved, enabling more efficient data processing and model training, and improving the accuracy and robustness of task processing.

CN122196173APending Publication Date: 2026-06-12CHINA UNIONPAY

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
CHINA UNIONPAY
Filing Date
2026-01-23
Publication Date
2026-06-12

AI Technical Summary

Technical Problem

In data processing, one-hot encoding of high cardinality feature data leads to the curse of dimensionality in the feature space, increases computational complexity and storage overhead, dilutes the influence of other features, and reduces the accuracy of similar data.

Method used

High cardinality feature data is transformed into descriptive text and integrated with unstructured data to form enhanced unstructured data, generating feature vectors to avoid high-dimensional one-hot encoding. Feature extraction is then performed using a natural language processing model.

🎯Benefits of technology

It significantly reduces the dimensionality of feature vectors, alleviates the computational and storage burden on models, improves data processing and model training speed, enhances the semantic coherence and expressiveness of features, and improves the accuracy and reliability of task processing results.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure 1
    Figure 1
  • Figure 2
    Figure 2
Patent Text Reader

Abstract

Embodiments of the present application provide a data processing method, device, storage medium and program product. The method comprises: obtaining first data, the first data comprising first structured data and first unstructured data; in response to the first structured data comprising at least one second data, determining a first description text of the at least one second data; the second data belonging to high cardinality characteristic data; integrating the first description text and the first unstructured data to obtain second unstructured data; determining third data corresponding to the first data based on second structured data and the second unstructured data; wherein the second structured data is obtained by removing the at least one second data from the first structured data; and the third data is used to generate a feature vector of the first data.
Need to check novelty before this filing date? Find Prior Art