Large-scale data quality anomaly detection method based on data features
A large-scale data and data feature technology, applied in unstructured text data retrieval, electronic digital data processing, text database query, etc., can solve problems such as poor versatility, low efficiency, and limited scope, and achieve large-scale and automation, the effect of improving detection efficiency
- Summary
- Abstract
- Description
- Claims
- Application Information
AI Technical Summary
Problems solved by technology
Method used
Image
Examples
Embodiment 1
[0019] A large-scale data quality anomaly detection method based on data characteristics, comprising the following steps:
[0020] Step S1: Build a database of data anomaly detection methods, set corresponding detection methods according to each data feature, and summarize and form a data anomaly detection method library.
[0021] The data anomaly detection method library is stored in the dictionary type, the tuple composed of the data feature name and its feature parameters is used as the key of the dictionary, and the anomaly detection method corresponding to the data feature is used as the value of the dictionary. Python's dictionary type is a key-value pair. Use Python's dictionary type to store data features and their anomaly detection methods. The key of the dictionary stores a tuple consisting of the name of the data feature and its feature parameters, and the value of the dictionary stores the data. The anomaly detection method corresponding to the feature, in which th...
Embodiment 2
[0031] This embodiment is generally consistent with Embodiment 1, except that the large-scale data feature traversal process is different. The large-scale data feature traversal process in this embodiment includes: scaling the value of each dimension in the word vector to be matched to 0 to 255 range, and divide 0 to 225 into several levels, modify the value of each dimension to the intermediate number in the corresponding level of the value, generate a new special word vector, and use the special word vector to calculate the cosine similarity to reduce large Computational intensity under large-scale data volume. This solution is still based on fuzzy word vectors to reduce the amount of calculations under large-scale data.
[0032] The substantive effects of the above-mentioned embodiments include: transforming the method of anomaly detection from being driven by detection rules to being driven by data characteristics, generating corresponding outlier detection methods based o...
PUM
Login to View More Abstract
Description
Claims
Application Information
Login to View More 


