The invention discloses a
mass data quality verification method based on Hadoop. The method comprises the following steps: formulating a
data quality standard; for the DDL instruction, writing
metadata information of the creation table into Hive; for the DQL statement, converting the
SQL character string into an
abstract syntax tree, performing
syntax analysis on the
abstract syntax tree, analyzing whether latest generated
SQL semantics are wrong or not according to a
data quality standard, and adding extension information; compiling the
abstract syntax tree to generate a corresponding logic
execution plan, optimizing the logic
execution plan, converting the optimized logic
execution plan into a physical plan, generating a MapReduce job, submitting the MapReduce job to the
Yarn for execution, and finally returning an execution result; storing a returned execution result into the HDFS, and carrying out
data visualization and abnormal data exporting, tracking and tracing. Therefore, thedata quality
verification effects of abnormal
data display,
traceability, easy configuration and easy classification are achieved.