Data verification processing method, device and storage medium

What is AI technical title?
AI technical title is built by PatSnap AI team. It summarizes the technical point description of the patent document.
By using multi-dimensional similarity calculation and length correction, the problem of insufficient semantic awareness in the matching of field names and tag names in existing technologies is solved, which improves the matching accuracy, robustness, and adaptability, and ensures the autonomy and controllability of the data governance process.

CN122242486APending Publication Date: 2026-06-19CHINASOFT INFORMATION SYST ENG CO LTD

View PDF 0 Cites 0 Cited by

Patent Information

Authority / Receiving Office: CN · China
Patent Type: Applications(China)
Current Assignee / Owner: CHINASOFT INFORMATION SYST ENG CO LTD
Filing Date: 2026-03-19
Publication Date: 2026-06-19

Application Information

Patent Timeline

19 Mar 2026

Application

19 Jun 2026

Publication

CN122242486A

IPC: G06F40/226; G06F40/232; G06F40/242; G06F40/247; G06F40/284; G06F40/30; G06F16/3332; G06F16/335; G06F18/22; G06N3/045; G06N5/022; G06N5/04; G06F16/903

AI Tagging

Application Domain

Semantic analysis Biological models

Explore More Agents

Novelty Search
Search existing technologies and assess novelty
↗
FTO
Analyze whether a product may infringe others' patents
↗
Design FTO
Check prior-design risk for exterior design
↗
Drafting
Draft patent application text based on a technical solution
↗
Find Solutions with TRIZ
Generate feasible solution to solve your technical challenge
↗

Similar Technology Patents

Get free access to AI patent search and analysis

Check patentability, review prior art and ask IP Agent with full patent context.

Smart Images

Figure CN122242486A_ABST

Patent Text Reader

Abstract

This application provides a data verification processing method, device, and storage medium. The method includes: acquiring database field information and standard label fields of a field to be verified; determining multiple words to be verified based on the database field information, and determining multiple standard words based on the standard label fields; performing multi-dimensional similarity calculations based on each of the words to be verified, the standard label fields, and the standard words to obtain a similarity result between the field to be verified and the standard label fields; and correcting the similarity result based on the length of the field to be verified and the length of the standard label fields to obtain matching information between the field to be verified and the standard label fields. This application can deeply understand Chinese semantics, effectively solve the problems of matching field abbreviations, synonym transformations, and domain terminology, and significantly improve matching accuracy. Simultaneously, it is fully adaptable to the domestic information technology innovation environment.

Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This application relates to the field of data processing technology, and more specifically, to a data verification processing method, device, and storage medium. Background Technology

[0002] As enterprises deepen their digital transformation, data has risen to become a new type of production factor and a core strategic asset. Against this backdrop, building an independent and controllable data governance system to achieve standardized identification, classification, and grading of physical fields in massive heterogeneous databases has become a fundamental project for ensuring data security, unlocking data value, and supporting the flow of data elements. Establishing a precise mapping relationship between business system databases (physical models) and enterprise standard data element / tag libraries (logical models) is a prerequisite and key entry point for upper-level applications such as data asset catalog construction, data standard implementation evaluation, and automatic identification of sensitive information.

[0003] In existing technologies, string matching based on keyword rules and literal similarity calculation based on edit distance are commonly used. Specifically, field names and tag names are treated as strings, and the similarity score is obtained by calculating the minimum number of edit operations and normalizing it. A threshold (such as ≥0.9) is set to trigger automatic association.

[0004] However, the aforementioned existing technologies have a serious lack of semantic awareness capabilities. Relying solely on literal matching cannot identify synonym conversions, abbreviation mappings, and domain terms, resulting in a high false negative rate and weak generalization. Summary of the Invention

[0005] The purpose of this application is to provide a data verification processing method, device and storage medium to address the shortcomings of the prior art, thereby solving the problems of high false negative rate and weak generalization in the prior art.

[0006] To achieve the above objectives, the technical solutions adopted in the embodiments of this application are as follows: In a first aspect, one embodiment of this application provides a data verification processing method, the method comprising: Obtain the database field information and standard label field of the field to be verified. The database field information includes: the table name where the field to be verified is located, the name of the field to be verified, the comment of the field to be verified, and the value of the field to be verified. Based on the database field information, multiple words to be verified are determined, and based on the standard tag field, multiple standard words are determined; Based on the words to be verified, the standard label fields, and the standard words, a multi-dimensional similarity calculation is performed to obtain the similarity result between the field to be verified and the standard label fields. Based on the length of the field to be verified and the length of the standard label field, the similarity result is corrected to obtain the matching information between the field to be verified and the standard label field. The matching information is used to indicate the recommended governance method for the field to be verified and the confidence level of the recommended governance method.

[0007] In one possible implementation, determining multiple standard terms based on the standard tag field includes: Explicit delimiter identification is performed on the standard label field to obtain multiple atomic fields; Based on a pre-defined domain knowledge base, each atomic field is mapped to obtain multiple standard atomic fields; Based on preset semantic rules, the standard atomic fields are semantically normalized to obtain multiple standard vocabularies.

[0008] In one possible implementation, the step of performing multi-dimensional similarity calculation based on each of the words to be verified, the standard label field, and each of the standard words to obtain the similarity result between the field to be verified and the standard label field includes: Global semantic similarity is calculated based on each of the words to be verified and each of the standard words to obtain the global semantic similarity result between the field to be verified and the standard label field. Based on the words to be verified and the standard words, the word overlap similarity is calculated to obtain the word overlap similarity result between the field to be verified and the standard label field. The glyph similarity is calculated based on each of the words to be verified and the standard label field to obtain the glyph similarity result between the field to be verified and the standard label field. Based on the semantic association similarity calculation of each of the words to be verified and each of the standard words, the semantic association similarity result between the field to be verified and the standard label field is obtained; The similarity results between the field to be verified and the standard label field are obtained by weighted fusion of the global semantic similarity results, the word overlap similarity results, the character shape similarity results, and the semantic association similarity results.

[0009] In one possible implementation, the step of calculating global semantic similarity based on each of the words to be verified and each of the standard words to obtain the global semantic similarity result between the field to be verified and the standard label field includes: A first vector set is generated based on each of the stated words to be verified, and a second vector set is generated based on each of the stated standard words. The first vector set is divided into blocks to obtain multiple vector blocks, each containing multiple single-precision floating-point numbers; The first target instruction of the preset instruction set is invoked to load each vector block of the first vector into the preset first register, and the vector block at the corresponding position of the second vector is loaded into the preset second register; The second target instruction is invoked to perform parallel multiplication and accumulation operations on multiple corresponding floating-point numbers in the first register and the second register to obtain the similarity results corresponding to each vector block in the first vector. Based on the similarity results corresponding to each vector block in the first vector, the global semantic similarity result between the field to be verified and the standard label field is obtained.

[0010] In one possible implementation, the step of calculating glyph similarity based on each of the words to be verified and the standard label field to obtain the glyph similarity result between the words to be verified and the standard label field includes: Each of the words to be verified is filtered to obtain at least one target word to be verified; The glyph similarity between the target words to be verified and the standard label field is calculated to obtain the glyph similarity result between the words to be verified and the standard label field.

[0011] In one possible implementation, the step of calculating glyph similarity based on each of the target words to be verified and the standard label field to obtain the glyph similarity result between the words to be verified and the standard label field includes: Calculate the minimum number of edits required between each target word to be verified and the standard label field at the string level to obtain the minimum number of edits required between each target word to be verified and the standard label field; The glyph similarity result between the target words to be verified and the standard label field is determined based on the minimum number of edits required between each target word to be verified and the standard label field.

[0012] In one possible implementation, the step of correcting the similarity result based on the length of the field to be verified and the length of the standard label field to obtain matching information between the field to be verified and the standard label field includes: Calculate the ratio of the length of the field to be verified to the length of the standard label field to obtain the ratio result; Based on the ratio result, the similarity result is corrected to obtain the matching information between the field to be verified and the standard label field.

[0013] In one possible implementation, the step of correcting the similarity result based on the ratio result to obtain the matching information between the field to be verified and the standard label field includes: Based on the ratio results, determine the initial adjustment factor; Based on the word overlap similarity results between the field to be verified and the standard label field, the initial adjustment factor is adjusted to obtain the target adjustment factor; The similarity result is corrected according to the target adjustment factor to obtain the target similarity result between the field to be verified and the standard label field; Based on the target similarity result and the preset similarity threshold, the matching information between the field to be verified and the standard label field is determined.

[0014] Secondly, another embodiment of this application provides a data verification processing apparatus, the apparatus comprising: The acquisition module is used to acquire the database field information and standard label field of the field to be verified. The database field information includes: the table name where the field to be verified is located, the name of the field to be verified, the comment of the field to be verified, and the value of the field to be verified. The determination module is used to determine multiple words to be verified based on the database field information, and to determine multiple standard words based on the standard tag field; The calculation module is used to perform multi-dimensional similarity calculation based on each of the words to be verified, the standard label field, and each of the standard words to obtain the similarity result between the field to be verified and the standard label field. The correction module is used to correct the similarity result based on the length of the field to be verified and the length of the standard label field, so as to obtain the matching information between the field to be verified and the standard label field. The matching information is used to indicate the recommended governance method of the field to be verified and the confidence level of the recommended governance method.

[0015] In one possible implementation, the determining module is specifically used for: Explicit delimiter identification is performed on the standard label field to obtain multiple atomic fields; Based on a pre-defined domain knowledge base, each atomic field is mapped to obtain multiple standard atomic fields; Based on preset semantic rules, the standard atomic fields are semantically normalized to obtain multiple standard vocabularies.

[0016] In one possible implementation, the computing module is specifically used for: Global semantic similarity is calculated based on each of the words to be verified and each of the standard words to obtain the global semantic similarity result between the field to be verified and the standard label field. Based on the words to be verified and the standard words, the word overlap similarity is calculated to obtain the word overlap similarity result between the field to be verified and the standard label field. The glyph similarity is calculated based on each of the words to be verified and the standard label field to obtain the glyph similarity result between the field to be verified and the standard label field. Based on the semantic association similarity calculation of each of the words to be verified and each of the standard words, the semantic association similarity result between the field to be verified and the standard label field is obtained; The similarity results between the field to be verified and the standard label field are obtained by weighted fusion of the global semantic similarity results, the word overlap similarity results, the character shape similarity results, and the semantic association similarity results.

[0017] In one possible implementation, the computing module is specifically used for: A first vector set is generated based on each of the stated words to be verified, and a second vector set is generated based on each of the stated standard words. The first vector set is divided into blocks to obtain multiple vector blocks, each containing multiple single-precision floating-point numbers; The first target instruction of the preset instruction set is invoked to load each vector block of the first vector into the preset first register, and the vector block at the corresponding position of the second vector is loaded into the preset second register; The second target instruction is invoked to perform parallel multiplication and accumulation operations on multiple corresponding floating-point numbers in the first register and the second register to obtain the similarity results corresponding to each vector block in the first vector. Based on the similarity results corresponding to each vector block in the first vector, the global semantic similarity result between the field to be verified and the standard label field is obtained.

[0018] In one possible implementation, the computing module is specifically used for: Each of the words to be verified is filtered to obtain at least one target word to be verified; The glyph similarity between the target words to be verified and the standard label field is calculated to obtain the glyph similarity result between the words to be verified and the standard label field.

[0019] In one possible implementation, the step of calculating glyph similarity based on each of the target words to be verified and the standard label field to obtain the glyph similarity result between the words to be verified and the standard label field includes: Calculate the minimum number of edits required between each target word to be verified and the standard label field at the string level to obtain the minimum number of edits required between each target word to be verified and the standard label field; The glyph similarity result between the target words to be verified and the standard label field is determined based on the minimum number of edits required between each target word to be verified and the standard label field.

[0020] In one possible implementation, the correction module is specifically used for: Calculate the ratio of the length of the field to be verified to the length of the standard label field to obtain the ratio result; Based on the ratio result, the similarity result is corrected to obtain the matching information between the field to be verified and the standard label field.

[0021] In one possible implementation, the correction module is specifically used for: Based on the ratio results, determine the initial adjustment factor; Based on the word overlap similarity results between the field to be verified and the standard label field, the initial adjustment factor is adjusted to obtain the target adjustment factor; The similarity result is corrected according to the target adjustment factor to obtain the target similarity result between the field to be verified and the standard label field; Based on the target similarity result and the preset similarity threshold, the matching information between the field to be verified and the standard label field is determined.

[0022] Thirdly, another embodiment of this application provides an electronic device, including: a processor, a storage medium, and a bus, wherein the storage medium stores machine-readable instructions executable by the processor, and when the electronic device is running, the processor communicates with the storage medium via the bus, and the processor executes the machine-readable instructions to perform the steps of any of the methods described in the first aspect above.

[0023] Fourthly, another embodiment of this application provides a computer-readable storage medium storing a computer program that, when executed by a processor, performs the steps of any of the methods described in the first aspect above.

[0024] The beneficial effects of this application are as follows: By obtaining the database field information and standard label fields of the field to be verified, and determining multiple words to be verified based on the database field information and multiple standard words based on the standard label fields, a multi-dimensional similarity calculation is performed on each word to be verified, the standard label fields, and the standard words to obtain the similarity result between the field to be verified and the standard label fields. The similarity result is then corrected based on the length of the field to be verified and the length of the standard label fields to obtain the matching information between the field to be verified and the standard label fields. This approach allows for a deeper understanding of Chinese semantics, without relying on single keyword matching or simple edit distance calculation. It effectively solves the challenges of matching field abbreviations, synonym transformations, and domain terminology, significantly improving matching accuracy and enhancing the robustness and adaptability of the algorithm. Furthermore, it is fully compatible with the domestic information technology innovation environment, ensuring data sovereignty and independent controllability in the data governance process.

[0025] Furthermore, by using the standard label field to identify multiple standard words, multi-dimensional similarity calculations can be performed using each standard word. The original label string can also be transformed into a data meta-paradigm, which upgrades the matching process from string similarity to semantic equivalence, thus further improving the matching accuracy. Attached Figure Description

[0026] To more clearly illustrate the technical solutions of the embodiments of this application, the accompanying drawings used in the embodiments will be briefly introduced below. It should be understood that the following drawings only show some embodiments of this application and should not be regarded as a limitation of the scope. For those skilled in the art, other related drawings can be obtained based on these drawings without creative effort.

[0027] Figure 1 A flowchart illustrating a data verification processing method provided in an embodiment of this application; Figure 2 A flowchart illustrating the determination of multiple standard terms in the data verification processing method provided in this application embodiment; Figure 3 This is a flowchart illustrating the process of obtaining the similarity result between the field to be verified and the standard label field in the data verification processing method provided in the embodiments of this application. Figure 4 This is a flowchart illustrating the process of obtaining the similarity result between the field to be verified and the standard label field in the data verification processing method provided in the embodiments of this application. Figure 5 This is a flowchart illustrating the process of obtaining the glyph similarity result between the field to be verified and the standard label field in the data verification processing method provided in the embodiments of this application. Figure 6This is another flowchart illustrating the process of obtaining the glyph similarity result between the field to be verified and the standard label field in the data verification processing method provided in the embodiments of this application. Figure 7 This is a flowchart illustrating the process of obtaining matching information between the field to be verified and the standard label field in the data verification processing method provided in this application embodiment; Figure 8 This is a flowchart illustrating the process of obtaining matching information between the field to be verified and the standard label field in the data verification processing method provided in this application embodiment; Figure 9 A schematic diagram of a data verification processing device provided in an embodiment of this application; Figure 10 This is a schematic diagram of the electronic device structure provided in an embodiment of this application. Detailed Implementation

[0028] To make the objectives, technical solutions, and advantages of the embodiments of this application clearer, the technical solutions of the embodiments of this application will be clearly and completely described below with reference to the accompanying drawings. It should be understood that the accompanying drawings in this application are for illustrative and descriptive purposes only and are not intended to limit the scope of protection of this application. Furthermore, it should be understood that the schematic drawings are not drawn to scale. The flowcharts used in this application illustrate operations implemented according to some embodiments of this application. It should be understood that the operations in the flowcharts may not be implemented in sequence, and steps without logical contextual relationships may be reversed or implemented simultaneously. In addition, those skilled in the art, guided by the content of this application, may add one or more other operations to the flowcharts, or remove one or more operations from the flowcharts.

[0029] Furthermore, the described embodiments are merely some, not all, of the embodiments of this application. The components of the embodiments of this application described and illustrated herein can typically be arranged and designed in various different configurations. Therefore, the following detailed description of the embodiments of this application provided in the accompanying drawings is not intended to limit the scope of the claimed application, but merely to illustrate selected embodiments of the application. All other embodiments obtained by those skilled in the art based on the embodiments of this application without inventive effort are within the scope of protection of this application.

[0030] It should be noted that the term "comprising" will be used in the embodiments of this application to indicate the presence of the features declared thereafter, but does not exclude the addition of other features.

[0031] In existing technologies, methods such as manual matching, string matching based on keyword rules, and literal similarity calculation based on edit distance are commonly used. Specifically, in string matching, field names and tag names are treated as strings, and the similarity score is obtained by calculating the minimum number of edit operations and normalizing it. A threshold (such as ≥0.9) is set to trigger automatic association.

[0032] However, the aforementioned existing technologies suffer from a severe lack of semantic awareness. Relying solely on literal matching fails to recognize synonym conversions, abbreviation mappings, and domain-specific terms, resulting in a high false negative rate and weak generalization. Furthermore, there are high requirements for the algorithm's autonomy, controllability, and environmental adaptability.

[0033] Based on the aforementioned problems, this application proposes a data verification processing method. This method obtains database field information and standard label fields for the field to be verified. Based on the database field information, multiple words to be verified are determined, and based on the standard label fields, multiple standard words are determined. Multi-dimensional similarity calculations are then performed on each word to be verified, each standard label field, and each standard word to obtain the similarity result between the field to be verified and the standard label fields. The similarity result is then corrected based on the lengths of the field to be verified and the standard label fields to obtain the matching information between the field to be verified and the standard label fields. This method achieves a deeper understanding of Chinese semantics without relying on single keyword matching or simple edit distance calculations. It effectively solves the challenges of matching field abbreviations, synonym transformations, and domain terminology, significantly improving matching accuracy, enhancing the robustness and adaptability of the algorithm, and ensuring data sovereignty and autonomous control during the data governance process. It is fully compatible with the domestic information technology innovation environment.

[0034] It is understood that the data verification processing method provided in this application embodiment can be applied to the step of establishing the mapping relationship between the business system database (physical model) and the enterprise standard data element or tag library (logical model) in the implementation process of data governance. By executing the data verification processing method provided in this application embodiment, automatic classification, hierarchical and standardized management of data can be achieved.

[0035] Specifically, by executing the data verification processing method provided in this application embodiment, the obtained matching information can be applied to the scenario of automatic construction of data asset catalog. In this scenario, the matching information is used to batch associate physical table fields with standard logical models, automatically generate data lineage relationships, and significantly reduce manual sorting costs. At the same time, it can also be applied to the scenario of data standard compliance assessment. In this scenario, based on the matching information, the degree of conformity between the business system database and the enterprise data standards is automatically analyzed, fields that are not compliant or non-standard are identified, and a data standardization assessment report is output. In addition, it can also be applied to the automatic identification and classification of sensitive data (PKS security linkage): in the scenario of automatically mapping the matched tags (such as "personal privacy" and "trade secrets") to the data security level, the matching information is directly connected to the access control module of the PKS security system to drive the underlying file encryption, dynamic desensitization and permission control strategies, realizing an automated closed loop from data identification to security protection, and ensuring the security of sensitive data throughout its entire lifecycle in a domestic environment.

[0036] For example, the data verification processing method provided in this application embodiment is applicable to any domestic IT innovation platform or general x86 / Windows / Linux platform, such as PKS (Phytium CPU + Kylin OS) system, Loongson, Shenwei, Hygon and other domestic IT innovation platforms.

[0037] The data verification processing method provided in this application will be described in detail below with reference to several embodiments.

[0038] Figure 1 This is a schematic flowchart of a data verification processing method provided in an embodiment of this application, referred to... Figure 1 As shown, the subject executing this method can be any electronic device with processing capabilities, and the method includes: S101. Obtain the database field information of the field to be verified and the standard label field.

[0039] Optionally, database field information of the field to be validated and standard label fields can be obtained.

[0040] The database field information includes: the table name where the field to be verified is located, the name of the field to be verified, the comment of the field to be verified, and the value of the field to be verified.

[0041] Specifically, database field information represents the complete observable information of a specific field in the database in a specific context. The table name where the field to be verified is located is used to provide the business domain context. The comments of the field to be verified are used to indicate business rules, quality tips, usage constraints, etc. The value of the field to be verified is the full data value of the field to be verified.

[0042] The standard label field serves as the target reference field for the alignment of the field to be validated, providing a unique, unambiguous, and manageable expression of the business semantics corresponding to the field to be validated. Specifically, the standard label field can be obtained from standard data elements or master data labels in the enterprise's logical model layer.

[0043] For example, taking the field to be verified as "mobile", we can obtain the table name of the table where the field to be verified is located (e.g., "user_info"), the comments of the table where the field to be verified is located (e.g., "user information table"), the field name of the field (e.g., "mobile"), the comments of the field to be verified (e.g., "contact number"), and the contents of all data cells under that field (i.e., analyze each record, such as "138xxxxxxx" etc.).

[0044] For example, the standard label field can be "PII_PHONE".

[0045] S102. Based on the database field information, determine multiple words to be verified, and based on the standard label field, determine multiple standard words.

[0046] Optionally, after obtaining the database field information and the standard label field, multiple words to be verified and multiple standard words can be determined in parallel.

[0047] In one example, after obtaining the database field information, the database field information can be cleaned, special symbols and stop words can be removed, and a preset word segmentation tool can be called to convert it into multiple words to be verified.

[0048] In another example, after obtaining the database field information, the database field information can be cleaned, special symbols and stop words can be removed, and it can be converted into multiple words to be verified using a pre-trained semantic word segmentation model.

[0049] In another example, after obtaining the standard label field, the standard label field can be cleaned, special symbols and stop words can be removed, and a preset word segmentation tool can be called to convert it into multiple standard words.

[0050] In another example, after obtaining the standard label field, a pre-trained semantic segmentation model can be used to clean the standard label field, remove special symbols and stop words, and convert it into multiple words to be verified. The semantic segmentation model can be trained based on the BERT model.

[0051] S103. Perform multi-dimensional similarity calculations based on each word to be verified, the standard label field, and each standard word to obtain the similarity results between the field to be verified and the standard label field.

[0052] Optionally, after obtaining each word to be verified and each standard word, multi-dimensional similarity calculations can be performed in parallel based on each word to be verified, the standard label field, and each standard word to obtain the similarity results under each dimension, and then fused to obtain the similarity results between the field to be verified and the standard label field.

[0053] Multi-dimensional similarity calculation can include: global semantic dimension, word overlap dimension, glyph dimension, and semantic association dimension.

[0054] Through the global semantic dimension, it can capture the deep semantics of the field to be verified and the standard label field; through the word overlap dimension, it can quickly screen highly relevant items and improve robustness; through the glyph dimension, it can capture common dirty data in the information technology innovation environment, such as spelling variations, mixed use of uppercase and lowercase, and missing underscores; through the semantic association dimension, it can strengthen domain credibility.

[0055] Optionally, multi-dimensional similarity calculation may also include: business context association dimension, domestic terminology mapping dimension, and quality perception dimension.

[0056] For example, under the business context association dimension, a business context vector can be constructed based on the statistical distribution characteristics of the table name, comments, and field values of the field to be verified. The business context association result can be obtained based on the cosine similarity between the business context vector and the preset standard label business domain vector. The business context association result is then used for fusion to obtain the similarity result between the field to be verified and the standard label field.

[0057] For example, under the dimension of domestic terminology mapping, a special dictionary for the domestic field can be constructed for the terminology unique to the information technology innovation environment, and a domestic terminology coverage can be generated for each standard atomic field. The domestic terminology coverage is then used for fusion to obtain the similarity result between the field to be verified and the standard label field.

[0058] For example, under the quality-aware dimension, the quality features of the field value samples to be verified can be calculated. If the quality of the field value is less than a preset quality threshold (e.g., 50% of the "mobile" field is empty), the global semantic similarity is automatically downweighted while the glyph and / or overlap similarity is upweighted. The quality features may include one or more of the following: empty value rate, repetition rate, length dispersion, and regular expression matching rate.

[0059] S104. Based on the length of the field to be verified and the length of the standard label field, the similarity result is corrected to obtain the matching information between the field to be verified and the standard label field.

[0060] Optionally, after obtaining the similarity results, a correction factor can be generated based on the length of the field to be verified and the length of the standard label field, and the similarity results can be corrected using the correction factor to obtain the matching information between the field to be verified and the standard label field.

[0061] For example, a correction factor can be generated based on the length difference between the length of the field to be validated and the length of the standard label field.

[0062] For example, after obtaining the corrected similarity results, matching can be performed according to a preset confidence interval to obtain the recommended governance method for the field to be verified and the confidence level of the recommended governance method.

[0063] The matching information indicates the recommended governance method for the field to be validated and the confidence level of the recommended governance method. Specifically, the recommended governance methods for the field to be validated include: automatic annotation and manual review.

[0064] In this embodiment, database field information and standard label fields of the field to be verified are obtained. Based on the database field information, multiple words to be verified are determined, and based on the standard label fields, multiple standard words are determined. Multi-dimensional similarity calculations are then performed on each word to be verified, each standard label field, and each standard word to obtain the similarity result between the field to be verified and the standard label fields. The similarity result is then corrected based on the lengths of the field to be verified and the standard label fields to obtain the matching information between the field to be verified and the standard label fields. This approach allows for a deeper understanding of Chinese semantics, without relying on single keyword matching or simple edit distance calculations. It effectively solves the challenges of matching field abbreviations, synonym transformations, and domain terminology, significantly improving matching accuracy and enhancing the algorithm's robustness and adaptability. Furthermore, it is fully compatible with domestic IT innovation environments, ensuring data sovereignty and independent controllability during the data governance process.

[0065] Furthermore, by using the standard label field to identify multiple standard words, multi-dimensional similarity calculations can be performed using each standard word. The original label string can also be transformed into a data meta-paradigm, which upgrades the matching process from string similarity to semantic equivalence, thus further improving the matching accuracy.

[0066] In one possible implementation, Figure 2 This is a flowchart illustrating the process of determining multiple standard terms in the data verification processing method provided in this application embodiment, with reference to... Figure 2 As shown, in S102 above, multiple standard terms are determined based on the standard tag field, including: S201. Explicitly identify delimiters for the standard label fields to obtain multiple atomic fields.

[0067] Optionally, the standard label field can be explicitly delimited using a preset delimiter to obtain multiple atomic fields.

[0068] The preset separators can include: "_", "-", "·", camelCase, etc.

[0069] For example, taking the standard label field as "PII_PHONE", multiple atomic fields can include "PII", "PHONE", and "MOBILE".

[0070] By explicitly identifying delimiters in standard label fields, multiple atomic fields are obtained, which can improve the business interpretability of atomic fields and provide semantically granular input units for subsequent knowledge mapping.

[0071] S202. Based on the preset domain knowledge base, map each atomic field to obtain multiple standard atomic fields.

[0072] Optionally, each atomic field can be searched in a preset domain knowledge base, and mapped to a standard atomic field based on the search results. The domain knowledge base can be a knowledge base for fields such as finance, healthcare, and government affairs.

[0073] By mapping each atomic field through a pre-defined domain knowledge base, multiple standard atomic fields are obtained, which can solve the problems of abbreviation ambiguity and synonymy, and achieve cross-system terminology unification.

[0074] S203. Based on the preset semantic rules, perform semantic normalization on each standard atomic field to obtain multiple standard words.

[0075] Optionally, the standard atomic fields can be semantically normalized by applying preset semantic rules to obtain unambiguous and computable standard vocabulary.

[0076] The preset semantic rules may include: removing articles / prepositions, nominalizing verbs, pluralizing singulars, standardizing tenses, and unifying units of measurement.

[0077] By using preset semantic rules, semantic normalization is performed on each standard atomic field to obtain multiple standard words, which can eliminate the interference of grammatical variations and ensure that words that are semantically equivalent but have different expressions can be correctly matched in subsequent similarity calculations.

[0078] In one possible implementation, Figure 3 This is a flowchart illustrating the process of obtaining the similarity result between the field to be verified and the standard label field in the data verification processing method provided in this application embodiment. (Refer to...) Figure 3As shown, S103 above performs multi-dimensional similarity calculations based on each word to be verified, the standard label field, and each standard word to obtain the similarity results between the field to be verified and the standard label field, including: S301. Calculate the global semantic similarity between each word to be verified and each standard word to obtain the global semantic similarity result between the field to be verified and the standard label field.

[0079] Optionally, based on each word to be verified and each standard word, the word frequency of each word to be verified and the word frequency of each standard word can be counted respectively, construct the vector to be verified and the standard vector, and calculate the cosine value of the angle between the two vectors in the vector space to obtain the global semantic similarity result between the field to be verified and the standard label field, so as to measure the global semantic similarity.

[0080] Alternatively, taking Phytium CPU as an example, the dot product operation of high-dimensional vectors can be accelerated in parallel through the NEON SIMD instruction set, which solves the performance bottleneck of domestic chips in complex floating-point calculations.

[0081] S302. Calculate the word overlap similarity based on each word to be verified and each standard word to obtain the word overlap similarity result between the field to be verified and the standard label field.

[0082] Optionally, the number of intersecting words and the number of union words can be determined based on each word to be verified and each standard word, and the ratio of the number of intersecting words to the number of union words can be calculated as the word overlap similarity result between the field to be verified and the standard label field.

[0083] Among them, the intersection vocabulary refers to the common vocabulary among all the vocabulary to be verified and all the standard vocabulary, and the union vocabulary refers to all the non-repeating vocabulary among all the vocabulary to be verified and all the standard vocabulary.

[0084] S303. Calculate the glyph similarity between each word to be verified and the standard label field to obtain the glyph similarity result between the field to be verified and the standard label field.

[0085] Optionally, glyph similarity can be calculated based on each word to be verified and the standard label field to obtain the glyph similarity result between the field to be verified and the standard label field. Here, glyph similarity can be used to calculate the edit distance.

[0086] S304. Calculate the semantic association similarity between each word to be verified and each standard word to obtain the semantic association similarity result between the field to be verified and the standard label field.

[0087] Optionally, it is possible to traverse each word to be verified and each standard word to generate multiple word pairs, and query the thesaurus based on each word pair to calculate the association similarity of each word pair, and calculate the average association similarity value of all word pairs as the semantic association similarity result between the field to be verified and the standard label field.

[0088] S305. Based on the global semantic similarity results, word overlap similarity results, character shape similarity results, and semantic association similarity results, a weighted fusion is performed to obtain the similarity results between the field to be verified and the standard label field.

[0089] Optionally, after obtaining the global semantic similarity result, word overlap similarity result, character shape similarity result and semantic association similarity result, the global semantic similarity result, word overlap similarity result, character shape similarity result and semantic association similarity result can be weighted and fused according to the preset weight information to obtain the similarity result between the field to be verified and the standard label field.

[0090] For example, the weight of the global semantic similarity result can be 35%, the weight of the word overlap similarity result can be 30%, the weight of the character shape similarity result can be 10%, and the weight of the semantic association similarity result can be 25%.

[0091] By determining global semantic similarity results, word overlap similarity results, character shape similarity results, and semantic association similarity results, and then weighting and fusing these results, the similarity result between the field to be verified and the standard label field is obtained. This approach overcomes the limitations of a single dimension, enhances semantic robustness, explicitly models the heterogeneity of field information, improves context awareness, and supports deterministic execution in domestic IT innovation environments. Furthermore, it provides interpretable decision-making basis, supporting a closed-loop data governance system.

[0092] In one possible implementation, Figure 4 This is a flowchart illustrating the process of obtaining the similarity result between the field to be verified and the standard label field in the data verification processing method provided in this application embodiment. (Refer to...) Figure 4 As shown, in step S301 above, a global semantic similarity calculation is performed based on each word to be verified and each standard word to obtain the global semantic similarity result between the field to be verified and the standard label field, including: S401. Generate a first vector set based on each word to be verified, and generate a second vector set based on each standard word.

[0093] Optionally, a pre-trained vocabulary processing model can be used to encode each word to be verified and each standard word to generate a first vector set and a second vector set.

[0094] Among them, the vocabulary processing model can be a BERT model fine-tuned with Chinese domain data.

[0095] S402. Divide the first vector set into blocks to obtain multiple vector blocks.

[0096] Optionally, the first vector set can be divided into multiple vector blocks.

[0097] Each vector block contains multiple single-precision floating-point numbers. For example, a vector block can be a 128-bit vector block, and each block can include 4 single-precision floating-point numbers.

[0098] S403: Call the first target instruction of the preset instruction set to load each vector block of the first vector into the preset first register, and load the vector block of the corresponding position of the second vector into the preset second register.

[0099] Optionally, the first target instruction of the preset instruction set is invoked to load each vector block of the first vector into the preset first register, and the vector block at the corresponding position of the second vector is loaded into the preset second register.

[0100] The instruction set can be the NEON instruction set, the first target instruction can be the 'LD1W' instruction, and both the first and second registers can be NEON registers.

[0101] S404. Call the second target instruction to perform parallel multiplication and addition operations on multiple corresponding floating-point numbers in the first register and the second register to obtain the similarity results corresponding to each vector block in the first vector.

[0102] Optionally, a second target instruction is invoked to perform parallel multiplication and accumulation operations on the four corresponding floating-point numbers in the first and second registers to obtain the similarity results corresponding to each vector block in the first vector. The second target instruction can be an 'MLA' instruction.

[0103] S405. Based on the similarity results corresponding to each vector block in the first vector, obtain the global semantic similarity results between the field to be verified and the standard label field.

[0104] Optionally, the average of all block-level similarities can be taken based on the similarity results corresponding to each vector block in the first vector, and used as the global semantic similarity result between the field to be verified and the standard label field.

[0105] The first target instruction loads each vector block of the first vector into a preset first register, and the vector block corresponding to the second vector into a preset second register. Then, the second target instruction performs parallel multiplication and addition operations on multiple corresponding floating-point numbers in the first and second registers to obtain the similarity results corresponding to each vector block in the first vector. This yields the global semantic similarity result between the field to be verified and the standard label field. This transforms the abstract semantic vector output by the BERT model into a precisely addressable, parallel-schedulable, and cycle-by-cycle-verifiable floating-point operation entity in the registers of a domestic processor, upgrading semantic similarity calculation from a probabilistic reasoning process to a deterministic hardware execution process. Simultaneously, it enables semantic-driven data governance capabilities to break free from dependence on x86, GPU, and CUDA technology stacks, achieving out-of-the-box semantic understanding capabilities on domestic CPUs.

[0106] In one possible implementation, Figure 5 This is a flowchart illustrating the process of obtaining the glyph similarity result between the field to be verified and the standard label field in the data verification processing method provided in this application embodiment, with reference to... Figure 5 As shown, in step S303 above, the glyph similarity is calculated based on each word to be verified and the standard label field to obtain the glyph similarity results between the field to be verified and the standard label field, including: S501. Filter each word to be verified to obtain at least one target word to be verified.

[0107] Optionally, each word to be verified can be filtered to obtain the table name and field name in each word to be verified, which can be used as at least one target word to be verified.

[0108] S502. Calculate the glyph similarity between each target word to be verified and the standard label field to obtain the glyph similarity result between the field to be verified and the standard label field.

[0109] Optionally, character similarity is calculated based on each target word to be verified and the standard label field to obtain the character similarity results between the field to be verified and the standard label field.

[0110] Optionally, when performing glyph similarity calculation, the 'VMAX' or 'VSUB' instructions in the NEON instruction set can be used to calculate the minimum value of multiple columns in a row in parallel to eliminate branch prediction failures.

[0111] By calculating the glyph similarity between each target word to be verified and the standard label field, the glyph similarity results between the field to be verified and the standard label field can be obtained, which can avoid exponential complexity, ensure real-time performance, and enhance robustness.

[0112] In one possible implementation, Figure 6 This is another flowchart illustrating the process of obtaining the glyph similarity result between the field to be verified and the standard label field in the data verification processing method provided in this application embodiment, referred to [reference needed]. Figure 6 As shown, in step S502 above, the glyph similarity is calculated based on each target word to be verified and the standard label field to obtain the glyph similarity results between the field to be verified and the standard label field, including: S601. Calculate the minimum number of edits required between each target word to be verified and the standard label field at the string level, and obtain the minimum number of edits required between each target word to be verified and the standard label field.

[0113] Optionally, the minimum number of edits required between each target word to be verified and the standard label field at the string level can be calculated to obtain the minimum number of edits required between each target word to be verified and the standard label field.

[0114] S602. Based on the minimum number of edits required between each target word to be verified and the standard label field, determine the glyph similarity result between the field to be verified and the standard label field.

[0115] Optionally, the minimum number of edits between each target word to be verified and the standard label field can be found and used as the minimum number of edits between the field to be verified and the standard label field. The minimum number of edits between the field to be verified and the standard label field can be normalized to a value between 0 and 1, which is used as the glyph similarity result between the field to be verified and the standard label field.

[0116] By determining the minimum number of edits required between each target word to be verified and the standard label field, the glyph similarity between the field to be verified and the standard label field can be accurately captured in local high-similarity segments, avoiding global mismatch and significantly reducing the false judgment rate.

[0117] In one possible implementation, Figure 7 This is a flowchart illustrating the process of obtaining matching information between the field to be verified and the standard label field in the data verification processing method provided in this application embodiment, with reference to... Figure 7 As shown, in S104 above, the similarity result is corrected based on the length of the field to be verified and the length of the standard label field to obtain the matching information between the field to be verified and the standard label field, including: S701. Calculate the ratio of the length of the field to be validated to the length of the standard label field, and obtain the ratio result.

[0118] Optionally, the ratio of the length of the field to be validated to the length of the standard label field can be calculated to obtain the ratio result.

[0119] S702. Based on the ratio result, correct the similarity result to obtain the matching information between the field to be verified and the standard label field.

[0120] Optionally, the similarity result can be corrected based on the ratio result to obtain the corrected similarity result. The corrected similarity result can then be matched according to a preset confidence interval to obtain the matching information between the field to be verified and the standard label field, thereby preventing high-score misjudgments caused by partial matching between short and long texts.

[0121] In one possible implementation, Figure 8 This is a flowchart illustrating the process of obtaining matching information between the field to be verified and the standard label field in the data verification processing method provided in this application embodiment, with reference to... Figure 8 As shown, in step S702 above, the similarity result is corrected based on the ratio result to obtain the matching information between the field to be verified and the standard label field, including: S801. Determine the initial adjustment factor based on the ratio results.

[0122] Optionally, the initial adjustment factor can be calculated based on the ratio result.

[0123] For example, after obtaining the ratio result, the initial adjustment factor can be calculated as (0.85 + 0.15). (Ratio results).

[0124] S802. Based on the word overlap similarity results between the field to be verified and the standard label field, adjust the initial adjustment factor to obtain the target adjustment factor.

[0125] Optionally, the initial adjustment factor can be adjusted based on the word overlap similarity results between the field to be verified and the standard label field to obtain the target adjustment factor.

[0126] For example, the product of the initial adjustment factor and the word overlap similarity result can be calculated as the target adjustment factor.

[0127] S803. Correct the similarity results according to the target adjustment factor to obtain the target similarity results between the field to be verified and the standard label field.

[0128] Optionally, the similarity results can be corrected according to the target adjustment factor to obtain the target similarity results between the field to be verified and the standard label field.

[0129] For example, the product of the target adjustment factor and the similarity result is calculated as the target similarity result between the field to be verified and the standard label field.

[0130] S804. Based on the target similarity results and the preset similarity threshold, determine the matching information between the field to be verified and the standard label field.

[0131] Optionally, the target similarity result can be compared with a preset similarity threshold. If the target similarity result is >0.8, the recommended governance method is automatic labeling. If 0.6 < target similarity result <0.8, the recommended governance method is manual review. The target similarity result is then used as the confidence level of the recommended governance method.

[0132] By adjusting the initial adjustment factor based on the word overlap similarity results between the field to be verified and the standard label field, a target adjustment factor is obtained. The similarity results are then corrected using the target adjustment factor to obtain the target similarity results between the field to be verified and the standard label field. This determines the matching information between the field to be verified and the standard label field, which can significantly improve the matching accuracy and recall rate in key scenarios, enhance the algorithm's anti-interference ability and scenario adaptability, and improve the engineering implementation efficiency in a domestic environment.

[0133] Based on the same inventive concept, this application also provides a data verification processing device corresponding to the data verification processing method. Since the principle of the device in this application is similar to the data verification processing method described above in this application, the implementation of the device can refer to the implementation of the method, and the repeated parts will not be described again.

[0134] Reference Figure 9 As shown, Figure 9 This is a schematic diagram of a data verification processing device provided in an embodiment of this application. The device includes: an acquisition module 901, a determination module 902, a calculation module 903, and a correction module 904. The acquisition module 901 is used to acquire the database field information and standard label field of the field to be verified. The database field information includes: the table name where the field to be verified is located, the name of the field to be verified, the comment of the field to be verified, and the value of the field to be verified. The determination module 902 is used to determine multiple words to be verified based on database field information, and to determine multiple standard words based on the standard tag field; The calculation module 903 is used to perform multi-dimensional similarity calculations based on each word to be verified, the standard label field, and each standard word to obtain the similarity results between the field to be verified and the standard label field. The correction module 904 is used to correct the similarity results based on the length of the field to be verified and the length of the standard label field, so as to obtain the matching information between the field to be verified and the standard label field. The matching information is used to indicate the recommended governance method and the confidence level of the recommended governance method for the field to be verified.

[0135] In one possible implementation, module 902 is specifically used for: Explicit delimiter identification is performed on the standard label field to obtain multiple atomic fields; Based on a pre-defined domain knowledge base, each atomic field is mapped to obtain multiple standard atomic fields; Based on preset semantic rules, semantic normalization is performed on each standard atomic field to obtain multiple standard vocabularies.

[0136] In one possible implementation, the computing module 903 is specifically used for: Global semantic similarity is calculated based on each word to be verified and each standard word to obtain the global semantic similarity result between the field to be verified and the standard label field. The word overlap similarity is calculated based on each word to be verified and each standard word to obtain the word overlap similarity result between the field to be verified and the standard label field. The glyph similarity between each word to be verified and the standard label field is calculated to obtain the glyph similarity result between the field to be verified and the standard label field. The semantic association similarity is calculated based on each word to be verified and each standard word to obtain the semantic association similarity result between the field to be verified and the standard label field. The similarity results between the field to be verified and the standard label field are obtained by weighted fusion of the global semantic similarity results, word overlap similarity results, character shape similarity results and semantic association similarity results.

[0137] In one possible implementation, the computing module 903 is specifically used for: Based on each word to be verified, generate a first vector set, and based on each standard word, generate a second vector set; The first vector set is divided into blocks to obtain multiple vector blocks, each containing multiple single-precision floating-point numbers; The first target instruction of the preset instruction set is invoked to load each vector block of the first vector into the preset first register, and the vector block at the corresponding position of the second vector is loaded into the preset second register; The second target instruction is invoked to perform parallel multiplication and accumulation operations on multiple corresponding floating-point numbers in the first and second registers to obtain the similarity results corresponding to each vector block in the first vector. Based on the similarity results corresponding to each vector block in the first vector, the global semantic similarity results between the field to be verified and the standard label field are obtained.

[0138] In one possible implementation, the computing module 903 is specifically used for: Each word to be verified is filtered to obtain at least one target word to be verified; The glyph similarity between the target words to be verified and the standard label field is calculated to obtain the glyph similarity results between the target words and the standard label field.

[0139] In one possible implementation, glyph similarity is calculated based on each target word to be verified and the standard label field to obtain the glyph similarity results between the words to be verified and the standard label field, including: Calculate the minimum number of edits required between each target word to be verified and the standard label field at the string level to obtain the minimum number of edits required between each target word to be verified and the standard label field. Based on the minimum number of edits required between each target word to be verified and the standard label field, the glyph similarity result between the field to be verified and the standard label field is determined.

[0140] In one possible implementation, the correction module 904 is specifically used for: Calculate the ratio of the length of the field to be validated to the length of the standard label field, and obtain the ratio result; Based on the ratio results, the similarity results are corrected to obtain the matching information between the field to be verified and the standard label field.

[0141] In one possible implementation, the correction module 904 is specifically used for: Based on the ratio results, determine the initial adjustment factor; Based on the word overlap similarity results between the field to be verified and the standard label field, the initial adjustment factor is adjusted to obtain the target adjustment factor; The similarity results are corrected based on the target adjustment factor to obtain the target similarity results between the field to be verified and the standard label field; Based on the target similarity results and the preset similarity threshold, the matching information between the field to be verified and the standard label field is determined.

[0142] The processing flow of each module in the device and the interaction flow between each module can be referred to the relevant descriptions in the above method embodiments, and will not be detailed here.

[0143] This application also provides an electronic device, such as... Figure 10 As shown, Figure 10The schematic diagram of the electronic device structure provided in this application embodiment includes: a processor 1001 and a memory 1002, and optionally, a bus 1003. The memory 1002 stores machine-readable instructions executable by the processor 1001. When the electronic device is running, the processor 1001 and the memory 1002 communicate via the bus 1003, and the processor 1001 executes the machine-readable instructions to perform the steps of the above-described data verification processing method.

[0144] This application also provides a computer-readable storage medium storing a computer program, which, when run by a processor, executes the steps of the above-described data verification processing method.

[0145] Those skilled in the art will clearly understand that, for the sake of convenience and brevity, the specific working processes of the systems and devices described above can be referred to the corresponding processes in the method embodiments, and will not be repeated here. In the several embodiments provided in this application, it should be understood that the disclosed systems, devices, and methods can be implemented in other ways. The device embodiments described above are merely illustrative. For example, the division of modules is only a logical functional division, and in actual implementation, there may be other division methods. Furthermore, multiple modules or components can be combined or integrated into another system, or some features can be ignored or not executed. Another point is that the displayed or discussed mutual coupling or direct coupling or communication connection can be through some communication interfaces; the indirect coupling or communication connection of devices or modules can be electrical, mechanical, or other forms.

[0146] Furthermore, the functional units in the various embodiments of this application can be integrated into one processing unit, or each unit can exist physically separately, or two or more units can be integrated into one unit. If the functions are implemented as software functional units and sold or used as independent products, they can be stored in a computer-readable storage medium. Based on this understanding, the technical solution of this invention, or the part that contributes to the prior art, or a part of the technical solution, can be embodied in the form of a software product. This computer software product is stored in a storage medium and includes several instructions to cause a computer device (which may be a personal computer, server, or network device, etc.) to execute all or part of the steps of the methods described in the various embodiments of this invention. The aforementioned storage medium includes: USB flash drives, portable hard drives, read-only memory (ROM), random access memory (RAM), magnetic disks, optical disks, and other media capable of storing program code.

[0147] The above are merely specific embodiments of this application, but the scope of protection of this application is not limited thereto. Any changes or substitutions that can be easily conceived by those skilled in the art within the scope of the technology disclosed in this application should be included within the scope of protection of this application.

Claims

1. A data verification processing method, characterized in that, include: Obtain the database field information and standard label field of the field to be verified. The database field information includes: the table name where the field to be verified is located, the name of the field to be verified, the comment of the field to be verified, and the value of the field to be verified. Based on the database field information, multiple words to be verified are determined, and based on the standard tag field, multiple standard words are determined; Based on the words to be verified, the standard label fields, and the standard words, a multi-dimensional similarity calculation is performed to obtain the similarity result between the field to be verified and the standard label fields. Based on the length of the field to be verified and the length of the standard label field, the similarity result is corrected to obtain the matching information between the field to be verified and the standard label field. The matching information is used to indicate the recommended governance method for the field to be verified and the confidence level of the recommended governance method.

2. The data verification processing method according to claim 1, characterized in that, The step of determining multiple standard terms based on the standard tag field includes: Explicit delimiter identification is performed on the standard label field to obtain multiple atomic fields; Based on a pre-defined domain knowledge base, each atomic field is mapped to obtain multiple standard atomic fields; Based on preset semantic rules, the standard atomic fields are semantically normalized to obtain multiple standard vocabularies.

3. The data verification processing method according to claim 1, characterized in that, The step of performing multi-dimensional similarity calculations based on each of the words to be verified, the standard label field, and each of the standard words to obtain the similarity result between the field to be verified and the standard label field includes: Global semantic similarity is calculated based on each of the words to be verified and each of the standard words to obtain the global semantic similarity result between the field to be verified and the standard label field. Based on the words to be verified and the standard words, the word overlap similarity is calculated to obtain the word overlap similarity result between the field to be verified and the standard label field. The glyph similarity is calculated based on each of the words to be verified and the standard label field to obtain the glyph similarity result between the field to be verified and the standard label field. Semantic association similarity is calculated based on each of the words to be verified and each of the standard words to obtain the semantic association similarity result between the field to be verified and the standard label field; The similarity results between the field to be verified and the standard label field are obtained by weighted fusion of the global semantic similarity results, the word overlap similarity results, the character shape similarity results, and the semantic association similarity results.

4. The data verification processing method according to claim 3, characterized in that, The step of calculating global semantic similarity based on each of the words to be verified and each of the standard words to obtain the global semantic similarity result between the field to be verified and the standard label field includes: A first vector set is generated based on each of the words to be verified, and a second vector set is generated based on each of the standard words. The first vector set is divided into blocks to obtain multiple vector blocks, each containing multiple single-precision floating-point numbers; The first target instruction of the preset instruction set is invoked to load each vector block of the first vector into the preset first register, and the vector block at the corresponding position of the second vector is loaded into the preset second register; The second target instruction is invoked to perform parallel multiplication and accumulation operations on multiple corresponding floating-point numbers in the first register and the second register to obtain the similarity results corresponding to each vector block in the first vector. Based on the similarity results corresponding to each vector block in the first vector, the global semantic similarity result between the field to be verified and the standard label field is obtained.

5. The data verification processing method according to claim 3, characterized in that, The step of calculating the glyph similarity between the words to be verified and the standard label field to obtain the glyph similarity result between the words to be verified and the standard label field includes: Each of the words to be verified is filtered to obtain at least one target word to be verified; The glyph similarity between the target words to be verified and the standard label field is calculated to obtain the glyph similarity result between the words to be verified and the standard label field.

6. The data verification processing method according to claim 5, characterized in that, The step of calculating the glyph similarity between the target words to be verified and the standard label field to obtain the glyph similarity result between the words to be verified and the standard label field includes: Calculate the minimum number of edits required between each target word to be verified and the standard label field at the string level to obtain the minimum number of edits required between each target word to be verified and the standard label field; The glyph similarity result between the target words to be verified and the standard label field is determined based on the minimum number of edits required between each target word to be verified and the standard label field.

7. The data verification processing method according to claim 1, characterized in that, The step of correcting the similarity result based on the length of the field to be verified and the length of the standard label field to obtain matching information between the field to be verified and the standard label field includes: Calculate the ratio of the length of the field to be verified to the length of the standard label field to obtain the ratio result; Based on the ratio result, the similarity result is corrected to obtain the matching information between the field to be verified and the standard label field.

8. The data verification processing method according to claim 7, characterized in that, The step of correcting the similarity result based on the ratio result to obtain the matching information between the field to be verified and the standard label field includes: Based on the ratio results, determine the initial adjustment factor; Based on the word overlap similarity results between the field to be verified and the standard label field, the initial adjustment factor is adjusted to obtain the target adjustment factor; The similarity result is corrected according to the target adjustment factor to obtain the target similarity result between the field to be verified and the standard label field; Based on the target similarity result and the preset similarity threshold, the matching information between the field to be verified and the standard label field is determined.

9. An electronic device, characterized in that, include: A processor and a memory, the memory storing machine-readable instructions executable by the processor, wherein when the electronic device is running, the processor executes the machine-readable instructions to perform the steps of the data verification processing method as described in any one of claims 1 to 8.

10. A computer-readable storage medium, characterized in that, The computer-readable storage medium stores a computer program that, when executed by a processor, performs the steps of the data verification processing method as described in any one of claims 1 to 8.