Data object sensitivity tagging method and apparatus, device, and storage medium

By implementing differentiated labeling strategies for data object types on the OLAP platform, including syntax tree parsing of views or bitmaps and inheritance of source table information for data tables, the problem of inconsistent labeling between upstream and downstream OLAP platforms is solved, achieving efficient and accurate sensitivity labeling.

CN122242494APending Publication Date: 2026-06-19CHINA MERCHANTS BANK

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
CHINA MERCHANTS BANK
Filing Date
2026-03-24
Publication Date
2026-06-19

AI Technical Summary

Technical Problem

The sensitive data labeling logic of existing online analytical processing (OLAP) platforms has not been adapted and optimized for their characteristics, resulting in inconsistent labeling across the OLAP platform and high costs of repeated labeling. How to improve the efficiency of sensitive data object labeling while ensuring labeling accuracy has become a problem.

Method used

By responding to automatic labeling requests, the object type of the data object to be labeled is obtained, and syntax tree parsing is performed for the view or bitmap. The data table inherits the labeling information of the source table based on the configuration information, and determines the table-level sensitivity level by combining the field labeling information, thereby realizing a differentiated labeling strategy.

Benefits of technology

It improves the efficiency and accuracy of sensitivity labeling for data objects, adapts to the diverse sensitivity labeling needs of data objects in OLAP scenarios, and ensures the comprehensiveness and accuracy of labeling results.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN122242494A_ABST
    Figure CN122242494A_ABST
Patent Text Reader

Abstract

This application discloses a data object sensitivity labeling method, apparatus, device, and storage medium, relating to the field of data labeling technology. The disclosed data object sensitivity labeling method includes: responding to an automatic labeling request and obtaining the object type of the data object to be labeled; when the object type is a view or bitmap, performing syntax tree parsing on the data object to be labeled to complete sensitivity field labeling and obtain field labeling information; when the object type is a data table, inheriting the source table labeling information of the data object to be labeled according to configuration information to complete sensitivity field labeling and obtain field labeling information; determining the table-level sensitivity level of the data object to be labeled based on the field labeling information to obtain the data object sensitivity labeling result. This solution can adapt to the characteristics of OLAP platforms and improve the efficiency of data object sensitivity labeling while ensuring labeling accuracy.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This application relates to the field of data labeling technology, and in particular to a method, apparatus, equipment and storage medium for sensitive labeling of data objects. Background Technology

[0002] Currently, most sensitive data labeling technologies in the industry are developed and designed based on Online Transaction Processing (OLTP) platforms. Mainstream labeling solutions and products are built around the characteristics of real-time transactional data on OLTP platforms. They can meet the labeling needs of OLTP platforms with a small number of fields, simple business logic, no large amount of upstream batch data dependence, and no historical accumulation tables, and complete the basic sensitive data classification labeling work.

[0003] However, existing Online Analytical Processing (OLAP) platforms simply copy the labeling logic of OLTP platforms without adapting or optimizing it for the upstream dependencies of OLAP platforms. This leads to inconsistent labeling between upstream and downstream processes and high costs associated with repetitive labeling in practical applications. Therefore, how to adapt to the characteristics of OLAP platforms and improve the efficiency of sensitive labeling for data objects while ensuring labeling accuracy has become an unresolved issue.

[0004] The above content is only used to help understand the technical solution of this application and does not represent an admission that the above content is prior art. Summary of the Invention

[0005] The main objective of this application is to provide a data object sensitivity labeling method, apparatus, device, and storage medium, aiming to solve the technical problem of how to adapt to the characteristics of the OLAP platform and improve the efficiency of data object sensitivity labeling while ensuring labeling accuracy.

[0006] To achieve the above objectives, this application proposes a data object sensitivity labeling method, which includes:

[0007] Respond to the automatic labeling request and obtain the object type of the data object to be labeled; When the object type is a view or bitmap, the syntax tree of the data object to be labeled is parsed to complete the labeling of sensitive fields and obtain the field labeling information. When the object type is a data table, the source table labeling information of the data object to be labeled is inherited according to the configuration information to complete the labeling of sensitive fields and obtain the field labeling information. The table-level sensitivity level of the data object to be tagged is determined based on the field labeling information, and the data object sensitivity labeling result is obtained.

[0008] In one embodiment, the step of inheriting the source table labeling information of the data object to be labeled according to the configuration information when the object type is a data table, completing the labeling of sensitive fields, and obtaining the field labeling information includes: When the object type is a data table, the source table and the mapping relationship of the source table fields of the data object to be labeled are determined according to the configuration information; The tagging information of the source table is determined based on the source table and the mapping relationship between the source table fields. Inherit the tagging information from the source table, complete the tagging of sensitive fields, and obtain the field tagging information.

[0009] In one embodiment, when the object type is a view or bitmap, the step of parsing the syntax tree of the data object to be labeled, completing the labeling of sensitive fields, and obtaining field labeling information includes: When the object type is a view or bitmap, obtain the data definition language of the data object to be labeled; Generate a syntax tree based on the data definition language; The syntax tree is parsed to obtain the source field processing logic and source field information; Based on the source field processing logic and the source field information, sensitive field labeling is completed to obtain field labeling information.

[0010] In one embodiment, the step of determining the table-level sensitivity level of the data object to be tagged based on the field tagging information, and obtaining the data object sensitivity tagging result, includes: Iterate through the field list of the data object to be labeled and obtain the current field; The sensitivity level of the current field is determined based on the field tagging information; When the sensitivity level of the field is not the target sensitivity level, obtain the updated current field; When the sensitivity level of the field is the target sensitivity level, the table-level sensitivity level of the data object to be labeled is determined to be the target sensitivity level, and the data object sensitivity labeling result is obtained.

[0011] In one embodiment, the method further includes: Respond to the auxiliary labeling request and obtain the field names and data volume statistics of the data object to be labeled; Sample data is determined from the data object to be labeled based on the regularization matching result of the field name; The sensitivity weight of the corresponding field name is determined based on the data volume statistics. The sample data and the sensitivity weights are input into the auxiliary labeling model to obtain the auxiliary labeling sensitivity level and the auxiliary labeling confidence level.

[0012] In one embodiment, the step of determining the sensitivity weight of the corresponding field name based on the data volume statistics includes: Based on the data volume statistics, determine the number of records, the percentage of non-null values, and the data update frequency for each field name; The sensitivity weight of the corresponding field name is determined based on the number of records in the field, the proportion of non-null values, and the data update frequency.

[0013] In one embodiment, after the step of inputting the sample data and the sensitivity weights into the auxiliary labeling model to obtain the auxiliary labeling sensitivity level and the auxiliary labeling confidence level, the method further includes: In response to a manual labeling request, obtain the labeling field name and the corresponding manual labeling sensitivity level of the data object to be labeled; When the confidence level of the auxiliary marking is greater than the preset confidence level of the marking, and the sensitivity level of the manual marking is less than the sensitivity level of the auxiliary marking, an early warning notification is pushed based on the marking field name.

[0014] Furthermore, to achieve the above objectives, this application also proposes a data object sensitivity labeling device, which includes: The data acquisition module is used to respond to automatic labeling requests and obtain the object type of the data object to be labeled; The data processing module is used to perform syntax tree parsing on the data object to be labeled when the object type is a view or bitmap, to complete the labeling of sensitive fields and obtain field labeling information. The data processing module is also used to inherit the source table labeling information of the data object to be labeled according to the configuration information when the object type is a data table, to complete the labeling of sensitive fields and obtain field labeling information; The data labeling module is used to determine the table-level sensitivity level of the data object to be labeled based on the field labeling information, and to obtain the data object sensitivity labeling result.

[0015] In addition, to achieve the above objectives, this application also proposes a data object sensitivity labeling device, the device comprising: a memory, a processor, and a computer program stored in the memory and executable on the processor, the computer program being configured to implement the steps of the data object sensitivity labeling method as described above.

[0016] In addition, to achieve the above objectives, this application also proposes a storage medium, which is a computer-readable storage medium, on which a computer program is stored, and when the computer program is executed by a processor, it implements the steps of the data object sensitivity labeling method described above.

[0017] In addition, to achieve the above objectives, this application also provides a computer program product, which includes a computer program that, when executed by a processor, implements the steps of the data object sensitivity labeling method described above.

[0018] One or more technical solutions proposed in this application have at least the following technical effects: By first responding to automatic labeling requests and obtaining the object type of the data object to be labeled, differentiated labeling strategies can be determined for different types of data objects, such as views, bitmaps, and data tables, to avoid the problem of poor adaptability of a single labeling logic. For views or bitmaps, syntax tree parsing is performed to complete field labeling; for data tables, field labeling is completed by inheriting the source table's labeling information based on configuration information, accurately identifying sensitive fields in different types of data objects. Combining field labeling information to determine the table-level sensitivity level allows for sensitivity labeling of data objects from both field and overall dimensions, improving the comprehensiveness and accuracy of the labeling results. This increases the efficiency of sensitivity labeling of data objects while ensuring labeling accuracy, effectively adapting to the diverse sensitivity labeling needs of data objects in OLAP scenarios. Attached Figure Description

[0019] The accompanying drawings, which are incorporated in and form part of this specification, illustrate embodiments consistent with this application and, together with the description, serve to explain the principles of this application.

[0020] To more clearly illustrate the technical solutions in the embodiments of this application or the prior art, the drawings used in the description of the embodiments or the prior art will be briefly introduced below. Obviously, for those skilled in the art, other drawings can be obtained based on these drawings without creative effort.

[0021] Figure 1 A flowchart illustrating the first embodiment of the data object sensitivity labeling method in this application; Figure 2 A schematic diagram of the view field labeling process provided in Embodiment 1 of the data object sensitivity labeling method of this application; Figure 3 A schematic diagram of the table field labeling process provided in Embodiment 1 of the data object sensitivity labeling method of this application; Figure 4 A schematic diagram of the automatic labeling process provided in Embodiment 1 of the data object sensitivity labeling method of this application; Figure 5 A flowchart illustrating Embodiment 2 of the data object sensitivity labeling method of this application; Figure 6 A schematic diagram of the auxiliary labeling suggestion generation process provided in Embodiment 2 of the data object sensitivity labeling method of this application; Figure 7 This is a schematic diagram of the architecture provided for Embodiment 2 of the data object sensitivity labeling method of this application; Figure 8 A schematic diagram of the physical deployment logic provided for Embodiment 2 of the data object sensitivity labeling method of this application; Figure 9 This is a schematic diagram of the module structure of the data object sensitivity labeling device according to an embodiment of this application; Figure 10 This is a schematic diagram of the device structure of the hardware operating environment involved in the data object sensitivity labeling method in the embodiments of this application.

[0022] The purpose, features, and advantages of this application will be further explained in conjunction with the embodiments and with reference to the accompanying drawings. Detailed Implementation

[0023] It should be understood that the specific embodiments described herein are merely illustrative of the technical solutions of this application and are not intended to limit this application.

[0024] To better understand the technical solution of this application, a detailed description will be provided below in conjunction with the accompanying drawings and specific implementation methods.

[0025] The main solution of this application embodiment is as follows: responding to an automatic labeling request, obtaining the object type of the data object to be labeled; when the object type is a view or bitmap, performing syntax tree parsing on the data object to be labeled, completing sensitivity field labeling, and obtaining field labeling information; when the object type is a data table, inheriting the source table labeling information of the data object to be labeled according to the configuration information, completing sensitivity field labeling, and obtaining field labeling information; determining the table-level sensitivity level of the data object to be labeled based on the field labeling information, and obtaining the data object sensitivity labeling result.

[0026] Currently, most sensitive data labeling technologies in the industry are developed and designed based on Online Transaction Processing (OLTP) platforms. Mainstream labeling solutions and products are built around the characteristics of real-time transactional data on OLTP platforms. They can meet the labeling needs of OLTP platforms with a small number of fields, simple business logic, no large amount of upstream batch data dependence, and no historical accumulation tables, and complete the basic sensitive data classification labeling work.

[0027] However, existing Online Analytical Processing (OLAP) platforms simply copy the labeling logic of OLTP platforms without adapting or optimizing it for the upstream dependencies of OLAP platforms. This leads to inconsistent labeling between upstream and downstream processes and high costs associated with repetitive labeling in practical applications. Therefore, how to adapt to the characteristics of OLAP platforms and improve the efficiency of sensitive labeling for data objects while ensuring labeling accuracy has become an unresolved issue.

[0028] This application provides a solution that, by first responding to automatic labeling requests and obtaining the object type of the data object to be labeled, can determine differentiated labeling strategies for different types of data objects, such as views, bitmaps, and data tables, thus avoiding the problem of poor adaptability of a single labeling logic. For views or bitmaps, syntax tree parsing is performed to complete field labeling; for data tables, field labeling is completed by inheriting the source table's labeling information based on configuration information, accurately identifying sensitive fields in different types of data objects. Combining field labeling information to determine the table-level sensitivity level, it can complete sensitivity labeling of data objects from both field and overall dimensions, improving the comprehensiveness and accuracy of the labeling results. While ensuring labeling accuracy, it improves the efficiency of sensitivity labeling of data objects, effectively adapting to the diverse sensitivity labeling needs of data objects in OLAP scenarios.

[0029] It should be noted that the executing entity in this embodiment can be a computing service device with data processing, network communication, and program execution functions, such as a tablet computer, personal computer, or mobile phone, or an electronic device or data object sensitivity labeling device capable of performing the above functions. The following description uses a data object sensitivity labeling device as an example to illustrate this embodiment and the subsequent embodiments.

[0030] Based on this, embodiments of this application provide a data object sensitivity labeling method, referring to... Figure 1 , Figure 1 This is a flowchart illustrating the first embodiment of the data object sensitivity labeling method of this application.

[0031] In this embodiment, the data object sensitivity labeling method includes steps S10 to S40: Step S10: Respond to the automatic labeling request and obtain the object type of the data object to be labeled; It should be noted that the automatic labeling request is an instruction that triggers the automated sensitivity labeling process for data objects. It can be triggered by preset scenarios such as data synchronization and data addition, and is used to request the execution of automated sensitivity level determination operations on specified data objects.

[0032] Additionally, the data objects to be labeled are the data carriers that require sensitivity identification and level determination; they are the target objects of sensitivity labeling operations and include at least different types of data collections such as views, bitmaps, and data tables. Object type is used to distinguish the category to which the data objects to be labeled belong; different object types correspond to different labeling processing logic.

[0033] It should be understood that upon receiving an automatic labeling request, the system will first parse the information related to the data object to be labeled carried in the request and extract the object type corresponding to the data object.

[0034] Step S20: When the object type is a view or bitmap, perform syntax tree parsing on the data object to be labeled, complete the labeling of sensitive fields, and obtain field labeling information; It should be noted that a view is a virtual table built on one or more data tables. It does not store data itself, but only saves query statements and is used to display data of a specific dimension in the data tables. A bitmap is a BITMAP table in the ClickHouse engine of the OLAP platform that stores data in a bitmap structure. It represents data characteristics through the bit state of the bitmap. Its fields are generated by aggregation and have no direct upstream original field correspondence.

[0035] Additionally, syntax tree parsing is the analysis process that breaks down structured statements of data objects to be tagged, such as view definition statements and bitmap data statements, into structured syntax trees. The parsing process identifies field names, field relationships, data operation logic, and other content in the statements.

[0036] Additionally, sensitive field labeling involves identifying fields containing sensitive information, such as personal identification information or trade secrets, within the data object to be labeled, based on preset sensitive field determination rules. These fields are then labeled with corresponding sensitive types, such as ID card numbers, bank card numbers, or business data. Field labeling information is a collection of information recording the sensitivity level of each field. For example, sensitivity levels can include Sensitive Level 1, Sensitive Level 2, General Level 1, and General Level 2. In the priority ranking rule for sensitivity levels, Sensitive Level 1 > Sensitive Level 2 > General Level 1 > General Level 2, meaning that Sensitive Level 1 has the highest sensitivity level, and General Level 2 has the highest sensitivity level.

[0037] It should be understood that when the object type of the data object to be labeled is determined to be a view or bitmap, the structure and statements of the data object are first parsed using a syntax tree to extract each field and its relationship information. Then, based on the preset sensitive field judgment rules, the sensitivity level of each field is identified one by one, and the labeling operation is completed for the identified sensitive fields. Finally, the field labeling information of the data object is summarized.

[0038] In one feasible implementation, step S20 may include steps S21 to S23: Step S21: When the object type is a view or bitmap, obtain the data definition language of the data object to be labeled; It's important to note that Data Definition Language (DDL) is a specialized language used to define the structure of database objects. It can fully describe the creation rules and field processing relationships of data objects such as views and bitmaps. In OLAP scenarios, structural operations such as creating, modifying, and deleting views and bitmaps are all implemented using DDL. This language includes information such as the field composition of views and bitmaps, the relationships between fields, aggregation processing rules, and data source pointers. The complete content of the DDL is stored in the metadata database of the OLAP platform, and different views or bitmaps correspond to specific DDL languages.

[0039] It should be understood that when the object type of the data object to be tagged is determined to be a view or bitmap, the data definition language corresponding to the data object is retrieved from the corresponding database storage location to ensure that the retrieved content includes all the structure and processing-related information of the data object.

[0040] Step S22: Generate a syntax tree based on the data definition language; It should be noted that a syntax tree is a tree-like logical structure formed by decomposing structured statements in a data definition language after lexical and syntactic analysis. It presents the hierarchical relationships and logical connections between syntactic units in a statement, intuitively reflecting the processing and tracing paths of fields in a data object. Each node in the syntax tree corresponds to a syntactic unit in the statement, and the connections between nodes correspond to the logical relationships between units.

[0041] It should be understood that, taking the acquired data definition language as input, a dedicated tool library such as the Antlr4 parsing library performs lexical and syntactic analysis on it, arranges the various syntactic units in the statement hierarchically according to the actual logical relationship, generates the corresponding syntax tree, and realizes the visual decomposition of the logic of the data definition language.

[0042] Step S23: Parse the syntax tree to obtain the source field processing logic and source field information; It should be noted that the source field processing logic refers to the processing rules and calculation methods generated from the upstream source fields of the aggregated fields of the data object to be tagged. It includes various operation requirements such as field association, calculation, and filtering, and serves as the basis for tracing the source of the aggregated field data.

[0043] In addition, source field information is the various attribute data of the original fields that provide the data basis for the data object to be tagged from upstream. It includes field name, sensitivity level, data type, etc., and is the direct basis for determining the sensitivity of the aggregated fields of the data object to be tagged.

[0044] It should be understood that the generated syntax tree is traversed in all dimensions and analyzed layer by layer according to the preset parsing rules. The source field processing logic corresponding to each aggregation field is extracted from the tree structure. At the same time, the source fields participating in all aggregation operations are sorted out and their complete source field information is obtained.

[0045] Step S24: Based on the source field processing logic and the source field information, complete the labeling of sensitive fields to obtain field labeling information.

[0046] It should be understood that, based on the processing logic of the extracted source fields, the unique association between each aggregate field and the corresponding source field in the data object to be labeled is determined. Then, based on the sensitive attributes in the source field information, the sensitivity level of each aggregate field is derived according to the preset rules. Sensitive fields are labeled one by one for all fields, and finally, the labeling results of each field are summarized to obtain complete field labeling information.

[0047] For example, please refer to Figure 2 , Figure 2 This is a schematic diagram of the view field labeling process provided in Embodiment 1 of the data object sensitivity labeling method of this application. For example... Figure 2 As shown, the DDL statement is constructed as a structured Abstract Syntax Tree (AST). Within the tree structure, the Statement Context related to the select (query) action and the Clause Context related to the where (filter) action are identified. Furthermore, the structure is broken down into elements containing the variable "a" and the wildcard "a". The system first creates a Terminal Node and an ExprContext node containing the constant "1". Then, it calls the PostgreSQLParserVisitor component, which performs deep semantic extraction on the syntax tree by executing the overridden visitStatementContext, visitClauseContext, visitExprContext, and visitTerminal methods. This extracted logic then proceeds to the subsequent traversal and field labeling stage. Based on preset sensitivity labeling rules, fields A and B, both at level "general level 1", are joined using a join and Expr+ (Expression Plus) logic. During this process, field C, at level "sensitive level 1", is introduced. Based on the highest-level inheritance logic, the system automatically performs level inheritance deduction to confirm that the final target field is at level "sensitive level 1" and outputs complete field labeling information. This achieves fully automated tracing and classification of field sensitivity from the original script to the final processed field.

[0048] Step S30: When the object type is a data table, inherit the source table labeling information of the data object to be labeled according to the configuration information, complete the labeling of sensitive fields, and obtain the field labeling information; It should be noted that the configuration information is a pre-defined set of rules used to guide data objects of the data table class to perform tagging operations. It includes the relationship between the data table and the corresponding source table, the inheritance rules of the tagging information of the source table, and other content.

[0049] Additionally, source table labeling information refers to the completed field labeling information of the original data table upon which the data table to be labeled depends, including the sensitivity level of each field in the source table. Field labeling information is the sensitive attribute annotation information for a single field in the data object to be labeled, and it serves as the basis for determining the sensitivity level of the overall data object.

[0050] It should be understood that when the object type of the data object to be tagged is a data table, the preset configuration information is first retrieved, and the source table information and tagging information inheritance rules corresponding to the data table are found from it. Then, the tagging information already completed in the source table is inherited according to the rules. Based on the inherited information, the sensitivity tagging of each field of the current data table is completed, and finally the field tagging information of the data table is generated.

[0051] In one feasible implementation, step S30 may include steps S31 to S33: Step S31: When the object type is a data table, determine the source table and source table field mapping relationship of the data object to be tagged according to the configuration information; It should be noted that the configuration information consists of various rules and related data stored in the OLAP platform development configuration library and ETL configuration library. This includes the association rules between source table information and landing files, as well as the association rules between landing files and fields loaded into the database. These rules are used to parse the relationship between data tables and source tables.

[0052] In addition, ETL tools are specialized tools for performing data extraction, transformation, and loading operations. Kettle is a typical type of tool, and its configuration information includes the entire chain of data flow relationships, which is the basis for determining the mapping relationship between source tables and source table fields.

[0053] Additionally, the "landed file" refers to the file generated during the intermediate stage from extracting source table information to loading the field relationships into the database, as recorded in the Kettle ETL tool configuration information. This file serves as an intermediate carrier, carrying the information from the source table after initial processing, so that it can be further loaded into the target OLAP platform table. Extracting source table information, the landed file, and loading the field relationships into the database are the corresponding relationships of field data flow from the upstream source table to the data table to be labeled, recorded in the ETL tool configuration information.

[0054] Additionally, the source table is the upstream original data table upon which the data table to be tagged depends. It is the original data source for all fields in the data table to be tagged, and its field attributes are the basis for determining the sensitivity of fields in the data table to be tagged. The field mapping relationship of the source table is a one-to-one correspondence between each field in the data table to be tagged and the corresponding field in the upstream source table, determining the original data attribution of each field in the data table to be tagged.

[0055] It should be understood that when the object type of the data object to be tagged is determined to be a data table, the pre-set configuration information is retrieved, the upstream source table corresponding to the data table is parsed from the configuration information, and then the matching relationship between each field of the data table to be tagged and the field of the source table is sorted out according to the field correspondence criteria in the configuration information, so as to determine the complete source table and source table field mapping relationship.

[0056] In practice, when determining the mapping relationship between the source table and its fields based on the configuration information, the configuration information of ETL tools such as Kettle contained in the configuration information is parsed. The association rules between the extracted table information and the landing file, and the association rules between the landing file and the fields loaded into the database are extracted from it. The complete mapping relationship between the source table and its fields is determined by combining the two types of association rules.

[0057] Step S32: Determine the tagging information of the source table based on the source table and the mapping relationship between the source table fields; It should be understood that after obtaining the source table and the mapping relationship of the source table fields corresponding to the data table to be tagged, the complete tagged information already stored in the source table is located first, and then the source table field tagged information corresponding to the fields of the data table to be tagged is accurately filtered out according to the field mapping relationship, so as to determine the source table tagged information that is compatible with the data table to be tagged.

[0058] Step S33: Inherit the tagging information from the source table, complete the tagging of sensitive fields, and obtain the field tagging information.

[0059] It should be noted that sensitive field labeling is the operation of labeling each field of the data table to be labeled with the corresponding sensitive attributes. It is a consistent labeling process completed based on the labeling information of the upstream source table, ensuring the consistency of sensitive attributes of upstream and downstream fields.

[0060] It should be understood that the filtered source table labeling information is assigned one-to-one to each field of the data table to be labeled according to the field mapping relationship of the source table, directly inheriting the sensitivity level of the corresponding field in the source table, completing the sensitivity labeling of all fields of the data object to be labeled, summarizing the labeling results of all fields, and forming the field labeling information of the data table to be labeled.

[0061] For example, please refer to Figure 3 , Figure 3 This is a schematic diagram of the table field labeling process provided in Embodiment 1 of the data object sensitivity labeling method of this application. For example... Figure 3 As shown, the field information in the warehouse table represents the initial source table information, including fields A, B, and C. Field A has a sensitivity label of "General Level 1," meaning that the sensitivity level of field A extracted by Kettle from the source table is General Level 1. This field information is processed by Kettle and written to a landing file, represented in the diagram as the second standard file information. The field order in the second standard file information is field C, field B, then field A, reflecting the association rules between the extracted source table information and the landing file. Based on the association rules between the landing file and the loaded fields, the information in these landing files is further transformed to obtain the second standard file information. The field order in the second standard file information is field B, field C, then field A. The configuration information of the ETL tool Kettle is retrieved and parsed. Based on the mapping relationship between the source table and its fields, the sensitivity level of field A is determined to be General Level 1. This General Level 1 sensitivity level is then applied to field A of the data object to be tagged, completing the sensitivity field tagging and obtaining the field tagging information.

[0062] Step S40: Determine the table-level sensitivity level of the data object to be tagged based on the field tagging information, and obtain the data object sensitivity tagging result.

[0063] It should be noted that the table-level sensitivity level is a level used to characterize the sensitivity of the entire data object, based on a comprehensive assessment of the sensitivity of all fields in the data object. Examples include Sensitive Level 1, Sensitive Level 2, General Level 1, and General Level 2.

[0064] In addition, the data object sensitivity labeling result is a complete set of labeling information, including the field labeling information and table-level sensitivity level of the data object to be labeled, which is the final output of the automatic labeling process.

[0065] It should be understood that after obtaining the field labeling information of the data object to be labeled, the sensitivity level of the entire data object will be determined according to the priority sorting rules of the sensitivity level, and its corresponding table-level sensitivity level will be determined. After integrating the field labeling information and the table-level sensitivity level, the final data object sensitivity labeling result will be formed.

[0066] In one feasible implementation, step S40 may include steps S41 to S44: Step S41: Traverse the field list of the data object to be labeled and obtain the current field; It should be noted that the field list is a complete collection of all fields in the data object to be tagged, including all field names, field types and other basic attributes of the data object, and is the basic object for performing field traversal operations.

[0067] Additionally, the current field is the single field currently undergoing sensitivity level identification during the traversal of the field list. It is the specific target of the traversal operation and will switch to the next field in the list as the traversal progresses.

[0068] It should be understood that the operation object is the complete list of fields corresponding to the data object to be labeled. The fields in the list are identified one by one according to the preset traversal order, and the first field in the list that has not been traversed is selected as the current field.

[0069] Step S42: Determine the field sensitivity level of the current field based on the field tagging information; It should be noted that the field sensitivity level is a level identifier used to characterize the sensitivity of a single field in the data object to be tagged, and it has a defined priority ranking rule.

[0070] It should be understood that the process involves retrieving the field labeling information of the completed data object to be labeled, accurately matching the labeling content corresponding to the current field, determining the specific field sensitivity level based on the labeling content, and thus completing the sensitivity level determination for a single field.

[0071] Step S43: When the field sensitivity level is not the target sensitivity level, obtain the updated current field; It should be noted that the target sensitivity level is a pre-defined highest-level sensitivity identifier used to determine the table-level sensitivity level of the data object to be tagged. For example, according to the preset priority sorting rule, Sensitive Level 1 > Sensitive Level 2 > General Level 1 > General Level 2, where Sensitive Level 1 is the target sensitivity level, and the overall table-level sensitivity level can be directly determined after it is triggered.

[0072] Additionally, if the current field being updated is not the target sensitivity level, the next field to be determined for sensitivity level will be selected from the field list in traversal order.

[0073] It should be understood that the determined sensitivity level of the current field is compared with the preset target sensitivity level. If the two are inconsistent, the next field is selected from the field list as the updated current field, and the subsequent sensitivity level determination is continued.

[0074] Step S44: When the field sensitivity level is the target sensitivity level, determine the table-level sensitivity level of the data object to be labeled as the target sensitivity level, and obtain the data object sensitivity labeling result.

[0075] It should be noted that the table-level sensitivity level is a level identifier used to characterize the overall sensitivity of the entire data object to be tagged. It is derived from the field sensitivity level and directly reflects the overall data security risk level of the data object.

[0076] It should be understood that if the current field's sensitivity level is consistent with the preset target sensitivity level, then the target sensitivity level is directly determined as the table-level sensitivity level of the data object to be tagged. The table-level sensitivity level is then integrated with the previously completed field tagging information to form a complete data object sensitivity tagging result.

[0077] Additionally, the derivation report is a document that records information related to the most sensitive fields in the data object to be tagged. It includes core content such as field name, location, and tagged basis, and serves as a traceable basis for aggregated query permission control and compliance auditing. The metadata database is a storage medium used to store various traceable documents and data during the tagged process, specifically for retaining derivation and verification records related to tagged processes.

[0078] It should be understood that after determining the table-level sensitivity level of the data object to be tagged as the target sensitivity level, the name, location, and tagged basis of all fields with the highest sensitivity level will be recorded immediately. Based on this information, a derivation report will be generated and stored in the metadata database to provide traceable data support for subsequent related operations such as aggregate query permission control and compliance audit.

[0079] For example, please refer to Figure 4 , Figure 4This is a schematic diagram of the automatic labeling process provided in Embodiment 1 of the data object sensitivity labeling method of this application. For example... Figure 4 As shown, after obtaining the object to be tagged, the object type is determined. If the object type is a view or a BitMap table, the DDL is retrieved from the configuration library, and a syntax tree is generated based on the DDL. Field sensitivity is then parsed using the syntax tree to obtain field tagging information and complete field-level tagging. If the object type is a table, ETL information is retrieved and parsed from the ETL configuration library to obtain the source and target table field mapping relationships. Simultaneously, source table tagging information is retrieved from the internal tagging library, and then the source table field tagging information is inherited to complete field-level tagging. After field-level tagging is completed, field-level merging is performed to generate table-level tagging information, resulting in the data object sensitivity tagging result.

[0080] This embodiment provides a data object sensitivity labeling method. By first responding to an automatic labeling request and obtaining the object type of the data object to be labeled, it can determine differentiated labeling strategies for different types of data objects, such as views, bitmaps, and data tables, to avoid the problem of poor adaptability of a single labeling logic. For views or bitmaps, it performs syntax tree parsing to complete field labeling, and for data tables, it inherits the labeling information from the source table based on configuration information to complete field labeling, which can accurately identify sensitive fields in different types of data objects. By combining the field labeling information to determine the table-level sensitivity level, it can complete the sensitivity labeling of data objects from both field and overall dimensions, improving the comprehensiveness and accuracy of the labeling results. While ensuring the accuracy of labeling, it improves the efficiency of data object sensitivity labeling and effectively adapts to the diverse data object sensitivity labeling needs in OLAP scenarios.

[0081] Based on the first embodiment of this application, in the second embodiment of this application, the content that is the same as or similar to that in the first embodiment described above can be referred to the above description, and will not be repeated hereafter. Based on this, please refer to... Figure 5 The data object sensitivity labeling method further includes steps A10 to A40: Step A10: Respond to the auxiliary marking request and obtain the field names and data volume statistics of the data object to be marked; It should be noted that the assisted labeling request is an instruction that triggers the AI-assisted labeling process. It can be initiated by the user or triggered by preset conditions and is used to request the generation of sensitivity labeling suggestions for a specified data object.

[0082] In addition, field names are special names that identify each field in the data object to be labeled. They are identifiers that distinguish different fields, and their semantic features are an important basis for determining the sensitivity of fields.

[0083] In addition, data volume statistics are a collection of information obtained by statistically analyzing the scale and update status of the data corresponding to each field, including indicators such as the total number of rows in the field, the percentage of non-null values, and the data update frequency.

[0084] It should be understood that after receiving an auxiliary labeling request, the data object to be labeled specified in the request is first parsed, and then the field names corresponding to all fields are extracted from the metadata of the data object. At the same time, statistical analysis is carried out on the data of each field to obtain the data volume statistics of the data object to be labeled.

[0085] Step A20: Determine sample data from the data object to be labeled based on the regularization matching result of the field name; It should be noted that regularization matching is a process of standardizing the field name and then comparing it according to preset sensitive word matching rules. Standardization can eliminate format differences in the name and improve the accuracy of the matching results.

[0086] In addition, sample data are representative data segments extracted from the fields of the data object to be labeled. They can intuitively reflect the actual data characteristics of the fields and serve as the data basis for the labeling model to determine the sensitivity of the fields.

[0087] It should be understood that the extracted field names are first processed by regularization and standardization, and then the field names are matched according to the preset rules to obtain the regularization matching results. Based on the regularization matching results, the dataset corresponding to the field in the data object to be labeled is located, and representative data segments are extracted from the dataset to determine the sample data corresponding to each field.

[0088] Furthermore, Natural Language Processing (NLP) can be used to analyze the semantic features of field names. Semantic vectors can be extracted through operations such as word segmentation, part-of-speech tagging, and sensitive word matching to uncover the semantic features of field names. Semantic vectors are numerical sets that quantify the semantic features of field names, and they can intuitively reflect the semantic attributes of field names.

[0089] It should be understood that before determining the sample data based on the regularization matching results of the field names, semantic feature analysis of the field names will be performed through natural language processing. First, the field names will be segmented, part-of-speech tagging and sensitive word matching will be performed. Then, semantic vectors will be extracted based on the analysis results to provide a semantic basis for subsequent sensitive weight determination and auxiliary labeling.

[0090] Step A30: Determine the sensitivity weight of the corresponding field name based on the data volume statistics information; It should be noted that the sensitivity weight is a numerical value assigned to each field name based on data volume statistics to characterize the degree of influence of the field's quantitative characteristics on the sensitivity level determination. The higher the weight, the greater the reference value of the field's quantitative characteristics for sensitivity determination.

[0091] It should be understood that, first, the statistical indicators in the data volume statistics information are determined, and according to the preset weight allocation rules, each field is quantified and assigned a value based on the specific numerical characteristics of each indicator. After integrating and calculating the assignment results, the sensitivity weight of the corresponding field name is determined.

[0092] In one feasible implementation, step A30 may include steps A31 to A32: Step A31: Determine the number of records, the percentage of non-null values, and the data update frequency for each field name based on the data volume statistics. It should be noted that the number of records for a field is the actual number of records stored in the dataset corresponding to each field name. It directly reflects the data size of that field and is an indicator for measuring the data volume in an OLAP scenario.

[0093] In addition, the percentage of non-null values ​​is the ratio of the number of records containing valid data in each field to the total number of records in that field. It reflects the completeness of the field data, and the higher or lower the ratio, the more valuable the field is for actual data utilization.

[0094] In addition, the data update frequency is the number of times data is added, modified, or deleted in each field within a preset time period. It reflects the activity level of the field data and is an important basis for judging the business importance of the field.

[0095] It should be understood that the data volume statistics information corresponding to the acquired data objects to be labeled is retrieved, and the field names are matched one by one. The specific values ​​of the number of field records, the proportion of non-empty values, and the data update frequency under each field name are extracted from the data volume statistics information to complete the accurate extraction of the quantitative characteristic indicators of each field.

[0096] Step A32: Determine the sensitivity weight of the corresponding field name based on the number of records in the field, the proportion of non-null values, and the data update frequency.

[0097] It should be understood that, according to the preset weight calculation rules, corresponding quantification coefficients are assigned to the number of field records, the proportion of non-null values, and the data update frequency. The specific values ​​of each indicator are multiplied by their corresponding quantification coefficients, and a comprehensive calculation is performed. Based on the final calculation result, a corresponding sensitivity weight is determined for each field name, thus achieving the quantification of sensitivity weight. Specifically, fields with high volume and high frequency of updates are assigned higher sensitivity weights; that is, the larger the number of field records under a field name, the larger the corresponding data volume, and the higher the sensitivity weight; the larger the proportion of non-null values, the larger the corresponding data volume, and the higher the sensitivity weight; and the higher the data update frequency, the higher the sensitivity weight.

[0098] Step A40: Input the sample data and the sensitivity weights into the auxiliary labeling model to obtain the auxiliary labeling sensitivity level and auxiliary labeling confidence level.

[0099] It should be noted that the auxiliary labeling model is an artificial intelligence model specifically trained for the data characteristics of OLAP scenarios. It integrates the analysis logic of field semantic features and data volume features, and can output sensitive labeling suggestions with confidence level judgment.

[0100] Additionally, the auxiliary labeling sensitivity level is the sensitivity level determined by the auxiliary labeling model based on sample data and sensitivity weights for the field. It is a suggested result of artificial intelligence-assisted labeling and includes different levels of sensitivity labels.

[0101] In addition, the auxiliary labeling confidence level is a numerical value that determines the reliability of the auxiliary labeling model's output auxiliary labeling sensitivity level. It directly reflects the credibility of the labeling suggestions and can be divided into different confidence intervals based on the numerical value.

[0102] It should be understood that the sample data and sensitivity weights corresponding to each field are first adapted to the data format to meet the input requirements of the auxiliary labeling model. Then, the adapted sample data and sensitivity weights are input into the trained auxiliary labeling model, which performs feature analysis and sensitivity level determination, and finally outputs the auxiliary labeling sensitivity level and auxiliary labeling confidence level corresponding to each field.

[0103] It should be noted that the auxiliary labeling confidence level classification rule is a pre-defined criterion used to determine the reliability of the auxiliary labeling suggestions. The confidence level is divided into different confidence levels according to numerical ranges, serving as the basis for determining whether additional samples need to be collected. When the auxiliary labeling confidence level is less than or equal to the preset labeling confidence level, the number of sample data collected will be increased. The process of determining sample data from the data object to be labeled based on the regularized matching result of the field name will be returned, and updated sample data will be obtained until the auxiliary labeling confidence level is greater than the preset labeling confidence level.

[0104] In practice, after obtaining the confidence level of the auxiliary labeling, its credibility level can be determined according to the grading rules of the auxiliary labeling confidence level. A confidence level of ≥80% is a high credibility suggestion, 60%-79% is a medium credibility suggestion, and <60% is a low credibility suggestion. If it is determined to be a low credibility suggestion, the number of samples collected for that field will be increased immediately until the confidence level of the auxiliary labeling is raised to above 60%, that is, 60% is the preset labeling confidence level.

[0105] Additionally, by temporarily storing the auxiliary labeling sensitivity level and confidence level through a caching module, the efficiency impact of high-traffic queries on the auxiliary labeling model can be avoided, thus improving the retrieval speed of labeling suggestions. Data change monitoring can detect changes in the data status of the data objects to be labeled, and can detect changes in table structure and data loading sources in real time, triggering relabeling.

[0106] It should be understood that after obtaining the auxiliary labeling sensitivity level and auxiliary labeling confidence level, the results will be stored in the cache module. At the same time, the data object to be labeled will be detected in real time through data change monitoring. If a change in table structure or data loading source is detected, the contents in the cache module will be updated immediately, and the auxiliary labeling model will be notified to re-label the field and update the results.

[0107] For example, please refer to Figure 6 , Figure 6 This is a schematic diagram illustrating the auxiliary labeling suggestion generation process provided in Embodiment 2 of the data object sensitivity labeling method of this application. Figure 6 As shown, in response to an auxiliary labeling request, the system obtains the data object to be labeled. First, it performs a null value check and data statistics to obtain the field names and corresponding data volume statistics. Then, it performs a regularization check on the field names to determine if they meet the mandatory conditions. Next, based on the regularization matching results of the field names, it obtains sample data from the data objects to be labeled that meet the mandatory conditions. It then performs a regularization check on the sample data to determine if it meets the first-level sensitive mandatory conditions. For field names that meet the first-level sensitive mandatory conditions, it directly outputs auxiliary labels with a confidence level greater than the preset labeling confidence level. For field names that do not meet the mandatory first-level sensitivity criteria, the sensitivity weight of the corresponding field name is determined based on the data volume statistics. Sample data and sensitivity weights are input into the auxiliary labeling model (i.e., the large model) to identify the sensitivity level, thereby obtaining the auxiliary labeling sensitivity level and auxiliary labeling confidence level. It is then determined whether the auxiliary labeling confidence level of the auxiliary labeling result is within a preset confidence interval. If it is not within the confidence interval, the number of sample data collections is increased, and the sample data acquisition step is returned to re-collect sample data. If it is within the confidence interval, the conclusion of the auxiliary labeling sensitivity level is obtained.

[0108] In one feasible implementation, after step A40, steps A50 to A60 may also be included: Step A50: Respond to the manual marking request and obtain the marking field name and corresponding manual marking sensitivity level of the data object to be marked; It should be noted that a manual labeling request is an instruction that triggers the user to manually perform data object sensitivity labeling operations. It can be generated by the user initiating an operation on a specified field on the front end, and is used to request manual labeling of the selected field.

[0109] Additionally, the labeling field name is a specific name selected by the user in the data object to be labeled for the field that needs to be manually labeled for sensitivity. It is an identifier that distinguishes the specific target of the manual labeling operation.

[0110] In addition, the manual labeling sensitivity level is a level label that users manually set for the selected labeling field name based on their business experience, representing the sensitivity of the field. It is the result of manual labeling operations.

[0111] It should be understood that upon receiving a manual marking request, the system first parses the data object to be marked specified in the request, then extracts all the marking field names selected by the user in this instance from the request, and simultaneously obtains the corresponding manual marking sensitivity level set by the user for each marking field name, thus completing the accurate collection of manual marking information.

[0112] It should be noted that the manual labeling batch operation mode is a batch processing method designed to improve the efficiency of manual labeling. It includes two types: batch selection based on conditions and batch labeling by importing files, adapting to the labeling needs of OLAP platforms with massive fields. The batch export of labeling information is an operation that integrates labeling-related information and generates files. The exported content includes traceability information and verification records, adapting to the needs of compliance auditing.

[0113] It should be understood that while responding to manual marking requests and obtaining the marking field names and manual marking sensitivity levels, batch manual marking operations are supported. Marking field names can be selected in batches according to conditions, and Excel or CSV files can be imported to complete batch marking. Batch export of marking information is also supported, and the exported files contain traceability information and verification records corresponding to the marking field names.

[0114] Step A60: When the auxiliary marking confidence level is greater than the preset marking confidence level and the manual marking sensitivity level is less than the auxiliary marking sensitivity level, a warning notification is pushed based on the marking field name.

[0115] It should be noted that the preset labeling confidence level is a pre-set confidence threshold value used to determine whether the auxiliary labeling suggestions have high reference value, and it is one of the criteria for triggering the warning operation.

[0116] In addition, the early warning notification is a prompt message pushed when the risk of under-marking is detected in manual marking. It includes the marking field name, the difference in sensitivity level between manual and assisted marking, etc., and is used to remind relevant managers to check the marking operation.

[0117] It should be understood that, firstly, the auxiliary labeling confidence level output by the auxiliary labeling model is compared with the preset labeling confidence level. Then, according to the priority rules of sensitivity level, the manual labeling sensitivity level and the auxiliary labeling sensitivity level are compared hierarchically. If the conditions of the auxiliary labeling confidence level being greater than the preset labeling confidence level and the manual labeling sensitivity level being less than the auxiliary labeling sensitivity level are met simultaneously, a corresponding early warning notification is pushed to the designated management personnel, using the labeling field name as the identifier, prompting them to verify the manual labeling operation of that field.

[0118] In addition, multiple channels can be used to send alert notifications to administrators in various ways, including platform in-app messages, administrator emails, and work SMS messages, ensuring that alert notifications are received in a timely manner. The alert verification record is information that documents the verification process and results of the alert notification; it is important data for compliance audits and must be fully retained.

[0119] It should be understood that when pushing early warning notifications based on the name of the labeled field, multiple channels will be used to send them. The notification content will fully cover the table name to which the labeled field name belongs, the sensitivity level of manual labeling, the sensitivity level of assisted labeling, the confidence level of assisted labeling, the early warning priority, the labeler, and the labeling time. After the administrator completes the early warning verification, the verification process and results will be recorded in the early warning verification record and kept on file.

[0120] For example, please refer to Figure 7 , Figure 7 This is a schematic diagram of the architecture provided for Embodiment 2 of the data object sensitivity labeling method of this application. For example... Figure 7As shown, the layered architecture of the data labeling system includes a presentation layer, a business logic layer, and a data access layer. The presentation layer provides user interaction through a front-end page, offering functions such as manual labeling, batch labeling, viewing labeling information, viewing AI-suggested labels, viewing field-level labels, batch exporting and importing labeling information, one-click AI-substitute labeling, table-level labeling verification, and labeling information modification. The business logic layer is the core processing part, providing backend services. It is subdivided into three modules: automatic labeling, manual labeling, and AI-assisted labeling. The automatic labeling module provides functions such as job link analysis, link labeling inheritance, syntax parsing, and field-level label merging. The manual labeling module provides functions such as batch labeling interfaces, user authentication, labeling information verification, and labeling information export. The AI-assisted labeling module provides functions such as labeling sample collection, labeling agents, labeling result caching, and field change checks. The data access layer is the underlying data storage and interface, connecting to OLAP databases such as ClickHouse and StarRocks, as well as knowledge bases such as TDSQL and Hadoop, to support data access for the upper-layer business logic.

[0121] Furthermore, in the presentation layer, the user manual labeling module receives manual labeling operations from users on data objects, supporting users to manually select and set sensitivity levels for tables, views, and fields within their permission scope. The user batch labeling module receives user operations to select data objects and sensitivity levels in batches, supporting multiple selections of objects (tables / views / fields) with the same sensitivity for one-time batch labeling. The labeling information viewing module provides a labeling information viewing interface, displaying the sensitivity labeling information of data objects. The AI ​​suggestion labeling viewing module displays AI labeling suggestion information and confidence levels generated by the AI-assisted labeling module. The field-level labeling viewing module receives user commands to view field-level labeling information, displaying the field-level sensitivity labeling results. The batch labeling information export module responds to export requests, generating export files containing labeling information, such as Excel / CSV. The batch labeling information import module obtains batch import files uploaded by users and parses them to obtain labeling information. The AI ​​one-click import labeling module receives user commands for one-click import of AI suggestions, using the AI ​​suggestion results as manual labeling results. The table-level tagging verification module is used to verify the table-level tagging results obtained by merging field-level tags against preset rules or user settings, and to display inconsistent results.

[0122] In the business logic layer, the automatic tagging module analyzes the source information of data objects through job chain analysis, and analyzes its upstream data extraction and loading job chain; it inherits the tagging information of the upstream data warehouse based on the chain analysis results through chain tagging inheritance; it performs syntax parsing on the table creation statements of special tables such as views to obtain the field processing logic; and it merges the field-level tagging results through field-level tagging merging to obtain the table-level tagging results.

[0123] The manual marking module provides a batch marking interface through the batch marking interface to receive batch marking requests; it verifies whether logged-in users have marking permissions through user authentication to ensure that marking records are not illegally tampered with; it verifies the difference between manual marking results and AI suggestions through marking information; and it responds to export requests through marking information export to generate an export file containing marking information.

[0124] The AI-assisted labeling module collects field names and sample data as samples for labeling; generates labeling suggestions using a trained model through a labeling agent; caches the AI ​​suggestion results through labeling result caching; and monitors field changes through field change checks to trigger relabeling.

[0125] For example, please refer to Figure 8 , Figure 8 This is a schematic diagram of the physical deployment logic provided for Embodiment 2 of the data object sensitivity labeling method of this application. For example... Figure 8 As shown, in the data sensitivity labeling tool of the OLAP platform, the front-end service performs access control through the business network and the ACS container, responding to requests (user manual labeling requests, automatic labeling information query requests, and labeling assistance information query requests) and executing the corresponding labeling methods. For manual labeling, the operation is performed within the business network through the ACS container, and the labeling information is stored in the TDSQL database. For automatic labeling, the operation runs within the business network through the ACS container, and the labeling information is stored in the TDSQL database. The corresponding data is then transferred to the development configuration TDSQL and the job knowledge base TDSQL. For AI-assisted labeling, the operation is performed within the business network through the ACS container and connected to the labeling agent, which is also located within the business network to provide intelligent assistance. The automatic labeling and AI-assisted labeling processes operate on the OLAP platform database, which includes two types: ClickHouse and StarRocks, which together form the basis for data storage and analysis.

[0126] This embodiment provides a data object sensitivity labeling method. By responding to auxiliary labeling requests, it extracts the field names and data volume statistics of the data objects to be labeled, collecting labeling features from both field semantics and data volume dimensions, providing a comprehensive basis for auxiliary labeling. Sample data is determined based on field name regularization matching, improving the targeting and accuracy of sample data selection. Sensitivity weights are assigned to fields according to data volume statistics, allowing sensitivity level determination to be combined with the actual scale and update characteristics of the data, better aligning with the data characteristics of OLAP scenarios. By fusing sample data and sensitivity weights through an auxiliary labeling model, it outputs labeling suggestions with confidence, achieving intelligent auxiliary labeling. This improves the labeling efficiency of massive data objects in OLAP scenarios, reduces the workload of manual labeling, and the output results with confidence provide a definite reference for manual review, effectively reducing the probability of missed or incorrect labeling.

[0127] It should be noted that the above examples are only for understanding this application and do not constitute a limitation on the data object sensitivity labeling method of this application. Any simple modifications based on this technical concept are within the protection scope of this application.

[0128] This application also provides a data object sensitivity labeling device, please refer to... Figure 9 The data object sensitivity labeling device includes: Data acquisition module 10 is used to respond to automatic labeling requests and obtain the object type of the data object to be labeled; The data processing module 20 is used to perform syntax tree parsing on the data object to be labeled when the object type is a view or bitmap, to complete the labeling of sensitive fields and obtain field labeling information. The data processing module 20 is also used to inherit the source table labeling information of the data object to be labeled according to the configuration information when the object type is a data table, to complete the labeling of sensitive fields and obtain field labeling information; The data labeling module 30 is used to determine the table-level sensitivity level of the data object to be labeled based on the field labeling information, and to obtain the data object sensitivity labeling result.

[0129] In one embodiment, the data processing module 20 is further configured to determine the source table and source table field mapping relationship of the data object to be tagged according to the configuration information when the object type is a data table; The tagging information of the source table is determined based on the source table and the mapping relationship between the source table fields. Inherit the tagging information from the source table, complete the tagging of sensitive fields, and obtain the field tagging information.

[0130] In one embodiment, the data processing module 20 is further configured to obtain the data definition language of the data object to be labeled when the object type is a view or a bitmap; Generate a syntax tree based on the data definition language; The syntax tree is parsed to obtain the source field processing logic and source field information; Based on the source field processing logic and the source field information, sensitive field labeling is completed to obtain field labeling information.

[0131] In one embodiment, the data processing module 20 is further configured to traverse the field list of the data object to be labeled and obtain the current field; The sensitivity level of the current field is determined based on the field tagging information; When the sensitivity level of the field is not the target sensitivity level, obtain the updated current field; When the sensitivity level of the field is the target sensitivity level, the table-level sensitivity level of the data object to be labeled is determined to be the target sensitivity level, and the data object sensitivity labeling result is obtained.

[0132] In one embodiment, the data labeling module 30 is further configured to respond to an auxiliary labeling request and obtain the field names and data volume statistics of the data object to be labeled; Sample data is determined from the data object to be labeled based on the regularization matching result of the field name; The sensitivity weight of the corresponding field name is determined based on the data volume statistics. The sample data and the sensitivity weights are input into the auxiliary labeling model to obtain the auxiliary labeling sensitivity level and the auxiliary labeling confidence level.

[0133] In one embodiment, the data tagging module 30 is further configured to determine the number of field records, the proportion of non-null values, and the data update frequency under each field name based on the data volume statistics information; The sensitivity weight of the corresponding field name is determined based on the number of records in the field, the proportion of non-null values, and the data update frequency.

[0134] In one embodiment, the data labeling module 30 is further configured to respond to a manual labeling request and obtain the labeling field name and the corresponding manual labeling sensitivity level of the data object to be labeled; When the confidence level of the auxiliary marking is greater than the preset confidence level of the marking, and the sensitivity level of the manual marking is less than the sensitivity level of the auxiliary marking, an early warning notification is pushed based on the marking field name.

[0135] The data object sensitivity labeling device provided in this application, employing the data object sensitivity labeling method described in the above embodiments, can solve the technical problem of how to adapt to the characteristics of the OLAP platform and improve the efficiency of data object sensitivity labeling while ensuring labeling accuracy. Compared with the prior art, the beneficial effects of the data object sensitivity labeling device provided in this application are the same as those of the data object sensitivity labeling method described in the above embodiments, and other technical features in the data object sensitivity labeling device are the same as those disclosed in the methods of the above embodiments, and will not be repeated here.

[0136] This application provides a data object sensitivity marking device, which includes: at least one processor; and a memory communicatively connected to the at least one processor; wherein the memory stores instructions executable by the at least one processor, and the instructions are executed by the at least one processor to enable the at least one processor to perform the data object sensitivity marking method in the first embodiment described above.

[0137] The following is for reference. Figure 10 The diagram illustrates a structural schematic suitable for implementing the data object sensitivity labeling device in the embodiments of this application. The data object sensitivity labeling device in the embodiments of this application may include, but is not limited to, mobile terminals such as mobile phones, laptops, digital broadcast receivers, PDAs (Personal Digital Assistants), PADs (Portable Application Description), PMPs (Portable Media Players), in-vehicle terminals (e.g., in-vehicle navigation terminals), and fixed terminals such as digital TVs and desktop computers. Figure 10 The data object sensitivity marking device shown is merely an example and should not impose any limitations on the functionality and scope of use of the embodiments of this application.

[0138] like Figure 10As shown, the data object sensitivity marking device may include a processing unit 1001 (e.g., a central processing unit, a graphics processing unit, etc.), which can perform various appropriate actions and processes according to a program stored in ROM (Read Only Memory) 1002 or a program loaded from storage device 1003 into RAM (Random Access Memory) 1004. RAM 1004 also stores various programs and data required for the operation of the data object sensitivity marking device. The processing unit 1001, ROM 1002, and RAM 1004 are interconnected via bus 1005. Input / output (I / O) interface 1006 is also connected to the bus. Typically, the following systems can be connected to I / O interface 1006: input devices 1007 including, for example, touch screens, touchpads, keyboards, mice, image sensors, microphones, accelerometers, gyroscopes, etc.; output devices 1008 including, for example, LCDs (Liquid Crystal Displays), speakers, vibrators, etc.; storage devices 1003 including, for example, magnetic tapes, hard disks, etc.; and communication devices 1009. Communication device 1009 allows the data object sensitivity marking device to communicate wirelessly or wiredly with other devices to exchange data. While the figures show data object sensitivity marking devices with various systems, it should be understood that implementation or possession of all the systems shown is not required. More or fewer systems may be implemented alternatively.

[0139] Specifically, according to the embodiments disclosed in this application, the processes described above with reference to the flowcharts can be implemented as computer software programs. For example, embodiments disclosed in this application include a computer program product comprising a computer program carried on a computer-readable medium, the computer program containing program code for performing the methods shown in the flowcharts. In such embodiments, the computer program can be downloaded and installed from a network via a communication device, or installed from storage device 1003, or installed from ROM 1002. When the computer program is executed by processing device 1001, it performs the functions defined in the methods of the embodiments disclosed in this application.

[0140] The data object sensitivity marking device provided in this application, employing the data object sensitivity marking method described in the above embodiments, can solve the technical problem of how to adapt to the characteristics of the OLAP platform and improve the efficiency of data object sensitivity marking while ensuring marking accuracy. Compared with the prior art, the beneficial effects of the data object sensitivity marking device provided in this application are the same as those of the data object sensitivity marking method provided in the above embodiments, and other technical features in this data object sensitivity marking device are the same as those disclosed in the previous embodiment method, and will not be repeated here.

[0141] It should be understood that the various parts disclosed in this application can be implemented using hardware, software, firmware, or a combination thereof. In the description of the above embodiments, specific features, structures, materials, or characteristics can be combined in any suitable manner in one or more embodiments or examples.

[0142] The above description is merely a specific embodiment of this application, but the scope of protection of this application is not limited thereto. Any variations or substitutions that can be easily conceived by those skilled in the art within the scope of the technology disclosed in this application should be included within the scope of protection of this application. Therefore, the scope of protection of this application should be determined by the scope of the claims.

[0143] This application provides a computer-readable storage medium having computer-readable program instructions (i.e., a computer program) stored thereon, the computer-readable program instructions being used to execute the data object sensitivity labeling method in the above embodiments.

[0144] The computer-readable storage medium provided in this application may be, for example, a USB flash drive, but is not limited to, electrical, magnetic, optical, electromagnetic, infrared, or semiconductor systems, devices, or any combination thereof. More specific examples of computer-readable storage media may include, but are not limited to: electrical connections having one or more wires, portable computer disks, hard disks, RAM (Random Access Memory), ROM (Read Only Memory), EPROM (Erasable Programmable Read Only Memory), or flash memory, optical fiber, CD-ROM (CD-Read Only Memory), optical storage devices, magnetic storage devices, or any suitable combination thereof. In this embodiment, the computer-readable storage medium may be any tangible medium containing or storing a program that can be used by or in conjunction with an instruction execution system, system, or device. The program code contained on the computer-readable storage medium may be transmitted using any suitable medium, including but not limited to: wires, optical cables, RF (Radio Frequency), etc., or any suitable combination thereof.

[0145] The aforementioned computer-readable storage medium may be included in the data object sensitivity marking device; or it may exist independently and not assembled into the data object sensitivity marking device.

[0146] The aforementioned computer-readable storage medium carries one or more programs. When these programs are executed by the data object sensitivity labeling device, the data object sensitivity labeling device: responds to an automatic labeling request and obtains the object type of the data object to be labeled; when the object type is a view or bitmap, it performs syntax tree parsing on the data object to be labeled, completes sensitivity field labeling, and obtains field labeling information; when the object type is a data table, it inherits the source table labeling information of the data object to be labeled according to configuration information, completes sensitivity field labeling, and obtains field labeling information; and determines the table-level sensitivity level of the data object to be labeled based on the field labeling information, and obtains the data object sensitivity labeling result.

[0147] Computer program code for performing the operations of this application can be written in one or more programming languages ​​or a combination thereof, including object-oriented programming languages ​​such as Java, Smalltalk, and C++, as well as conventional procedural programming languages ​​such as the "C" language or similar programming languages. The program code can be executed entirely on the user's computer, partially on the user's computer, as a standalone software package, partially on the user's computer and partially on a remote computer, or entirely on a remote computer or server. In cases involving remote computers, the remote computer can be connected to the user's computer via any type of network—including LAN (Local Area Network) or WAN (Wide Area Network)—or can be connected to an external computer (e.g., via the Internet using an Internet service provider).

[0148] The flowcharts and block diagrams in the accompanying drawings illustrate the architecture, functionality, and operation of possible implementations of systems, methods, and computer program products according to various embodiments of this application. In this regard, each block in a flowchart or block diagram may represent a module, segment, or portion of code containing one or more executable instructions for implementing a specified logical function. It should also be noted that in some alternative implementations, the functions indicated in the blocks may occur in a different order than those indicated in the drawings. For example, two consecutively indicated blocks may actually be executed substantially in parallel, and they may sometimes be executed in reverse order, depending on the functions involved. It should also be noted that each block in the block diagrams and / or flowcharts, and combinations of blocks in the block diagrams and / or flowcharts, can be implemented using a dedicated hardware-based system that performs the specified function or operation, or using a combination of dedicated hardware and computer instructions.

[0149] The modules described in the embodiments of this application can be implemented in software or hardware. The names of the modules do not necessarily limit the functionality of the unit itself.

[0150] The readable storage medium provided in this application is a computer-readable storage medium that stores computer-readable program instructions (i.e., a computer program) for executing the above-described data object sensitivity labeling method. This solves the technical problem of how to adapt to the characteristics of OLAP platforms and improve the efficiency of data object sensitivity labeling while ensuring labeling accuracy. Compared with the prior art, the beneficial effects of the computer-readable storage medium provided in this application are the same as those of the data object sensitivity labeling method provided in the above embodiments, and will not be repeated here.

[0151] This application also provides a computer program product, including a computer program that, when executed by a processor, implements the steps of the data object sensitivity labeling method described above.

[0152] The computer program product provided in this application solves the technical problem of how to adapt to the characteristics of the OLAP platform and improve the efficiency of sensitive labeling of data objects while ensuring labeling accuracy. Compared with the prior art, the beneficial effects of the computer program product provided in this application are the same as those of the sensitive labeling method for data objects provided in the above embodiments, and will not be repeated here.

[0153] The above description is only a part of the embodiments of this application and does not limit the patent scope of this application. All equivalent structural transformations made under the technical concept of this application and using the contents of the specification and drawings of this application, or direct / indirect applications in other related technical fields, are included in the patent protection scope of this application.

Claims

1. A data object sensitivity labeling method, characterized in that, The data object sensitivity labeling method includes: Respond to the automatic labeling request and obtain the object type of the data object to be labeled; When the object type is a view or bitmap, the syntax tree of the data object to be labeled is parsed to complete the labeling of sensitive fields and obtain the field labeling information. When the object type is a data table, the source table labeling information of the data object to be labeled is inherited according to the configuration information to complete the labeling of sensitive fields and obtain the field labeling information. The table-level sensitivity level of the data object to be tagged is determined based on the field labeling information, and the data object sensitivity labeling result is obtained.

2. The method as described in claim 1, characterized in that, When the object type is a data table, the step of inheriting the source table labeling information of the data object to be labeled according to the configuration information to complete the labeling of sensitive fields and obtain the field labeling information includes: When the object type is a data table, the source table and the mapping relationship of the source table fields of the data object to be labeled are determined according to the configuration information; The tagging information of the source table is determined based on the source table and the mapping relationship between the source table fields. Inherit the tagging information from the source table, complete the tagging of sensitive fields, and obtain the field tagging information.

3. The method as described in claim 1, characterized in that, When the object type is a view or bitmap, the step of parsing the syntax tree of the data object to be labeled, completing the labeling of sensitive fields, and obtaining the field labeling information includes: When the object type is a view or bitmap, obtain the data definition language of the data object to be labeled; Generate a syntax tree based on the data definition language; The syntax tree is parsed to obtain the source field processing logic and source field information; Based on the source field processing logic and the source field information, sensitive field labeling is completed to obtain field labeling information.

4. The method as described in claim 1, characterized in that, The step of determining the table-level sensitivity level of the data object to be tagged based on the field tagging information, and obtaining the data object sensitivity tagging result, includes: Iterate through the field list of the data object to be labeled and obtain the current field; The sensitivity level of the current field is determined based on the field tagging information; When the sensitivity level of the field is not the target sensitivity level, obtain the updated current field; When the sensitivity level of the field is the target sensitivity level, the table-level sensitivity level of the data object to be labeled is determined to be the target sensitivity level, and the data object sensitivity labeling result is obtained.

5. The method as described in claim 1, characterized in that, The method further includes: Respond to the auxiliary labeling request and obtain the field names and data volume statistics of the data object to be labeled; Sample data is determined from the data object to be labeled based on the regularization matching result of the field name; The sensitivity weight of the corresponding field name is determined based on the data volume statistics. The sample data and the sensitivity weights are input into the auxiliary labeling model to obtain the auxiliary labeling sensitivity level and the auxiliary labeling confidence level.

6. The method as described in claim 5, characterized in that, The step of determining the sensitivity weight of the corresponding field name based on the data volume statistics includes: Based on the data volume statistics, determine the number of records, the percentage of non-null values, and the data update frequency for each field name; The sensitivity weight of the corresponding field name is determined based on the number of records in the field, the proportion of non-null values, and the data update frequency.

7. The method as described in claim 5, characterized in that, After the step of inputting the sample data and the sensitivity weights into the auxiliary labeling model to obtain the auxiliary labeling sensitivity level and the auxiliary labeling confidence, the method further includes: In response to a manual labeling request, obtain the labeling field name and the corresponding manual labeling sensitivity level of the data object to be labeled; When the confidence level of the auxiliary marking is greater than the preset confidence level of the marking, and the sensitivity level of the manual marking is less than the sensitivity level of the auxiliary marking, an early warning notification is pushed based on the marking field name.

8. A data object sensitivity labeling device, characterized in that, The device includes: The data acquisition module is used to respond to automatic labeling requests and obtain the object type of the data object to be labeled; The data processing module is used to perform syntax tree parsing on the data object to be labeled when the object type is a view or bitmap, to complete the labeling of sensitive fields and obtain field labeling information. The data processing module is also used to inherit the source table labeling information of the data object to be labeled according to the configuration information when the object type is a data table, to complete the labeling of sensitive fields and obtain field labeling information; The data labeling module is used to determine the table-level sensitivity level of the data object to be labeled based on the field labeling information, and to obtain the data object sensitivity labeling result.

9. A data object sensitivity labeling device, characterized in that, The device includes: a memory, a processor, and a computer program stored in the memory and executable on the processor, the computer program being configured to implement the steps of the data object sensitivity labeling method as described in any one of claims 1 to 7.

10. A storage medium, characterized in that, The storage medium is a computer-readable storage medium, and a computer program is stored on the storage medium. When the computer program is executed by a processor, it implements the steps of the data object sensitivity labeling method as described in any one of claims 1 to 7.