Dataset conflict detection method and apparatus, electronic device, and storage medium
By constructing a classification and labeling ontology and generating a machine-readable constraint set, hierarchical conflict detection is performed, which solves the problem of insufficient training model accuracy in existing technologies and achieves higher-precision model training.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- ZHENHUA ZHIZAO (XIAN) TECH CO LTD
- Filing Date
- 2026-05-18
- Publication Date
- 2026-06-19
AI Technical Summary
Existing technologies lack the ability to detect hierarchical conflicts covering ontology, single sample, dataset, and application layers when generating classification datasets, resulting in insufficient accuracy of the trained models.
A classification and annotation ontology is constructed, an attribute set is defined, and a machine-readable constraint set is generated. Conflict detection methods are used at the ontology layer, annotation layer, dataset layer, and application layer, including ontology structure conflict detection, intra-sample annotation conflict detection, inter-sample annotation conflict detection, and application result conflict detection. The conflict type, location, and correction suggestions are output.
This approach enables the discovery of multi-level conflicts during the dataset construction phase, thereby improving the accuracy of the trained model.
Smart Images

Figure CN122241237A_ABST
Abstract
Description
Technical Field
[0001] This application relates to the field of data conflict detection technology, and in particular to a data set conflict detection method, apparatus, electronic device and storage medium. Background Technology
[0002] With the continuous application of artificial intelligence in professional fields such as law, industry, medicine, finance, and energy, building high-quality training datasets for classification tasks has become a fundamental step in the research and development and deployment of artificial intelligence models.
[0003] However, domain-specific classification tasks often have obvious hierarchical structures, attribute constraints, and semantic boundaries. For example, within the same domain, different categories may have parent-child hierarchical relationships, compositional relationships, sibling mutually exclusive relationships, and differences in attribute applicability; the same object may also exhibit type conflicts, attribute value conflicts, or cross-sample semantic inconsistencies in different labeled versions.
[0004] Existing classification datasets are typically generated through manual annotation, rule-based annotation, weakly supervised annotation, or model pre-annotation, and are continuously reused during subsequent data cleaning, version iteration, model training, and online applications. Current technologies usually only check sample format, label set, or simple enumeration rules, lacking the ability to detect layered conflicts covering the ontology layer, single sample layer, dataset layer, and application layer. Many deep-seated conflicts cannot be detected during the dataset construction phase, leading to significant difficulties in training models using classification datasets and insufficient accuracy of the trained models. Summary of the Invention
[0005] This application provides a dataset conflict detection method, apparatus, electronic device, and storage medium, which solves the technical problem that the training model of the classification dataset generated by the prior art has insufficient accuracy.
[0006] Firstly, this application provides a dataset conflict detection method, the method comprising: constructing a classification annotation ontology, the classification annotation ontology including a set of key entities and relations; the relations including composition relations and special case relations; defining an attribute set, each attribute in the attribute set including an attribute name, an attribute scope, and an attribute value set, and the attribute value set of each attribute including a null value marker; acquiring the sample data to be detected and the annotation results of the sample data to be detected, and converting the annotation results into a sample instance representation; the sample instance representation including a set of labeled objects, a type annotation predicate, and an attribute value annotation predicate; and generating a machine-readable contract based on the classification annotation ontology, the attribute set, and attribute inheritance rules. The constraint set includes an ontology layer constraint set, an annotation layer constraint set, a dataset layer constraint set, and an application layer constraint set. Ontology layer conflict results are obtained by performing ontology structure conflict detection on the classification annotation ontology based on the ontology layer constraint set; intra-sample annotation conflict results are obtained by performing intra-sample annotation conflict detection on the sample instance representation of a single sample data to be tested based on the annotation layer constraint set; inter-sample annotation conflict results are obtained by performing inter-sample annotation conflict detection on multi-version sample annotation results and / or equivalent sample annotation results in the historical dataset based on the dataset layer constraint set; and application layer conflict results are obtained by performing conflict detection on the application results output by the sample data to be tested based on the application layer constraint set.
[0007] In conjunction with the first aspect, in one possible implementation, the ontology layer constraints include the directed acyclicity constraint of the classification annotation ontology, the reflexive relation prohibition constraint, the relation type overlap prohibition constraint, the special case parent node uniqueness constraint, the same-name attribute uniqueness constraint, and the attribute definition validity constraint; the annotation layer constraints include the unknown type prohibition constraint, the missing type prohibition constraint, the multi-type labeling prohibition constraint, the parent-child type labeling prohibition constraint, the sibling special case labeling prohibition constraint, the undefined attribute prohibition constraint, the attribute domain constraint, the attribute value out-of-bounds prohibition constraint, the single attribute multi-value prohibition constraint, and the coexistence of null and non-null values prohibition constraint; the dataset layer constraints include the multi-version annotation constraint, the equivalent sample annotation constraint, and the ontology consistency constraint; the application layer constraint set includes the output result ontology constraint, and the application output and existing sample constraints.
[0008] In conjunction with the first aspect, in one possible implementation, the step of performing intra-sample annotation conflict detection on the sample instance representation of a single sample data to be detected to obtain the annotation layer conflict result includes: determining whether there are type annotations that do not belong to the key entity set; if attribute value annotations exist, determining whether the corresponding annotation object is missing a type annotation; determining whether the same annotation object is simultaneously assigned multiple different types, simultaneously annotated as a special case of the parent class and its descendants, and simultaneously annotated as different special cases under the same parent class; determining whether the attribute corresponding to the attribute annotation belongs to the attribute set, whether the type of the annotation object is within the corresponding attribute inheritance scope, and whether the attribute value is within the corresponding attribute value set; determining whether the same annotation object has multiple different values on the same attribute, or whether null and non-null values exist simultaneously.
[0009] In conjunction with the first aspect, in one possible implementation, the step of performing inter-sample annotation conflict detection on the multi-version sample annotation results and / or equivalent sample annotation results in the historical dataset to obtain the dataset-level conflict result includes: normalizing different annotation versions of the same sample data to be detected; if the normalized annotation results are inconsistent, it is determined to be a multi-version annotation conflict; when two sample data to be detected correspond to the same object, the same source, or share the same classification semantics, an equivalence relationship is established between the two sample data to be detected, and the annotation results of the two sample data to be detected are normalized; if the normalized annotation results are inconsistent, it is determined to be an equivalent sample annotation conflict; when the annotation-level conflict result of any sample data to be detected in the historical dataset is not empty, it is determined that the historical dataset has an ontology consistency conflict.
[0010] In conjunction with the first aspect, in one possible implementation, the step of performing conflict detection on the application results output by the sample data to be detected to obtain application layer conflict results includes: converting the application results output by the sample data to be detected into an application output representation consistent with the sample instance representation; determining whether the application output representation violates the application layer constraint set to identify the output result ontology conflict; normalizing the application output representation with the sample annotation results that satisfy equivalence relations in the historical dataset and comparing them, and determining that the application output conflicts with existing samples when the normalization results are inconsistent.
[0011] In conjunction with the first aspect, in one possible implementation, the method further includes: summarizing the ontology layer conflict results, the annotation layer conflict results, the dataset layer conflict results, and the application layer conflict results, and outputting the conflict type, conflict location, conflict level, and correction suggestions.
[0012] In conjunction with the first aspect, in one possible implementation, the step of summarizing the conflict results of the ontology layer, the annotation layer, the dataset layer, and the application layer, and outputting the conflict type, conflict location, conflict level, and correction suggestions includes: classifying conflicts hierarchically according to the ontology layer, the annotation layer, the dataset layer, and the application layer; determining the conflict location based on the ontology node, attribute, sample identifier, annotation object identifier, or application output location where the conflict occurs; classifying the conflict level according to the degree of impact of the conflict on ontology usability, sample trainability, and application output credibility; generating correction suggestions for different conflict types; the correction suggestions include at least one of deleting illegal relationships, supplementing missing attribute types, modifying attribute values, adjusting the scope of attribute application, merging conflicting version annotations, and triggering manual review.
[0013] Secondly, this application provides a dataset conflict detection device, which includes a construction module, a definition module, a transformation module, a constraint generation module, and a detection module. The construction module is used to construct a classification labeling ontology, which includes a set of key entities and relationships. The definition module is used to define an attribute set, where each attribute includes an attribute name, an attribute scope, and an attribute value set, and the attribute value set of each attribute includes a null value marker. The transformation module is used to obtain the sample data to be detected and the labeling results of the sample data to be detected, and convert the labeling results into a sample instance representation. The sample instance representation includes a set of labeled objects, a type labeling predicate, and an attribute value labeling predicate. The constraint generation module is used to construct a classification labeling ontology based on the classification labeling ontology and the classification labeling ontology. The attribute set and attribute inheritance rules are used to generate a machine-readable constraint set; wherein, the machine-readable constraint set includes an ontology layer constraint set, an annotation layer constraint set, a dataset layer constraint set, and an application layer constraint set; the detection module is used to perform ontology structure conflict detection on the classification annotation ontology based on the ontology layer constraint set to obtain ontology layer conflict results, perform intra-sample annotation conflict detection on the sample instance representation of a single sample data to be detected based on the annotation layer constraint set to obtain annotation layer conflict results, perform inter-sample annotation conflict detection on multi-version sample annotation results and / or equivalent sample annotation results in the historical dataset based on the dataset layer constraint set to obtain dataset layer conflict results, and perform conflict detection on the output application results of the sample data to be detected based on the application layer constraint set to obtain application layer conflict results.
[0014] Thirdly, this application provides an electronic device including one or more processors and a memory storing computer-executable instructions, which, when executed by the one or more processors, cause the one or more processors to perform the dataset conflict detection method as described in the first aspect or any possible implementation thereof.
[0015] Fourthly, this application provides a computer-readable storage medium storing computer-readable instructions that, when executed by a computer, implement the dataset conflict detection method as described in the first aspect or any possible implementation thereof.
[0016] This application provides a dataset conflict detection method that can detect conflicts at the ontology layer, annotation layer, dataset layer, and application layer, enabling multi-level conflicts to be discovered during the dataset construction stage. This, in turn, results in higher accuracy for models trained on datasets that have been verified by this dataset conflict detection method. Attached Figure Description
[0017] To more clearly illustrate the technical solutions in the embodiments of this application, the accompanying drawings used in the description of the embodiments of this application will be briefly introduced below. Obviously, the accompanying drawings described below are some embodiments of this application. For those skilled in the art, other drawings can be obtained based on these drawings without creative effort.
[0018] Figure 1 Flowcharts of dataset conflict detection methods provided for some embodiments of this application; Figure 2 A schematic diagram of ontology layer constraints provided for some embodiments of this application; Figure 3 A schematic diagram of annotation layer constraints provided for some embodiments of this application; Figure 4 A schematic diagram of dataset layer constraints provided for some embodiments of this application; Figure 5 A schematic diagram illustrating application layer constraints provided for some embodiments of this application; Figure 6 A schematic diagram of a dataset conflict detection device provided for some embodiments of this application. Detailed Implementation
[0019] The technical solutions of the embodiments of this application will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some, not all, of the embodiments of this application. Based on the embodiments of this application, all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of this application.
[0020] This application provides a dataset conflict detection method, such as Figure 1 As shown, the method includes steps S101 to S105.
[0021] S101. Construct the classification annotation ontology. The classification annotation ontology includes a set of key entities and relationships. Relationships include composition relationships and special case relationships. Composition relationships represent the whole and its parts, while special case relationships represent the parent class and its child class.
[0022] For example, the set of key entities includes power equipment, insulators, suspension insulators, and post insulators; the relationship between insulators and power equipment is a compositional relationship, that is, insulators are a component of power equipment; the relationship between suspension insulators and insulators is a special case relationship, that is, suspension insulators are a special case of insulators.
[0023] S102. Define an attribute set. Each attribute in the attribute set includes an attribute name, an attribute scope, and a set of attribute values. Each attribute's set of attribute values also includes a null value marker.
[0024] The attribute name, the scope of application of the attribute, and the set of attribute values form an attribute triple, which is used to describe the characteristics of each key entity.
[0025] For example, the attribute name is the defect level, the scope of application of the attribute includes insulators, suspension insulators, post insulators, etc., and the attribute value set includes normal, minor defect, severe defect and null value.
[0026] S103. Obtain the sample data to be tested and the annotation results of the sample data to be tested, and convert the annotation results into a sample instance representation. The sample instance representation includes the set of annotated objects, type annotation predicates, and attribute value annotation predicates.
[0027] The sample instance indicates that the annotation results of different sample data to be tested have been standardized, so that the computer can perform the same detection operation on different sample data to be tested in the future.
[0028] For example, B(o,v) can be used to represent the type labeling predicate, such as B(object in picture 1, suspension insulator); L(o,p,l) can be used to represent the attribute value labeling predicate, such as L(object in picture, defect level, minor defect).
[0029] S104. Generate a machine-readable constraint set based on the classification annotation ontology, attribute set, and attribute inheritance rules. The machine-readable constraint set includes an ontology layer constraint set, an annotation layer constraint set, a dataset layer constraint set, and an application layer constraint set.
[0030] The attribute integration rule means that if an attribute applies to a parent class, then that attribute automatically applies to all subclasses of that parent class. This ensures the consistency of the rule and avoids redefining it on each subclass. For example, the defect level attribute applies to insulators, and suspension insulators are a subclass of insulators, so suspension insulators automatically also have the defect level attribute.
[0031] A machine-readable constraint set is a set of rules that a computer uses to determine whether there are conflicts in a dataset to be detected, enabling the computer to detect conflicts based on each rule in the machine-readable constraint set.
[0032] S105. Based on the ontology layer constraint set, perform ontology structure conflict detection on the classification and labeled ontology to obtain ontology layer conflict results. Based on the labeling layer constraint set, perform intra-sample labeling conflict detection on the single sample instance representation in the sample data to be tested to obtain labeling layer conflict results. Based on the dataset layer constraint set, perform inter-sample labeling conflict detection on the multi-version sample labeling results and / or equivalent sample labeling results in the historical dataset to obtain dataset layer conflict results. Based on the application layer constraint set, perform conflict detection on the output application results of the sample data to be tested to obtain application layer conflict results.
[0033] When performing ontology structure conflict detection on a classification label ontology, the detection objects are the classification label ontology and its attributes, in order to check relationships, the scope of application of attributes, etc.
[0034] When performing intra-sample annotation conflict detection on a single sample instance in the sample data to be tested, the object of inspection is the annotation of a single image or a single data item, in order to detect individual incorrectly labeled data and prevent erroneous annotations from entering the database.
[0035] When performing inter-sample annotation conflict detection on multiple versions of sample annotation results in historical datasets, the object of inspection is the annotation results of different versions of samples in the entire sample data to be tested, in order to detect inconsistencies in the annotation of the same object in different historical versions. For example, for an object in the same image, if it is labeled as a suspension insulator and a post insulator in two different historical versions, the inter-sample annotation conflict detection will identify the labeling error. When performing inter-sample annotation conflict detection on equivalent sample annotation results, the object of inspection is multiple samples that can be considered equivalent samples, in order to detect inconsistencies in the labeling of equivalent samples. For example, for images of the same insulator, if one is labeled as having a minor defect and the other as being normal, the inter-sample annotation conflict detection will identify the labeling error.
[0036] The application result output by the sample data to be tested refers to the result obtained after inputting the dataset to be tested into the model. When performing conflict detection on the output application result, the object of inspection is the model's output, to check whether the model's output conforms to the constraint rules in the application layer constraint set. For example, when using the model trained with the aforementioned sample dataset to identify objects in an image, if a suspension insulator is identified as a post insulator, conflict detection on the output application result is performed to identify the error in the application result.
[0037] Some embodiments of this application provide specific details of ontology layer constraints, annotation layer constraints, dataset layer constraints, and application layer constraints.
[0038] like Figure 2 As shown, the ontology layer constraints include the directed acyclicity constraint of the classification and labeling ontology, the reflexive relation prohibition constraint, the relation type overlap prohibition constraint, the special case parent node uniqueness constraint, the same-name attribute uniqueness constraint, and the attribute definition legality constraint.
[0039] The directed acyclic property (DAG) constraint in classification ontology means that the relationships between special cases in a classification ontology must form a directed acyclic graph. That is, a category cannot ultimately become a subclass of itself through a chain of special cases; circular inheritance is prohibited. For example, suppose the classification ontology contains "Equipment-Power Equipment-Insulator". If "Equipment" is incorrectly defined as a special case of "Insulator", a circular graph of "Equipment-Power Equipment-Insulator-Equipment" will be formed, violating the DAG constraint of classification ontology.
[0040] The reflexive relation prohibition constraint means that no critical entity can have a compositional relationship or a special case relationship with itself. For example, defining a compositional relationship between insulators is meaningless and violates the reflexive relation prohibition constraint.
[0041] The restriction prohibiting overlapping relationship types means that only one explicit relationship can exist between two key entities: a component relationship or a special case relationship. They cannot be defined as both a component relationship and a special case relationship simultaneously. For example, a suspension insulator cannot be defined as both a special case of an insulator and a component of an insulator at the same time.
[0042] The uniqueness constraint of the parent node for special cases means that in a special case relationship, a subclass can only have one direct parent class, which ensures the clarity of the inheritance relationship. For example, a suspension insulator cannot be defined as a direct subclass of both insulator and power equipment; it can only inherit directly from insulator.
[0043] The uniqueness constraint for attribute names means that within the entire attribute set, attributes with the same meaning must have unique attribute names. For example, defining attribute names for defect level and defect grade that have the same meaning violates the uniqueness constraint for attribute names.
[0044] The attribute definition legality constraint means that the definition of an attribute must be complete and legal. For example, defining an attribute name for a voltage rating, but specifying the scope of application of the corresponding attribute for a fictitious device that does not exist, violates the attribute definition legality constraint.
[0045] like Figure 3As shown, the annotation layer constraints include constraints prohibiting unknown types, constraints prohibiting missing types, constraints prohibiting multiple types from being annotated together, constraints prohibiting parent and child types from being annotated together, constraints prohibiting special cases at the same level from being annotated together, constraints prohibiting undefined attributes, constraints on the scope of attribute application, constraints prohibiting attribute values from going out of bounds, constraints prohibiting multiple values for a single attribute, and constraints prohibiting the coexistence of null and non-null values.
[0046] The "Unknown type prohibition constraint" means that the type of an object labeled must be in the set of key entities of the ontology being categorized. For example, if the ontology only contains insulators and circuit breakers, but an object in an image is labeled as a transformer, this violates the "Unknown type prohibition constraint."
[0047] The missing type prohibition constraint states that if an object is labeled with an attribute, then that object must have been labeled with a valid type. For example, in an image, if an object is labeled with the attribute name "defect level" but no type is assigned to the object (such as "insulator"), then the missing type prohibition constraint is violated.
[0048] The multi-type labeling prohibition constraint means that an object can only be labeled with one type. For example, labeling the same object as both a suspension insulator and a post insulator violates the multi-type labeling prohibition constraint.
[0049] The prohibition against combining parent and child type annotations means that an object cannot be annotated as both a class and a subclass of that class. For example, annotating the same object as both an insulator and suspension insulator violates this prohibition.
[0050] The "Same-level special cases cannot be labeled together" constraint means that an object cannot be labeled as multiple mutually exclusive subclasses under the same parent class at the same time. For example, under the parent class "insulator," there are same-level special cases "suspension insulator" and "post insulator." Labeling the same object as both "suspension insulator" and "post insulator" at the same time violates the same-level special cases labeling together constraint.
[0051] The "Undefined attribute prohibition" constraint means that the attribute used in the annotation must be defined in the attribute set. For example, if you annotate an object with the attribute name "production date", but the attribute name "production date" is not defined in the attribute set, it violates the "Undefined attribute prohibition" constraint.
[0052] The attribute scope constraint means that an attribute must be applicable to the type of the object. For example, if the attribute name "enamel color" is defined as applicable only to post insulators, assigning this attribute to an object of type suspension insulators would violate the attribute scope constraint.
[0053] The "Attribute Value Out of Bounds Prohibition" constraint means that the value of an attribute must be within the set of values defined for that attribute. For example, if the attribute name is "Defect Level" and the set of values corresponding to it is [Normal, Minor, Severe], then marking the attribute value as "Scrap" violates the "Attribute Value Out of Bounds Prohibition" constraint.
[0054] The "Single Attribute Multiple Value Prohibition" constraint means that in annotations, the same attribute of an object cannot have multiple different values. For example, an object with the attribute name "Defect Level" corresponding to the attribute values "Slight" and "Severe" violates the "Single Attribute Multiple Value Prohibition" constraint.
[0055] The prohibition against the coexistence of null and non-null values means that for the same attribute, a null value and a specific non-null value cannot coexist. For example, an object whose attribute name is labeled as defect level has the attribute values of null and minor, which violates the prohibition against the coexistence of null and non-null values.
[0056] like Figure 4 As shown, the dataset layer constraints include multi-version annotation constraints, equivalent sample annotation constraints, and ontology consistency constraints.
[0057] Multi-version annotation constraint means that the annotations of the same sample data in different historical annotation versions should be consistent after normalization. For example, the same insulator image might be labeled as a suspension insulator by the first annotator in version V1.0, but as a post insulator by the second annotator in version V2.0, which violates the multi-version annotation constraint.
[0058] The equivalent sample labeling constraint means that samples defined as equivalent (such as pictures of the same object taken from different angles, or different texts with the same meaning) should have consistent labeling results. For example, front and side views of the same suspension insulator equipment, one labeled as suspension insulator and the other as post insulator, violate the equivalent sample labeling constraint.
[0059] Ontology consistency constraints mean that there should be no samples in the entire dataset that violate the annotation layer constraints. For example, if any sample in the dataset violates the aforementioned parent-child type and annotation prohibition constraint, then the entire dataset is considered to have an ontology consistency conflict.
[0060] like Figure 5 As shown, the application layer constraint set includes output result ontology constraints, as well as application output and existing sample constraints.
[0061] Output ontology constraints refer to the fact that the model's prediction / recognition results on the input data must conform to the classification labeling ontology and attribute definitions. For example, if the model identifies an object in an image as an insulator, with the attribute name being defect level and the attribute value being unknown, then "unknown" is not within the attribute value set for defect level. Therefore, this violates the attribute value out-of-bounds prohibition constraint, resulting in an output ontology conflict.
[0062] The constraint between application output and existing samples means that the model's prediction for a new sample should be consistent with the labeled results of equivalent samples known in the historical dataset. For example, the model predicts a suspension insulator with a normal defect level for a new image. However, in the historical dataset, there exists an equivalent image of the same object, labeled as a suspension insulator with a minor defect level. Comparison reveals a conflict between the model's output and existing historical samples, thus violating the constraint between application output and existing samples.
[0063] In some embodiments of this application, step S105, which involves performing intra-sample annotation conflict detection on the sample instance representation of a single sample data to be detected to obtain annotation layer conflict results, includes steps S201 to S205.
[0064] S201. Determine if there are any type annotations that do not belong to the key entity set.
[0065] Step S201 checks whether the labeled type is valid. When constructing the classification label ontology, a set of key entities is defined, which is a list of all allowed categories, such as insulators and circuit breakers. In any sample data to be tested, the type labeled for an object must be within this predefined set. If a type not found in the key entity set is labeled, the ontology definition is violated.
[0066] For example, when constructing the classification label ontology, the key entity set is set to {"insulator", "circuit breaker", "disconnect switch"}. When a device in the image is labeled as a transformer, a conflict is detected in step S201 because "transformer" is not in the key entity set, violating the unknown type prohibition constraint.
[0067] S202. If attribute value annotations exist, determine whether the corresponding annotation object is missing a type annotation.
[0068] Step S202 checks the completeness of the annotation. An object must first have a type in order to meaningfully describe its attributes. If an attribute is annotated for an object but no type is specified for that object, then the attribute annotation lacks a subject to which it is attached and is incomplete.
[0069] For example, in an image of electrical equipment, a rectangular area is labeled with the attribute name "Defect Level" and the corresponding attribute value is "Slight". However, no type is labeled for the rectangular area. Step S202 detects a conflict, violating the missing type prohibition constraint.
[0070] S203. Determine whether the same labeled object is simultaneously assigned multiple different types, simultaneously labeled as a special case of the parent class and its descendants, and simultaneously labeled as different special cases under the same parent class.
[0071] Step 203 checks the mutual exclusion and hierarchy consistency of type annotations, which includes three sub-checks: an object can only belong to one most specific category; an object cannot be annotated as a class and its subclasses at the same time; an object cannot be annotated as multiple mutually exclusive subclasses under the same parent class at the same time.
[0072] For example, if the same object in an image is labeled as both a suspension insulator and a post insulator, step S203 is executed, and a conflict is detected, violating the prohibition constraints on labeling the same type of special case and multiple types of insulators; if the same object in an image is labeled as both an insulator and a suspension insulator, it violates the prohibition constraint on labeling parent and child types of insulators.
[0073] S204. Determine whether the attribute corresponding to the attribute annotation belongs to the attribute set, whether the type of the annotation object is within the applicable domain of the corresponding attribute inheritance, and whether the attribute value is within the corresponding attribute value set.
[0074] Step S204 is a triple validation of attribute annotation, ensuring that the attribute itself, the matching relationship between the attribute and the type, and the attribute value are all valid. Specifically, the attribute name used must be in the predefined attribute set; the annotated attribute must be applicable to the type of the object, and if the attribute is applicable to a parent class, it will automatically be applicable to all subclasses of that parent class; the specific value assigned to the attribute must be within the attribute value set specified when the attribute is defined.
[0075] For example, the attribute set contains an attribute named "enamel color," which applies to post insulators, and its value set is {"white", "brown", "null"}. The attribute set also contains an attribute named "defect level," which applies to insulators, and its value set is {"normal", "minor", "severe", "null"}. Through inheritance, this attribute also applies to suspension insulators and post insulators. For an object labeled as a suspension insulator, with the attribute name "enamel color" and the attribute value "white," step S204 detects a conflict because the "enamel color" attribute only applies to post insulators, while the object type is suspension insulators, violating the attribute application domain constraint. Similarly, for an object labeled as a suspension insulator, with the attribute name "defect level" and the attribute value "scrap," step S204 detects a conflict because "scrap" is not within the attribute value set, violating the attribute value out-of-bounds prohibition constraint.
[0076] S205. Determine whether the same labeled object has multiple different values for the same attribute, or whether it has both null and non-null values.
[0077] Step S205 checks the consistency of assignments for the same property within the same object. For a given property and object, its value should be unique and unambiguous. It is prohibited to assign two or more different valid values to the same property of the same object. It is prohibited to simultaneously record a null value marker and a specific non-null value.
[0078] For example, for the same suspension insulator object, if the attribute name is "defect level" and the attribute value is "minor" and "severe", execution step S205 will detect a conflict, violating the constraint prohibiting multiple values for a single attribute; if the same suspension insulator object is labeled with the attribute name "defect level" and the attribute value is "null" and "minor", execution step S205 will detect a conflict, violating the constraint prohibiting the coexistence of null and non-null values.
[0079] Steps S201 to S205 constitute a progressive detection chain from type validity, annotation completeness, type logical consistency, attribute validity, and attribute value uniqueness. Through these five steps, this method can systematically discover and locate low-level annotation errors and logical contradictions within a single sample data to be detected, thereby ensuring the quality of single-point data before data is entered into the database or before training, laying a solid foundation for building a high-quality dataset.
[0080] In some embodiments of this application, step S105, which involves performing inter-sample annotation conflict detection on the multi-version sample annotation results and / or equivalent sample annotation results in the historical dataset to obtain dataset-level conflict results, includes steps S401 to S403.
[0081] S301. Normalize different labeled versions of the same sample data to be tested. If the labeled results after normalization are inconsistent, it is determined to be a multi-version labeling conflict.
[0082] Step S301 aims to resolve inconsistencies between multiple versions of the same sample data to be tested, generated at different times, by different annotators, or through different annotation processes. In real-world projects, an image or a piece of data may be repeatedly annotated and revised, resulting in multiple versions (e.g., version V1.0, version V1.1, version V2.0). To compare these versions, their annotation results need to be normalized, i.e., converted into a unified and comparable standard format. If differences exist in the normalized results, it indicates that there are annotation conflicts between different versions of the same sample data to be tested.
[0083] For example, the sample data to be tested is a picture of power equipment inspection. In annotation version V1.0, the first annotator annotated the equipment in the picture as an insulator, with the attribute name being defect level and the attribute value being minor. In annotation version V2.0, the second annotator annotated the same equipment in the picture as a suspension insulator, with the attribute name being defect level and the attribute value being normal. When executing step S301, the annotation content in annotation version V1.0 and annotation content in annotation version V2.0 are normalized, and then the insulator and suspension insulator, as well as minor and normal, are compared. The comparison results are inconsistent, violating the multi-version annotation constraint.
[0084] S302. When two sample data to be detected correspond to the same object, the same source, or share the same classification semantics, establish an equivalence relationship between the two sample data to be detected, and normalize the annotation results of the two sample data to be detected. If the normalized annotation results are inconsistent, it is determined that there is a conflict in the annotation of equivalent samples.
[0085] Step S302 aims to address the issue of inconsistent labeling between different sample data to be detected, which describe the same real-world object or have the same semantics. Equivalence relations are key to detection and can be established in the following ways: photographs of the same physical entity taken from different angles and at different times; same source: different paragraphs extracted from the same document, describing the same event; shared classification semantics: different texts describing the same concept, or different images belonging to the exact same scene category.
[0086] For example, one sample data to be tested is a close-up front view of an insulator device, labeled as a suspension insulator, with the attribute name being defect level and the attribute value being normal; another sample data to be tested is a panoramic side view of the same insulator device, labeled as a suspension insulator, with the attribute name being defect level and the attribute value being minor; when executing step S203, the two sample data to be tested are determined to correspond to the same object based on information such as device ID, GPS coordinates, and shooting time sequence, and an equivalence relationship is established. Then, the labeled content of the two sample data to be tested is normalized and compared. Since the attribute value of defect level is inconsistent, it is determined that the two sample data to be tested violate the equivalent sample labeling constraint.
[0087] S303. When the annotation layer conflict result of any sample data to be detected in the historical dataset is not empty, it is determined that there is an ontology consistency conflict in the historical dataset.
[0088] In step S303, a non-empty annotation layer conflict result means that at least one conflict was found after performing intra-sample annotation conflict detection on a certain sample data to be detected. If any sample data to be detected in the historical dataset fails the annotation layer detection, then the entire historical dataset is considered to have an ontology consistency conflict.
[0089] For example, a historical dataset contains 1000 labeled images of power equipment. During in-sample labeling conflict detection, steps S201 to S205 are performed on each image. It is found that the labeling of one image violates the attribute applicability domain constraint, labeling a circuit breaker object with an attribute name that only applies to the enamel color of insulators. Although this conflict is a single sample data labeling layer conflict, according to step S303, because such an erroneous sample exists in the historical dataset, the entire historical dataset is determined to have an ontology consistency conflict. This conclusion forces the data manager to fix the labeling layer conflict; otherwise, the dataset cannot be used to train a high-quality model.
[0090] Steps S301 to S303 constitute a three-layer quality firewall at the dataset level: Step S301 ensures the consistency of the evolution history of individual data, avoiding version management chaos; Step S302 ensures the consistency between different data describing the same fact, eliminating semantic contradictions; Step S303 ensures that all members of the dataset adhere to the most basic annotation rules, which is the basic threshold for the dataset to be used for subsequent tasks. The combined detection of steps S301 to S303 can systematically discover and locate macro-level data inconsistencies caused by multi-person collaboration, standard drift, and historical errors, making it a crucial step in building a high-quality, highly consistent model training dataset.
[0091] Step S105 involves performing conflict detection on the application results output by the sample data to be tested to obtain application layer conflict results, including steps S401 to S403.
[0092] S401. Convert the application results of the sample data to be tested into an application output representation that is consistent with the sample instance representation.
[0093] Step S401 is a prerequisite for detection. The sample instance representation is the standard data format used by this method for unified processing and detection. When executing step S401, the model's raw output (such as JSON, strings, etc.) is parsed and converted into this standard format before it can be compared with a predefined set of machine-readable constraints. This conversion process ensures the consistency of the detection logic.
[0094] S402. Determine whether the application output representation violates the application layer constraint set in order to identify ontological conflicts in the output results.
[0095] Step S402 determines whether the model outputs an illegal type, an incorrect attribute value, or a logical contradiction label. If it outputs an illegal type, an incorrect attribute value, or a logical contradiction label, it indicates that there may be a problem with the model's reasoning process, and thus it is determined that the dataset used to train the model is not accurate enough.
[0096] For example, the model's output is: {Type: Suspension Insulator, Attribute: {Defect Level: Scrap}}. According to the classification annotation ontology definition, the legal attribute value set for the defect level is [Normal, Minor, Severe, Null]. Scrap is not within this value set. Step S402 identifies an ontology conflict in the output result, specifically a violation of the "Attribute Value Out of Bounds Prohibition Constraint". This suggests that the model may have encountered erroneous or non-standard data during training, causing it to learn an illegal output pattern.
[0097] S403. Normalize the application output representation with the sample labeling results that satisfy the equivalence relationship in the historical dataset and compare them. If the normalization results are inconsistent, it is determined that the application output conflicts with the existing samples.
[0098] Step S403 compares the model's application output with the historical judgments of human experts, which is crucial for verifying the model's credibility. First, for a new input sample (such as a newly taken photo of a device), equivalent samples (such as old photos of the same device) are found in the historical dataset. Then, the model's application output for the new sample (converted to a standard format) is normalized and compared with the manually labeled results of samples with equivalent relationships. If the comparison results are inconsistent, it indicates that the model's judgment contradicts historical consensus, and the application output is determined to conflict with existing samples.
[0099] For example, when a newly taken photo of a suspension insulator is input into the model, the model's output is: {Type: Suspension Insulator, Attribute: {Defect Level: Normal}}. When executing step S403, firstly, an old photo of the same device is found in the historical dataset, and its manual annotation is determined to be {Type: Suspension Insulator, Attribute: {Defect Level: Minor}}. Normalization processing is performed and compared. The two types are consistent, but the attribute values of the defect level are inconsistent, which is determined to be a conflict between the application output and the existing sample.
[0100] Steps S401 to S403 extend the closed loop of conflict detection to the model application stage. Step S401 unifies the data format, paving the way for detection. Step S402 performs a self-check on the model's application output to ensure that the model output itself is legal and self-consistent. Step S403 performs an external check by comparing the model's application output with the labeled results of samples that satisfy the equivalence relation in the historical dataset, and uses manual annotation to verify the model's application output.
[0101] Furthermore, in some embodiments of this application, the method further includes: summarizing the ontology layer conflict results, annotation layer conflict results, dataset layer conflict results, and application layer conflict results, and outputting the conflict type, conflict location, conflict level, and correction suggestions.
[0102] Specifically, steps S501 to S504 summarize the conflict results at the ontology layer, annotation layer, dataset layer, and application layer, and output the conflict type, conflict location, conflict level, and correction suggestions.
[0103] S501. Conflicts are classified and categorized according to the ontology layer, annotation layer, dataset layer, and application layer.
[0104] Step S501 involves top-level classification of all detected conflicts, categorizing the raw conflict data from different detection stages into four predefined categories based on their logical hierarchy: ontology layer, annotation layer, dataset layer, and application layer. This hierarchical classification ensures the structure and clarity of the report.
[0105] S502. Determine the conflict location based on the ontology node, attribute, sample identifier, labeled object identifier, or application output location where the conflict occurs.
[0106] Step S502 provides precise location information for each conflict, which is crucial for achieving traceability and remediability. Step S502 fills in specific location fields according to the type of conflict: ontology-level conflicts are located to specific ontology nodes (such as insulators) or relationships (such as special case relationships); annotation-level conflicts are located to specific attributes; dataset-level conflicts are located to sample identifiers and annotation object identifiers; application-level conflicts are located to specific application output locations.
[0107] S503. Conflict levels are classified according to the degree of impact of conflicts on ontology availability, sample trainability, and application output credibility.
[0108] Step S503 classifies the conflicts by severity to help users prioritize the most critical issues. For example, the conflict levels can be categorized using the following labeling: High risk: Conflicts that render the ontology unusable, samples completely unusable, or application outputs completely unreliable. For example, a classification labeling ontology might violate the directed acyclicity constraint of a classification labeling ontology. Medium risk: Affects the credibility of application output, but does not completely affect the usability of the ontology and the trainability of samples. For example, the application output of the model conflicts with existing samples; Low risk: This is due to a formatting or minor logical error, which can usually be automatically corrected. For example, inconsistent capitalization of attribute values.
[0109] S504. Generate correction suggestions for different conflict types. Correction suggestions include at least one of the following: deleting illegal relationships, supplementing missing attribute types, modifying attribute values, adjusting the scope of attribute application, merging conflict version annotations, and triggering manual review.
[0110] Step S504 matches the preset repair strategy according to the type of conflict and generates specific and actionable correction suggestions.
[0111] like Figure 6 As shown, this application also provides a dataset conflict detection device 600, which includes a construction module 601, a definition module 602, a transformation module 603, a constraint generation module 604, and a detection module 605.
[0112] Module 601 is used to construct the classification annotation ontology, which includes a set of key entities and relationships.
[0113] The definition module 602 is used to define an attribute set. Each attribute in the attribute set includes an attribute name, an attribute scope, and a set of attribute values. Each attribute's set of attribute values includes a null value marker.
[0114] The conversion module 603 is used to acquire the sample data to be detected and the annotation results of the sample data to be detected, and convert the annotation results into a sample instance representation. The sample instance representation includes a set of annotated objects, type annotation predicates, and attribute value annotation predicates.
[0115] The constraint generation module 604 is used to generate machine-readable constraint sets based on the classification annotation ontology, attribute set, and attribute inheritance rules. These machine-readable constraint sets include ontology-level constraint sets, annotation-level constraint sets, dataset-level constraint sets, and application-level constraint sets.
[0116] The detection module 605 is used to perform ontology structure conflict detection on the classification and labeled ontology based on the ontology layer constraint set to obtain ontology layer conflict results; to perform intra-sample annotation conflict detection on the sample instance representation of a single sample data to be detected based on the annotation layer constraint set to obtain annotation layer conflict results; to perform inter-sample annotation conflict detection on the multi-version sample annotation results and / or equivalent sample annotation results in the historical dataset based on the dataset layer constraint set to obtain dataset layer conflict results; and to perform conflict detection on the output application results of the sample data to be detected based on the application layer constraint set to obtain application layer conflict results.
[0117] The ontology layer constraints include the directed acyclicity constraint for classification and annotation ontology, the prohibition of reflexive relations, the prohibition of overlapping relation types, the uniqueness constraint of special case parent nodes, the uniqueness constraint of attributes with the same name, and the constraint of the legality of attribute definitions; the annotation layer constraints include the prohibition of unknown types, the prohibition of missing types, the prohibition of multiple types being annotated together, the prohibition of parent-child types being annotated together, the prohibition of annotating sibling special cases together, the prohibition of undefined attributes, the constraint of the domain of application of attributes, the prohibition of attribute values going out of bounds, the prohibition of multiple values for a single attribute, and the prohibition of the coexistence of null and non-null values; the dataset layer constraints include the multi-version annotation constraints, the equivalent sample annotation constraints, and the ontology consistency constraints; the application layer constraint set includes the ontology constraints of the output results, as well as the constraints of the application output and existing samples.
[0118] When the detection module 605 performs intra-sample annotation conflict detection on the sample instance representation of a single sample data to be detected and obtains the annotation layer conflict result, it determines whether there are type annotations that do not belong to the key entity set; if attribute value annotations exist, it determines whether the corresponding annotation object is missing a type annotation; it determines whether the same annotation object is simultaneously assigned multiple different types, simultaneously annotated as a special case of the parent class and its descendants, and simultaneously annotated as different special cases under the same parent class; it determines whether the attribute corresponding to the attribute annotation belongs to the attribute set, whether the type of the annotation object is within the applicable domain of the corresponding attribute inheritance, and whether the attribute value is within the corresponding attribute value set; it determines whether the same annotation object has multiple different values on the same attribute, or whether there are both null and non-null values.
[0119] When the detection module 605 performs inter-sample annotation conflict detection on the multi-version sample annotation results and / or equivalent sample annotation results in the historical dataset to obtain the dataset-level conflict results: different annotation versions of the same sample data to be detected are normalized respectively. If the normalized annotation results are inconsistent, it is determined to be a multi-version annotation conflict; when two sample data to be detected correspond to the same object, the same source, or share the same classification semantics, an equivalence relationship is established between the two sample data to be detected, and the annotation results of the two sample data to be detected are normalized. If the normalized annotation results are inconsistent, it is determined to be an equivalent sample annotation conflict; when the annotation-level conflict result of any sample data to be detected in the historical dataset is not empty, it is determined that there is an ontology consistency conflict in the historical dataset.
[0120] When the detection module 605 performs conflict detection on the application results of the output application results of the sample data to be detected and obtains the application layer conflict results: it converts the application results of the output application results of the sample data to be detected into an application output representation that is consistent with the sample instance representation; it determines whether the application output representation violates the application layer constraint set in order to identify the output result ontology conflict; it normalizes the application output representation with the sample annotation results that satisfy the equivalence relationship in the historical dataset and compares them. If the normalization results are inconsistent, it is determined that the application output conflicts with the existing samples.
[0121] The dataset conflict detection device 600 also includes a correction module, which summarizes the conflict results at the ontology layer, annotation layer, dataset layer, and application layer, and outputs the conflict type, conflict location, conflict level, and correction suggestions.
[0122] The correction module is specifically used for: classifying conflicts hierarchically according to the ontology layer, annotation layer, dataset layer, and application layer; determining the conflict location based on the ontology node, attribute, sample identifier, annotation object identifier, or application output location where the conflict occurs; classifying the conflict level according to the degree of impact of the conflict on ontology usability, sample trainability, and application output credibility; generating correction suggestions for different conflict types; and the correction suggestions include at least one of the following: deleting illegal relationships, supplementing missing attribute types, modifying attribute values, adjusting the scope of attribute application, merging conflicting version annotations, and triggering manual review.
[0123] The apparatus or module described in the above embodiments can be implemented by a computer chip or physical entity, or by a product with a certain function. For ease of description, the above apparatus is described by dividing it into various modules according to their functions. When implementing the embodiments of this application, the functions of each module can be implemented in one or more software and / or hardware. Of course, a module that implements a certain function can also be implemented by combining multiple sub-modules or sub-units.
[0124] This application also provides an electronic device including one or more processors and a memory storing computer-executable instructions, which, when executed by the one or more processors, cause the one or more processors to perform the above-described dataset conflict detection method.
[0125] In the embodiments of this application, the processor can be a central processing unit, or it can be other general-purpose processors, digital signal processors, application-specific integrated circuits, field-programmable gate arrays or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc. A general-purpose processor can be a microprocessor or any conventional processor.
[0126] The memory may include read-only memory and random access memory (RAM), and provides instructions and data to the processor. The memory may also include non-volatile random access memory. Optionally, the random access memory may be, for example, high-bandwidth memory. This application also provides a computer-readable storage medium storing computer-readable instructions, which, when executed by a computer, implement the above-described dataset conflict detection method.
[0127] The aforementioned storage media include, but are not limited to, random access memory, read-only memory, cache, hard disk, or memory card.
[0128] The various embodiments in this specification are described in a progressive manner. For the same or similar parts between the various embodiments, please refer to each other. Each embodiment focuses on describing the differences from other embodiments.
[0129] The above embodiments are only used to illustrate the technical solutions of this application, and are not intended to limit this application. Although this application has been described in detail with reference to the foregoing embodiments, those skilled in the art should understand that modifications can still be made to the technical solutions described in the foregoing embodiments, or equivalent substitutions can be made to some or all of the technical features therein. Such modifications or substitutions do not cause the essence of the corresponding technical solutions to deviate from the scope of the technical solutions of this application.
Claims
1. A dataset conflict detection method, characterized in that, include: Construct a classification labeling ontology, which includes a set of key entities and relationships; the relationships include compositional relationships and special case relationships. Define an attribute set, wherein each attribute in the attribute set includes an attribute name, an attribute scope, and an attribute value set, and the attribute value set of each attribute includes a null value marker; Acquire the sample data to be detected and the annotation results of the sample data to be detected, and convert the annotation results into a sample instance representation; the sample instance representation includes a set of annotated objects, type annotation predicates, and attribute value annotation predicates; A machine-readable constraint set is generated based on the classification annotation ontology, the attribute set, and the attribute inheritance rules; wherein, the machine-readable constraint set includes an ontology layer constraint set, an annotation layer constraint set, a dataset layer constraint set, and an application layer constraint set; Based on the ontology layer constraint set, ontology structure conflict detection is performed on the classification labeled ontology to obtain ontology layer conflict results. Based on the labeling layer constraint set, intra-sample labeling conflict detection is performed on the sample instance representation of a single sample data to be tested to obtain labeling layer conflict results. Based on the dataset layer constraint set, inter-sample labeling conflict detection is performed on multiple version sample labeling results and / or equivalent sample labeling results in the historical dataset to obtain dataset layer conflict results. Based on the application layer constraint set, conflict detection is performed on the application results output by the sample data to be tested to obtain application layer conflict results.
2. The method according to claim 1, characterized in that, The ontology layer constraints include the directed acyclicity constraint of the classification and labeling ontology, the constraint prohibiting reflexive relations, the constraint prohibiting overlapping relations types, the constraint of uniqueness of special parent nodes, the constraint of uniqueness of attributes with the same name, and the constraint of legality of attribute definitions. The annotation layer constraints include constraints prohibiting unknown types, constraints prohibiting missing types, constraints prohibiting multiple types from being annotated together, constraints prohibiting parent and child types from being annotated together, constraints prohibiting special cases of the same level from being annotated together, constraints prohibiting undefined attributes, constraints on the scope of attribute application, constraints prohibiting attribute values from exceeding the bounds, constraints prohibiting multiple values for a single attribute, and constraints prohibiting the coexistence of null and non-null values. The dataset layer constraints include multi-version annotation constraints, equivalent sample annotation constraints, and ontology consistency constraints. The application layer constraint set includes output result ontology constraints, as well as constraints between application output and existing samples.
3. The method according to claim 2, characterized in that, The process of performing intra-sample annotation conflict detection on a single instance representation of the sample data to be detected to obtain annotation layer conflict results includes: Determine whether there are any type annotations that do not belong to the set of key entities; In the case of attribute value annotations, determine whether the corresponding annotated object is missing a type annotation; Determine whether the same labeled object is simultaneously assigned multiple different types, simultaneously labeled as a special case of the parent class and its descendants, and simultaneously labeled as different special cases under the same parent class; Determine whether the attribute corresponding to the attribute annotation belongs to the attribute set, whether the type of the annotation object is within the corresponding attribute inheritance scope, and whether the attribute value is within the corresponding attribute value set; Determine whether the same labeled object has multiple different values for the same attribute, or whether it has both null and non-null values.
4. The method according to claim 2, characterized in that, The method of performing inter-sample annotation conflict detection on multi-version sample annotation results and / or equivalent sample annotation results in historical datasets to obtain dataset-level conflict results includes: Normalization is performed on different labeled versions of the same sample data to be detected. If the labeled results after normalization are inconsistent, it is determined that the multiple labeled versions conflict. When two samples to be detected correspond to the same object, the same source, or share the same classification semantics, an equivalence relationship is established between the two samples to be detected, and the annotation results of the two samples to be detected are normalized. If the normalized annotation results are inconsistent, it is determined that the equivalent sample annotations conflict. When any of the labeled layer conflict results of the sample data to be detected in the historical dataset is not empty, it is determined that there is an ontology consistency conflict in the historical dataset.
5. The method according to claim 2, characterized in that, The process of performing conflict detection on the application results output from the sample data to be detected to obtain application-layer conflict results includes: The application result output from the sample data to be detected is converted into an application output representation that is consistent with the sample instance representation; Determine whether the application output represents a violation of the application layer constraint set in order to identify ontology conflicts in the output result; The application output is normalized and compared with the sample labeling results that satisfy the equivalence relation in the historical dataset. If the normalization results are inconsistent, it is determined that the application output conflicts with the existing samples.
6. The method according to claim 1, characterized in that, The method further includes: Summarize the conflict results of the ontology layer, the annotation layer, the dataset layer, and the application layer, and output the conflict type, conflict location, conflict level, and correction suggestions.
7. The method according to claim 6, characterized in that, The process of summarizing the conflict results from the ontology layer, the annotation layer, the dataset layer, and the application layer, and outputting the conflict type, conflict location, conflict level, and correction suggestions includes: Conflicts are categorized hierarchically according to the ontology layer, the annotation layer, the dataset layer, and the application layer; The conflict location is determined based on the ontology node, attribute, sample identifier, labeled object identifier, or application output location where the conflict occurs; Conflict levels are classified according to the degree to which conflicts affect ontology usability, sample trainability, and application output credibility. Remedial suggestions are generated for different conflict types; the remedial suggestions include at least one of the following: deleting illegal relationships, supplementing missing attribute types, modifying attribute values, adjusting the scope of application of attributes, merging conflict version labels, and triggering manual review.
8. A dataset conflict detection device, characterized in that, include: The construction module is used to construct the classification labeling ontology, which includes a set of key entities and relationships; The definition module is used to define an attribute set. Each attribute in the attribute set includes an attribute name, an attribute scope, and an attribute value set. The attribute value set of each attribute includes a null value marker. The conversion module is used to acquire the sample data to be detected and the annotation results of the sample data to be detected, and to convert the annotation results into a sample instance representation; the sample instance representation includes a set of annotated objects, type annotation predicates, and attribute value annotation predicates. The constraint generation module is used to generate a machine-readable constraint set based on the classification annotation ontology, the attribute set, and the attribute inheritance rules; wherein, the machine-readable constraint set includes an ontology layer constraint set, an annotation layer constraint set, a dataset layer constraint set, and an application layer constraint set; The detection module is used to perform ontology structure conflict detection on the classification and labeled ontology based on the ontology layer constraint set to obtain ontology layer conflict results; to perform intra-sample annotation conflict detection on the sample instance representation of a single sample data to be detected based on the annotation layer constraint set to obtain annotation layer conflict results; to perform inter-sample annotation conflict detection on multi-version sample annotation results and / or equivalent sample annotation results in the historical dataset based on the dataset layer constraint set to obtain dataset layer conflict results; and to perform conflict detection on the application results output by the sample data to be detected based on the application layer constraint set to obtain application layer conflict results.
9. An electronic device, characterized in that, include: One or more processors; as well as A memory storing computer-executable instructions, which, when executed by the one or more processors, cause the one or more processors to perform the dataset collision detection method as described in any one of claims 1 to 7.
10. A computer-readable storage medium, characterized in that, The computer-readable storage medium stores computer-readable instructions, which, when executed by a computer, implement the dataset conflict detection method as described in any one of claims 1 to 7.