RDFS based on key instances and their query rewriting (C) Methods for detecting pattern conflicts
By converting the RDFS(c) schema into an internal schema and generating key instances, and using query rewriting methods to detect RDFS(c) schema conflicts, the inefficiency of existing technologies is solved, and efficient schema conflict detection and repair are achieved.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- TIANJIN POLYTECHNIC UNIV
- Filing Date
- 2022-12-09
- Publication Date
- 2026-06-30
AI Technical Summary
Existing RDFS(c) schema conflict detection methods are inefficient, unable to efficiently detect whether the RDFS(c) ontology is still valid after the application of inference rules, and are prone to introducing conflicts.
A key instance-based and query rewriting approach is adopted to transform the RDFS(c) schema into an internal schema, generate a key instance mapping set, detect conflicts through a type framework, and infer the violated type constraints and limitations using the query rewriting method.
It significantly improves detection efficiency, enabling efficient detection of pattern conflicts while reducing instance size, and provides support for conflict resolution and pattern repair.
Smart Images

Figure CN116069622B_ABST
Abstract
Description
Technical Field
[0001] This invention belongs to the field of machine semantic understanding, and in particular to an RDFS based on key instances and query rewriting. (c) Methods for detecting pattern conflicts. Background Technology
[0002] As a semantic model proposed by the W3C (World Wide Web Consortium) for a unified description of information resources in the Web environment, RDFS (Resource Description Framework Schema) makes it possible to describe the hierarchical structure between concepts and the semantics of concepts and attributes through a series of semantically defined terms, thus becoming an important foundation for constructing semantic Web ontologies. Ensuring the correctness of RDFS ontologies is key to avoiding error propagation during the ontology modeling process; therefore, the automatic detection of RDFS schema conflicts has received considerable research attention.
[0003] Due to the limitations of RDFS in describing constraints, introducing non-graphical constraint mechanisms into the RDFS schema has become an important development direction in recent years. RDFS ontologies that incorporate non-graphical constraints are collectively referred to as constraint-enhanced RDFS ontologies, i.e., RDFS. (c) Ontology. Due to the diversity and complexity of constraints in the real world, the introduction of non-graphical constraints makes RDFS... (c) The problem of schema conflict detection becomes more complex, especially when inference rules exist. Therefore, although some detection tools have been proposed based on the aforementioned research, such as RDFShape, FHIR, and RDD CHECKER, how to automatically and efficiently detect schema conflicts has not yet been well resolved and has become a key challenge for RDFS. (c) One of the research hotspots in the field of ontology modeling.
[0004] Detection patterns are not static objects; they can evolve over time to reflect changes in the dataset they model. The application of inference rules is a significant source of variation in RDF graph datasets, and the design process for these rules is labor-intensive and prone to introducing conflicts, as the application of inference rules may generate new facts that are not captured by the original pattern definition. Summary of the Invention
[0005] The purpose of this invention is to overcome the shortcomings of the prior art and provide an RDFS based on key instances and query rewriting. (c) Methods for detecting pattern conflicts, given an RDFS (c)A schema and a set of inference rules determine whether an RDF graph remains valid after its closure is computed using the inference rules, if it was initially valid for the schema. If conflicts exist, the effects of applying specific rules that affect the schema are analyzed. Therefore, it is possible to efficiently infer which constraints and type limitations might be violated without examining all instances of the schema, thus supporting conflict resolution and schema repair.
[0006] The present invention solves the existing technical problems by adopting the following technical solution:
[0007] An RDFS based on key instances and query rewriting (c) The method for detecting pattern conflicts includes the following steps:
[0008] Step 1: Configure RDFS (c) Schema conversion to internal schema;
[0009] Step 2: Establish a key instance mapping set and calculate the key instances for generating the internal schema;
[0010] Step 3: Filter the key instance mapping set and generate a type framework;
[0011] Step 4: Use the query rewrite method to infer the violated type constraints and limitations;
[0012] The internal mode S =< S P , S NL , S ED > is a triple, where S P A style set is a collection of triple styles, where each variable name appears at most once. S NL ={ ?v |( ?v ∈ S P )∧¬ Literal ( ?v )} is a non-literal set; S ED It is a collection of embedded dependencies; ?v for S P Variables in;
[0013] The internal mode triplet S P , S NL and S ED Calculation method:
[0014] The S P The calculation method is as follows: initial S P For an empty set, define three types of triples appearing in an RDF graph: the class of a single instance, the data attribute value of a single instance, and the object attribute relationship between two instances; for RDFS (c) Each class in the pattern CL ,Towards S P Add triplet style ?v rdf:type CL For each data attribute in the schema DP ,Towards S P Add < ?v DP Datatype For each object property in the pattern OP ,Towards S P Add < ?v1 OP ?v2 >;
[0015] The S NL The calculation method is as follows: [The text abruptly ends here, likely due to an incomplete sentence or a formatting error.] S P Add all variables appearing in the subject and predicate positions to the... S NL Then, it scans the non-graphical constraints one by one, limiting the values of the variables to instantiation. IRI Add the constraint variables to the constraint. S NL And remove the constraint from the constraint set;
[0016] The S ED The calculation method is as follows: S P The triplet style corresponding to each class is transformed into a unary predicate. The triplet style corresponding to each object attribute and each data attribute is transformed into a binary predicate. The class hierarchy and integrity in the original pattern are transformed into negative constraints. For each non-graphic constraint, all negative texts are moved from the left to the right of the negative constraint and the negative symbol is removed. Each negative constraint is equivalently transformed into an embedded dependency.
[0017] Furthermore, the specific implementation method of step 2 is as follows:
[0018] Definition: Given an internal schema S and inference rule set R , r It is a set of reasoning rules R Any rule in, i.e. r ∈ R, sayBS′ yes S Regarding the rules of reasoning r The framework primitive, denoted as r ( S ), if and only if ,in express BS′ Instance set, express S Instance set, I for Any instance in, For reasoning rules r exist I The application on, for Any subset; called < S′ P , S′ NL , ∅> is S about R The type framework, denoted as fra ( S , R ), if and only if ,in express Instance set, for I In the rule set R The closure below;
[0019] In calculation fra ( S , R During the process of using internal schema S All instances I On the rules of inference r The premise of the rules A Assign values and perform calculations. A exist S Map the SPARQL values across all instances to a set [ A ] I The calculation uses key instances instead of... S All instances of the model are based on S On key instances A Perform the assignment and calculate the key instance mapping set;
[0020] Definition: Given an internal schema S and inference rules r:A → C , S Compared to r Key example 𝕊( S , r ) is a triplet The set of: ,
[0021] in express The set of variables in any element in, i ∈{1,2,3}, It is an embedded dependency S ED The set of constants in It is a rule of reasoning r premise A The set of constants in , , Indicates a non-literal quantity;
[0022] use S ED and A The constants in the model are used to replace the variables in the internal model in all possible ways, including using... All of them IRI Replace μ1 one by one with the literal, using All of them IRI Replace μ2 one by one to create critical instances.
[0023] Furthermore, the method for filtering the key instance mapping set in step 3 is as follows: for each mapping in the key instance mapping set... m Perform the following operations:
[0024] First, create a temporary nonliteral set. S m NL Examine the inference rules one by one r Each variable in the equation, if used S Instances of rules premises A Assignment and the result of the instantiation rule C If the argument cannot be bound to a literal, then the argument will be placed into... S m NL ;
[0025] Then, consider in r The premise of the rules A triples in t A The element appearing in the object position: let 𝕊( S , r )express S Compared to r Key examples, [ ( A )]𝕊(S,r) For key instance 𝕊( S , r The mapping set of ) needs to be t A exist ( A All variants in ) t q Taken into consideration, because [ ( A )] 𝕊(S,r) Each mapping in m In the key instance 𝕊( S , r It is obtained by calculation on ), therefore there exists at least one t q Make m ( t q )∈𝕊( S , r For each of these t q It is necessary to obtain the correct information. m ( t q Triples used for modeling t S ∈ S The set, through triples t S make t A Or one of its variants to map key instances m Matching.
[0026] Furthermore, the generation of the type framework in step 3 is based on the following formula:
[0027] fra ( S , R )= ,in fra ( S , R )yes S about R The type framework, which consists of style sets S′ P Non-literal set S′ NL It is composed of the empty set ∅. S 0= S , S i+1 = , S n = S n-1 ;
[0028] First of all fra ( S , R )=< S′ P , S′ NL , ∅> initialized to < S P , S NL ,∅>, respectively use S style set and non-literal initialization S′ P and S′ NL For each unfiltered mapping m Each binding is examined one by one. If the binding maps the variable to a value other than μ1 or μ2, the binding is kept unchanged. Otherwise, a different new variable is introduced to replace each μ1 and μ2 to generate a new binding.
[0029] Then style the triplet m ( C Add to S′ P and the variable m ( S m NL )⋂ vars ( S′ P Add to S′ NL Following this method, the process is gradually expanded based on each mapping. S′ until all mappings have been considered, at this point S′ It is the final output of the type framework.
[0030] Furthermore, the specific implementation method of step 4 is as follows:
[0031] First, given the internal schema S and inference rule set R ,make S 'express S about R The pattern result, i.e. ;in for Instance set, for S Instance set, I for Any instance in, for I In the rule set R The closure below, for Examples in, Sets representing embedded dependencies S ED quilt What is violated, and Sets representing embedded dependencies S ED quilt What was violated;
[0032] Then, according to the internal schema S =< S P , S NL , S ED >and inference rule set R, Computational Type Framework fra ( S , R ) =< S′ P , S′ NL , ∅>,use type ( S , R () represents the set of type constraints that have been violated. con ( S , R () represents the set of constraints that are violated; initialization type ( S , R )= ∅, con ( S , R ) = ∅, mapping set M = ∅;
[0033] For each embedded dependency ed Examine each inference rule one by one. r:A → C To infer C Can it be violated? ed Through backlinks R Calculate using the rules in A All rewrites; for each such rewrite A w By calculating the mapping set M =[ ( A w )] 𝕊(S,r) To infer whether it is possible to S Match it on the instance;
[0034] For each mapping m ∈ MFirst remove m The code does not map all the variables to μ1 and μ2 and apply the mapping. A w Then map each of the remaining variables to a new one. IRI To generate instances I ;use S ED All embedded dependencies and inference rule sets R calculate I The closure on, if it violates ed ,Will ed Add to the set of violated embedding dependencies; thus making S P from S′ P Remove and output the set of type constraints that have been violated. type ( S , R and the set of violated constraints con ( S , R ).
[0035] The advantages and positive effects of this invention are:
[0036] This invention is reasonably designed. First, it defines the concept of internal schema and provides RDFS. (c) The method for converting to an internal schema is as follows: then, a key instance of the internal schema is generated, which is significantly smaller than the original instance, thus significantly improving the detection efficiency; finally, schema conflict detection is performed on the key instance of the internal schema. During the detection, a query rewriting method is used to expand the key instance and construct potential instances that violate constraints. The minimization of rewriting ensures that the constructed instance has a minimum size, improving the inference efficiency of constraint violations. Attached Figure Description
[0037] Figure 1 This invention describes the RDFS of oil exploration sensors. (c) Schematic diagram;
[0038] Figure 2 It is RDFS (c) model RS Inference rules on 1;
[0039] Figure 3 It is a pattern RS A valid instance of 1 I 1;
[0040] Figure 4 It is a pattern RS Style set of 1 S 1 PA fragment;
[0041] Figure 5a This is a diagram illustrating the impact of critical instances on collision detection time;
[0042] Figure 5b This is a diagram illustrating the impact of query rewriting on conflict detection time. Detailed Implementation
[0043] The embodiments of the present invention will be further described in detail below with reference to the accompanying drawings.
[0044] An RDFS based on key instances and query rewriting (c) The method for detecting pattern conflicts includes the following steps:
[0045] Step 1: Configure RDFS (c) The schema is converted to an internal schema.
[0046] This embodiment uses an RDFS in the field of oil exploration. (c) The ontology serves as the use case. Sensors are carried by workers and collect exploration data in real time. Inference rules are used to deduce unsafe working environments based on the data collected by the sensors. Due to frequent changes in patterns (such as sensor malfunctions or the deployment of new sensors) and changes in inference rules (such as adjustments to business strategies), it is necessary to constantly monitor pattern definitions that conflict with the inference rules.
[0047] The RDFS for this use case is given. (c) model 1. (Sensor measurement) (Results) and (A specific target) (e.g., a location) (Specific attributes) (e.g., temperature). This mode includes two types of sensors. The first type is... (Gas) detector, if the gas concentration ( If the value is within the allowed range, the return value is "0"; otherwise, the return value is "1". The second type is... Readers are used to sense what nearby workers are carrying. Label( This allows for the location of workers. Four additional non-graphical constraints specify that: the collected sensor data will only involve these two types of sensors (Constraint 1); each detected tag must be carried by at least one worker (Constraint 2); the type of target being measured is limited to... (Constraint 3); The type of measurement result is limited to Or literal (Constraint 4).
[0048] In mode 1. Define the set of inference rules. R 1={ r 1, r 2, r 3} (e.g.) (as shown), used for calculation The closure of an instance of 1. Rules r 1 indicates that the sensor recorded This should be interpreted as a worker tag, which records the location where the worker was found. (Rules) r 2 indicates that areas with high methane concentrations should be off-limits. (Rule) r 3 indicates that if anyone is in the restricted area, they are illegally entering the area.
[0049] Want to know the set of reasoning rules R Does the application of 1 lead to... The conflict of 1, that is: in R Under the closure of 1, is each constraint in the pattern? Does it remain valid in all instances of 1? The answer is no. For example... The example shown I 1. It is A valid instance of 1. This instance contains two records of the detected worker tags. and Although I know It was being carried by worker Smith, but there was no information about it. Any such information.
[0050] rule r 1 will be I The triplet is derived in section 1. > and < >, that is It is a worker's label, however, since there is no information about who is carrying that label, Constraint 2 of rule 1 has been violated. r 2 by Triggered, the triplet is inferred. > It is not one of the types allowed in the original pattern, therefore violating the pattern. The type constraint of 1 may require expansion. 1. To allow instances of this type. By rule r 1 and r The facts derived from 2 will collectively trigger the rules. r 3, thus deriving < >, that is, to carry Label People are illegally trespassing into dangerous areas These new facts contain new predicates. ,model The type constraint 1 is violated once again. Therefore, to maintain schema consistency, the aforementioned conflicts must be efficiently identified and resolved.
[0051] For clarity, the present invention uses the following symbols:
[0052] (1) Basic elements. Literals and These are collectively called constants. Variables and constants are collectively called terms. Atoms p ( t 1, ..., t n ) is acting on n individual items t 1, ..., t n The predicate, abbreviated as p( Atom p( ) and its negation ¬p( These are collectively referred to as text. An RDF graph is a collection of triples. ⨉ ⨉( ⋃ ),in It is all The set, It is the set of all literals. It is the set of all variables. A graph pattern is defined as ( ⋃ )⨉( ⋃ )⨉( ⋃ ⋃ A set of triplet patterns. Given a pattern P , ( P )and ( P ) represent the drawing styles respectively. P The set of arguments and constants in the text. Literals are represented as strings enclosed in double quotes, such as "l"; arguments are represented as strings prefixed with a question mark, such as... ?v Triples t The first, second, and third elements are called the subject, predicate, and object, respectively, and will be represented as follows: t [ x ], x ∈{1, 2, 3}.
[0053] (2) Substitution and Mapping. A variable substitution is a partial function. ⋃ ⋃ It can also be represented as a set { x 1 / t 1, ..., x n / t n}, where each variable x i They are all distinct. A mapping is defined as... ⋃ Variable substitution. Given a mapping m ,if m ( ?v ) = n ,but m Includes binding ?v → n Mapping m The field of a is the set of all its variables, denoted as . ( m Given a diagram pattern P and a mapping m ,use m ( P ) indicates that P Each appearing in ?v Replace with m ( ?v The style generated afterward. A closed mapping is a mapping that transforms a graph style into a graph. Given a graph style... P and picture I ,Will P exist I The SPARQL values on the map are represented as a mapping set. P ] I A graph pattern is said to match a graph if its values on the graph are a set of non-empty mappings. Given a graph pattern... S , use 𝕀( S )express S The set of instances, i.e. S The graph being modeled. Two graph modes. S and S′ They are semantically equivalent if and only if they have the same set of instances (i.e., 𝕀( S ) = 𝕀( S′ )).
[0054] (3) Embeddedness dependency. A negative constraint is of the form of... The first-order logic expression, where It is a conjunction of literals, and ⊥ is an atom whose value is always false. Intuitively, the left-hand side of a negation constraint represents a condition that will never be satisfied by any instantiation. An embedded dependency (ED) is of the form... The first-order logic expression, where and Both are conjunctions of words. n It is a non-negative integer. When n When the value is 0, the disjunction on the right-hand side of the embedding dependency is empty, and its value is ⊥. Therefore, the negation constraint is essentially a special form of embedding dependency. Embedding dependency restricts which triples must exist in the RDF graph. For example, embedding dependency... a c Limited to only a For an instance to be valid in the graph, the graph must contain instances of that instance. c Examples. Furthermore, given an RDF graph... I If a mapping exists m ∈[{ a}] I Make [ m ( c )] I =∅ (i.e.) m ( c Not here I (in Chinese), then it is called a c In the figure I The condition is met if the condition is violated; otherwise, the dependency is satisfied. Given a set of embedded dependencies... E ,use ( E , I ) represents a pair < m , The set of >, where ∈ E And mapping m lead to In the example I The above was violated.
[0055] (4) Inference rules. Inference rules take the form of... A → C The premise of A and results C All are graph patterns. Since each graph pattern essentially corresponds to a conjunctive query, each inference rule can be represented as a SPARQL Construct query. The result of the rule... C This is indicated in the Construct clause of the query. It also relates to the premises indicated in the WHERE clause. A After assigning a value, CIt will be instantiated. (Use) r ( I () to represent inference rules r:A → C For dataset I The application, i.e. r ( I )= Dataset I In a set of reasoning rules R The closure below is denoted as ( I , R ), is achieved by repeatedly applying R The unique dataset obtained by using all rules until no new statements can be derived is... ( I , R )= ,in I 0= I , I i+1 = .
[0056] Based on the above explanation, this step will use RDFS. (c) The method for converting a schema to an internal schema is as follows:
[0057] To model all instances of a graph pattern, this invention proposes the concept of an internal schema. An internal schema consists of a set of triplet patterns and their variable subsets, and a set of patterns related to RDFS. (c) The constraints correspond to embedded dependencies. The introduction of internal schemas allows for the extraction of data from RDFS. (c) This is abstracted from the specific syntax of the internal schema. This brings two advantages: firstly, it simplifies the method of this invention; secondly, it makes it applicable to other languages (such as ShEx and SHACL), as long as a mapping between such a language and the internal schema is established. The definition of the internal schema and its calculation method are given below.
[0058] Definition 1: An internal schema S =< S P , S NL , S ED > is a triple. S P A style set is a collection of triple styles, in which each variable name appears at most once. S NL ={ ?v |( ?v ∈ S P )∧¬ (?v )} is called the nonliteral set; S ED It is a collection of embedded dependencies.
[0059] (1) S P Calculation
[0060] With RDFS (c) model For example, 1 represents a fragment of the style set in its corresponding internal schema, such as... As shown, each variable can be accessed through a Alternatively, it can be instantiated using a literal. This set of triple styles constitutes the first element of the inner pattern, the style set. It's important to note that a valid instance of the inner pattern may contain multiple instances of some predicates defined in the style set, but not necessarily multiple instances of all predicates in the style set. Furthermore, the variables used in the style set only act as wildcards and do not imply concatenating the triple styles together.
[0061] As mentioned above, S P The types of triples that can appear in an RDF graph are defined. For any given RDFS... (c) In a schema, the triples in a graph instance are of three types: the class of a single instance; the data attribute value of a single instance; and the object attribute relationship between two instances. Therefore, for RDFS... (c) Each class in the pattern ,Towards S P Add triplet style ?v For each data attribute in the schema ,Towards S P Add < For each object property in the pattern ,Towards S P Add < >
[0062] (2) S NL Calculation
[0063] The second element of the inner pattern is a subset of the variables in the style set, called the non-literal set. The non-literal set specifies which variables cannot be instantiated using literals. For example, variables cannot be instantiated using literals. S 1 P The triplet style in < ?v7 > Variables ?v7 and?v8 .exist ?v7 In this case, it's because valid RDF triples won't be generated, while... ?v8 In this case, it's because it would violate constraint 3. For example, if < >∈ S 1 P and ?v10 S 1 NL , then the pattern An instance of 1 can contain... Any triplet that serves as the predicate. If < ?v3 >∈ S 1 G and ?v3 ∈ S 1 NL , then the pattern An instance of 1 can contain types of Any entity.
[0064] S NL In fact, it is for S P Further restrictions on the value types of the variables in the code. Because... S NL The variables in can only be instantiated as ,therefore S NL Must include at least S P All variables appearing in the subject and predicate positions. Add the above two types of variables to... S NL Then, it is necessary to scan the non-graphical constraints one by one, for those that limit the values of the variables to... Constraints (e.g.) Constraint 3 of 1 needs to have its constraint variables added. S NL And remove the constraint from the constraint set.
[0065] (3) S ED Calculation
[0066] The last element of the internal schema is generated by RDFS. (c) A set of embedded dependencies resulting from constraint transformation. Although S P and S NLTogether, they define the set of all possible triples that can be found in graph instances, but not all combinations of these triples are necessarily valid instances of the inner schema. (Set of embedding dependencies) S ED Further requirements that instances of an internal schema must satisfy are defined. Formally, a graph instance... I It is an internal mode. S P , S NL , S ED A valid instance of > if and only if ( S ED , I )=∅ and for I Each triplet in t I There exists a mapping for each. m and triplet pattern t S ∈ S P , making m ( t S ) = t I and m Will not S NL Any variable is bound to a literal.
[0067] The following explanation S ED The calculation method for RDFS. (c) The elements in the pattern (classes, object attributes, data attributes, class hierarchy, integrity, constraints) are formalized. The correspondence between the basic elements and first-order logical expressions is shown in the table below. S P The triplet style corresponding to each class will be converted into a unary predicate. The triplet style corresponding to each object property and each data property will be converted into a binary predicate. For example, class Corresponding triplet style < ?v3 >will be transformed into ( ?v3 Object properties Corresponding triplet style < >will be transformed into ( ?v1 , ?v2 The class hierarchy and integrity in the original pattern will be transformed into negative constraints. For example... Class in 1 and The class hierarchy between them is formalized as ( ?v )∧¬ ( ?v →⊥, this formula precisely specifies all The instance must also be Examples.
[0068] For each non-graphical constraint, move all negation literals from the left to the right of the constraint and remove the negation symbols, thus converting each negation constraint into an equivalent embedded dependency. For example, for a negation constraint... ( ?v1 )∧¬ ( ?v1 , ?v2 ) ⊥, we can obtain the following embedded dependency. : ( ?v1 ) ( ?v1 , ?v2 ).
[0069]
[0070] Step 2: Establish a key instance mapping set and calculate the key instances that generate the internal schema.
[0071] As mentioned above, the objective of this invention is to predict RDFS. (c) Constraint violations do not require consideration of all instances, i.e., given RDFS (c) The inner mode corresponding to the pattern S :< S P , S NL , S ED >and a set of reasoning rules R To determine S Did it capture the application of rule sets on any potential instances? R All the inferences drawn. Once a conflict is detected, it will be deduced which constraints might be violated.
[0072] Definition 2: Given an internal pattern S and inference rule set R ,say yes S about r ( r ∈ R The framework primitive of ) is denoted as r ( S ), if and only if ;Called< S′ P , S′ NL , ∅> is S about R The type framework, denoted as ( S , R ), if and only if .
[0073] As can be seen from Definition 2, S about R The type framework is S It is a restricted set of instances of a pattern, and therefore it is essentially a pattern as well. The type framework captures only the instances of the pattern in the inference rules. r The closure of an instance may contain all triple types, without taking into account the embedding dependencies corresponding to the constraints. It is important to note that each subset of the closure of an instance is still an instance of the type frame, so a valid instance of the type frame may contain only the results of the inference rules applied, without containing the set of triples that match the premises, and vice versa.
[0074] In calculation ( S , R In the process, it is necessary to examine each inference rule individually. r And the calculation is performed iteratively. It is not difficult to see that... ( S , R ) is through all r ∈ R Repetitive computation framework primitives r ( S This continues until no new styles are inferred, i.e. ( S , R )= ,in S 0= S , S i+1 = , S n = S n-1 According to Definition 2, in order to obtain r ( S ) Calculation required r right S The application of this process to all instances (including potential instances) requires the utilization of S All instances I right r The premise (i.e., the drawing style)A Assigning values, i.e., calculating A exist S The SPARQL values on all instances, therefore the problem is transformed into a mapping set [ A ] I The computation of this model is computationally inefficient due to the enormous size of its effective instances for any given pattern. To address this issue, key instances will be used instead of... S All instances modeled (including potential instances). Calculate the mapping set [ A ] I The method is based on S On key instances A Perform the assignment.
[0075] Definition 3: Given an internal schema S and inference rules r:A → C , S Compared to r Key example 𝕊( S , r ) is a triple t The set of: ,in t S ∈ S P , i ∈{1,2,3}, ,
[0076] use S ED and A The constants in the schema are used to replace the variables in the internal schema in all possible ways to create key instances, that is... All of them Assigning each literal value to the variable μ1, and so on. All of them Assign values to the variable μ2 one by one, while ensuring that the result is a valid RDF graph (i.e., literals only appear in the object position) and that it is an instance of the original pattern (i.e., literal substitution is not used). S NL (variables in the text). Due to the significant reduction t [ i The range of values for ], and key examples, enable this invention to perform operations on significantly reduced-scale data.
[0077] Step 3: Filter the key instance mapping set and generate the type framework.
[0078] The aforementioned key example 𝕊( S , rThe mapping set of ) ( A )] 𝕊(S,r) The computation process does not take into account the restrictions that RDF graphs place on literals. For example, the generated mappings may map variables at the subject or predicate positions to literals, thus generating invalid triple patterns. Therefore, in order to comply with these restrictions, the mapping set needs to be modified. ( A )] 𝕊(S,r) Filtering is performed to determine which variables should be included in the non-literal set of the type frame, and the unremoved mappings are then applied to the rules. r The result (i.e., the pattern) C This generates a type framework.
[0079] for[ ( A )] 𝕊(S,r) Each mapping in m Perform the following operations. First, create a temporary non-literal set. S m NL Examine the rules one by one. r Each variable in the equation, if used S Instances of rules premises A Assignment and the result of the instantiation rule C If the argument cannot be bound to a literal, then the argument will be placed into... S m NL Because it is known that it appeared A or C The arguments in the subject and predicate positions of the triple cannot be matched with literals or instantiated using literals, therefore S m NL Initialized to r The aforementioned variables.
[0080] Subsequently, consider in A triples t A The element that appears in the object position. Because A It has been expanded to ( A Therefore, it is necessary to... t A exist ( A All variants in ) t q Taken into consideration. Due to mapping m In the key instance 𝕊( S , rIt is obtained by calculation on ), therefore there exists at least one t q Make m ( t q )∈𝕊( S , r For each of these t q It is necessary to obtain the correct information. m ( t q Triples used for modeling t S ∈ S The set, that is, these triples t S make t A Or one of its variants can associate key instances with mappings m Matching. Furthermore, (1) if t A [3] is a literal, or through m Map a variable to a literal and check if it exists. t S Make t S [3] equals the literal or t S [3] S NL If such a thing does not exist. t S This means m ( A It can't be S A valid instance of , therefore, m from[ ( A )] 𝕊(S,r) Delete; (2) if t A [3] is through m Map to a variable of μ1 or μ2 and check if it exists. t S Make t S [3] S NL If you can't find such a thing... t S In other words, under this mapping t A [3] Unable to bind to a literal, in this case, the variable will be... t A[3] Add to S m NL Following the above approach, for all triples... t A ∈ A Process it if m If it was not ultimately deleted, then it means... m No S m NL Any variable in the string is bound to a literal, thus allowing the string to be bound to a literal. m Applied to rules r And based on m Generate framework primitives r ( S ).
[0081] The generation of the type framework is based on the following formula: ( S , R )= ,in S 0= S , S i+1 = , S n = S n-1 Specifically, firstly, ( S , R )=< S′ P , S′ NL , ∅> initialized to < S P , S NL ,∅>, that is, using them separately S style set and non-literal initialization S′ P and S′ NL For each unfiltered mapping m Each binding is examined one by one. If the binding maps the variable to a value other than μ1 or μ2, the binding remains unchanged; otherwise, a completely new variable is introduced to replace each μ1 and μ2, thus generating a new binding. Then the triplet style is applied. m ( C Add to S′ P and the variable m ( S m NL )⋂ (S′ P Add to S′ NL Following this method, the process is gradually expanded based on each mapping. S′ until all mappings have been considered, at this point S′ It is the final output of the type framework.
[0082] Step 4: Use the query rewrite method to infer the violated type constraints and limitations.
[0083] Inferring RDFS (c) The task of pattern conflict resolution will be built upon the type framework. This step will combine the already obtained... S ′ P and S ′ NL The set of violated constraints is computed by analyzing the interaction between the results of inference rules and embeddedness dependencies; that is, by computed across all possible instances of the "closure" (internal schema). S Instances in R The set of embeddedness dependencies that are violated on the closure below.
[0084] Definition 4: Given an internal pattern S and inference rule set R ,say S 'yes S about R The pattern result, if and only if .
[0085] According to Definition 4, internal schema S About the set of reasoning rules R The pattern result is for all satisfying Examples I′ The modeling pattern. Intuitively, instances. I It will not lead to S ED Violation of embedded dependency in, but with rule set R Apply to I The instances generated afterward (i.e. ( I , R This could potentially lead to violations of these embedded dependencies. I′ and ( I , R The set of embeddedness dependencies violated is the same, while the pattern result... S′ That is for all I′ The modeling pattern.
[0086] The following explains how to infer violated type constraints and limitations. This invention starts from the internal schema.S Starting with a key instance, due to the inference rules directed towards that instance... R New facts added to a closure may lead to a violation of one or more embedding dependencies; therefore, the proposed algorithm aims to find the internal schema. S An example I , I By embedding dependencies a c Precedence a Mapped to ( I , R This is used to trigger the constraint. To improve the efficiency of this process, for each embedded dependency... The goal is to construct in a "minimal" way. I In other words, if ( I , R )satisfy Then it does not exist. I proper subsets I′ , making I′ Still S An instance and does not satisfy By each Constructing such a tiny instance I It can be deduced that each And whether the corresponding constraints are violated on the potential closure of the critical instance.
[0087] If an instance violates the constraint I Existence exists because such an instance must contain a predecessor with an embedding dependency. a Matching facts, therefore, these instances can be intuitively found by starting with the following triples: (1) the closed permutation of the premises of the inference rule on the closure of the key instance; (2) the triples in the key instance whose closures contain elements that can be mapped to... a The fact is that, in order to find the premise of the minimum number of inference rules that must be closed-formed, it is necessary to first obtain the premise that the result can violate... The rules, and from their premises A Initiating "reverse" reasoning, thereby further calculating its result, can trigger... A The rules are based on premises. The rewrites generated by the algorithm of this invention are essentially based on premises "transmitted" through reasoning rules, which can produce... A This leads to further inferences about... The facts that match the antecedents. By instantiating these rule premises as a rewrite (which is also < S′ P , S′NL (an instance of ∅>), which can imply A Generate inner schema on closure S This is such a "tiny" instance. Since the above rewrite represents all possible instances, it is then possible to detect... Is it valid? If If it is valid in all of these rewrites, then it means exist S This will not be violated in any closure of an instance. The following gives RDFS. (c) Algorithms for detecting pattern conflicts:
[0088] Input: Internal mode S =< S P , S NL , S ED >, rule set R .
[0089] Output: The set of violated type constraints ( S , R ), the set of constraints that are violated ( S , R ).
[0090] (1) Computational Type Framework ( S , R ) =< S′ P , S′ NL , ∅>;
[0091] (2) Initialization ( S , R )= ∅, ( S , R ) = ∅, mapping set M = M′ = M′′ = ∅;
[0092] (3) For each c ∈ S ED Perform the following operations:
[0093] (4) Let the minimum set of the rule premises be W = ∅;
[0094] (5) If [ ( a )] 𝕊(S,r) ≠∅, then:
[0095] (6) {For each r:A → C ∈ R Reverse reasoning, calculating the implications through minimal rewriting A The premise of the rules A W Minimal set;
[0096] (7) Order W = W ⋃ { A W};
[0097] (8) For each A W ∈ W Perform the following operations:
[0098] (9) M = [ ( A W )] 𝕊(S,r) ;
[0099] (10) For each m ∈ M Perform the following operations:
[0100] (11) Order m = m \{Will ( A W Mapping to all bindings of elements other than μ1 and μ2;
[0101] (12) will ( m ( A W Each variable in the expression is mapped to a randomly generated new variable. And let the mapping be g ;
[0102] (13) Order I = g ( m ( A W )), I′ =∅;
[0103] (14) If I ≠ I′ , but:
[0104] (15) {Let} I′ = I ;
[0105] (16) For each c′ ∈ S ED Perform the following operations:
[0106] (17) Order M′ = [ ( a′ )] I ;
[0107] (18) For each m′ ∈ M′ , if[ ( m′ ( c′ ))] I ≠∅, then:
[0108] (19) {will ( m′ ( c Each variable in the expression is mapped to a randomly generated new variable. ,get g′ ;
[0109] (20) I = I ⋃ g′ ( m′ ( c ));}
[0110] (21) Jump to (16);}
[0111] (22) Order I = ( I , R ), M′′ = [ ( a )] I ;
[0112] (23) Examine each one in turn m′′ ∈ M′′ If there exists [ ( m′′ ( c ))] I ≠∅, then let ( S , R ) = (S , R )⋃{ };
[0113] (24) Jump to (10);
[0114] (25) Jump to (8);}
[0115] (26) Jump to (3);
[0116] (27) Order ( S , R )= S′ P \ S P ;
[0117] (28) Return ( S , R )and ( S , R ).
[0118] Algorithm 1 provides the calculation ( S , R )and ( S , R The pseudocode for each embedded dependency. This invention examines each reasoning rule one by one. r:A → C To infer C Can it be violated? ed (Lines 3-6). Then via backlinks R Calculate using the rules in A All rewrites. For each such rewrite A w By calculating the mapping set M =[ ( A w )] 𝕊(S,r) To infer whether it is possible to S Match it on the instance. If M If empty, then there is no such expression. A w Matching S Otherwise, examine whether this potential match violates the rules. ed (Lines 11-23). For each mapping m ∈ M First remove mThe code does not map all the variables to μ1 and μ2 and apply the mapping. A w Then map each of the remaining variables to a new one. IRI To generate instances I ,Right now A w Instances generated through closed mappings (lines 11-13). To ensure I yes S One example, using S ED All embedded dependencies in I The closures satisfying the constraints are calculated (lines 14-22). Finally, the inference rules are used. R calculate I The closure on, if it violates ed ,Will ed Add to the set of violated embedding dependencies (line 23). Because S′ P It includes element types that are not modeled by the internal schema, and the algorithm will ultimately... S P from S′ P Remove and output the set of type constraints that have been violated. type ( S , R and the set of violated constraints con ( S , R ).
[0119] Through the above steps, RDFS based on key instances and query rewriting was implemented. (c) Pattern conflict detection function.
[0120] Finally, applicants at different sizes of RDFS (c) The effectiveness of this method was evaluated in terms of its implementation pattern, and the improvement in detection efficiency caused by the use of key instances and query rewriting was tested. The specific evaluation method is as follows:
[0121] In selecting the dataset, to comprehensively reflect the uniqueness of different real-world datasets, this test method is configured with nine parameters: λ1, λ2, λ3, | PR |,| UR |,| LI |,| R |,| S P |,| S ED The following is a brief introduction. Random triplet patterns are created as follows: the predicate follows the predicate... IRI set PR The elements in the subject and object are randomly selected. They are either constants with probability λ1 or new variables. The constant for the subject position is randomly selected. IRI To instantiate, while the constants in the object position use random values with a 50% probability each. IRI Instantiated using random literals. From collections respectively. UR and LI ( UR ⋂ PR =∅)Select random IRI And literals. Inference rules are created as follows: premises and results are conjunctive expressions of triple patterns, with numbers λ2 and λ3 respectively, selected from the previously generated random triple patterns. Each constraint in the original schema is created by directly initializing its corresponding embedding dependency. The antecedent of the embedding dependency is randomly selected from all results of the inference rule, while its consequent is randomly selected from all premises of all inference rules. This is done to ensure the relevance of the embedding dependencies and increase their probability of interacting with the inference rules. In each execution of the experiment, this invention generates such a set of inference rules. R And fill an inner pattern based on the original pattern. S =< S P , S NL , S ED To ensure that inference rules are available for each pattern instance, one half of the pattern is initialized with premise triples of randomly selected inference rules, while the other half is filled with random triple patterns. All tests were performed on an Intel i7-6700, 10.5GB RAM, and Windows 10 operating system.
[0122] Figure 5a Showing as S P The detection time was compared between using key instances and query rewriting and not using these techniques, with an increase in the number of triplet patterns. Experiment a: Other parameters of the experiment were set as follows: λ1=0.3, λ2=2, λ3=2, | PR |=1.5| S P |,| UR |=| LI |=| S P |,| R |=16,| S ED |=39. It can be seen that the above techniques significantly improve the efficiency of collision detection. The reason for this is that... S ,R The calculation process depends on S P and S ED Despite | S P Increasing the value of | will lead to an increase in the size of the critical instance, but the magnitude of this increase decreases in a polynomial order, which violates the principle of |. ed The growth rate of the size of extremely small instances also benefits from | S P The increase in | The combined effect of these two factors resulted in a quadratic improvement in time performance.
[0123] Figure 5b Showing as S ED The increased embedding dependency in the data affects the detection time of the above techniques. Experiment b: Other parameters are set as follows: λ1=0.7, λ2=3, λ3=1, | PR |=16,| UR |=| LI |=| S P |=34,| R |=9. Because 𝕊( S , R The computational process and the growth rate of the size of extremely small instances both benefit from | S ED With the increase of |, the above techniques have a similar effect on improving detection efficiency as in experiment a, which is evident from... Figure 5b The changing trend of the curve is also reflected. Experimental results show that the above techniques are effective for RDFS. (c) It significantly improves the detection efficiency of pattern conflicts.
[0124] It should be emphasized that the embodiments described in this invention are illustrative rather than limiting. Therefore, this invention includes, but is not limited to, the embodiments described in the specific implementation. Any other implementations derived by those skilled in the art based on the technical solutions of this invention are also within the scope of protection of this invention.
Claims
1. An RDFS based key instance and query rewriting (c) A method for detecting schema conflicts, characterized in that: Includes the following steps: Step 1, convert the RDFS (c) schema into an internal schema; Step 2: Establish a key instance mapping set and calculate the key instances for generating the internal schema; Step 3: Filter the key instance mapping set and generate a type framework; Step 4: Use the query rewrite method to infer the violated type constraints and limitations; The internal mode S =< S P , S NL , S ED > is a triple, where S P A style set is a collection of triple styles, where each variable name appears at most once. S NL ={ ?v |( ?v ∈ S P )∧¬ Literal ( ?v )} is a non-literal set, ¬ Literal ( ?v ) represents a variable ?v It belongs to nonliteral units; S ED It is a collection of embedded dependencies; ?v for S P Variables in; The internal mode triplet S P , S NL and S ED Calculation method: The S P The calculation method is as follows: initial S P For an empty set, define three types of triples appearing in an RDF graph: the class of a single instance, the data attribute value of a single instance, and the object attribute relationship between two instances; for RDFS (c) Each class in the pattern CL ,Towards S P Add triplet style ?v rdf:type CL For each data attribute in the schema DP ,Towards S P Add < ?v DP Datatype For each object property in the pattern OP ,Towards S P Add < v1 OP v2 >; The S NL The calculation method is as follows: [The text abruptly ends here, likely due to an incomplete sentence or a formatting error. S P Add all variables appearing in the subject and predicate positions to the... S NL Then, it scans the non-graphical constraints one by one, limiting the values of the variables to instantiation. IRI Add the constraint variables to the constraint. S NL And remove the constraint from the constraint set; The S ED The calculation method is as follows: S P The triplet style corresponding to each class is transformed into a unary predicate. The triplet style corresponding to each object attribute and each data attribute is transformed into a binary predicate. The class hierarchy and integrity in the original pattern are transformed into negative constraints. For each non-graphic constraint, all negative texts are moved from the left to the right of the negative constraint and the negative symbol is removed. Each negative constraint is equivalently transformed into an embedded dependency.
2. The RDFS based on key instances and query rewriting as described in claim 1 (c) The method for detecting pattern conflicts is characterized by: The specific implementation method of step 2 is as follows: Definition: Given an internal schema S and inference rule set R , r It is a set of reasoning rules R Any rule in, i.e. r ∈ R, say BS′ yes S Regarding the rules of reasoning r The framework primitive, denoted as r ( S ), if and only if ,in express BS′ Instance set, express S Instance set, I for Any instance in, For reasoning rules r exist I The application on, for Any subset; called < S′ P , S′ NL , ∅> is S about R The type framework, denoted as fra ( S , R ), if and only if ,in S′ P Representation of instances S′ style set, S′ NL Representation of instances S′ The nonliteral set, express Instance set, for I In the rule set R The closure below; In calculation fra ( S , R During the process of using internal schema S All instances I On the rules of inference r The premise of the rules A Assign values and perform calculations. A exist S Map the SPARQL values across all instances to a set [ A ] I The calculation uses key instances instead of... S All instances of the model are based on S On key instances A Perform the assignment and calculate the key instance mapping set; Definition: Given an internal schema S and inference rules r:A → C ,in r For the rule name, and A and C Rules respectively r The premise and the result, S Compared to r Key example 𝕊( S , r ) is a triple t The set of: , in express The set of variables in any element in, i ∈{1,2,3}, It is an embedded dependency S ED The set of constants in It is a rule of reasoning r premise A The set of constants in , , Indicates a non-literal quantity; use S ED and A The constants in the model are used to replace the variables in the internal model in all possible ways, including using... All of them IRI Replace μ1 one by one with the literal, using All of them IRI Replace μ2 one by one to create critical instances.
3. The RDFS based on key instances and query rewriting as described in claim 2 (c) The method for detecting pattern conflicts is characterized by: The method for filtering the key instance mapping set in step 3 is as follows: for each mapping in the key instance mapping set... m Perform the following operations: First, create a temporary nonliteral set. S m NL Examine the inference rules one by one r Each variable in the equation, if used S Instances of rules premises A Assignment and the result of the instantiation rule C If the argument cannot be bound to a literal, then the argument will be placed into... S m NL ; Then, consider in r The premise of the rules A triples in t A The element appearing in the object position: let 𝕊( S , r )express S Compared to r Key examples, [ ( A )] 𝕊(S,r) For key instance 𝕊( S , r The mapping set of ) needs to be t A exist ( A All variants in ) t q Taken into consideration, because [ ( A )] 𝕊(S,r) Each mapping in m In the key instance 𝕊( S , r It is obtained by calculation on ), therefore there exists at least one t q Make m ( t q )∈𝕊( S , r ),in m ( t q ) indicates that t q Each variable appearing in ?v Replace with m ( ?v The styles generated afterward, for each such... t q It is necessary to obtain the correct information. m ( t q Triples used for modeling t S ∈ S The set, through triples t S make t A Or one of its variants to map key instances m Matching.
4. The RDFS based on key instances and query rewriting as described in claim 3 (c) The method for detecting pattern conflicts is characterized by: The generation of the type framework in step 3 is based on the following formula: fra ( S , R )= ,in fra ( S , R )yes S about R The type framework, which consists of style sets S′ P Non-literal set S′ NL It is composed of the empty set ∅. S 0= S , S i+1 = , S n = S n-1 ; First of all fra ( S , R )=< S′ P , S′ NL , ∅> initialized to < S P , S NL ,∅>, about to S P As S′ P The initial value will be S NL As S′ NL The initial value is determined for each unfiltered mapping. m Each binding is examined one by one. If the binding maps the variable to a value other than μ1 or μ2, the binding is kept unchanged. Otherwise, a different new variable is introduced to replace each μ1 and μ2 to generate a new binding. Then the triplet style m ( C Add to S′ P and the variable m ( S m NL )⋂ vars ( S′ P Add to S′ NL Following this method, the process is gradually expanded based on each mapping. S′ until all mappings have been considered, at this point S′ It is the final output of the type framework.
5. The RDFS based on key instances and query rewriting as described in claim 1 (c) The method for detecting pattern conflicts is characterized by: The specific implementation method of step 4 is as follows: First, given the internal schema S and inference rule set R ,make S 'express S about R The pattern result, i.e. ;in for Instance set, for S Instance set, I for Any instance in, for I In the rule set R The closure below, for Examples in, Sets representing embedded dependencies S ED quilt What is violated, and Sets representing embedded dependencies S ED quilt What was violated; Then, according to the internal schema S = < S P , S NL , S ED >and inference rule set R, Computational Type Framework fra ( S , R ) = < S′ P , S′ NL , ∅>,use type ( S , R () represents the set of type constraints that have been violated. con ( S , R () represents the set of constraints that are violated; initialization type ( S , R )= ∅, con ( S , R ) = ∅, mapping set M = ∅; For each embedded dependency ed Examine each inference rule one by one. r:A → C To infer C Can it be violated? ed Through backlinks R Calculate using the rules in A All rewrites; for each such rewrite A w By calculating the mapping set M =[ ( A w )] 𝕊(S,r) To infer whether it is possible to S Match it on instances where 𝕊( S , r )yes S Compared to r Key examples, [ ( A w )] 𝕊(S,r) For 𝕊( S , r Relative to ( A w The mapping set of ); For each mapping m ∈ M First remove m The code does not map all the variables to μ1 and μ2 and apply the mapping. A w Then map each of the remaining variables to a new one. IRI To generate instances I ; use S ED All embedded dependencies and inference rule sets R calculate I The closure on, if it violates ed ,Will ed Add to the set of violated embedding dependencies; Thus S P from S′ P Remove and output the set of type constraints that have been violated. type ( S , R ) and the set of violated constraints con ( S , R ).