Method for constructing a large model-based metadata discovery agent facing data weaving
By unifying the access to multi-source heterogeneous data and processing it with a large language model, structured semantic metadata instances are generated, solving the problem of the lack of unified representation for different types of data and realizing unified processing and management across data types.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- CHINESE PEOPLES LIBERATION ARMY INFORMATION SUPPORT CORPS ENGINEERING UNIVERSITY
- Filing Date
- 2026-05-21
- Publication Date
- 2026-06-19
AI Technical Summary
In existing technologies, different types of data lack a unified data representation structure, making it difficult to perform consistent data processing operations based on a unified input in subsequent processing.
By unifying the access of multi-source heterogeneous data, standardized input data objects and multi-dimensional data feature description objects are constructed. A large language model is used for semantic interpretation and relation inference to generate semantic metadata instances, which are then mapped to a unified semantic metadata model for verification and storage.
It achieves unified representation and consistent processing of different types of data, generates structured semantic metadata, and expresses the structural information, business semantic information and relationship information between data objects in a unified data structure, supporting subsequent unified management and processing.
Smart Images

Figure CN122240712A_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the field of computer technology, and in particular to a method and system for constructing metadata discovery agents based on large models for data weaving. Background Technology
[0002] With the development of data weaving technology, the data generated in different business systems is gradually showing characteristics of multi-source and heterogeneity, and the data types include structured data, semi-structured data, unstructured data and multimodal data.
[0003] In existing technologies, the processing of multi-source heterogeneous data typically involves designing separate processing flows for different data types. For structured data, processing is performed by parsing the table structure and field information; for semi-structured data, key-value pairs are generated by parsing the hierarchical structure; for unstructured data, semantic information is extracted using text processing methods; and for multimodal data, feature information is extracted using corresponding perceptual models.
[0004] However, in the process of implementing the technical solution of this application, the inventors of this application discovered that the above-mentioned technology has at least the following technical problems: the prior art lacks a unified data representation structure for different types of data, which makes it difficult to perform consistent data processing operations based on a unified input in subsequent processing. Summary of the Invention
[0005] To overcome the above shortcomings, this invention provides a method and system for constructing metadata discovery agents based on large models for data weaving. It aims to improve the problem in the prior art where different types of data lack a unified data representation structure, making it difficult to perform consistent data processing operations based on a unified input in subsequent processing.
[0006] In a first aspect, the present invention provides the following technical solution: a method for constructing a metadata discovery agent based on a large model for data weaving, comprising the following steps: S1. Access multi-source heterogeneous data, identify and register the accessed data sources at the data source level, generate data source identification information, and sample and parse the data based on the data source identification information to form a standardized input data object. S2. Perform data feature parsing on the standardized input data object to obtain a multi-dimensional data feature description object; S3. Based on the preset semantic understanding prompt template, the multidimensional data feature description object is input into the large language model. The large language model performs semantic interpretation, business meaning induction and semantic relationship inference between data objects on the data object, and performs structured parsing on the output result. The structured parsing result is then constrained according to the semantic metadata model constraints. S4. Generate a semantic metadata instance based on the structured parsing result after constraint processing. The semantic metadata instance includes at least technical metadata, business metadata, and semantic relationship metadata. S5. Map the semantic metadata instance to the unified semantic metadata model, perform consistency verification, integrity verification and rationality verification, and write the semantic metadata instance that passes the verification into the metadata knowledge graph; S6. Call the semantic metadata that has been written into the metadata knowledge graph, collect the feedback information generated during the call, and adjust the semantic understanding prompt template, model parameters or semantic metadata model based on the feedback information.
[0007] Preferably, in step S1, the step of sampling and format parsing the data based on the data source identification information to form a standardized input data object includes: It can access structured data from relational databases and columnar databases, semi-structured data from JSON, XML, and logs, unstructured data from text and documents, and multimodal data from images, videos, and audio. Each data source is identified and registered at the data source level, and corresponding data source identification information is generated. The data source identification information includes at least the data source type, connection method, update time and access permissions. Based on the data source identification information, the data undergoes unified sampling, format parsing, and object encapsulation processing, converting data from different sources and with different structural forms into standardized input data objects.
[0008] Preferably, in step S2, the step of performing data feature parsing to obtain a multidimensional data feature description object includes: For structured and semi-structured data, we analyze tables, fields, hierarchical structure, data types, length, precision, primary and foreign key relationships, constraint information, field value distribution, and the proportion of null values to obtain structural features; For unstructured data, keywords, named entities, topic information, and semantic fragments are parsed to obtain content features; For multimodal data, the corresponding perceptual model is invoked to generate semantic labels and vectorized feature representations to obtain multimodal semantic features; The structural features, content features, and multimodal semantic features are uniformly encapsulated to form a multidimensional data feature description object.
[0009] Preferably, in step S3, the step of the large language model performing semantic interpretation on the data object includes: The multidimensional data feature description objects are organized based on a preset semantic understanding prompt template to obtain the input content of the large language model. The semantic understanding prompt template includes at least a data object description, a structural feature summary, a content feature summary, and semantic metadata model constraints. The input content to the large language model is input into the large language model, and the large language model generates field-level natural language semantic descriptions, which are used to represent the business meaning of the fields and the field descriptions. The large language model generates object-level natural language semantic descriptions to characterize the core business uses and business domain roles of data objects.
[0010] Preferably, in step S3, the step of inferring the semantic relationship between the data objects includes: Infer input by constructing relationships between at least two data objects; The relationship inference input is provided to the large language model, which then determines whether there is a business or semantic relationship between the at least two data objects. When determining whether there is a business relationship or semantic relationship, the large language model determines the relationship type and the basis for the relationship. The relationship type includes at least subordinate relationship, relationship relationship, reference relationship and derivation relationship. The basis for the relationship includes at least key fields, context fields or business semantic correspondence. Output the semantic relationship inference results between the data objects.
[0011] Preferably, in step S3, the steps of performing structured parsing on the output results and constraining the structured parsing results according to the semantic metadata model constraints include: The output of the large language model is subjected to structured parsing to obtain structured semantic results containing entity types, attributes, and relation types; The structured semantic results are constrained and validated according to the semantic metadata model constraints. Only entity types, attributes, and relationship types that conform to the constraints of the semantic metadata model are retained, while attributes and relationship types that do not conform to the constraints of the semantic metadata model are removed; The structured semantic results after constraint processing are used as input for generating semantic metadata instances.
[0012] Preferably, in step S4, the step of generating semantic metadata instances based on the structured parsing results after constraint processing includes: Metadata is generated based on the structured semantic results after constraint processing, which is used to describe field descriptions, data object descriptions, and structural relationships. Business metadata is generated based on the structured semantic results after constraint processing, which is used to describe business definitions, indicator meanings and business tags. Semantic relation metadata is generated based on the structured semantic results after constraint processing, which is used to describe the logical and semantic relationships between data objects; The technical metadata, business metadata, and semantic relationship metadata are uniformly encoded to form standardized semantic metadata instances.
[0013] Preferably, in step S5, the step of writing the verified semantic metadata instance into the metadata knowledge graph includes: Map semantic metadata instances to a unified semantic metadata model and align them with domain ontology or industry standard models; Perform consistency checks, integrity checks, and rationality checks on the mapped and aligned results; Generate correction suggestions or mark semantic metadata instances that fail verification as requiring manual review; Write the validated semantic metadata instances into the metadata knowledge graph.
[0014] Preferably, in step S6, the step of adjusting the semantic understanding prompt template, model parameters, or semantic metadata model based on the feedback information includes: The semantic metadata written into the metadata knowledge graph is invoked, and feedback information generated during the invocation process is collected. The feedback information includes at least semantic conflicts, usage frequency, and correction records. Based on the feedback information, the semantic understanding prompt template, model parameters, or semantic metadata model are adjusted; The adjusted semantic understanding prompt template, model parameters, or semantic metadata model will be used in the subsequent metadata discovery process.
[0015] Secondly, this invention provides the following technical solution: a metadata discovery intelligent agent system based on a large model for data weaving, comprising: The multi-source data access module is used to access multi-source heterogeneous data, identify and register the accessed data sources at the data source level, generate data source identification information, and perform preliminary sampling and format parsing of the data based on the data source identification information to form standardized input data objects. The data feature parsing module is used to perform data feature parsing on the standardized input data object to obtain a multi-dimensional data feature description object; The semantic understanding and reasoning module of the large language model is used to input the multi-dimensional data feature description object into the large language model based on the preset semantic understanding prompt template. The large language model performs semantic interpretation, business meaning induction and semantic relationship inference between data objects on the data object, and performs structured parsing on the output result. The structured parsing result is then constrained according to the semantic metadata model constraints. The semantic metadata generation module is used to generate semantic metadata instances based on the structured parsing results. The semantic metadata instances include at least technical metadata, business metadata, and semantic relationship metadata. The metadata modeling, verification, and storage module is used to map the semantic metadata instances to a unified semantic metadata model, perform consistency, completeness, and rationality verification, and write the verified semantic metadata instances into the metadata knowledge graph. The discovery result feedback and continuous optimization module calls the semantic metadata that has been stored in the database, collects feedback information during the calling process, and adjusts the semantic understanding prompt template, model parameters or semantic metadata model based on the feedback information.
[0016] In summary, the technical solutions conceived by this invention have the following beneficial effects compared with the prior art: 1. This invention unifies the access of multi-source heterogeneous data and constructs standardized input data objects and multi-dimensional data feature description objects, transforming structured data, semi-structured data, unstructured data and multimodal data into a unified data representation form, so that data from different sources and with different structures have a consistent input structure in subsequent processing, thereby realizing a unified processing foundation across data types.
[0017] 2. This invention introduces semantic understanding prompt templates and combines them with a large language model to perform semantic interpretation and semantic relationship inference on data objects. At the same time, it performs structured parsing on the output results and performs constraint processing according to the semantic metadata model, so that the generated semantic results are limited to the preset entity types, attributes and relationships, thereby making the semantic results structurally consistent and meeting the requirements of subsequent modeling.
[0018] 3. Based on the structured semantic results after constraint processing, this invention generates technical metadata, business metadata, and semantic relationship metadata, and performs unified encoding to form semantic metadata instances, so that the structural information, business semantic information, and relationship information between data objects are expressed in a unified data structure, thereby forming a data representation form that can be used for unified management.
[0019] 4. This invention performs unified semantic metadata model mapping, verification, and knowledge graph storage on semantic metadata instances, and generates feedback information based on the call results of the metadata application terminal, and adjusts the semantic understanding prompt template, model parameters, or semantic metadata model, so that semantic metadata can participate in subsequent processing and drive parameter updates after being stored in the database, thereby forming a processing flow that includes generation, storage, and adjustment. Attached Figure Description
[0020] To more clearly illustrate the technical solutions in the embodiments of this application, the accompanying drawings used in the embodiments will be briefly introduced below. Obviously, the drawings described below are only some embodiments of this application. For those skilled in the art, other drawings can be obtained based on these drawings without creative effort.
[0021] Figure 1 This is a flowchart of a method for constructing a metadata discovery agent based on a large model and oriented towards data weaving, as proposed in this invention. Figure 2 This is a flowchart illustrating the data access and feature parsing process of a metadata discovery agent construction method based on a large model for data weaving proposed in this invention. Figure 3 This is a system architecture diagram of a metadata discovery intelligent agent based on a large model and oriented towards data weaving, as proposed in this invention. Figure 4 This is a flowchart illustrating the semantic processing and data entry process of a metadata discovery intelligent agent system based on a large model and oriented towards data weaving, as proposed in this invention. Detailed Implementation
[0022] To make the objectives, technical solutions, and advantages of this invention clearer, the invention will be further described in detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative and not intended to limit the invention. Furthermore, the technical features involved in the various embodiments of this invention described below can be combined with each other as long as they do not conflict with each other.
[0023] Reference Figure 1 and Figure 2 In the first embodiment of the present invention, the present invention provides a method for constructing a metadata discovery agent based on a large language model for data weaving, comprising the following steps: S1. Access multi-source heterogeneous data, identify and register the accessed data sources at the data source level, generate data source identification information, and sample and parse the data based on the data source identification information to form standardized input data objects; S2. Perform data feature parsing on the standardized input data object to obtain a multi-dimensional data feature description object; S3. Based on the preset semantic understanding prompt template, the multi-dimensional data feature description object is input into the big language model. The big language model performs semantic interpretation, business meaning induction and semantic relationship inference between data objects on the data object, and performs structured parsing on the output result. The structured parsing result is then constrained according to the semantic metadata model constraints. S4. Generate a semantic metadata instance based on the structured parsing results after constraint processing. The semantic metadata instance includes at least technical metadata, business metadata, and semantic relationship metadata. S5. Map semantic metadata instances to a unified semantic metadata model, perform consistency checks, integrity checks, and rationality checks, and write the semantic metadata instances that pass the checks into the metadata knowledge graph. S6. Call the semantic metadata that has been written into the metadata knowledge graph, collect the feedback information generated during the call, and adjust the semantic understanding prompt template, model parameters or semantic metadata model based on the feedback information.
[0024] Specifically, in step S1, data from different data sources are uniformly accessed and processed. For each data source, identification and registration operations are performed to record information such as data source type, connection method, update time, and access permissions, forming corresponding data source identification information. Based on this, the accessed data is sampled according to preset rules, and format parsing operations are performed according to data type to uniformly convert structured data, semi-structured data, unstructured data, and multimodal data into standardized input data objects to ensure that the data input format is consistent in subsequent processing.
[0025] In step S2, data feature parsing processing is performed on the standardized input data object; for structured and semi-structured data, table structure, field information, hierarchical relationships, data types, constraint information, and statistical information are extracted; for unstructured data, keywords, named entities, topic information, and semantic fragments are extracted; for multimodal data, the corresponding perceptual model is called to obtain semantic labels and feature representations; and the above structural features, content features, and multimodal semantic features are uniformly organized and encapsulated to form a multidimensional data feature description object, which is used to characterize the comprehensive features of the data object.
[0026] In step S3, based on a preset semantic understanding prompt template, the multidimensional data feature description objects are organized into input data for the large language model. The semantic understanding prompt template includes data object descriptions, structural feature summaries, content feature summaries, and semantic metadata model constraints. The input data is provided to the large language model, which performs semantic interpretation on the data objects, generating field-level semantic descriptions and object-level semantic descriptions. Simultaneously, semantic relationship inference is performed on multiple data objects to obtain the relationship types and association criteria between the data objects. The output results of the large language model are subjected to structured parsing, converting the natural language results into structured semantic results. The structured semantic results are then validated according to the semantic metadata model constraints, retaining only the entity types, attributes, and relationship types that conform to the constraints, resulting in the constrained structured semantic results.
[0027] In step S4, a semantic metadata instance is generated based on the structured semantic results after constraint processing. Specifically, technical metadata is generated based on field-level semantic descriptions and structural features, business metadata is generated based on object-level semantic descriptions, and semantic relationship metadata is generated based on semantic relationship inference results. The technical metadata, business metadata, and semantic relationship metadata are uniformly encoded to form a standardized semantic metadata instance for subsequent modeling and storage.
[0028] In step S5, semantic metadata instances are mapped to a unified semantic metadata model, and the mapping results are aligned according to the domain ontology or industry standard model. Based on this, consistency verification, integrity verification, and rationality verification are performed respectively. For semantic metadata instances that do not meet the verification requirements, correction suggestions are generated or they are marked as pending manual review. For semantic metadata instances that pass the verification, they are written into the metadata knowledge graph, where the semantic metadata instances are graph nodes and the semantic relationship metadata are graph edges, to form a queryable structured metadata set.
[0029] In step S6, the semantic metadata written into the metadata knowledge graph is invoked. This can be achieved by providing the semantic metadata to the metadata application segment or application function unit for invocation processing, such as providing it to the metadata governance agent and service orchestration agent. Feedback information generated during the invocation process is recorded; this feedback information includes semantic conflict information, usage frequency information, and correction records. Based on the feedback information, the semantic understanding prompt template, model parameters, or semantic metadata model are updated, and the updated content is applied to subsequent data processing flows, thus forming a continuously updated processing procedure.
[0030] Furthermore, in step S1, the steps of sampling and format parsing the data based on the data source identification information to form a standardized input data object include: It can access structured data from relational databases and columnar databases, semi-structured data from JSON, XML, and logs, unstructured data from text and documents, and multimodal data from images, videos, and audio. Each data source is identified and registered at the data source level, and corresponding data source identification information is generated. The data source identification information includes at least the data source type, connection method, update time and access permissions. Based on the data source identification information, the system performs unified sampling, format parsing, and object encapsulation processing on the data, converting data from different sources and with different structures into standardized input data objects.
[0031] Specifically, for the set of data sources accessed. Perform data source identification operations one by one, for each data source Generate data source identification information ,in: ; in, Indicates the data source type, used to distinguish between structured data, semi-structured data, unstructured data, or multimodal data; Indicates the connection method, used to identify the data access protocol or interface type; Indicates the update time, used to record the time attribute of the data source; This indicates access permissions and is used to identify the data access control level. The above data source identification information serves as the basis for control during subsequent sampling and parsing processes.
[0032] During the data sampling process, based on the data source type Select the appropriate sampling strategy. For structured data sources, extract data records using table-level or field-level sampling; for semi-structured data sources, extract node data using structure path parsing; for unstructured data sources, extract semantic units using text segmentation; for multimodal data sources, extract data content using frame extraction or fragment truncation. The sampled data subset is denoted as: ;in, This represents the k-th sampled data unit from the j-th data source.
[0033] During the format parsing process, the sampled data Perform a structure transformation operation. For structured data, parse the table structure information, including field names, field types, and constraint information; for semi-structured data, parse the hierarchical structure and expand it into a set of key-value pairs; for unstructured data, extract the text content and perform basic cleaning; for multimodal data, extract file identifiers, time information, and basic attributes. Through the parsing operation, data of different formats is unified into an intermediate representation, denoted as: ; in, Represents the format parsing function, This represents the intermediate data after parsing.
[0034] During object encapsulation, the parsed intermediate representation data is... With the corresponding data source identification information Combine them to generate standardized input data objects. , is represented as: ;in, It represents a unique identifier for an object, generated by combining data source identification information with data content; Indicates the object type and the corresponding data source type. ; This represents the data content, corresponding to the intermediate data after parsing. ; It represents object attribute information, including source information, sampling information, and permission information.
[0035] The following method is used to generate a unique identifier for an object: ; in, Indicates the identifier generation function, This indicates a splicing operation. Indicates the data source type. This represents a summary of the data content. This represents time information. This method ensures that different data sources and different content correspond to unique object identifiers. Through the above sampling, parsing, and encapsulation processes, raw data from different data sources are uniformly transformed into a standardized set of input data objects with a consistent structure. ;in, This indicates the number of standardized input data objects.
[0036] Through the above implementation methods, data from different sources and with different structural forms are uniformly mapped into standardized input data objects with a unified structure, enabling subsequent data feature parsing steps to be performed on the basis of a unified data representation.
[0037] Furthermore, in step S2, the steps of performing data feature parsing to obtain multidimensional data feature description objects include: For structured and semi-structured data, we analyze tables, fields, hierarchical structure, data types, length, precision, primary and foreign key relationships, constraint information, field value distribution, and the proportion of null values to obtain structural features; For unstructured data, keywords, named entities, topic information, and semantic fragments are parsed to obtain content features; For multimodal data, the corresponding perceptual model is invoked to generate semantic labels and vectorized feature representations to obtain multimodal semantic features; Structural features, content features, and multimodal semantic features are uniformly encapsulated to form a multidimensional data feature description object.
[0038] Specifically, for the standardized input data object obtained in step S1 According to object type Choose the corresponding feature parsing method.
[0039] For structured and semi-structured data, a set of structural features is generated by parsing their structural information. The structural features include a set of fields, a set of field types, and a set of structural relationships. The distribution of field values is obtained through statistical calculations on the sampled data. The proportion of null values in a field is defined as follows: ; in, Indicates the proportion of null values in the field. Indicates the number of empty values in the field. This indicates the total number of samples for the field. The field value distribution is obtained by counting the occurrences of different values, and the result is used to characterize the field value features. For primary and foreign key relationships, a set of structural constraints is generated by parsing the reference relationships between fields; for hierarchical structures, a sequence of hierarchical paths is generated by parsing nested paths. The above structural information is uniformly represented as follows: ; in, Indicates name-based features. Represents type class characteristics, Represents constraint information, Indicates statistical characteristics, It represents hierarchical structure information.
[0040] For unstructured data, by analyzing the data content... Perform text processing to obtain a set of content features First, the text is segmented to obtain a set of semantic fragments. ,in, This indicates the number of semantic segments. Then, keywords and named entities are extracted from each semantic segment to form a keyword set. With entity set .
[0041] The semantic fragments are vectorized to obtain fragment vector representations. ;in, Indicates the first Vector representation of semantic segments This represents the text vectorization function. Based on the vector representations of all semantic segments, it calculates the overall semantic representation of the text: ;in, This represents the semantic vector of the text. The final result is a set of content features. For multimodal data, based on object type The corresponding perceptual model is invoked for feature extraction. Assume the multimodal data contains a set of modes. ;in, This indicates the number of modes. For each mode of data... Feature extraction is performed to obtain semantic vectors. ;in, This represents the eigenvector of the k-th mode. This represents the feature extraction function for the corresponding mode.
[0042] The feature vectors of each modality are fused to obtain multimodal semantic features. ;in, This represents the feature fusion function. Fusion methods include vector concatenation or weighted combination. After extracting structural features, content features, and multimodal semantic features, the three types of features are uniformly encapsulated to form a multidimensional data feature description object: ; in, Represents a multidimensional data feature description object. This represents the feature encapsulation function. This encapsulation method represents object attribute information, enabling data from different sources to have a unified representation structure in the feature space.
[0043] Through the above implementation methods, data objects from different data types are converted into multi-dimensional data feature description objects with a unified structure, enabling subsequent semantic understanding and reasoning steps of large language models to be processed based on a unified input.
[0044] Furthermore, in step S3, the steps by which the large language model performs semantic interpretation on the data object include: Based on a preset semantic understanding prompt template, the multidimensional data feature description objects are organized to obtain the input content of the large language model. The semantic understanding prompt template includes at least a data object description, a structural feature summary, a content feature summary, and semantic metadata model constraints. Input the content into the large language model, and the large language model will generate field-level natural language semantic descriptions to represent the business meaning of the fields and field descriptions. The large language model generates object-level natural language semantic descriptions to represent the core business uses and business domain roles of data objects.
[0045] In step S3, the steps for inferring the semantic relationships between data objects include: Infer input by constructing relationships between at least two data objects; The relation inference input is provided to the large language model, which then determines whether there is a business or semantic relationship between at least two data objects. When determining whether there is a business relationship or semantic relationship, the large language model determines the relationship type and the basis for the relationship. The relationship type includes at least subordinate relationship, relationship relationship, reference relationship and derivation relationship. The basis for the relationship includes at least key fields, context fields or business semantic correspondence. Output the semantic relationship inference results between data objects.
[0046] Step S3, which involves performing structured parsing on the output and constraining the structured parsing results according to the semantic metadata model, includes the following steps: The output of the large language model is subjected to structured parsing to obtain structured semantic results containing entity types, attributes, and relation types; The structured semantic results are constrained and validated according to the semantic metadata model constraints. Only entity types, attributes, and relationship types that conform to the semantic metadata model constraints are retained, while attributes and relationship types that do not conform to the semantic metadata model constraints are removed; The structured semantic results after constraint processing are used as input for generating semantic metadata instances.
[0047] Specifically, for the multidimensional data feature description object obtained in step S2 First, construct a semantic understanding prompt template input. The input template is composed of a data object description, a structural feature summary, a content feature summary, and semantic metadata model constraints, and is represented as follows: ; in, This indicates a prompt for the template constructor. Represents semantic metadata model constraints. This represents the input content of the large language model.
[0048] Enter content Input the large language model and obtain the semantic output. , is represented as: ;in, This represents the large language model processing function. This represents the semantic description results output by the model. The output results include field-level semantic descriptions and object-level semantic descriptions. Field-level semantic descriptions are used to characterize the business meaning, semantic type, and field description of a field, while object-level semantic descriptions are used to characterize the business purpose and business role of a data object.
[0049] For inferring semantic relationships between data objects, construct object pairs from the input. , is represented as: ;in, This represents the input constructor for relational inference. , This represents the feature descriptions of two data objects to be analyzed. The input... Inputting into a large language model yields relation inference results. , is represented as: ;in, This indicates the semantic relationship inference result between two data objects. The relationship inference result includes a relationship existence identifier, a relationship type, and a relationship basis, where the relationship type is selected from a pre-defined set of relationship types.
[0050] Output of large language model Relationship inference results Perform structured parsing to obtain structured semantic results. , is represented as: ;in Represents a structured analytic function. It represents a structured semantic result that includes entity type, attribute type, and relation type.
[0051] Perform semantic metadata model constraint processing on the structured semantic results to obtain the constraint-processed results. , is represented as: ;in, This represents the constraint handling function. This represents the semantic metadata model. Constraint processing includes matching and validating entity types, validating attribute sets, and restricting relation types, retaining only content that satisfies the semantic metadata model constraints.
[0052] During constraint processing, a set of legal entity types is defined. legal attribute set and the set of legal relationships The structured semantic results are filtered, and the filtering conditions are expressed as follows: ; in, Represents elements in the structured semantic result. Indicates entity type, Represents a collection of attributes. Represents a set of relations.
[0053] Through the above semantic interpretation, relation inference, structured parsing and constraint processing, the original data feature representation is transformed into a structured semantic result that meets the constraints of the semantic metadata model, providing an input basis for the subsequent generation of semantic metadata instances.
[0054] Furthermore, in step S4, the step of generating semantic metadata instances based on the structured parsing results after constraint processing includes: Metadata is generated based on the structured semantic results after constraint processing, which is used to describe field descriptions, data object descriptions, and structural relationships. Business metadata is generated based on the structured semantic results after constraint processing, which is used to describe business definitions, indicator meanings and business tags. Semantic relation metadata is generated based on the structured semantic results after constraint processing, which is used to describe the logical and semantic relationships between data objects; Technical metadata, business metadata, and semantic relationship metadata are uniformly encoded to form standardized semantic metadata instances.
[0055] Specifically, the structured semantic results obtained after constraint processing in step S3 The system parses data according to entity type, attribute information, and relationship information, generating corresponding category metadata. During the technical metadata generation process, information describing the data object structure is extracted from the structured semantic results, including field names, field types, field descriptions, and structural relationships between fields. The technical metadata is represented as follows: ; in, Indicates technical metadata identifier, Indicates the type of technical metadata. This represents a set of technical metadata attributes, which includes at least field descriptions, data object descriptions, and structural relationship information.
[0056] During the business metadata generation process, information describing business semantics is extracted from the structured semantic results, including business definitions, indicator meanings, and business tags. Business metadata is represented as follows: ; in, Indicates the business metadata identifier, Indicates the type of business metadata. This represents a set of business metadata attributes, including business semantic descriptions and business tag information. During the generation of semantic relation metadata, information about relationships between entities is extracted from the structured semantic results to form relation metadata. Semantic relation metadata is represented as follows: ;in, and These represent the identifiers of two semantic metadata instances. This indicates the type of relationship between the two, which is derived from the set of relationship types defined in the semantic metadata model constraints.
[0057] After generating technical metadata, business metadata, and semantic relation metadata, these three types of metadata undergo unified encoding. They are then uniformly represented as semantic metadata instances. ; in, Indicates the semantic metadata instance identifier, Indicates entity type, Represents a collection of attributes. Represents a set of relations.
[0058] The attribute set is represented as: ;in, Indicates the attribute name. Indicates the attribute value. Indicates the number of attributes. A relation set is represented as: ; in, Indicates the relation type, Indicates the identifier of the associated object, Indicates the number of relations.
[0059] During the unified encoding process, metadata from different sources undergoes field standardization, including standardized naming of attribute names, standardized mapping of relation types, and unified encoding of entity identifiers. Through this encoding process, metadata of different categories is integrated into a unified set of semantic metadata instances. ;in, Indicates the number of semantic metadata instances.
[0060] Through the above implementation methods, the structured semantic results after constraint processing are converted into semantic metadata instances with a unified structure, enabling semantic information of data from different sources and of different types to be represented with a consistent data structure, thereby providing a unified input for subsequent metadata modeling, verification and storage processing.
[0061] Furthermore, in step S5, the step of writing the validated semantic metadata instance into the metadata knowledge graph includes: Map semantic metadata instances to a unified semantic metadata model and align them with domain ontology or industry standard models; Perform consistency checks, integrity checks, and rationality checks on the mapped and aligned results; Generate correction suggestions or mark semantic metadata instances that fail verification as requiring manual review; Write the validated semantic metadata instances into the metadata knowledge graph.
[0062] Specifically, for the set of semantic metadata instances generated in step S4 First, the semantic metadata model mapping operation is performed. Let the unified semantic metadata model be... Define the mapping function as: ;in, Represents a mapping function. Represents the original semantic metadata instance, This represents the mapped semantic metadata instance. The mapping process includes matching entity types, standardizing attribute fields, and normalizing relation types.
[0063] After completing the unified semantic metadata model mapping, the mapping results are aligned with the domain ontology or industry standard model. Let the domain ontology or industry standard model be... The alignment function is defined as follows: ; in, This represents the alignment function. This represents the aligned semantic metadata instance. The alignment process includes matching entity concepts, unifying attribute semantics, and normalizing relational semantics. After mapping and alignment are completed, a validation operation is performed on the semantic metadata instance. The validation function is defined as follows: ;in, Represents the verification function. ,when This indicates that the verification has passed, when This indicates that the validation failed.
[0064] Validation function This includes consistency checks, integrity checks, and reasonableness checks. For consistency checks, the matching relationship between entity types and attributes is checked; for integrity checks, whether required attributes are missing; and for reasonableness checks, whether attribute values meet preset rules. The check results can be expressed as: ;in, This indicates the consistency check result. Indicates the integrity verification result. Indicates the result of the rationality check, symbol Represents the logical AND operation.
[0065] For semantic metadata instances that fail validation, a correction suggestion is generated, which is expressed as follows: ;in, This indicates the function that generates the suggested corrections. This indicates the suggested corrections. If the suggested corrections cannot be applied automatically, the semantic metadata instance will be marked as awaiting manual review.
[0066] For semantic metadata instances that pass verification, they are written into the metadata knowledge graph. The metadata knowledge graph is represented as follows: in, Represents a set of nodes. Let represent the set of edges. The update rule for the set of nodes is expressed as: ;in, This represents the semantic metadata instance identifier. The update rule for the edge set is expressed as: ;in, Indicates the relation type, This represents the identifier of the associated semantic metadata instance. During the writing process, the uniqueness of nodes and relationships is verified to avoid duplicate writing, and timestamp information is recorded to identify the time of entry into the database.
[0067] Through the above implementation method, the semantic metadata instance completes the process from model mapping, semantic alignment, quality verification to knowledge graph storage, so that the semantic metadata is stored in the metadata knowledge graph in a unified structure, thereby supporting subsequent query and association processing.
[0068] Furthermore, in step S6, the step of adjusting the semantic understanding prompt template, model parameters, or semantic metadata model based on feedback information includes: Invoke semantic metadata written into the metadata knowledge graph; for example, it could be provided to a metadata governance agent or a service orchestration agent for invocation. Collect feedback information generated during the invocation process. The feedback information should include at least semantic conflicts, usage frequency, and correction records. Adjust the semantic understanding prompt template, model parameters, or semantic metadata model based on feedback information; The adjusted semantic understanding prompt template, model parameters, or semantic metadata model will be used in the subsequent metadata discovery process.
[0069] Specifically, regarding the metadata knowledge graph obtained in step S5 The node set is obtained through an interface call. Sum of edges This information is provided to metadata applications, such as metadata governance agents and service orchestration agents, or other metadata applications. The invocation process includes querying semantic metadata instances, retrieving relational paths, and updating metadata. During the invocation process, the invocation actions and their results are recorded, forming a set of feedback information. ,in, This represents the k-th feedback message. This indicates the number of feedback messages. Each feedback message is represented as: ; in, This indicates the semantic metadata instance identifier corresponding to the feedback. Indicates the feedback type. This indicates the content of the feedback. Indicates the feedback time.
[0070] Feedback type Values from a set ,in Indicates semantic conflict, This indicates that frequency recording is used. This indicates a record correction.
[0071] For feedback messages of semantic conflict type, inconsistencies are identified by comparing the attributes or relationships in the semantic metadata instance; for feedback messages of usage frequency type, the number of times the semantic metadata instance is accessed during the call process is recorded; for feedback messages of correction record type, they are generated by recording the modification operations on the semantic metadata instance.
[0072] Based on the feedback information set, the semantic understanding prompt template, model parameters, and semantic metadata model are adjusted. Let the semantic understanding prompt template be... The model parameters are The semantic metadata model is The adjustment process is represented as follows: ; in, This represents the adjustment function. This represents the adjusted semantic understanding prompt template. This represents the adjusted model parameters. This represents the adjusted semantic metadata model.
[0073] During the prompt template adjustment process, the constraints in the prompt template are updated based on semantic conflicts and correction records in the feedback information to correct deviations in the semantic generation process. During model parameter adjustment, parameter configurations are updated based on usage frequency information to adapt to the processing needs of different types of data objects. During semantic metadata model adjustment, entity types, attribute sets, and relationship types are expanded or constrained based on feedback information. After the adjustments are completed, the updated semantic understanding prompt template will be... Model parameters and semantic metadata model This is applied to the subsequent metadata discovery process, so that the processing in steps S3 to S5 is performed based on the updated configuration.
[0074] Through the above implementation methods, the results of using the metadata knowledge graph are transformed into feedback information that can be used to update the semantic understanding and modeling process, thereby enabling continuous adjustment and updating of the semantic metadata generation process and ensuring that the subsequent processing process remains consistent with the actual calling requirements.
[0075] Example 2: Refer to Figure 3 and Figure 4In a second embodiment of the present invention, the present invention provides a metadata discovery intelligent agent system based on a large language model for data weaving, comprising: The multi-source data access module is used to access multi-source heterogeneous data, identify and register the accessed data sources at the data source level, generate data source identification information, and perform preliminary sampling and format parsing of the data based on the data source identification information to form standardized input data objects. The data feature parsing module is used to perform data feature parsing on standardized input data objects to obtain multidimensional data feature description objects; The semantic understanding and reasoning module of the large language model is used to input multi-dimensional data feature description objects into the large language model based on the preset semantic understanding prompt template. The large language model performs semantic interpretation, business meaning induction and semantic relationship inference between data objects, and performs structured parsing on the output results. The structured parsing results are then constrained according to the semantic metadata model constraints. The semantic metadata generation module is used to generate semantic metadata instances based on the structured parsing results. The semantic metadata instances include at least technical metadata, business metadata, and semantic relationship metadata. The metadata modeling, validation, and storage module is used to map semantic metadata instances to a unified semantic metadata model, perform consistency, integrity, and rationality checks, and write the validated semantic metadata instances into the metadata knowledge graph. The Discovery Results Feedback and Continuous Optimization module is used to call the semantic metadata that has been stored in the database, collect feedback information during the calling process, and adjust the semantic understanding prompt template, model parameters, or semantic metadata model based on the feedback information.
[0076] Specifically, the multi-source data access module includes a data source access unit, a data source identification unit, and a data standardization processing unit. The data source access unit is used to access different types of data sources through database connection interfaces, file interfaces, or multimedia data interfaces; the data source identification unit is used to identify the type of the accessed data source and generate data source identification information; the data standardization processing unit is used to perform sampling, format parsing, and object encapsulation processing based on the data source identification information, converting the raw data into standardized input data objects.
[0077] The data feature parsing module includes a structural feature parsing unit, a content feature parsing unit, a multimodal feature parsing unit, and a feature encapsulation unit. The structural feature parsing unit parses table structures, field information, and constraints; the content feature parsing unit extracts keywords, named entities, and topic information from text; the multimodal feature parsing unit extracts features from image, video, or audio data; and the feature encapsulation unit unifies the structural, content, and multimodal features to form a multidimensional data feature description object.
[0078] The semantic understanding and reasoning module of the large language model includes a prompt template construction unit, a semantic interpretation unit, a relation inference unit, a structured parsing unit, and a constraint processing unit. The prompt template construction unit constructs semantic understanding prompt template inputs based on multi-dimensional data feature descriptions of objects; the semantic interpretation unit generates field-level and object-level semantic descriptions; the relation inference unit determines the relationships between multiple data objects and outputs the relation type and association criteria; the structured parsing unit converts natural language output into structured semantic results; and the constraint processing unit performs constraint verification on the structured semantic results according to the semantic metadata model and filters entity types, attributes, and relation types that meet the constraint conditions.
[0079] The semantic metadata generation module includes a technical metadata generation unit, a business metadata generation unit, a semantic relationship metadata generation unit, and a unified encoding unit. The technical metadata generation unit generates field descriptions and structural relationship information; the business metadata generation unit generates business semantic descriptions and business tags; the semantic relationship metadata generation unit generates records of relationships between data objects; and the unified encoding unit encodes the above metadata in a unified format to form standardized semantic metadata instances.
[0080] The metadata modeling, validation, and storage module includes a model mapping unit, a model alignment unit, a validation unit, a correction unit, and a graph writing unit. The model mapping unit maps semantic metadata instances to a unified semantic metadata model; the model alignment unit aligns the mapping results with the domain ontology or industry standard models; the validation unit performs consistency, completeness, and rationality checks; the correction unit generates correction suggestions or marks semantic metadata instances that fail validation as requiring manual review; and the graph writing unit writes validated semantic metadata instances into the metadata knowledge graph.
[0081] The discovery result feedback and continuous optimization module includes an interface call unit, a feedback collection unit, and an adjustment unit. The interface call unit provides a semantic metadata access interface to metadata application clients, such as metadata governance agents or service orchestration agents; the feedback collection unit collects semantic conflicts, usage frequency, and correction records during the call process; and the adjustment unit updates the semantic understanding prompt template, model parameters, and semantic metadata model based on the feedback information, and applies the updated results to subsequent processing flows.
[0082] The modules are connected through data interfaces. The output of the multi-source data access module serves as the input of the data feature parsing module. The output of the data feature parsing module serves as the input of the large language model semantic understanding and reasoning module. The output of the large language model semantic understanding and reasoning module serves as the input of the semantic metadata generation module. The output of the semantic metadata generation module serves as the input of the metadata modeling, verification and storage module. The output of the metadata modeling, verification and storage module serves as the input of the discovery result feedback and continuous optimization module, thus forming a complete data processing flow.
[0083] The above system architecture enables a continuous process of accessing, parsing, semantic processing, metadata generation, quality verification, and feedback updates of multi-source heterogeneous data.
[0084] Example 3: In the third embodiment of the present invention, the present invention provides a specific application implementation method for data weaving scenarios, taking customer management system, order management system and log and multimedia data in enterprise data environment as objects, and specifically describes the metadata discovery process.
[0085] Specifically, in this embodiment, the data sources include order and customer tables in a relational database, access log files in a log system, text data in a document system, and image data in a storage system. A multi-source data access module provides unified access to these data sources, identifies and registers each data source, and generates corresponding data source identification information. Subsequently, based on the data source identification information, field data in the order and customer tables is sampled, text fragments are extracted from the log files, and basic information is parsed from the image data to form standardized input data objects.
[0086] During the data feature parsing process, structural feature parsing is performed on the order table and customer table to obtain field names, field types, primary and foreign key relationships, and field statistical features; text parsing is performed on log data to extract keywords, named entities, and semantic fragments; and label information and feature representations are extracted from image data. These features are then uniformly encapsulated to form a multi-dimensional data feature description object, which serves as input for subsequent processing.
[0087] In the semantic understanding and reasoning process of the large language model, semantic understanding prompt templates are constructed for the field information in the order table, including field names, data types, sample value ranges, and contextual field information. This input is provided to the large language model to generate field-level semantic descriptions, which are used to determine the business meaning and field descriptions of the fields. For the overall information in the customer table, object-level prompt inputs are constructed to generate descriptions of the business uses of data objects and their roles in the business domain.
[0088] Simultaneously, the order table and customer table are selected as object pairs to construct relationship inference inputs. A large language model is provided to determine whether a relationship exists between the two data objects and to identify the relationship type and associated fields. After obtaining the semantic interpretation and relationship inference results, the output results are subjected to structured parsing to obtain structured semantic results containing entity types, attribute information, and relationship information. Subsequently, based on the constraints of the semantic metadata model, the entity types, attributes, and relationship types in the structured semantic results are validated, retaining only the content that meets the constraints.
[0089] During the semantic metadata generation process, technical metadata is generated based on the structured semantic results after constraint processing. This technical metadata describes the structural information of the order and customer fields. Business metadata is generated to describe the business meanings such as order amount and customer level. Semantic relationship metadata is generated to describe the association between order data and customer data. All of the above metadata undergoes unified encoding processing to form standardized semantic metadata instances.
[0090] During metadata modeling, validation, and storage, semantic metadata instances are mapped to a unified semantic metadata model and aligned with a pre-defined business domain model. Consistency, integrity, and rationality checks are performed on the mapping results. Metadata that fails the checks is given correction suggestions or marked for manual review. Semantic metadata that passes the checks is written into a metadata knowledge graph, forming a node and relationship structure.
[0091] During the feedback and continuous optimization process, the metadata governance agent queries and manages data objects based on the metadata knowledge graph, while the service orchestration agent executes data calls and combinations based on semantic relationships. Semantic conflict information, call frequency, and correction records are recorded during the call process, forming feedback information. Based on this feedback information, the constraints in the semantic understanding prompt template are updated, model parameters are adjusted, and the semantic metadata model is expanded or its constraints are updated, ensuring that subsequent processing is executed based on the updated configuration.
[0092] Through the above implementation methods, unified processing of structured data, unstructured data, and multimodal data is achieved in a multi-source data environment, and a semantic metadata representation that can be used for data governance and service orchestration is formed, enabling metadata to be managed and invoked in a unified structure.
[0093] This application also provides a computer-readable storage medium having a computer program stored thereon, which, when executed by a processor, implements the steps of the above-described method. The computer-readable storage medium may include, but is not limited to, any type of disk, including floppy disks, optical disks, DVDs, CD-ROMs, microdrives, as well as magneto-optical disks, ROMs, RAMs, EPROMs, EEPROMs, DRAMs, VRAMs, flash memory devices, magnetic cards or optical cards, nanosystems (including molecular memory ICs), or any type of medium or device suitable for storing instructions and / or data.
[0094] It should be noted that, for the sake of simplicity, the foregoing method embodiments are all described as a series of actions. However, those skilled in the art should understand that this application is not limited to the described order of actions, as some steps may be performed in other orders or simultaneously according to this application. Furthermore, those skilled in the art should also understand that the embodiments described in the specification are preferred embodiments, and the actions and modules involved are not necessarily essential to this application.
[0095] In the above embodiments, the descriptions of each embodiment have different focuses. For parts not described in detail in a certain embodiment, please refer to the relevant descriptions in other embodiments.
[0096] In the several embodiments provided in this application, it should be understood that the disclosed apparatus can be implemented in other ways. For example, the apparatus embodiments described above are merely illustrative; for instance, the division of units is only a logical functional division, and in actual implementation, there may be other division methods. For example, multiple units or components may be combined or integrated into another system, or some features may be ignored or not executed. Furthermore, the coupling or direct coupling or communication connection shown or discussed may be through some service interface; the indirect coupling or communication connection between devices or units may be electrical or other forms.
[0097] The units described as separate components may or may not be physically separate. The components shown as units may or may not be physical units; that is, they may be located in one place or distributed across multiple network units. Some or all of the units can be selected to achieve the purpose of this embodiment according to actual needs.
[0098] Furthermore, the functional units in the various embodiments of this application can be integrated into one processing unit, or each unit can exist physically separately, or two or more units can be integrated into one unit. The integrated unit can be implemented in hardware or as a software functional unit.
[0099] If the integrated unit is implemented as a software functional unit and sold or used as an independent product, it can be stored in a computer-readable storage device (CMD). Based on this understanding, the technical solution of this application, in essence, or the part that contributes to the prior art, or all or part of the technical solution, can be embodied in the form of a software product. This computer software product is stored in a memory and includes several instructions to cause a computer device (which may be a personal computer, server, or network device, etc.) to execute all or part of the steps of the methods described in the various embodiments of this application. The aforementioned memory includes various media capable of storing program code, such as USB flash drives, read-only memory (ROM), random access memory (RAM), portable hard drives, magnetic disks, or optical disks.
[0100] Those skilled in the art will understand that all or part of the steps in the various methods of the above embodiments can be implemented by a program instructing related hardware. The program can be stored in a computer-readable storage device, which may include: a flash drive, a read-only memory (ROM), a random access memory (RAM), a magnetic disk, or an optical disk, etc.
[0101] The foregoing description is merely an exemplary embodiment of this disclosure and should not be construed as limiting the scope of this disclosure. Any equivalent changes and modifications made in accordance with the teachings of this disclosure shall still fall within the scope of this disclosure. Those skilled in the art will readily conceive of embodiments of this disclosure upon considering the specification and practicing the disclosure herein. This application is intended to cover any variations, uses, or adaptations of this disclosure that follow the general principles of this disclosure and include common knowledge or customary techniques in the art not described herein. The specification and embodiments are to be considered exemplary only, and the scope and spirit of this disclosure are defined by the claims.
[0102] The technical features of the above embodiments can be combined in any way. For the sake of brevity, not all possible combinations of the technical features in the above embodiments are described. However, as long as there is no contradiction in the combination of these technical features, they should be considered to be within the scope of this specification.
[0103] Those skilled in the art will readily understand that the above description is merely a preferred embodiment of the present invention and is not intended to limit the present invention. Any modifications, equivalent substitutions, and improvements made within the spirit and principles of the present invention should be included within the scope of protection of the present invention.
Claims
1. A method for constructing a metadata discovery agent based on a large model, oriented towards data weaving, characterized in that, Includes the following steps: S1. Access multi-source heterogeneous data, identify and register the accessed data sources, generate data source identification information, and sample and parse the data based on the data source identification information to form standardized input data objects; S2. Perform data feature parsing on the standardized input data object to obtain a multi-dimensional data feature description object; S3. Based on the preset semantic understanding prompt template, the multidimensional data feature description object is input into the large language model. The large model performs semantic interpretation, business meaning induction and semantic relationship inference between data objects on the data object, and performs structured parsing on the output result. The structured parsing result is then constrained according to the semantic metadata model constraints. S4. Generate a semantic metadata instance based on the structured parsing result after constraint processing. The semantic metadata instance includes at least technical metadata, business metadata, and semantic relationship metadata. S5. Map the semantic metadata instance to the unified semantic metadata model, perform consistency verification, integrity verification and rationality verification, and write the semantic metadata instance that passes the verification into the metadata knowledge graph; S6. Call the semantic metadata that has been written into the metadata knowledge graph, collect the feedback information generated during the call, and adjust the semantic understanding prompt template, model parameters or semantic metadata model based on the feedback information.
2. The method for constructing a metadata discovery agent based on a large model and oriented towards data weaving, as described in claim 1, is characterized in that... The steps of sampling and parsing data based on the data source identification information to form a standardized input data object specifically include: S101 can access structured data from relational and columnar databases, semi-structured data from JSON, XML, and logs, unstructured data from text and documents, and multimodal data from images, videos, and audio. S102 performs data source level identification and registration for each data source, and generates corresponding data source identification information. The data source identification information includes at least the data source type, connection method, update time and access permissions. S103 performs unified sampling, format parsing, and object encapsulation processing on the data based on the data source identification information, converting data from different sources and with different structural forms into standardized input data objects.
3. The method for constructing a metadata discovery agent based on a large model and oriented towards data weaving, as described in claim 1, is characterized in that... The specific steps of performing data feature parsing to obtain a multidimensional data feature description object include: S201 analyzes tables, fields, hierarchical structure, data types, length, precision, primary and foreign key relationships, constraint information, field value distribution, and null value ratio for structured and semi-structured data to obtain structural features. S202 analyzes unstructured data by parsing keywords, named entities, topic information, and semantic fragments to obtain content features; S203, for multimodal data, calls the corresponding perceptual model to generate semantic labels and vectorized feature representations, thus obtaining multimodal semantic features; S204 encapsulates the structural features, content features, and multimodal semantic features in a unified manner to form a multidimensional data feature description object.
4. The method for constructing a metadata discovery agent based on a large model and oriented towards data weaving, as described in claim 1, is characterized in that... The steps by which the large model performs semantic interpretation on data objects specifically include: S301 organizes the multidimensional data feature description object based on the preset semantic understanding prompt template to obtain the input content of the large language model, wherein the semantic understanding prompt template includes at least a data object description, a structural feature summary, a content feature summary, and semantic metadata model constraints. S302 Input the input content of the large language model into the large model, and the large model generates field-level natural language semantic descriptions to represent the business meaning of the fields and field descriptions; S303 generates object-level natural language semantic descriptions from the large model to characterize the core business uses and business domain roles of data objects.
5. The method for constructing a metadata discovery agent based on a large model and oriented towards data weaving, as described in claim 1, is characterized in that... The steps for inferring the semantic relationships between the data objects include: S311 constructs a relation inference input for at least two data objects; S312 provides the relationship inference input to the large model, which determines whether there is a business or semantic relationship between the at least two data objects; S313 When determining whether there is a business relationship or semantic relationship, the large model determines the relationship type and the relationship basis. The relationship type includes at least subordinate relationship, relationship relationship, reference relationship and derivation relationship. The relationship basis includes at least key fields, context fields or business semantic correspondence. S314 outputs the semantic relationship inference results between the data objects.
6. The method for constructing a metadata discovery agent based on a large model and oriented towards data weaving, as described in claim 1, is characterized in that... Step S3, which involves performing structured parsing on the output results and constraining the structured parsing results according to the semantic metadata model, specifically includes: S321 performs structured parsing on the output of the large model to obtain structured semantic results containing entity type, attribute, and relation type; S322 performs constraint verification on the structured semantic results according to the semantic metadata model constraints; S323 retains only entity types, attributes, and relationship types that conform to the constraints of the semantic metadata model, and removes attributes and relationship types that do not conform to the constraints of the semantic metadata model; S324 uses the structured semantic results after constraint processing as input for generating semantic metadata instances.
7. The method for constructing a metadata discovery agent based on a large model and oriented towards data weaving, as described in claim 1, is characterized in that... The steps for generating semantic metadata instances based on the structured parsing results after constraint processing specifically include: S401 generates metadata based on the structured semantic results after constraint processing, which is used to describe field descriptions, data object descriptions, and structural relationships. S402 generates business metadata based on the structured semantic results after constraint processing, which is used to describe business definitions, indicator meanings, and business tags. S403 generates semantic relation metadata based on the structured semantic results after constraint processing, which is used to describe the logical and semantic relationships between data objects; S404 performs unified encoding on the technical metadata, the business metadata, and the semantic relationship metadata to form a standardized semantic metadata instance.
8. The method for constructing a metadata discovery agent based on a large model and oriented towards data weaving, as described in claim 1, is characterized in that... The step of writing the validated semantic metadata instance into the metadata knowledge graph specifically includes: S501 maps semantic metadata instances to a unified semantic metadata model and aligns them with domain ontology or industry standard models. S502 performs consistency checks, integrity checks, and reasonableness checks on the mapped and aligned results; S503 generates correction suggestions or marks semantic metadata instances that fail verification as requiring manual review. S504 writes the validated semantic metadata instance into the metadata knowledge graph.
9. The method for constructing a metadata discovery agent based on a large model and oriented towards data weaving, as described in claim 1, is characterized in that... In step S6, the step of adjusting the semantic understanding prompt template, model parameters, or semantic metadata model based on the feedback information includes: S601 calls the writing of semantic metadata into the metadata knowledge graph; S602 collects feedback information generated during the invocation process, the feedback information including at least semantic conflicts, usage frequency, and correction records; S603 adjusts the semantic understanding prompt template, model parameters, or semantic metadata model based on the feedback information; S604 will use the adjusted semantic understanding prompt template, model parameters, or semantic metadata model for subsequent metadata discovery processing.
10. A metadata discovery intelligent agent system based on a large model for data weaving, characterized in that, include: The multi-source data access module is used to access multi-source heterogeneous data, identify and register the accessed data sources at the data source level, generate data source identification information, and perform preliminary sampling and format parsing of the data based on the data source identification information to form standardized input data objects. The data feature parsing module is used to perform data feature parsing on the standardized input data object to obtain a multi-dimensional data feature description object; The semantic understanding and reasoning module of the large language model is used to input the multi-dimensional data feature description object into the large language model based on the preset semantic understanding prompt template. The large language model performs semantic interpretation, business meaning induction and semantic relationship inference between data objects on the data object, and performs structured parsing on the output result. The structured parsing result is then constrained according to the semantic metadata model constraints. The semantic metadata generation module is used to generate semantic metadata instances based on the structured parsing results. The semantic metadata instances include at least technical metadata, business metadata, and semantic relationship metadata. The metadata modeling, verification, and storage module is used to map the semantic metadata instances to a unified semantic metadata model, perform consistency, completeness, and rationality verification, and write the verified semantic metadata instances into the metadata knowledge graph. as well as The discovery result feedback and continuous optimization module is used to call the semantic metadata that has been stored in the database, collect feedback information during the calling process, and adjust the semantic understanding prompt template, model parameters or semantic metadata model based on the feedback information.