A data product value distribution method based on blood analysis

By using fine-grained data lineage analysis and contribution quantification algorithms based on large language models, the problems of subjectivity and automated recommendation in data product value allocation have been solved, achieving transparent and auditable value allocation, reducing transaction friction costs, and promoting the healthy development of the data ecosystem.

CN122240601APending Publication Date: 2026-06-19NUCLEAR POWER OPERATIONS RES INST (NPRI)

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
NUCLEAR POWER OPERATIONS RES INST (NPRI)
Filing Date
2026-05-20
Publication Date
2026-06-19

AI Technical Summary

Technical Problem

Existing technologies rely on human experience in the allocation of data product value, which is highly subjective, inefficient, and lacks a deep lineage analysis to support value judgments. They also lack automated recommendation mechanisms, resulting in allocation schemes that are unconvincing and costly, and they cannot effectively quantify the contribution of data processing.

Method used

By employing fine-grained data lineage analysis based on a large language model, and through interactive data product visualization assembly and configuration, combined with contribution quantification algorithms, we automatically recommend data value allocation ratios, quantify the technical contributions in the data processing process, and generate transparent and auditable value allocation schemes.

Benefits of technology

It has enabled objective, transparent, and auditable calculation of the value distribution of data products, reduced the friction costs of circulation and transactions, stimulated the enthusiasm for data element supply and processing, and promoted the formation of a healthy data ecosystem.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN122240601A_ABST
    Figure CN122240601A_ABST
Patent Text Reader

Abstract

This application belongs to the field of data product value allocation technology, specifically relating to a data product value allocation method based on lineage analysis. This application addresses the core shortcomings of existing technologies in handling data product value allocation, such as reliance on manual processes, superficial lineage analysis, neglect of processed value, and lack of automated recommendation mechanisms, by introducing three core technical means: fine-grained data lineage analysis based on a large language model; interactive and guided data product visualization assembly and configuration; and automatic recommendation of data value allocation ratios based on contribution metrics.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This application belongs to the field of data product value allocation technology, specifically involving a data product value allocation method based on lineage analysis. Background Technology

[0002] With the deepening development of the big data era, data has been established as a key production factor. Various organizations, especially large enterprises and public institutions, have aggregated massive and multi-source data assets through technical architectures such as data platforms and data lakes. This raw data undergoes a complex extraction, transformation, and loading (ETL) process, as well as a series of data cleaning, integration, processing, and modeling processes, ultimately forming data products that can be analyzed, applied, and provided to external users.

[0003] The realization of the value of data products relies on a high-quality data supply and complex processing. However, a data product is often not composed of a single data source; it may involve multiple different data sources, data processors, and business entities. For example, a precision marketing data product may derive its data from internal CRM systems, external third-party data providers, and publicly available social media data. This data undergoes multiple rounds of mixing, calculation, and derivation to form the final product. Currently, identification and judgment are mainly performed manually by personnel familiar with the system's background, making it difficult to guarantee its rationality, efficiency, and objectivity.

[0004] A clear value distribution mechanism is a crucial prerequisite for incentivizing data supply, promoting data circulation and trading, and ultimately building a healthy data ecosystem. However, currently, there is a lack of mature and automated solutions within and outside the industry for addressing the value distribution of data products. Existing methods mainly suffer from the following limitations: (1) Reliance on human experience, high subjectivity and low efficiency: The current mainstream approach relies on data experts or business experts to make manual judgments and negotiations based on experience. Product design heavily depends on data engineers manually writing scripts, which is inefficient and business personnel cannot participate. This method relies heavily on the personal knowledge and subjective will of the participants and lacks objective and unified quantitative basis. It can cope with data products with simple structure and single source, but for data products with complex sources and long processing links, it is almost impossible to manually trace their lineage and estimate their contribution, resulting in a lack of persuasiveness in the allocation plan and extremely high negotiation costs.

[0005] (2) The lineage analysis technology is superficial and cannot support value judgment: Although there are some data lineage analysis tools on the market (usually built into data governance platforms or data catalog systems), the lineage analysis capabilities of these tools are mostly limited to the "table level" or "task level". They can show the flow relationship of data between databases or tables, but cannot delve into understanding the precise impact and contribution at the field level. More importantly, existing lineage tools mainly serve operational purposes such as data quality, impact analysis, and troubleshooting. Their analysis results (such as table-level dependencies) are qualitative rather than quantitative and cannot be directly converted into weight ratios that can be used for value allocation calculations. They lack a bridge to link technical lineage with business value. The unclear data lineage relationship makes it difficult to trace the original data source and processing path, leading to difficulties in product compliance and ownership definition.

[0006] (3) Disconnect between value allocation and data processing: Existing methods often treat the value of data products as a black box, focusing only on the weight of the original data source while ignoring the technical and intellectual contributions injected during data processing. For example, a simple raw data table may see its value increase exponentially after being processed by a complex algorithm model. Existing extensive allocation methods cannot effectively quantify the role of data processing scripts, algorithm models, etc., in value-added, which dampens the enthusiasm of data processors and is detrimental to the innovation of data products.

[0007] (4) Lack of automated and intelligent recommendation mechanism: Due to the limitations of the above technologies, there is currently no automated method for recommending value allocation ratios. From the definition of data products to the generation of allocation schemes, the entire process requires a large amount of manual intervention, making it impossible to form a replicable and scalable standard process. This seriously restricts the large-scale circulation and trading of data products and has become a technological bottleneck for the development of the data element market.

[0008] In summary, existing technologies have several core shortcomings when dealing with the value distribution of data products, including reliance on manual labor, superficial lineage analysis, neglect of processing value, and lack of automated recommendation mechanisms. Summary of the Invention

[0009] In view of this, this application provides a data product value allocation method based on lineage analysis. By introducing three core technical means, namely fine-grained data lineage analysis based on large language models, interactive and guided data product visualization assembly and configuration, and automatic recommendation of data value allocation ratio based on contribution measurement, this method addresses the core defects of existing technologies in dealing with data product value allocation, such as reliance on manual labor, superficial lineage analysis, neglect of processing value, and lack of automated recommendation mechanism.

[0010] The first aspect of this application provides a method for allocating the value of data products based on kinship analysis, which includes the following steps.

[0011] Step S1: Obtain the raw material information required to build the data product. The raw material information required for the data product includes data source metadata and data processing scripts.

[0012] Step S2: Send the raw material information required for the data product into the fine-grained lineage analysis engine based on a large language model for processing to obtain the output results of the lineage analysis engine.

[0013] Step S3: Construct a kinship graph based on the output of the kinship analysis engine, and persistently store the kinship graph in a graph database. Nodes in the kinship graph represent data entities, and edges represent data processing relationships.

[0014] Step S4: Receive the multiple target nodes and product definitions finally selected by the user on the graphical user interface displaying the kinship map. The target nodes are the tables or fields selected by the user that they expect to be output as data products.

[0015] Step S5: After the user completes the product definition, the intelligent value allocation engine is triggered. The intelligent value allocation engine runs a contribution metric algorithm based on the lineage graph to obtain the contribution weight of each node and edge on all paths leading to the target node to the final data product. The basic principle of the intelligent value allocation engine is to assign an initial weight to the upstream original data node, and then let the weight be passed downstream along the lineage edges. During the transmission process, different value-added coefficients are assigned to each node and edge according to the complexity of the processing logic.

[0016] Step S6: Distribute the total value weight of the final data product to each of the upstream original data source nodes and key data processing script nodes through attribution algorithms to obtain a value allocation scheme.

[0017] In one specific embodiment of this application, step S11 is a specific implementation of step S1.

[0018] Step S11: Automatically collect the raw material information required to build the data product from the data environment.

[0019] In one specific embodiment of this application, the data collection period in step S11 is from 1 minute to 24 hours.

[0020] In one specific embodiment of this application, steps S41 to S43 are a specific implementation of step S4.

[0021] Step S41: Receive one or more target nodes selected by the user on a graphical user interface displaying a kinship map. The target nodes are the tables or fields selected by the user that they are expected to be output as data products.

[0022] Step S42: Automatically highlight all upstream nodes and paths related to the target node.

[0023] Step S43: Receive the user's definition of the business attributes of the data product in an integrated form.

[0024] In one specific embodiment of this application, the business attributes of the data product include product name, description, update frequency, and access interface type.

[0025] In one specific embodiment of this application, the data product value allocation method based on kinship analysis further includes step S7.

[0026] Step S7: Based on the definition information and value distribution plan of the data product, package and release the data product.

[0027] In one specific embodiment of this application, the value-added coefficient ranges from 0.5 to 5.0.

[0028] A second aspect of this application provides a computer apparatus including a processor and a memory. The processor is used to execute a data product value allocation method based on kinship analysis according to the first aspect of this application. The memory is used to store executable instructions of the processor.

[0029] A third aspect of this application provides a computer-readable storage medium having executable instructions stored thereon. When executed by a processor, the executable instructions implement a data product value allocation method based on lineage analysis, as described in the first aspect of this application.

[0030] The fourth aspect of this application provides a computer program product, including a computer program / instructions, which, when executed by a processor, implements a data product value allocation method based on lineage analysis according to the first aspect of this application.

[0031] The beneficial effects of this technical solution are as follows: It proposes, for the first time, a method for quantifying and weighting the technical contributions (such as association, computation, and model training) in the data processing process. By setting "value-added coefficients" for different processing logics, the abstract value of data processing is transformed into concrete weight values. This provides an objective, transparent, and auditable basis for the value allocation of data products, replacing the previously controversial and uncertain subjective negotiations and laying a solid foundation for fair and reasonable benefit distribution. Simultaneously, it implants circulation genes at the source of product creation, placing the value allocation assessment stage forward to the creation phase of the data product, rather than as a post-event remedy. This mechanism design means that every data product created through this method naturally carries a clear "instruction manual" of value composition, greatly reducing the friction and trust costs of subsequent market transactions of data products, stimulating the enthusiasm of all parties to participate in the supply and processing of data elements, and thus powerfully promoting the formation of a healthy data ecosystem. Attached Figure Description

[0032] Figure 1 The diagram shown is a flowchart illustrating a data product value allocation method based on lineage analysis, provided in an embodiment of this application.

[0033] Figure 2 The diagram shown is a simplified flowchart of a data product value allocation method based on lineage analysis provided in an embodiment of this application. Detailed Implementation

[0034] The technical solutions of the embodiments of this application will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of this application, and not all embodiments. Based on the embodiments of this application, all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of this application.

[0035] At least one embodiment of this application provides a method for allocating the value of data products based on kinship analysis, see reference. Figure 1 The data product value allocation method based on kinship analysis includes the following steps S1 to S6.

[0036] Step S1: Obtain the raw material information required to build the data product. The raw material information required for the data product includes data source metadata and data processing scripts.

[0037] In some embodiments, step S11 is a specific implementation of step S1.

[0038] Step S11: Automatically collect the raw material information required to build the data product from the data environment.

[0039] Step S1 or S11 marks the starting point of this method and is also the step for collecting data sources and processing scripts. The system automatically collects the raw material information required to build data products from the data environment (such as Hadoop, data warehouse, object storage, etc.). This information mainly includes two categories: 1) Data source metadata: including the schema information of the database and data tables (such as table names, field names, field comments, data types, etc.); 2) Data processing scripts: including SQL scripts, Python scripts (such as PySpark), Shell scripts, and job configuration files of ETL tools (such as Kettle, DataX) used for data extraction, transformation, and loading (ETL).

[0040] Technical specifications: Data acquisition time window: The system can perform incremental data acquisition according to a set period.

[0041] Parameter range: The acquisition period can range from 1 minute to 24 hours.

[0042] Specific examples: The lower limit is 1 minute, which is suitable for real-time data environments where data processing logic changes frequently; the upper limit is 24 hours, which is suitable for batch processing environments where changes are relatively slow; a representative intermediate value could be 1 hour.

[0043] Script file size: The system needs to be able to handle scripts of different sizes.

[0044] Parameter range: The size of a single script file that can be processed ranges from 1KB to 100MB.

[0045] Specific examples: a lower limit of 1KB corresponds to a simple SQL query; an upper limit of 100MB corresponds to an extremely complex generative script containing a large amount of logic; a common intermediate value is about 1MB.

[0046] Step S2: Send the raw material information required for the data product into the fine-grained lineage analysis engine based on a large language model for processing to obtain the output results of the lineage analysis engine.

[0047] Step S2 is the fine-grained lineage analysis step based on Large Language Model (LLM). Specifically, the collected data source metadata and data processing scripts are sent to the fine-grained lineage analysis engine based on Large Language Model (LLM) for processing.

[0048] Basic Principle: The fine-grained lineage analysis engine based on a large language model uses a known, pre-trained large language model (LLM) as its core parsing and inference engine. First, the LLM lineage analysis algorithm is deployed in environments such as data platforms and data lakes. The workflow is as follows: Input: The algorithm automatically collects or receives various inputs from the data environment, including but not limited to: ETL job configuration files (such as SQL scripts, Kettle job files), data processing scripts (such as Python, Spark scripts), database schema definitions, and inter-table field comparison results obtained through data exploration techniques.

[0049] Parsing and Reasoning: This fine-grained lineage analysis engine, based on a large language model, leverages the powerful code understanding and logical reasoning capabilities of LLM to perform deep parsing of data source metadata and data processing scripts. It can not only identify explicit data dependencies (such as INSERT INTO table_a SELECT ... FROM table_b in SQL), but also understand implicit lineages such as field-level mappings in complex functions, stored procedures, conditional statements, complex logical judgments, and stored procedure calls. For example, it can parse out complex logic like output_field = func_a(table1.field_x) + func_b(table2.field_y) and accurately establish the lineage relationship between output_field and table1.field_x and table2.field_y. As another example, it can parse out field-level mappings like field_c = field_a + field_b.

[0050] Output: LLM generates a fine-grained lineage graph covering three levels: database, table, and field, through inference. This lineage graph is stored in the form of a graph data structure, where nodes represent data entities (databases, tables, fields), edges represent data processing relationships, and information such as source scripts and transformation logic can be labeled.

[0051] Implementation Requirements and Technical Foundations: This approach requires access to the computing resources of the data platform, as well as a pre-trained large language model (such as variants of Codex or CodeLlama) that performs well in code understanding tasks. The data environment must support log collection and script file access.

[0052] The equipment used to complete the operation includes: servers deployed with the LLM lineage analysis algorithm, computing nodes of the data platform / data lake, and a graph database for storing lineage graphs.

[0053] Technical specifications: LLM context window length: determines the length of script code that can be processed in a single analysis.

[0054] Parameter range: 2k tokens to 128k tokens (a token is the basic text unit processed by LLM).

[0055] Specific examples: a lower limit of 2k tokens can handle most standard SQL scripts; an upper limit of 128k tokens can handle very long and complex integrated scripts; the middle value of 16k tokens or 32k tokens is the common effective length for the current model.

[0056] Lineage analysis depth (recursive parsing level): refers to the number of recursive parsings of nested calls in the data processing script (such as one SQL view referencing another view).

[0057] Parameter range: Depth from 1 level (parse only the current script) to 20 levels (deep recursive parsing of all nested elements).

[0058] Specific examples: A lower limit of 1 layer can be used for quick but coarse analysis; an upper limit of 20 layers can handle extremely complex dependency chains; usually, setting 5 or 10 layers is sufficient to balance analysis effectiveness and performance.

[0059] Step S3: Construct a kinship graph based on the output of the kinship analysis engine, and persistently store the kinship graph in a graph database. Nodes in the kinship graph represent data entities, and edges represent data processing relationships.

[0060] Step S3 involves constructing and storing the pedigree graph. Specifically, the output of the pedigree analysis engine is systematically constructed into a directed graph structure, namely the pedigree graph (or pedigree relationship graph). Nodes in the pedigree graph represent data entities (such as databases, tables, and fields), edges represent data processing relationships, and information such as processing script IDs and transformation types can be attached to the edges. The constructed pedigree graph is persistently stored in a graph database (such as Neo4j or Nebula Graph) for subsequent querying and visualization.

[0061] Technical specifications: Response time of kinship graph storage: refers to the time taken to query all direct upstream / downstream dependencies of a specific node.

[0062] Parameter range: Ideal response time should be between 10 milliseconds and 1000 milliseconds (1 second).

[0063] Specific examples: A lower limit of 10 milliseconds is suitable for high-performance real-time queries; an upper limit of 1000 milliseconds is an acceptable upper limit for interactive queries; a typical value is about 100 milliseconds.

[0064] Step S4: Receive the multiple target nodes and product definitions finally selected by the user on the graphical user interface displaying the kinship map. The target nodes are the tables or fields selected by the user that they expect to be output as data products.

[0065] In some embodiments, steps S41 to S43 are specific implementations of step S4.

[0066] Step S41: Receive one or more target nodes selected by the user on a graphical user interface displaying a kinship map. The target nodes are the tables or fields selected by the user that they are expected to be output as data products.

[0067] It should be noted that the target node can also be called the "endpoint" data entity or the output node.

[0068] Step S42: Automatically highlight all upstream nodes and paths related to the target node.

[0069] Specifically, the system can automatically trace back and highlight all upstream data entities and processing paths related to the target node based on lineage.

[0070] Step S43: Receive the user's definition of the business attributes of the data product in an integrated form.

[0071] Specifically, users can define metadata such as the name, description, output format (e.g., API, file), and access permissions for the selected dataset within an integrated interface. It should be noted that the system can automatically record the data range and processing path corresponding to this selection operation.

[0072] Step S4, or steps S41 to S43, is the interactive selection and product definition step. Specifically, the system provides a graphical user interface (GUI) that visualizes the stored lineage graph, presenting it to the user in a graphical manner. Users (such as product managers, data analysts, business personnel, or data engineers) select one or more target nodes (i.e., tables or fields expected to be output as data products) from the graph through drag-and-drop, clicking, and other interactive methods. The system automatically highlights all relevant upstream nodes and paths. Subsequently, the user defines the business attributes of the data product in an integrated form, such as product name, description, update frequency, and access interface type (e.g., RESTful API, SDK).

[0073] Technical specifications: Number of UI rendering nodes: To ensure a smooth interactive experience, the number of graph nodes rendered at one time by the front end needs to be limited.

[0074] Parameter range: The number of nodes that can be rendered simultaneously can range from 100 to 5000.

[0075] Specific examples: a minimum of 100 graphs suitable for simple business lines; a maximum of 5000 graphs to cover data domains of medium complexity; typically 1000 nodes is a balance point, and any excess can be loaded interactively through scaling, collapsing, and other functions.

[0076] Implementation requirements and technical foundation: It requires front-end visualization technology (such as using graphics libraries like D3.js and G6) support and the ability to communicate with the back-end pedigree data interface.

[0077] The devices used to complete the action include: user interaction terminals (such as PCs and laptops), web application servers that provide the interactive interface, and backend business logic servers.

[0078] Step S4 above, through interactive and guided visual assembly and configuration of data products, achieves both low barriers to entry and high efficiency in the data product creation process, promoting efficient data development. By constructing an interactive layer connecting the "technical lineage" (backend) and the "business product" (frontend), its structural feature is that it deconstructs the complex, code-driven development process into three intuitive steps: "visual presentation - interactive selection - form-based definition." Furthermore, by transforming the traditional, technically-oriented data lineage graph into an interactive data product "assembly" interface for business users, the barrier to entry for data product creation is significantly lowered. This allows business personnel to intuitively understand data sources and participate in product design, and transforms the product creation process from "writing code" to "visual configuration," significantly improving efficiency and usability.

[0079] Step S5: After the user completes the product definition, the intelligent value allocation engine is triggered. The intelligent value allocation engine runs a contribution metric algorithm based on the lineage graph to obtain the contribution weight of each node and edge on all paths leading to the target node to the final data product. The basic principle of the intelligent value allocation engine is to assign an initial weight to the upstream original data node, and then let the weight be passed downstream along the lineage edges. During the transmission process, different value-added coefficients are assigned to each node and edge according to the complexity of the processing logic.

[0080] Step S5 is the contribution quantification calculation step, the core of which lies in constructing a contribution quantification model. The consensus conclusion is that the value contribution of a data entity is reflected in its scarcity as "raw material", its participation in the processing chain, and the complexity of the value-added processing it undergoes.

[0081] Specifically, once the user completes the product definition, the system triggers the intelligent value allocation engine. This intelligent value allocation engine runs a contribution quantification algorithm based on the lineage graph generated in step S3, and its basic principle is as follows.

[0082] a. Node weight initialization: Assign initial weights to the most upstream original data nodes (i.e., data source tables / fields) in the lineage graph (which can be set based on data cost, scarcity, etc., or be equal by default). The initial weights can be set based on data value assessment methods (such as data cost, market valuation, etc., which are known principles), or be set to equal values ​​by default.

[0083] b. Value Transfer and Value-Added Calculation: Along the edges representing processing steps, the weights of upstream nodes are transferred to downstream nodes. The transfer function can consider the complexity of the processing. For example, an SQL statement that merges multiple upstream fields and calculates a new field should have a higher value-added coefficient than a simple field filtering operation. This calculation process is analogous to the PageRank algorithm in graph algorithms, but applied to the data value transfer scenario. During the transfer, different value-added coefficients are assigned based on the complexity of the processing logic (e.g., whether it involves a complex machine learning model, multiple table joins, and calculations), thus calculating the contribution weight of each node and edge to the final data product. It should be noted that the value-added coefficient is calculated by multiplying each node's weight by its corresponding value-added coefficient, and finally normalizing the weights of all nodes to ensure that the sum of their weights equals one.

[0084] Technical specifications: Value-added coefficient range: used to quantify the complexity of different data processing operations.

[0085] Parameter range: This coefficient is a dimensionless weight value that ranges from 0.5 (e.g., low value gain for simple data filtering) to 5.0 (e.g., high value gain for complex AI model training).

[0086] Specific examples: A lower limit of 0.5 represents cleaning operations that may result in information loss; an upper limit of 5.0 represents algorithmic models that can greatly improve data value; for ordinary join queries and field calculations, the coefficient can be set to 1.0 to 1.5.

[0087] Step S6: Distribute the total value weight of the data product corresponding to the target node to each of the upstream original data source nodes and key data processing script nodes through the attribution algorithm to obtain a value allocation scheme.

[0088] Step S6 is the value allocation ratio recommendation step. Specifically, for the target node finally selected by the user in step S4, the system automatically aggregates the weights of all upstream paths leading to the target node calculated in step S5. Through an attribution algorithm, the total value weight of the data product is distributed to each of the upstream original data source nodes (which can be associated with the data subject) and key data processing script nodes (which can be associated with the processing subject). Finally, a suggested value allocation ratio scheme for the data product is generated, for example: "Data source A (providing basic user information) accounts for 35%, data source B (providing transaction logs) accounts for 45%, and data processing party C (responsible for developing the user profile model) accounts for 20%." This scheme will be presented to the user as a core reference for data product pricing and profit sharing. For example, a suggested value allocation scheme calculated according to the weight ratio is: Data source A accounts for 40%, data source B accounts for 30%, and data processing party C accounts for 30%. It should be noted that after completing step S5 above, a weight sequence with a total sum of 1 is obtained. Multiplying the total value of the data product by the weight sequence above will yield the value allocation value corresponding to each node. Then, by summing the values ​​of each node according to each participating entity, the value allocation of that participating entity can be obtained.

[0089] Implementation conditions and technical basis: Complete pedigree data is required, along with defined reasonable weight initialization rules and value transfer functions.

[0090] The equipment used to complete the action: business logic server and computing engine for performing graph calculations.

[0091] In steps S2 and S3 of the above embodiments of this application, a large language model is innovatively applied to the field of data lineage analysis. This enables a deep understanding and accurate reasoning of complex and heterogeneous data processing scripts, overcoming the shortcomings of traditional lineage analysis tools that can only perform shallow syntax parsing and cannot handle complex logic and implicit dependencies. It generates an unprecedented field-level precision and high reliability end-to-end lineage graph, which is the data foundation for all subsequent intelligent functions. Fine-grained data lineage analysis based on the large language model achieves a clear and comprehensive understanding of data lineage relationships, laying a reliable data foundation for all subsequent processes.

[0092] Traditional lineage analysis tools rely on simple syntax parsers or regular expressions, whose structure limits their ability to recognize only fixed, surface-level code patterns. In contrast, the large language model used in this application is essentially a deep neural network with massive prior knowledge and powerful logical reasoning capabilities. Like a seasoned data expert, it can understand the semantics and context of code. The large language model can parse scripts that traditional tools struggle with, such as SQL stored procedures and complex Python UDFs (User-Defined Functions), achieving precise lineage tracing at the field level. This stems from the LLM's deep understanding of programming language syntax and semantics, rather than fixed rule-based matching. The result is the generation of unprecedentedly high-precision and high-completeness lineage maps, making the component tracing of data products clear and reliable, and improving the accuracy and depth of data processing. Simultaneously, the large language model can discover implicit data dependencies indirectly generated in data processing scripts through conditional statements, loops, or function encapsulation—something traditional methods cannot achieve. This effectively avoids ownership disputes and misjudgments caused by missing lineages, greatly enhancing the reliability and authority of lineage analysis.

[0093] Step S4 above, through interactive and guided visual assembly and configuration of data products, achieves both low barriers to entry and high efficiency in the data product creation process, promoting efficient data development. This step constructs an interactive layer connecting the "technical lineage" (backend) and the "business product" (frontend). Its structural feature is that it deconstructs the complex, code-driven development process into three intuitive steps: "visual presentation - interactive selection - form-based definition." Since the data product is directly "assembled" based on an authoritative lineage map, its data source and processing path are automatically traced and locked by the system. This mechanism eliminates the deviations and errors that might be introduced by manual recoding, ensuring strict consistency between the data product definition and the underlying data logic, and enhancing the product's reliability and interpretability.

[0094] Steps S5 and S6 above create a quantitative model and method that automatically maps technological lineage to economic value distribution ratios. Based on contribution-quantified data value distribution ratios, it automatically recommends ratios, solving the core pain point of "lack of basis for value distribution" in the marketization of data elements. This achieves "quantifiable" and "automated" value distribution of data products, providing key technical support for the market circulation of data elements. Transparent and auditable value distribution suggestions are automatically generated during the product creation stage, providing objective technical basis for data transactions and revenue sharing, greatly promoting the circulation and collaborative production of data products. This application's embodiment constructs a mapping model of "technological lineage → contribution weight → economic value." The structure of this mapping model is similar to a distributed value calculation network, where nodes are data entities and edges are value transfer rules.

[0095] The above embodiments of this application propose for the first time a method for quantifying and weighting the technical contributions (such as association, calculation, and model training) in the data processing process. By setting "value-added coefficients" for different processing logics, the abstract data processing value is transformed into specific weight values, thereby providing an objective, transparent, and auditable basis for the value distribution of data products. This replaces the subjective negotiation that was originally full of controversy and uncertainty, and lays a solid foundation for fair and reasonable distribution of benefits.

[0096] Simultaneously, by embedding circulation-oriented principles at the product creation stage, the value allocation and evaluation process is moved forward to the data product creation phase, rather than being addressed retroactively. This mechanism means that every data product created through this method naturally carries a clear "instruction manual" for its value composition. This significantly reduces the friction and trust costs associated with the subsequent circulation and trading of data products in the market, stimulating the enthusiasm of all parties to participate in the supply and processing of data elements, thereby powerfully promoting the formation of a healthy data ecosystem.

[0097] In summary, the embodiments of this application organically integrate three core technologies: fine-grained data lineage analysis based on Large Language Model (LLM), interactive and guided data product visualization assembly and configuration, and automatic recommendation of data value allocation ratios based on contribution measurement. This results in a complete solution that integrates "precise analysis, convenient creation, and fair allocation," which not only improves specific technical indicators but also demonstrates significant technological advancement and broad industrial application prospects in substantially promoting the macro-level goal of market-based allocation of data elements.

[0098] In at least one embodiment of this application, reference is made to Figure 2 The data product value allocation method based on kinship analysis also includes step S7.

[0099] Step S7: Based on the product definition and value distribution plan, package and release the data product.

[0100] Step S7 is the data product packaging and release step, which can be the end point of the process. Specifically, after the user confirms the product definition and value allocation plan (which can be adjusted), the system calls upon the data platform's existing API gateway, resource scheduling, and other capabilities (retaining existing technologies) to automatically complete the technical packaging according to the product definition, such as generating data APIs, packaging data files, and setting access permissions and quotas. Finally, the packaged data product is released to the internal market or external data trading platform, completing the entire creation process.

[0101] This application embodiment, through the organic combination of the above steps S1 to S7, realizes the interactive and intelligent creation of data products from lineage analysis, intelligent assembly, value assessment to final release, and can solve the problem of data value distribution in an automated, refined, fair and reasonable manner.

[0102] At least one embodiment of this application also provides a computer device including a processor and a memory. The processor is used to execute a data product value allocation method based on lineage analysis provided in any of the above embodiments of this application. The memory is used to store executable instructions of the processor, such as application programs. The number of processors can be one or more. The application programs stored in the memory can include one or more modules, each corresponding to a set of instructions. Furthermore, the processor is configured to execute instructions to perform the above-described data product value allocation method based on lineage analysis.

[0103] The computer device may also include a power supply component configured for power management, a wired or wireless network interface configured to connect the computer device to a network, and an input / output (I / O) interface. The computer device can operate on an operating system stored in memory, such as Windows Server. TM Mac OSX TM Unix TM Linux TM FreeBSD TM Or similar.

[0104] At least one embodiment of this application also provides a computer-readable storage medium storing executable instructions for a computer thereon. When executed by a processor, the executable instructions implement a data product value allocation method based on lineage analysis provided in any of the above embodiments of this application.

[0105] A non-transitory computer-readable storage medium, when the instructions in the storage medium are executed by the processor of the computer device, enables the computer device to perform the aforementioned data product value allocation method based on lineage analysis. This data product value allocation method based on lineage analysis is executed by a proxy program.

[0106] Those skilled in the art will recognize that the algorithmic steps of the various examples described in conjunction with the embodiments disclosed in this application can be implemented in electronic hardware, or a combination of computer software and electronic hardware. Whether these functions are implemented in hardware or software depends on the specific application and design constraints of the technical solution. Those skilled in the art can use different methods to implement the described functions for each specific application, but such implementation should not be considered beyond the scope of this application.

[0107] At least one embodiment of this application also provides a computer program product, including a computer program / instruction, which, when executed by a processor, implements a data product value allocation method based on lineage analysis provided in any of the above embodiments of this application.

[0108] If the aforementioned functions are implemented as software functional units and sold or used as independent products, they can be stored in a computer-readable storage medium. Based on this understanding, the technical solution of this application, in essence, or the part that contributes to the prior art, or a part of the technical solution, can be embodied in the form of a computer program product. This computer program product is stored in a storage medium and includes several instructions to cause a computer device (which may be a personal computer, server, or network device, etc.) to execute all or part of the steps of the data product value allocation method based on lineage analysis according to various embodiments of this application. The aforementioned storage medium includes various media capable of storing program verification codes, such as USB flash drives, portable hard drives, read-only memory (ROM), random access memory (RAM), magnetic disks, or optical disks.

[0109] It should be noted that the combination of the technical features in the embodiments of this application is not limited to the combination methods described in the embodiments of this application or the combination methods described in specific embodiments. All technical features described in this application can be freely combined or combined in any way, unless they contradict each other.

[0110] As indicated in this application and claims, unless the context clearly indicates otherwise, the words "a," "an," and / or "the" do not specifically refer to the singular and may also include the plural. Generally speaking, the term "comprising" only indicates that it includes the explicitly identified steps and elements, which do not constitute an exclusive list, and the method or apparatus may also include other steps or elements.

[0111] The above description is merely a preferred embodiment of this application and is not intended to limit this application. Any modifications or equivalent substitutions made within the spirit and principles of this application should be included within the protection scope of this application.

Claims

1. A method for allocating the value of data products based on kinship analysis, characterized in that, include: Step S1: Obtain the raw material information required to build the data product. The raw material information required for the data product includes data source metadata and data processing scripts; Step S2: Send the raw material information required for the data product into the fine-grained lineage analysis engine based on the large language model for processing to obtain the output results of the lineage analysis engine. Step S3: Construct a kinship graph based on the output of the kinship analysis engine, and persistently store the kinship graph in a graph database. Nodes in the kinship graph represent data entities, and edges represent data processing relationships. Step S4: Receive the multiple target nodes and product definitions finally selected by the user on the graphical user interface displaying the kinship map. The target nodes are the tables or fields that the user expects to be output as data products. Step S5: After the user completes the product definition, the intelligent value allocation engine is triggered. The intelligent value allocation engine runs a contribution quantification algorithm based on the lineage graph to obtain the contribution weight of each node and edge on all paths to the target node to the entire final data product. The basic principle of the intelligent value allocation engine is to assign an initial weight to the upstream original data node and then let the weight be passed downstream along the lineage edge. During the transmission process, different value increment coefficients are assigned to each node and edge according to the complexity of the processing logic. Step S6: Distribute the total value weight of the final data product to each of the upstream original data source nodes and key data processing script nodes through attribution algorithms to obtain a value allocation scheme.

2. The method for allocating the value of data products based on kinship analysis according to claim 1, characterized in that, Step S1 includes: Step S11: Automatically collect the raw material information required to build the data product from the data environment.

3. The data product value allocation method based on kinship analysis according to claim 2, characterized in that, The data collection period in step S11 ranges from 1 minute to 24 hours.

4. The method for allocating the value of data products based on kinship analysis according to claim 1, characterized in that, Step S4 includes: Step S41: Receive one or more target nodes selected by the user on the graphical user interface displaying the kinship map. The target nodes are the tables or fields that the user expects to be output as data products. Step S42: Automatically highlight all upstream nodes and paths related to the target node; Step S43: Receive the user's definition of the business attributes of the data product in an integrated form.

5. The data product value allocation method based on kinship analysis according to claim 4, characterized in that, The business attributes of data products include product name, description, update frequency, and access interface type.

6. The method for allocating the value of data products based on kinship analysis according to claim 1, characterized in that, Also includes: Based on the definition information and value distribution scheme of the data product, the data product is packaged and released.

7. A method for allocating the value of data products based on kinship analysis according to any one of claims 1 to 6, characterized in that, The value-added coefficient ranges from 0.5 to 5.

0.

8. A computer device, characterized in that, include: A processor for executing a data product value allocation method based on lineage analysis as described in any one of claims 1 to 7; as well as Memory for storing the executable instructions of the processor.

9. A computer-readable storage medium having executable instructions stored thereon, characterized in that, When the executable instructions are executed by the processor, they implement the data product value allocation method based on lineage analysis as described in any one of claims 1 to 7.

10. A computer program product, comprising a computer program / instructions, characterized in that, When the computer program / instruction is executed by the processor, it implements a data product value allocation method based on lineage analysis as described in any one of claims 1 to 7.