A data asset evaluation method
By collecting and analyzing industry guidance documents, combining market data and transaction prices, and using pre-trained models for multi-dimensional evaluation, this approach solves the problems of inaccurate and inflexible data asset valuation in existing technologies, and achieves customized and efficient data asset valuation.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- SHANGHAI ZHONGWEI INTELLIGENT CODE TECHNOLOGY CO LTD
- Filing Date
- 2026-03-23
- Publication Date
- 2026-06-19
AI Technical Summary
Existing technologies fail to fully consider industry orientation and application scenario matching in data asset valuation, resulting in inaccurate and inflexible valuation results. Furthermore, the data catalog generation lacks flexibility and cannot be customized.
By collecting and analyzing industry guidance documents, we can obtain the matching rate of industry guidance types and application scenarios. Combined with market data and transaction prices, we can conduct multi-dimensional evaluation using pre-trained supply and demand relationship models and asset valuation models. This includes defining data asset themes, industries, and scenarios, using natural language processing technology to automatically process industry guidance documents, and collecting market data and transaction prices.
It enables customized assessments based on data asset themes, industries, and scenarios, improving the flexibility and accuracy of assessments, reducing human intervention, providing a scientific basis for enterprise decision-making, and helping enterprises to rationally plan and operate.
Smart Images

Figure CN122243554A_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the field of data asset analysis technology, and more specifically, to a data asset evaluation method. Background Technology
[0002] To improve the accuracy of data asset valuation, existing application publication number CN114897322A proposes a data asset valuation system and method, including: a catalog module for acquiring a dataset and generating a data catalog based on the dataset; a calculation module for finding historical transaction prices of data based on the data catalog and calculating the scarcity, demand, and quantity of data; and an evaluation module for determining the possession attribute of data, calling a price formula based on the possession attribute, and calculating the asset value by inputting the historical transaction prices, scarcity, and demand. This invention improves the accuracy of data asset valuation by scoring the scarcity and demand of data assets and calculating the value range of data assets.
[0003] While the above methods can meet the needs of most scenarios, research and practical application of these methods and existing technologies have revealed at least the following shortcomings:
[0004] (1) Existing technologies mainly rely on historical transaction prices, scarcity and demand to assess the value of data assets, but fail to fully consider factors such as industry orientation and application scenario matching, which may result in incomplete and inaccurate assessment results.
[0005] (2) Existing technologies generate data catalogs based on datasets, which lacks flexibility and therefore do not support customized assessments based on different data asset themes, industries and scenarios.
[0006] In view of this, the present invention proposes a data asset evaluation method to solve the above problems. Summary of the Invention
[0007] To overcome the aforementioned deficiencies of the prior art and to achieve the above objectives, the present invention provides the following technical solution: a data asset valuation method, comprising:
[0008] Define the data asset themes, data asset industries, and data asset scenario sets;
[0009] Collect industry guidance documents with the same theme as data assets from the industry guidance documents published on information release websites;
[0010] The industry guidance documents were analyzed to obtain a collection of industry documents;
[0011] The industry-specific document set is analyzed to determine the industry orientation type.
[0012] Extract a set of standardized scenarios from the industry document set, match the set of standardized scenarios with the data asset scenario set, and obtain the application scenario matching rate;
[0013] Collect a market dataset; the market dataset includes market data for Q companies; the companies are those corresponding to the data asset theme and the data asset industry.
[0014] The market dataset is analyzed to obtain financial analysis data, which includes the average total revenue difference, the average total cost difference, the average net profit difference, and the average gross profit margin difference.
[0015] Financial analysis data is input into a pre-trained supply and demand model to obtain supply and demand values;
[0016] The transaction prices of data assets collected from historical data to the present.
[0017] The average transaction price is calculated by averaging the transaction prices of data assets.
[0018] The asset valuation data is input into a pre-trained asset valuation model to obtain the asset valuation value. The asset valuation data includes industry orientation type, application scenario matching rate, supply and demand value, and average transaction price.
[0019] Furthermore, the method for collecting the industry guidance documents is as follows:
[0020] Use data scraping tools to extract industry guidance documents with the same theme as the data assets from information publishing websites; the data scraping tools include API interfaces and RSS subscriptions.
[0021] Furthermore, the method for obtaining the industry document set includes:
[0022] Step 1: Convert the industry guidance document format into a parsable text format;
[0023] Step 2: Predefine a stop word list;
[0024] Step 3: Use natural language processing technology to divide the industry guidance document according to subheadings to obtain information subheadings and the information content corresponding to the information subheadings; let the initial value of i be 1, i∈I, where I is the total number of information contents and i is the information content index;
[0025] Step 4: Remove the corresponding words from the industry guidance document based on the stop word list;
[0026] Step 5: Use natural language processing techniques to remove punctuation and special characters from the i-th piece of information, and obtain a set of information keywords based on word segmentation of the information content;
[0027] Step Six: Add the information subheadings and keyword sets to the industry document collection;
[0028] Step 7: Let i = i + 1. If i is less than or equal to I, then execute steps 4 to 6; if i is greater than I, then end.
[0029] Furthermore, the method for obtaining the industry orientation type is as follows:
[0030] Step 1: Obtain the current year when assessing the asset value;
[0031] Step 2: Obtain the publication year of the industry guidance documents from the industry document collection;
[0032] Step 3: Subtract the release year from the current year to obtain the year difference;
[0033] Step 4: Select the largest year difference as the year difference used for industry orientation type determination;
[0034] Step 5: Determine the industry orientation type based on the year difference.
[0035] Furthermore, the method for obtaining the industry orientation type based on the year difference is as follows:
[0036] A preset year determination threshold is included. and If the year difference is less than or equal to If the industry orientation type is short-term oriented, then the industry orientation type is short-term oriented; if the year difference is greater than and less than or equal to If the industry orientation type is medium-term oriented, then the industry orientation type is medium-term oriented; if the year difference is greater than If so, the industry orientation type is long-term orientation.
[0037] Furthermore, the method for extracting the set of standardized scenarios from the industry document set includes:
[0038] Step 1: Match the information subheadings in the data asset industry with the information subheadings in the industry file collection in turn to obtain information subheadings that are the same as those in the data asset industry;
[0039] Step 2: Obtain the set of information keywords corresponding to the information subheadings;
[0040] Step 3: Use natural language processing technology to extract industry application scenarios from the set of information keywords, count the number of industry application scenarios as M, and M industry application scenarios constitute a standardized scenario set.
[0041] Furthermore, the method for matching the standardized scenario set with the data asset scenario set to obtain the application scenario matching rate includes:
[0042] Step 1: The initial value of j is preset to 1, j∈M; the number of matching industry application scenarios and data asset scenarios is preset to num; the initial value of num is 0;
[0043] Step 2: Obtain the j-th industry application scenario in the set of standardized scenarios, and match the industry application scenario with the data asset scenarios in the set of data asset scenarios in turn; if the industry application scenario and the data asset scenario are the same, the match is successful, and let num = num + 1;
[0044] Step 3: Let j = j + 1; if j is less than or equal to M, then continue to Step 2; if j is greater than M, then proceed to Step 4.
[0045] Step 4: Calculate the application scenario matching rate by dividing num by M.
[0046] Furthermore, the method for calculating the average of the total income difference includes:
[0047] ;
[0048] The average of the total income difference. Let q be the total operating revenue of the company in the current quarter. The total operating revenue of the qth company in the previous quarter;
[0049] The method for calculating the average total cost difference includes:
[0050] ;
[0051] This is the average of the total cost difference. Let q be the total operating cost of the company in the current quarter. Let q be the total operating cost of the qth company in the previous quarter.
[0052] The method for obtaining the average difference in net profit and the average difference in gross profit margin is the same as the method for calculating the average difference in total revenue.
[0053] Furthermore, the training method for the supply and demand model includes:
[0054] A financial dataset is pre-collected, including financial analysis data and corresponding supply and demand values. The financial dataset is divided into a training set and a test set. A classifier is constructed, using the financial analysis data in the training set as input to the supply and demand model and the supply and demand values in the training set as output. The classifier is trained to obtain an initial classifier. The initial classifier is then tested using the test set, and the output classifier that meets the preset accuracy is used as the supply and demand model. The supply and demand model is either a Naive Bayes model or a Support Vector Machine model.
[0055] Furthermore, the method for calculating the average transaction price of data assets includes:
[0056] Step 1: Create an asset transaction price set by compiling historical asset transaction prices from the past to the present.
[0057] Step 2: Calculate the average transaction price within the asset transaction price set;
[0058] The average transaction price is calculated as follows:
[0059] ;
[0060] The average transaction price. Let H be the h-th asset transaction price in the set of asset transaction prices, where H is the total number of asset transaction prices.
[0061] Furthermore, the training method for the asset valuation model includes:
[0062] An asset dataset is pre-collected, including asset appraisal data and the corresponding asset appraisal values, which are pre-assessed by a data asset appraiser. The asset dataset is divided into training and test sets, and a classifier is constructed. The asset appraisal data in the training set is used as input to the asset appraisal model, and the asset appraisal values in the training set are used as output. During the training of the asset appraisal model, minimizing the cross-entropy loss function is used as the optimization objective. An early stopping strategy is used to monitor the performance on the validation set. By continuously adjusting the network parameters, training stops when the prediction accuracy on the test set reaches a prediction accuracy threshold. The asset appraisal model is a gradient boosting tree model.
[0063] The technical effects and advantages of the data asset valuation method proposed in this invention are as follows:
[0064] Customized assessments based on different data asset themes, industries, and scenarios enhance the flexibility of data asset evaluation. The use of data collection tools and natural language processing technology to automatically collect and process industry guidance documents from information publishing websites improves data processing efficiency and accuracy while reducing manual intervention. Analysis of information keyword sets determines the degree of matching between data assets and industry guidance; data assets that align with industry guidance have increased value, which is beneficial for enterprises to rationally plan and operate within that guidance. Collecting financial data from relevant companies and using supply and demand models to assess the market supply and demand of data assets provides a scientific basis for enterprise decision-making, helping enterprises grasp market dynamics and allocate resources effectively.
[0065] Finally, by comprehensively considering the industry orientation type, application scenario matching rate, supply and demand value, and average transaction price of data assets, the value of data assets is comprehensively evaluated from multiple dimensions, which improves the accuracy and reliability of the evaluation. Attached Figure Description
[0066] Figure 1 This is a schematic diagram of a data asset evaluation system according to Embodiment 1 of the present invention;
[0067] Figure 2 This is a flowchart of a data asset valuation method according to Embodiment 2 of the present invention;
[0068] Figure 3 This is a flowchart of a data asset valuation method according to Embodiment 3 of the present invention;
[0069] Figure 4 This is a flowchart of the method for obtaining the industry orientation type according to the present invention. Detailed Implementation
[0070] The technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of the present invention, and not all embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of the present invention.
[0071] Example 1
[0072] Please see Figure 1 As shown in this embodiment, a data asset valuation system includes a first definition module, a first acquisition module, a first processing module, a second processing module, a third processing module, a second acquisition module, a fourth processing module, a supply and demand module, a third acquisition module, a fifth processing module, and an asset valuation module; the modules are connected via wired and / or wireless means to realize data transmission between modules;
[0073] The first definition module is used to define the data asset theme, data asset industry, and data asset scenario set; for example, the data asset theme is humanoid robot, the data asset industry is manufacturing, and the data asset scenario set is {{assembly}, {transfer}, {inspection}}, where assembly, transfer, and inspection are all data asset scenarios.
[0074] For example, the data asset theme is intelligent vehicles, the data asset industry is the automotive industry, and the data asset scenario set is {{autonomous driving}, {assisted driving}}, where autonomous driving and assisted driving are data asset scenarios.
[0075] The first data collection module is used to collect industry guidance documents with the same theme as the data assets from the industry guidance documents published on information release websites; the information release websites include official information disclosure websites and public websites of industry-related associations.
[0076] The method for collecting the industry guidance documents is as follows:
[0077] Use data scraping tools to extract industry guidance documents with the same theme as the data assets from information publishing websites; the data scraping tools include API interfaces and RSS subscriptions.
[0078] It should be noted that industry guidance documents represent the future direction of development and provide safeguards for development; if the data asset theme aligns with industry guidance, the value of the data asset will increase.
[0079] For example, according to publicly available information on an industry website, relevant industry stakeholders have provided the following safeguards to ensure the smooth implementation of the "Guiding Opinions on the Innovative Development of Humanoid Robots".
[0080] First, strengthen overall coordination. Enhance inter-departmental collaboration to coordinate and advance technological breakthroughs, industrial development, and integrated applications. Second, improve industrial policies. Promote the implementation of the humanoid robot innovation project, increasing investment in key areas such as specialized software, core components, and complete robot applications. Third, accelerate talent attraction and development. Strengthen the training of personnel in humanoid robot-related disciplines, including industry-academia-research collaborative training models and high-level talent training mechanisms. Fourth, deepen exchange and cooperation. Expand international cooperation in humanoid robots and promote the internationalization and standardization of the industry.
[0081] The first processing module is used to analyze the industry guidance documents to obtain a set of industry documents; the set of industry documents includes X industry guidance documents; the content of each industry guidance document includes information subheadings and a set of information keywords corresponding to the information subheadings.
[0082] The collection of industry documents is expressed as follows:
[0083] .
[0084] It should be noted that, This is the information subheading of the Xth industry guidance document in the industry document collection. This is the set of information keywords corresponding to the information subheading of the Xth industry guidance document in the industry document collection.
[0085] The methods for obtaining the industry document set include:
[0086] Step 1: Convert the industry guidance document format to a parsable text format. The industry guidance document format includes PDF, DOC, etc., and the parsable text format includes Word or TXT.
[0087] Step 2: Predefine a stop word list, which includes words such as "at", "and", "of" and "is".
[0088] Step 3: Use natural language processing (NLP) techniques to divide the industry guidance document into subheadings, obtaining the information subheadings and their corresponding information content; let the initial value of i be 1, i∈I, where I is the total number of information contents and i is the information content index. Natural language processing techniques include NLP.
[0089] Step 4: Remove the corresponding words from the industry guidance document based on the stop word list.
[0090] Step 5: Use Natural Language Processing (NLP) techniques to remove punctuation and special characters from the i-th piece of information, and obtain a set of keywords based on word segmentation of the information content. For example, use jieba, an NLP tool specifically designed for processing Chinese, to remove punctuation and special characters from the i-th piece of information. The special characters include parentheses, commas, and slashes.
[0091] Step Six: Add the information subheadings and information keyword sets to the industry document set.
[0092] Step 7: Let i = i + 1. If i is less than or equal to I, then execute steps 4 to 6; if i is greater than I, then end.
[0093] For example, publicly available information on an industry website shows that the "Guiding Opinions on the Innovative Development of Humanoid Robots" issued by the Ministry of Industry and Information Technology includes:
[0094] To create the "brain" and "cerebellum" of a humanoid robot.
[0095] Develop a humanoid robot "brain" based on a large artificial intelligence model and build a large model training database; develop a "cerebellum" to control the humanoid robot's movement, build a motion control algorithm library, and establish a network control system architecture.
[0096] It should be noted that the information content, with the subheading "Building the Humanoid Robot's 'Brain' and 'Cerebellum'" and the subtitle "Developing a humanoid robot 'brain' based on a large artificial intelligence model, constructing a large model training database; developing a 'cerebellum' to control the humanoid robot's movement, building a motion control algorithm library, and establishing a network control system architecture," is divided into the following keyword sets: {{development}, {humanoid robot}, {brain}, {construction}, {large model}, {training}, {database}, {development}, {humanoid robot}, {cerebellum}, {construction}, {motion}, {control}, {algorithm library}}.
[0097] Create typical scenarios for the manufacturing industry.
[0098] For structured manufacturing processes, promote the application and popularization of humanoid robots in assembly, transportation, testing, and maintenance.
[0099] It should be noted that, with the information subheading "Creating typical manufacturing scenarios" and the information content "Promoting the application and popularization of humanoid robots in assembly, transfer, inspection, and maintenance processes, targeting structured production and manufacturing links," the information content is divided into the following keyword sets: {{targeting}, {manufacturing}, {links}, {promoting}, {humanoid robots}, {assembly}, {transfer}, {inspection}, {maintenance}, {processes}, {application}, {promotion}}.
[0100] The second processing module is used to determine the industry document set and obtain the industry orientation type. The industry orientation type includes short-term orientation, medium-term orientation and long-term orientation. The industry orientation type is helpful to the operation of enterprises. If the industry orientation type is medium-term orientation or long-term orientation, the value of data assets will increase.
[0101] like Figure 4 As shown, the method for obtaining the industry orientation type is as follows:
[0102] Step 1: Obtain the current year when assessing the asset value.
[0103] Step 2: Obtain the publication year of the industry guidance documents from the industry document collection; such as {{to}, {2025}, {year}}; {{to}, {2027}, {year}}; {{to}, {2035}, {year}}; 2025, 2027 and 2035 are the publication years.
[0104] Step 3: Calculate the year difference by subtracting the publication year from the current year; the method for calculating the year difference is as follows:
[0105] ;
[0106] It should be noted that, This is the difference between years. The year of publication. This refers to the current year.
[0107] Step 4: Select the largest year difference as the year difference used for industry orientation type judgment.
[0108] Step 5: Determine the industry orientation type based on the year difference.
[0109] The method for determining the industry orientation type based on the year difference is as follows:
[0110] A preset year determination threshold is included. and If the year difference is less than or equal to If the industry orientation type is short-term oriented, then the industry orientation type is short-term oriented; if the year difference is greater than and less than or equal to If the industry orientation type is medium-term oriented, then the industry orientation type is medium-term oriented; if the year difference is greater than Then the industry orientation type is long-term oriented; for example For 2, It is 5.
[0111] The third processing module is used to extract a set of standardized scenarios from the industry document set, match the set of standardized scenarios with the data asset scenario set, and obtain the application scenario matching rate.
[0112] The method for extracting a set of standardized scenarios from an industry document set includes:
[0113] Step 1: Match the information subheadings in the data asset industry with the information subheadings in the industry file collection in sequence to obtain information subheadings that are the same as the data asset industry; for example, the information subheading is "Creating a typical scenario for manufacturing".
[0114] Step 2: Obtain the set of information keywords corresponding to the information subheadings; for example, the set of information keywords is {{oriented}, {manufacturing}, {process}, {promotion}, {humanoid robot}, {assembly}, {transfer}, {testing}, {maintenance}, {process}, {application}, {promotion}}.
[0115] Step 3: Use natural language processing (NLP) techniques to extract industry application scenarios from the set of information keywords. Count the number of industry application scenarios as M. M industry application scenarios constitute a set of standardized scenarios. NLP techniques are used. For example, by extracting from the set of information keywords, the set of standardized scenarios obtained is {{assembly}, {transfer}, {inspection}, {maintenance}}, where M is 4.
[0116] The method for matching the standardized scenario set with the data asset scenario set to obtain the application scenario matching rate includes:
[0117] Step 1: The initial value of j is preset to 1, j∈M; the number of matching industry application scenarios and data asset scenarios is preset to num; the initial value of num is 0.
[0118] Step 2: Obtain the j-th industry application scenario in the set of standardized scenarios, and match the industry application scenario with the data asset scenarios in the set of data asset scenarios in turn; if the industry application scenario and the data asset scenario are the same, the match is successful, and let num = num + 1;
[0119] Step 3: Let j = j + 1; if j is less than or equal to M, then continue to step 2; if j is greater than M, then proceed to step 4.
[0120] Step 4: Calculate the application scenario matching rate; the formula for calculating the application scenario matching rate is:
[0121] ;
[0122] It should be noted that, The application scenario matching rate is the highest level of data asset matching. The higher the application scenario matching rate, the more the data asset aligns with industry development, and consequently, the higher the value of the data asset.
[0123] The second data acquisition module is used to acquire market datasets; the market datasets include market data of Q companies; the companies are those corresponding to the data asset theme and the data asset industry; the market data includes total operating revenue for the current quarter, total operating revenue for the previous quarter, total operating costs for the current quarter, total operating costs for the previous quarter, net profit for the current quarter, net profit for the previous quarter, gross profit margin for the current quarter, and gross profit margin for the previous quarter.
[0124] The market data mentioned above is obtained from the company's operating reports published on the company's official website or the stock exchange's official website; it should be noted that the company's operating reports are usually published once per quarter.
[0125] The fourth processing module is used to analyze the market dataset to obtain financial analysis data, which includes the average difference in total revenue, the average difference in total cost, the average difference in net profit, and the average difference in gross profit margin.
[0126] The method for calculating the average of the total income difference includes:
[0127] ;
[0128] in, The average of the total income difference. Let q be the total operating revenue of the company in the current quarter. Let q be the total operating revenue of the qth company in the previous quarter; if If the result is positive, the company's overall operating revenue will increase.
[0129] The method for calculating the average total cost difference includes:
[0130] ;
[0131] in, This is the average of the total cost difference. Let q be the total operating cost of the company in the current quarter. Let q be the total operating cost of the qth company in the previous quarter; if If the result is positive, the company's overall operating costs will increase.
[0132] The method for obtaining the average difference in net profit and the average difference in gross profit margin is the same as the method for calculating the average difference in total revenue.
[0133] It should be noted that the average total revenue difference, average total cost difference, average net profit difference, and average gross profit margin difference can reflect market demand for such data assets. If the average total cost difference is negative, and the average total revenue difference, average net profit difference, and average gross profit margin difference are positive, then the company has reduced operating costs, improved profitability, and there is strong market demand for such data assets, increasing the value of the data assets. Conversely, if the average total cost difference and average total revenue difference are positive, and the average net profit difference and average gross profit margin difference are negative, then the company has increased operating costs, decreased profitability, lowered sales prices to increase total operating revenue, and there is weak market demand for data assets, decreasing the value of the data assets.
[0134] The supply and demand module is used to input financial analysis data into a pre-trained supply and demand model to obtain supply and demand values; the supply and demand values include -1, 0 and 1; where -1 indicates that supply is less than demand, 0 indicates that supply and demand are balanced, and 1 indicates that supply is more than demand.
[0135] The training methods for the supply and demand model include:
[0136] A financial dataset is pre-collected, including financial analysis data and corresponding supply and demand values. The financial dataset is divided into a training set and a test set. A classifier is constructed, using the financial analysis data in the training set as input to the supply and demand model and the supply and demand values in the training set as output. The classifier is trained to obtain an initial classifier. The initial classifier is then tested using the test set, and the output classifier that meets the preset accuracy is used as the supply and demand model. The supply and demand model is either a Naive Bayes model or a Support Vector Machine model.
[0137] The third data acquisition module is used to collect data asset transaction prices from historical periods to the present. The historical period is the previous 12 months.
[0138] The transaction price of the data assets is obtained from publicly available data on the data asset trading platform.
[0139] The fifth processing module is used to calculate the average transaction price of data assets.
[0140] The method for calculating the average transaction price of data assets includes:
[0141] Step 1: Create an asset transaction price set by collecting historical data on asset transaction prices up to the present.
[0142] Step 2: Calculate the average transaction price within the asset transaction price set.
[0143] The average transaction price is calculated as follows:
[0144] ;
[0145] in, The average transaction price. Let H be the h-th asset transaction price in the asset transaction price set, where H is the total number of asset transaction prices. The average transaction price reflects the value of the data asset; the higher the average transaction price, the greater the value of the data asset.
[0146] The asset valuation module inputs asset valuation data into a pre-trained asset valuation model to obtain the asset valuation value. The asset valuation data includes industry orientation type, application scenario matching rate, supply and demand relationship value, and average transaction price.
[0147] The training methods for the asset valuation model include:
[0148] An asset dataset is pre-collected, including asset appraisal data and corresponding asset appraisal values, which are pre-assessed by a data asset appraiser. The asset dataset is divided into training and test sets, and a classifier is constructed. The asset appraisal data in the training set is used as input to the asset appraisal model, and the asset appraisal values in the training set are used as output. During the training of the asset appraisal model, minimizing the cross-entropy loss function is used as the optimization objective. An early stopping strategy is used to monitor the performance on the validation set. By continuously adjusting the network parameters, training stops when the prediction accuracy on the test set reaches a prediction accuracy threshold. The asset appraisal model is a gradient boosting tree model. For example, the prediction accuracy threshold can be set to 95% in this application.
[0149] Example 2
[0150] Please see Figure 2 As shown, this embodiment provides a data asset valuation method, which also includes:
[0151] Obtain actual call and usage data of the target data asset in various business systems within the enterprise; the actual call and usage data includes information such as access time, access source system, business scenario identifier, number of calls, amount of returned data, and interface execution time.
[0152] The actual calls and usage data are input into a pre-built incremental revenue adjustment coefficient setting model to determine the corresponding incremental revenue adjustment coefficient based on the actual usage intensity and effect of data assets in various business scenarios.
[0153] The asset valuation value output by the asset valuation model is adjusted according to the incremental revenue adjustment coefficient to obtain the corrected asset valuation value, which is then used as the final asset valuation value of the target data asset.
[0154] It should be noted that the incremental income adjustment coefficient ranges from -1 to 1. For example, the asset valuation value output by the asset valuation model is recorded as the original asset valuation value, and the incremental income adjustment coefficient output by the model is set to 0.1. If the original asset valuation value is 1 million yuan, then the corrected asset valuation value is 1.1 million yuan.
[0155] The training method for the incremental return adjustment coefficient setting model includes:
[0156] A pre-constructed dataset for setting incremental revenue adjustment coefficients is prepared. This dataset includes incremental revenue adjustment coefficient setting data for group TJ and the corresponding incremental revenue adjustment coefficients for group TJ, where TJ is a positive integer. The incremental revenue adjustment coefficient setting data includes actual call and usage data. The dataset is divided into a training set and a validation set. The training set is used for learning the parameters of the incremental revenue adjustment coefficient setting model, and the validation set is used to monitor the generalization performance and overfitting of the incremental revenue adjustment coefficient setting model in real time.
[0157] A node-output graph neural network architecture is used as the incremental revenue adjustment coefficient setting model. The incremental revenue adjustment coefficient setting data is standardized and vectorized before being input into the node-output graph neural network architecture, which consists of an input layer, hidden layers, and an output layer. Each hidden layer extracts features using a non-linear activation function, and the output layer uses a Softmax activation function to obtain the probability distribution corresponding to each incremental revenue adjustment coefficient. Finally, the incremental revenue adjustment coefficient corresponding to the highest probability is taken as the prediction result of the incremental revenue adjustment coefficient setting model. During training, the cross-entropy loss function is used as the optimization objective, and a gradient descent-type optimization algorithm is used to update the network weights. An early stopping strategy is implemented: when the prediction accuracy on the validation set reaches or exceeds the prediction accuracy threshold, the incremental revenue adjustment coefficient setting model is considered to have converged, and training is terminated. For example, the prediction accuracy threshold can be set to 95% in this application.
[0158] It should be noted that the reason for collecting actual call and usage data of the target data asset in various business systems within the enterprise in this invention is that relying solely on external or static indicators such as industry orientation type, application scenario matching rate, supply and demand relationship value, and transaction price cannot accurately reflect the actual contribution of the data asset in the enterprise's real business operations. By collecting and statistically analyzing usage behavior data such as access time, access source system, business scenario identifier, number of calls, returned data volume, and interface execution time, the actual usage of the data asset can be characterized from multiple dimensions such as usage frequency, usage breadth, usage depth, and performance.
[0159] For example, data assets that are called frequently, cover many business systems, are frequently called in key business scenarios, and return a stable amount of data usually have higher practical value in terms of business decision support, operational efficiency improvement, or cost reduction; conversely, data assets that are called less frequently or are only used sporadically in edge scenarios have relatively limited actual monetization value.
[0160] The aforementioned actual call and usage data are correlated with the incremental revenue adjustment coefficient through a pre-built incremental revenue adjustment coefficient setting model. Specifically, the incremental revenue adjustment coefficient setting model takes indicators such as the number of calls, the number of covered scenarios, the call intensity per unit time, the scale of returned data, and the interface execution time as input features. Through a rule engine and machine learning algorithms, it maps usage intensity and usage effect with the potential incremental revenue that the asset can bring.
[0161] For example, when the target data asset is used frequently and stably in multiple high-value scenarios, the incremental revenue adjustment coefficient output by the incremental revenue adjustment coefficient setting model is increased accordingly to amplify the original asset valuation value. Conversely, when the data asset is used less or mainly in low-value scenarios, the incremental revenue adjustment coefficient output by the incremental revenue adjustment coefficient setting model is decreased accordingly to suppress the original asset valuation value. Through the above-mentioned correlation settings, this invention can introduce a dynamic adjustment mechanism in the asset valuation process: "the more it is used and the greater its contribution, the higher the asset valuation value will be; the less it is used and the limited its contribution, the lower the asset valuation value will be." This makes the final asset valuation value more closely match the incremental revenue level of the data asset in actual business operations, effectively supporting the rationality and superiority of the technical solution of this invention.
[0162] Example 3
[0163] Please see Figure 3 As shown, this embodiment provides a data asset valuation method, including:
[0164] Collect industry guidance documents with the same theme as data assets from the industry guidance documents published on information release websites;
[0165] The industry guidance documents were analyzed to obtain a collection of industry documents;
[0166] The industry-specific document set is analyzed to determine the industry orientation type.
[0167] Extract a set of standardized scenarios from the industry document set, match the set of standardized scenarios with the data asset scenario set, and obtain the application scenario matching rate;
[0168] Collect a market dataset; the market dataset includes market data for Q companies; the companies are those corresponding to the data asset theme and the data asset industry.
[0169] The market dataset is analyzed to obtain financial analysis data, which includes the average total revenue difference, the average total cost difference, the average net profit difference, and the average gross profit margin difference.
[0170] Financial analysis data is input into a pre-trained supply and demand model to obtain supply and demand values;
[0171] The transaction prices of data assets collected from historical data to the present.
[0172] The average transaction price is calculated by averaging the transaction prices of data assets.
[0173] The asset valuation data is input into a pre-trained asset valuation model to obtain the asset valuation value. The asset valuation data includes industry orientation type, application scenario matching rate, supply and demand value, and average transaction price.
[0174] The above description is merely a specific embodiment of the present invention, but the scope of protection of the present invention is not limited thereto. Any variations or substitutions that can be easily conceived by those skilled in the art within the technical scope disclosed in the present invention should be included within the scope of protection of the present invention. Therefore, the scope of protection of the present invention should be determined by the scope of the claims.
[0175] In conclusion, the above description is only a preferred embodiment of the present invention and is not intended to limit the present invention. Any modifications, equivalent substitutions, improvements, etc., made within the spirit and principles of the present invention should be included within the protection scope of the present invention.
Claims
1. A method for evaluating data assets, characterized in that, include: Define the data asset themes, data asset industries, and data asset scenario sets; Collect industry guidance documents with the same theme as data assets from the industry guidance documents published on information release websites; The industry guidance documents were analyzed to obtain a collection of industry documents; The industry-specific document set is analyzed to determine the industry orientation type. Extract a set of standardized scenarios from the industry document set, match the set of standardized scenarios with the data asset scenario set, and obtain the application scenario matching rate; Collect a market dataset; the market dataset includes market data for Q companies; the companies are those corresponding to the data asset theme and the data asset industry. The market dataset is analyzed to obtain financial analysis data, which includes the average total revenue difference, the average total cost difference, the average net profit difference, and the average gross profit margin difference. Financial analysis data is input into a pre-trained supply and demand model to obtain supply and demand values; The transaction prices of data assets collected from historical data to the present. The average transaction price is calculated by averaging the transaction prices of data assets. The asset valuation data is input into a pre-trained asset valuation model to obtain the asset valuation value. The asset valuation data includes industry orientation type, application scenario matching rate, supply and demand value, and average transaction price.
2. The data asset valuation method according to claim 1, characterized in that, The methods for obtaining the industry document set include: Step 1: Convert the industry guidance document format into a parsable text format; Step 2: Predefine a stop word list; Step 3: Use natural language processing technology to divide the industry guidance document according to subheadings to obtain information subheadings and the information content corresponding to the information subheadings; let the initial value of i be 1, i∈I, where I is the total number of information contents and i is the information content index; Step 4: Remove the corresponding words from the industry guidance document based on the stop word list; Step 5: Use natural language processing techniques to remove punctuation and special characters from the i-th piece of information, and obtain a set of information keywords based on word segmentation of the information content; Step Six: Add the information subheadings and keyword sets to the industry document collection; Step 7: Let i = i + 1. If i is less than or equal to I, then execute steps 4 to 6; if i is greater than I, then end.
3. The data asset valuation method according to claim 1, characterized in that, The method for obtaining the industry orientation type is as follows: Step 1: Obtain the current year when assessing the asset value; Step 2: Obtain the publication year of the industry guidance documents from the industry document collection; Step 3: Subtract the release year from the current year to obtain the year difference; Step 4: Select the largest year difference as the year difference used for industry orientation type determination; Step 5: Determine the industry orientation type based on the year difference.
4. The data asset valuation method according to claim 3, characterized in that, The method for determining the industry orientation type based on the year difference is as follows: A preset year determination threshold is included. and , Less than ; If the year difference is less than or equal to If the industry orientation type is short-term oriented, then the industry orientation type is short-term oriented; if the year difference is greater than and less than or equal to If the industry orientation type is medium-term oriented, then the industry orientation type is medium-term oriented; if the year difference is greater than If so, the industry orientation type is long-term orientation.
5. The data asset valuation method according to claim 1, characterized in that, The method for extracting a set of standardized scenarios from an industry document set includes: Step 1: Match the information subheadings in the data asset industry with the information subheadings in the industry file collection in turn to obtain information subheadings that are the same as those in the data asset industry; Step 2: Obtain the set of information keywords corresponding to the information subheadings; Step 3: Use natural language processing technology to extract industry application scenarios from the set of information keywords, count the number of industry application scenarios as M, and M industry application scenarios constitute a standardized scenario set.
6. The data asset valuation method according to claim 1, characterized in that, The method for matching the standardized scenario set with the data asset scenario set to obtain the application scenario matching rate includes: Step 1: The initial value of j is preset to 1, j∈M; the number of matching industry application scenarios and data asset scenarios is preset to num; the initial value of num is 0; Step 2: Obtain the j-th industry application scenario in the set of standardized scenarios, and match the industry application scenario with the data asset scenarios in the set of data asset scenarios in turn; if the industry application scenario and the data asset scenario are the same, the match is successful, and let num = num + 1; Step 3: Let j = j + 1; if j is less than or equal to M, then continue to Step 2; if j is greater than M, then proceed to Step 4. Step 4: Calculate the application scenario matching rate by dividing num by M.
7. The data asset valuation method according to claim 1, characterized in that, The training methods for the supply and demand model include: A financial dataset is pre-collected, including financial analysis data and corresponding supply and demand values. The financial dataset is divided into a training set and a test set. A classifier is constructed, using the financial analysis data in the training set as input to the supply and demand model and the supply and demand values in the training set as output. The classifier is trained to obtain an initial classifier. The initial classifier is then tested using the test set, and the output classifier that meets the preset accuracy is used as the supply and demand model. The supply and demand model is either a Naive Bayes model or a Support Vector Machine model.
8. The data asset valuation method according to claim 1, characterized in that, The method for calculating the average transaction price of data assets includes: Step 1: Create an asset transaction price set by compiling historical asset transaction prices from the past to the present. Step 2: Calculate the average transaction price within the asset transaction price set.
9. A data asset valuation method according to claim 1, characterized in that, The training methods for the asset valuation model include: An asset dataset is pre-collected, including asset appraisal data and the corresponding asset appraisal values. The asset dataset is divided into a training set and a test set. A classifier is constructed, using the asset appraisal data from the training set as input to the asset appraisal model and the asset appraisal values from the training set as output. During the training of the asset appraisal model, minimizing the cross-entropy loss function is used as the optimization objective. An early stopping strategy is employed to monitor the performance on the validation set. By continuously adjusting the network parameters, training stops when the prediction accuracy on the test set reaches a prediction accuracy threshold. The asset appraisal model is a gradient boosting tree model.
10. A data asset valuation method according to claim 1, characterized in that, The method for collecting the industry guidance documents is as follows: Use data scraping tools to extract industry guidance documents with the same theme as the data assets from information publishing websites; the data scraping tools include API interfaces and RSS subscriptions.