Quality assurance data transaction method and system based on stackelberg game and crowd-sourced perception

By using KANN-DBSCAN clustering and the Stackelberg game model, the problem of balancing data quality and participant interests in collective intelligence sensing data transactions is solved, achieving effective protection of data quality and optimal matching of interests, thereby improving the fairness and efficiency of transactions.

CN122241279APending Publication Date: 2026-06-19GUIZHOU UNIV

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
GUIZHOU UNIV
Filing Date
2026-03-16
Publication Date
2026-06-19

AI Technical Summary

Technical Problem

In existing data trading for collective intelligence sensing, data quality is difficult to guarantee, and the interests of participants are difficult to balance. There is a lack of effective data quality assessment methods and incentive mechanisms, which makes it difficult to improve the fairness and efficiency of the transaction.

Method used

The KANN-DBSCAN clustering algorithm is used to divide the data into regions. Combined with the Stackelberg game model, a data quality assessment and benefit balance mechanism is constructed. Data quality is assessed by data scarcity and similarity. The task allocation and pricing strategy are optimized by greedy strategy algorithm and Newton's iteration method to ensure that both data demanders and providers can obtain the best profits.

Benefits of technology

It effectively ensures data quality and achieves optimal matching of interests between supply and demand, significantly improving the fairness and efficiency of collective sensing data transactions.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN122241279A_ABST
    Figure CN122241279A_ABST
Patent Text Reader

Abstract

This application relates to the field of IoT data and discloses a data transaction method and system for quality assurance based on Stackelberg game theory and crowdsourced sensing. The method includes: using clustering to divide data regions and eliminate fake data; quantifying data scarcity based on probability matrices and data similarity based on Euclidean distance, and weighted fusion to form a data quality assessment formula. A Stackelberg game model is constructed, defining the payoff strategies of both parties; a reputation mechanism is introduced to reconstruct the task allocation problem, proving it to be a 0-1 knapsack problem, and a greedy algorithm is used to solve for the optimal allocation strategy of the demand side; Newton's iteration method is used to solve for the optimal bidding strategy of the provider. Nash equilibrium analysis is performed from both single-provider and multi-provider perspectives to ensure that both parties obtain optimal profits and complete the data transaction. This application, through data quality assessment and game theory-based benefit equilibrium design, achieves effective data quality assurance and optimal matching of interests between both parties in crowdsourced sensing data transactions, improving the fairness and efficiency of crowdsourced sensing data transactions.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This application belongs to the field of Internet of Things (IoT) data, and specifically relates to a quality assurance data transaction method and system based on Stackelberg game theory and crowd sensing. Background Technology

[0002] In the information age, with the widespread adoption of smart devices, users generate massive amounts of data daily. This vast amount of data contains rich information and possesses enormous potential commercial and strategic value. More and more companies are beginning to improve their business by purchasing third-party data. Data trading is a potential solution to meet the exponentially growing data demand in the field of artificial intelligence, leading to the emergence of big data trading platforms. However, conventional data trading systems struggle to acquire real-time, highly customized data.

[0003] Crowdsourced sensing data trading, as a novel form of data trading, acquires data by recruiting a large number of data providers distributed across different regions. These providers collect data using sensors on mobile terminals such as smartphones and wearable devices. Due to the wide distribution and high mobility of data providers, coupled with the rich array of sensors on mobile devices, the crowdsourced sensing model can provide real-time, broad-coverage, and multi-source data support. This data trading paradigm provides efficient data support for decision-making in fields such as urban planning, environmental monitoring, health management, and traffic forecasting.

[0004] In existing technologies, data trading for crowdsourced sensing faces two core problems: First, data quality is difficult to guarantee. Low-quality data (including poor-quality data due to the provider's lack of professional capabilities or maliciously submitted forged data) can harm the interests of the demand side. Moreover, existing data quality assessment methods mostly focus on data similarity, ignoring the unique value of rare data and making them susceptible to interference from plagiarized data. Second, it is difficult to balance the interests of participants. Data providers need to pay higher costs to collect high-quality data and require sufficient incentives. However, excessive incentives may harm the interests of the demand side, while insufficient incentives will not attract providers to participate. Most existing incentive mechanisms do not fully consider the impact of data quality on the decision-making of both parties and lack an effective framework to achieve a balance of interests.

[0005] Therefore, existing technologies face an urgent and unresolved technical problem: while the industry recognizes that data quality assurance and balancing the interests of participants and providers are crucial for collectively intelligent data transactions, there is a lack of technical solutions that simultaneously address both. Specifically, there is a lack of comprehensive data quality assessment methods that integrate data similarity and rarity and can resist interference from forged and plagiarized data, as well as an incentive mechanism that can accurately simulate the interaction between the two parties in the transaction and achieve a balance of interests. This makes it difficult to improve the fairness and efficiency of collectively intelligent data transactions.

[0006] It should be noted that the information disclosed in the background section above is only used to enhance the understanding of the background of this application, and therefore may include information that does not constitute prior art known to those skilled in the art. Summary of the Invention

[0007] To address or at least alleviate one or more of the above problems, a quality assurance data trading method and system based on Stackelberg game theory and crowd-aware perception is provided. By integrating multi-dimensional data quality assessment with game theory-based interest equilibrium design, it achieves effective data quality assurance and optimal matching of the interests of both supply and demand sides in crowd-aware perception data trading, significantly improving the fairness and efficiency of crowd-aware perception data trading.

[0008] To achieve the above objectives, in accordance with the first aspect of this application, a quality assurance data transaction method based on Stackelberg game theory and crowd intelligence sensing is provided, comprising the following steps: Data quality assessment: The KANN-DBSCAN clustering algorithm was used to divide the data into regions based on the data location information collected by all data providers in order to eliminate suspected forged data; The probability matrix is ​​calculated based on the data entry matrix to quantify data scarcity, and the Euclidean distance matrix is ​​calculated to quantify data similarity. A data quality formula is formed based on data scarcity, data similarity, and weights to conduct data quality assessment. Constructing a Stackelberg game model to balance the interests of participants: Define the revenue strategies for data demanders and data providers, and construct the transactions between data demanders and data providers as a Stackelberg game model; A task allocation problem model for data demanders is constructed based on the Stackelberg game model, and a reputation mechanism is introduced to reconstruct the task allocation problem model. The optimal task allocation strategy for data demanders is derived, and it is found that the task allocation problem model is a 0-1 knapsack problem. The optimal task allocation strategy for data demanders is calculated based on a greedy strategy algorithm. Based on the optimal task allocation strategy of data demanders, the profit function of data demanders is obtained. The optimal bidding strategy when the first derivative of the profit function is zero is obtained using the Newton iteration method. The strategy for data demanders and data providers is analyzed using Nash equilibrium from both the perspective of one data provider and multiple data providers to ensure that both data demanders and data providers can obtain the best profit and complete the data transaction.

[0009] To achieve the above objectives, according to the first aspect of this application, a quality assurance data trading system based on Stackelberg game theory and crowd intelligence sensing is provided, the quality assurance data trading system comprising: The data quality assessment module is used for data quality assessment. The KANN-DBSCAN clustering algorithm was used to divide the data into regions based on the data location information collected by all data providers in order to eliminate suspected forged data; The probability matrix is ​​calculated based on the data entry matrix to quantify data scarcity, and the Euclidean distance matrix is ​​calculated to quantify data similarity. A data quality formula is formed based on data scarcity, data similarity, and weights to conduct data quality assessment. The game model building module is used to construct the Stackelberg game model to balance the interests of the participants. Define the revenue strategies for data demanders and data providers, and construct the transactions between data demanders and data providers as a Stackelberg game model; A task allocation problem model for data demanders is constructed based on the Stackelberg game model, and a reputation mechanism is introduced to reconstruct the task allocation problem model. The optimal task allocation strategy for data demanders is derived, and it is found that the task allocation problem model is a 0-1 knapsack problem. The optimal task allocation strategy for data demanders is calculated based on a greedy strategy algorithm. Based on the optimal task allocation strategy of data demanders, the profit function of data demanders is obtained. The optimal bidding strategy when the first derivative of the profit function is zero is obtained using the Newton iteration method. The Nash equilibrium analysis module is used to perform Nash equilibrium analysis on the strategies of data demanders and data providers from both the perspective of one data provider and multiple data providers, ensuring that both data demanders and data providers can obtain the best profit and complete the data transaction.

[0010] By adopting the above technical solution, this application has the following beneficial effects compared with the prior art: First, this invention proposes a data quality assessment scheme that comprehensively considers both data similarity and data scarcity, taking into account the spatial characteristics of the data. Specifically, this invention uses a density clustering algorithm to cluster the spatial information of the data and measures data quality from the perspectives of data similarity and data dissimilarity based on the information entropy formula and the Euclidean distance formula. Second, this invention establishes a Stackelberg game model to simulate the interaction between the two parties. In the first stage, the data provider acts as the leader, determining the price for executing the data collection task; in the second stage, the data demander acts as the follower, selecting suitable data providers to execute tasks after observing the price, aiming to maximize its overall profit with a fixed incentive budget. Third, regarding the task allocation problem for the data demander in the game model, this invention proves that the problem is equivalent to the 0-1 knapsack problem and designs an algorithm to solve it. For the pricing problem of the data provider, this invention finds that solving for its optimal pricing is a nonlinear optimization problem and uses the Newton-Raphson iteration method for numerical solution.

[0011] The specific embodiments of this application will be described in further detail below with reference to the accompanying drawings. Attached Figure Description

[0012] The accompanying drawings, which form part of this application, are used to provide a further understanding of the application. The illustrative embodiments and descriptions of the application are used to explain the application, but do not constitute an undue limitation of the application. Obviously, the drawings described below are merely some embodiments, and those skilled in the art can obtain other drawings based on these drawings without creative effort.

[0013] In the attached diagram: Figure 1 This is a flowchart illustrating the quality assurance data transaction method based on Stackelberg game theory and crowd intelligence perception in this specific implementation. Figure 2 This is a flowchart of the data quality assessment process in this specific implementation method; Figure 3 This is a flowchart illustrating the optimal strategy for the two-player game in this specific implementation. Figure 4 This is a schematic diagram showing the results of cluster analysis of the dataset using the KANN-DBSCAN algorithm in this specific implementation. Figure 5 This is a schematic diagram illustrating the impact of data price and data quality on the data provider's profits in this specific implementation method; Figure 6 This is a schematic diagram illustrating the impact of data quality on the profit of the data demander in this specific implementation method; Figure 7 This is a schematic diagram of the quality assurance data transaction system based on Stackelberg game theory and crowd intelligence perception in this specific embodiment. Detailed Implementation

[0014] To make the objectives, technical solutions, and advantages of the embodiments of this application clearer, the technical solutions in the embodiments will be clearly and completely described below with reference to the accompanying drawings. The following embodiments are used to illustrate this application, but are not intended to limit the scope of this application.

[0015] Example 1: Please see Figure 1 This application provides a quality assurance data transaction method based on Stackelberg game theory and crowd intelligence sensing, comprising the following steps: S1. Data Quality Assessment: The KANN-DBSCAN clustering algorithm was used to divide the data into regions based on the data location information collected by all data providers in order to eliminate suspected forged data; The probability matrix is ​​calculated based on the data entry matrix to quantify data scarcity, and the Euclidean distance matrix is ​​calculated to quantify data similarity. A data quality formula is formed based on data scarcity, data similarity, and weights to conduct data quality assessment. S2. Construct a Stackelberg game model to balance the interests of the participants: Define the revenue strategies for data demanders and data providers, and construct the transactions between data demanders and data providers as a Stackelberg game model; A task allocation problem model for data demanders is constructed based on the Stackelberg game model, and a reputation mechanism is introduced to reconstruct the task allocation problem model. The optimal task allocation strategy for data demanders is derived, and it is found that the task allocation problem model is a 0-1 knapsack problem. The optimal task allocation strategy for data demanders is calculated based on a greedy strategy algorithm. Based on the optimal task allocation strategy of data demanders, the profit function of data demanders is obtained. The optimal bidding strategy when the first derivative of the profit function is zero is obtained using the Newton iteration method. S3. The strategy for data demanders and data providers is analyzed using Nash equilibrium from both the perspective of one data provider and multiple data providers to ensure that both data demanders and data providers can obtain the best profit and complete the data transaction.

[0016] Please see Figure 2 This embodiment mainly uses the KANN-DBSCAN clustering algorithm to divide the data into regions and proposes to combine data sparseness and similarity to measure data quality.

[0017] There are two common methods for dividing the region: (1) dividing the sensing area into grids of equal size; (2) determining the region based on the activity range of the data provider. Obviously, the grid division method is relatively simpler and more direct. However, in the actual data collection process, the spatial distribution of the region often exhibits discrete and uneven characteristics. Densely populated areas such as urban areas, residential areas, and commercial areas often have more providers, while industrial parks and rural areas have relatively fewer providers. Therefore, the region division should be irregular and should be dynamically divided according to the current data density of the region. Based on the above assumptions, this invention attempts to use the KANN-DBSCAN clustering algorithm to divide the data into regions. KANN-DBSCAN clusters data by defining the density of points within the neighborhood, which can handle datasets with different densities and shapes. Utilizing the distribution characteristics of the dataset itself, through The average nearest neighbor method generates the neighborhood radius. Candidate values, and the value corresponding to the minimum number of noise points within the stable interval of cluster number variation. and minimum sample size As the optimal parameters, high-accuracy clustering results can be obtained. Therefore, this invention explores the region partitioning problem based on the KANN-DBSCAN algorithm. The specific region partitioning steps are as follows:

[0018] The location information corresponding to the datasets collected by all data providers is input into the KANN-DBSCAN clustering algorithm.

[0019] The KANN-DBSCAN algorithm can calculate the K-nearest neighbor distance between each data point in the dataset and its K nearest neighbor data points based on the spatial distribution of the input data. Then, it calculates the average of the K-nearest neighbor distances of all data points to obtain the K-average nearest neighbor distance of the dataset.

[0020] Different equilibrium nearest neighbor distances corresponding to different K values ​​were sequentially selected and input into the DBSCAN algorithm to perform cluster analysis on the dataset, thereby obtaining the number of clusters generated under different K values. When the number of clusters generated in three consecutive iterations remained consistent, this study considered the clustering result to be stable, and the corresponding clustering result was denoted as N. Subsequently, different K values ​​were continuously selected until the number of clusters generated was no longer N, and the maximum K value corresponding to the number of clusters N was selected as the optimal K value. The average nearest neighbor distance corresponding to this K value is the optimal K value. parameter.

[0021] Based on the data quality assessment benchmark: "If the area where a data point is located lacks other collected records, then the data is determined to be fake data," another parameter in the KANN-DBSCAN clustering algorithm is used. Set to 2. Setting it to 2 means that a valid cluster should cover at least 2 data points, and isolated points will be treated as noise points, with the aim of eliminating isolated and suspected fake data points.

[0022] After clustering is completed, for the data provider In the Round of collected data entries The data quality formula is defined as follows: ; Among them, data providers In the The first round of collection The final quality score of each data point This refers to data quality measured by data scarcity, evaluated based on information content, and its value range is [value range missing]. , This indicates data quality measured by data similarity, with values ​​ranging from [value range missing]. , The weight is denoted as , and its value is . .

[0023] The steps of the quality assessment plan are as follows: Assuming multiple data providers in the first... Data collected in the round of data collection tasks Each data item includes Individual attributes (such as temperature, humidity, air pollution index, etc.). These data items can be represented as follows: ; in, , , It is the first of the first data entries. , The value of each attribute , , They represent the first The first of the data items , The value of each attribute, , , They represent the first The first data item , The value of each attribute.

[0024] As mentioned above, data quality is measured from two aspects. One aspect is measuring data quality from the perspective of data scarcity: calculating the probability matrix for each data point. for: ; in, For the first The value of the element in the column is frequency, This is the total number of entries collected in this round; Calculate the relationship between data quality and data scarcity using probability matrices: ; Where m is the unit defining information content, used to quantify the contribution of data scarcity. , They represent the first The first data item The value of the first attribute, the first The number of all attributes of each data item; From the perspective of measuring data quality through data scarcity: Given that multiple data providers in the same region may collect completely identical data in this scenario, this invention uses Euclidean distance to measure the similarity between data. The Euclidean distance matrix between each data point is as follows: ; in, , , , , These represent the Euclidean distances between the first and second data points, the first and h-th data points, the second and first data points, the h-th and first data points, and the h-th and second data points, respectively. The relationship between data quality and data similarity is defined as follows: ; ; in, , , , They represent the first The value of the first attribute of the data, the... The value of the first attribute of the data, the... The first data item The value of the first attribute, the first The first data item The value of each attribute, , , They represent the first The data and the first Data entries.

[0025] This invention uses a dataset of 320 taxi trajectories from a selected data area (LM) over 30 days. The data spans from February 1, 2014 to March 2, 2014, and the trajectories are updated with latitude and longitude approximately every 7 seconds. The data format is: taxi ID, date, time, latitude and longitude (latitude, longitude).

[0026] Please see Figure 4 , Figure 4 To visualize the results of cluster analysis on the dataset using the KANN-DBSCAN algorithm, 1000 data points were randomly sampled from the dataset, and the average of these 1000 data points was calculated. Nearest neighbor distance. Then, each... The average nearest neighbor distance corresponding to the value is used as the clustering criterion. The parameters were used to cluster these datasets using KANN-DBSCAN, and the clustered data were plotted. The value is a graph representing the number of clusters generated. When the number of clusters generated in three consecutive iterations remains consistent, the clustering result is considered to have stabilized, and the corresponding clustering result is denoted as [value missing]. Subsequently, different [selections / methods] were continuously used. The value continues until the number of clusters generated is no longer zero. And select the number of clusters as The maximum corresponding to time The value is taken as the optimal value. Value, this The average nearest neighbor distance corresponding to the value is the optimal value. Parameters. Use optimal The nearest neighbor distance corresponding to the value is used as a parameter to set. The value is 2.

[0027] Example 2: Please see Figure 3 This embodiment primarily illustrates a method for balancing the interests of relevant stakeholders based on the Stackelberg game model. Data requesters are the initiators of sensing tasks and typically need to incentivize data providers to drive their participation. A reasonable incentive allocation scheme often brings higher profits to data requesters, while an unreasonable allocation often leads to the loss of incentives. To obtain more profits, data requesters need to carefully decide on the incentive allocation scheme. Meanwhile, data providers are the executors of data sensing tasks and typically determine the price for task execution in order to receive compensation. If the price is too high, most data requesters will not choose to execute tasks, resulting in low income. Conversely, if the price is too low, even if they obtain many tasks, they cannot obtain high returns. The conflicting pursuits between these two roles hinder the smooth implementation of collective intelligence sensing.

[0028] To explore a solution that balances the interests of all parties, this invention constructs the interactions between them as a two-stage Stackelberg game, based on their respective positions in the transaction. In the first stage, all data providers act as leaders, initially determining the price for performing the perception task and observing the data demanders' reactions. Based on this observation, they adjust their strategies to the most advantageous position for themselves. In contrast, the data demanders act as followers in the Stackelberg game. After observing the data providers' prices, they determine which providers to incentivize to participate in the perception task, ensuring that, with a fixed budget, the collected data maximizes their total profit.

[0029] Assuming that all data requesters use a greedy strategy to select the data provider to perform the task. This means that each data requester, within a fixed budget, selects the most suitable set of data providers. Execute the task , Indicates the data provider Data demander Select to execute task This maximizes the total benefit. Therefore, the data requester's strategy can be defined as: ; in, Representing subgames, Represents an operator, This indicates that the data requester is performing a task. The Profits earned during rotational operations This represents the set of optimal data providers chosen by the data requester. Data providers determine the price of tasks based on their predictions of the strategies chosen by data demanders, aiming to maximize their profits. This strategy can be defined as follows: ; in, This indicates the optimal price for the data provider to execute the task. This indicates that the data provider is performing the task. The Profits earned during rotational operations; Assuming the participants are rational, their decisions should ensure a positive return on the transaction, i.e.: ; .

[0030] Through game theory, this invention aims to find an optimal strategy that maximizes the self-interest of all parties. More importantly, the optimal solution must satisfy a Nash equilibrium. The Nash equilibrium is defined as follows: the optimal strategy is achieved when the following inequalities are satisfied. To form a Nash equilibrium: ; ; in, This represents the set of data providers selected by the data requester. This indicates the price charged by the data provider for executing the task.

[0031] To obtain Nash equilibrium, this invention employs backward induction theory. Initially, the research focused on the second stage of the Stackelberg game, namely, the problem of data demanders determining their task allocation.

[0032] Assume a unit price is given for each data provider to perform the task. Data requester. , Given a set of data requesters, we need to determine which data providers should be incentivized to participate in the sensing task. Specifically, given the budget constraints of the data requesters, we need to determine an optimal set of data providers. Allocating incentives to this optimal set of data providers aims to maximize the profits of the data requesters themselves. This problem can be structured as follows: ; ; ; The first constraint states that the cost of executing the task cannot exceed the upper limit of each data requester's budget. The second constraint ensures the integrity of the task, meaning that it is the responsibility of the data requester. Released tasks Whether provided by the party implement, 0-1 decision variables, representing the data provider Is the task selected? , Represents the set of data providers. For data demanders Total budget, This represents the total number of collection tasks.

[0033] In the above scenario, the profit of the data requester is closely linked to the quality of the perceived data acquired. However, the quality of the perceived data is difficult to know in advance before a transaction is completed. Therefore, this invention introduces a reputation mechanism. This mechanism allows for the gradual estimation of the quality level of data collected by different data providers throughout multiple transaction cycles. Given that this invention focuses on the design of a data quality assessment and a mechanism for balancing the interests of both parties, the reputation mechanism adopts a scheme proposed in previous research. In this study, the reputation value of a data provider is related not only to the quality of its historically collected data but also to its potential ability to collect high-quality data. This design effectively solves a practical problem: when some data providers have historically collected excessively high-quality data, other potentially valuable data providers in the system cannot be selected.

[0034] Data Provider No. After the collection task, its reputation is defined as follows: ; in, Representing the data provider No. Reputation value after each task execution Data provider in front The average quality of the data collected each time. The reputation upper bound takes into account the uncertainty of the estimate, giving sellers who were previously less likely to be selected a greater chance of being selected in this round.

[0035] and The definition is as follows: ; ; ; in, , This represents the total number of times a user has been selected to perform a task. Indicates the deadline Sub-data provider Total number of times the task was selected to be executed. Data provider in front The average quality of the data collected in this study Indicates the data provider In the This is a task-oriented approach The provided data quality score Representing 0-1 variables, indicating the data provider. Is it in the first The second time selected to perform the task .

[0036] With the reputation scores of all parties as a reference, the task allocation problem for data requesters can be reconstructed as the following function: ; ; .

[0037] The task allocation problem is a problem with The 0-1 knapsack problem for a set of items. This can be viewed as backpack capacity, with the value and weight of each item being... and Therefore, the data demander task allocation problem is an NP-complete problem. Since NP-complete problems cannot be solved in polynomial time, this invention proposes a greedy strategy-based algorithm to solve this problem.

[0038] In the greedy strategy algorithm proposed in this invention, as shown in Table 1, the data requester will choose the benefit each time they make a selection. The task with the highest value is the one that yields the greatest benefit while incurring the least cost. The core idea of ​​a greedy algorithm for solving the data requester task allocation problem can be summarized as follows: When the algorithm is initially set, , All values ​​are set to 0. In each iteration, the algorithm selects the data whose execution cost is less than the available budget of the current data requester and whose current benefit is less than 0. The largest data provider acts as the task executor, and the budget is reduced by the cost incurred by that data provider in performing the task. The process terminates when the data requester's entire budget is exhausted. All selected data providers will then be responsible for performing the corresponding sensing tasks.

[0039] Table 1: Greedy Strategy Algorithm:

[0040] Pricing for data provider task execution: Following the above approach, for each data provider... Should the task be executed? There are four possible scenarios, which are: ; ; ; ; in, Indicates the data demander In the Total profit obtained in round; From the above formula, we can obtain that if... The following constraints must be met: ; in, The cost required for data providers to collect unit data. This represents the profit coefficient for the data demander. This represents the profit coefficient for the data demander. Since this invention focuses on how to facilitate smooth transactions, it is only applicable to... In this context, we will delve into the optimal pricing strategy for data providers. At that time, the profit formula for data providers It can be simplified to: ; right Taking the first and second derivatives, we get: ; ; in, These are the weighting coefficients. right The first derivative, right The second derivative, For data providers Task execution price; ; .

[0041] Because when hour, , It can be deduced that: ; ; ; And because: ; ; and ; We can obtain: ; , It can be known If it is a strictly convex function, then it must have This enables data providers Profit maximization.

[0042] Substitute the following formula into: ; ; ; We can obtain: ; make We can obtain: ; Taking the natural logarithm of both sides of the above equation, we get: ; again We can obtain: ; Since the above formula is a nonlinear equation, it is obvious that no analytical solution can be found. This invention uses Newton's method to find its numerical solution: make: ; achievable right First derivative: ; This invention uses the Newton-Raphson method iterative formula to solve the problem. The plan consists of four steps: Step 1: Select initial guess value ; Step 2: Calculate the new iteration value using the following formula: ; in, For the first The value of the next iteration. For the first The value of the next iteration. Step 3: Repeat step 2 until... Convergence (i.e.) Very small); Step 4: When During convergence, .

[0043] This invention verifies the existence of a unique Nash equilibrium in the proposed Stackelberg game using two scenarios: The first scenario assumes the entire system has only one data provider. The optimal strategy for the data requester is to choose this data provider to execute the task, provided certain conditions are met. ,formula This holds true in all cases. Based on this decision by the data demander, the data provider can formulate its own decisions according to the iterative formula of Newton's method, thus ensuring that the data provider's profit is always greater than zero. At this point, both the data demander and the data provider can obtain optimal profits, and the conditions stipulated by the Nash equilibrium formula above are satisfied, thereby achieving Nash equilibrium.

[0044] The second scenario assumes the system has multiple data providers. Regarding the data providers... Data requesters have two strategies: when the data requester's strategy conforms to the formula and formula At this point, the data requester will not choose the data provider. It performs data perception tasks ,Right now Alternatively, the data requester's strategy conforms to the formula. and formula At this point, the data provider may be selected, i.e. Data providers can use Newton's method iterative formulas to make decisions that consistently ensure the data requester's profit strategy is greater than zero. In this case, the data requester can employ a greedy algorithm to derive the optimal task allocation strategy. At this point, the data demander obtains the optimal profit. Both the data demander and the data provider can obtain the optimal profit, and the conditions stipulated by the Nash equilibrium formula above are satisfied, thus achieving Nash equilibrium.

[0045] This invention presents experiments on the proposed Stackelberg game. The simulation parameters in the simulation experiments are shown in Table 2.

[0046] Table 2: Simulation Parameter Table

[0047] Figure 5 illustrates the impact of data price and data quality on the profits of data providers. As data prices gradually increase, the profits of data demanders show a trend of first rising and then falling. This is because excessively high data prices typically reduce the probability of data demanders purchasing the data. Conversely, higher data quality leads to a stronger inclination of data demanders to purchase the data.

[0048] Figure 6 illustrates the impact of data quality on the profits of data demanders. As data quality improves, the profits of data demanders continuously increase. Furthermore, changes in the profit coefficient of data demanders also affect the profits of data providers. Data demanders with higher profit coefficients generally have stronger monetization capabilities, and improvements in data quality have a more significant positive impact on these data demanders.

[0049] Please see Figure 7 Based on the same inventive concept, this application also provides a quality assurance data trading system based on Stackelberg game theory and crowd intelligence perception, the quality assurance data trading system comprising: The data quality assessment module is used for data quality assessment. The KANN-DBSCAN clustering algorithm was used to divide the data into regions based on the data location information collected by all data providers in order to eliminate suspected forged data; The probability matrix is ​​calculated based on the data entry matrix to quantify data scarcity, and the Euclidean distance matrix is ​​calculated to quantify data similarity. A data quality formula is formed based on data scarcity, data similarity, and weights to conduct data quality assessment. The game model building module is used to construct the Stackelberg game model to balance the interests of the participants. Define the revenue strategies for data demanders and data providers, and construct the transactions between data demanders and data providers as a Stackelberg game model; A task allocation problem model for data demanders is constructed based on the Stackelberg game model, and a reputation mechanism is introduced to reconstruct the task allocation problem model. The optimal task allocation strategy for data demanders is derived, and it is found that the task allocation problem model is a 0-1 knapsack problem. The optimal task allocation strategy for data demanders is calculated based on a greedy strategy algorithm. Based on the optimal task allocation strategy of data demanders, the profit function of data demanders is obtained. The optimal bidding strategy when the first derivative of the profit function is zero is obtained using the Newton iteration method. The Nash equilibrium analysis module is used to perform Nash equilibrium analysis on the strategies of data demanders and data providers from both the perspective of one data provider and multiple data providers, ensuring that both data demanders and data providers can obtain the best profit and complete the data transaction.

[0050] The above are merely preferred embodiments of this application and are not intended to limit this application in any way. Although this application has disclosed preferred embodiments as described above, it is not intended to limit this application. Any person skilled in the art can make some modifications or alterations to the above-mentioned technical content to create equivalent embodiments without departing from the scope of the technical solution of this application. The implementation schemes in the above embodiments can also be further combined or replaced. Any simple modifications, equivalent changes and alterations made to the above embodiments based on the technical essence of this application without departing from the content of the technical solution of this application shall still fall within the scope of this application.

Claims

1. A quality assurance data transaction method based on Stackelberg game theory and crowd intelligence sensing, characterized in that, Includes the following steps: Data quality assessment: The KANN-DBSCAN clustering algorithm was used to divide the data into regions based on the data location information collected by all data providers in order to eliminate suspected forged data; The probability matrix is ​​calculated based on the data entry matrix to quantify data scarcity, and the Euclidean distance matrix is ​​calculated to quantify data similarity. A data quality formula is formed based on data scarcity, data similarity, and weights to conduct data quality assessment. Constructing a Stackelberg game model to balance the interests of participants: Define the revenue strategies for data demanders and data providers, and construct the transactions between data demanders and data providers as a Stackelberg game model; A task allocation problem model for data demanders is constructed based on the Stackelberg game model, and a reputation mechanism is introduced to reconstruct the task allocation problem model. The optimal task allocation strategy for data demanders is derived, and it is found that the task allocation problem model is a 0-1 knapsack problem. The optimal task allocation strategy for data demanders is calculated based on a greedy strategy algorithm. Based on the optimal task allocation strategy of data demanders, the profit function of data demanders is obtained. The optimal bidding strategy when the first derivative of the profit function is zero is obtained using the Newton iteration method. The strategy for data demanders and data providers is analyzed using Nash equilibrium from both the perspective of one data provider and multiple data providers to ensure that both data demanders and data providers can obtain the best profit and complete the data transaction.

2. The method according to claim 1, characterized in that, The method employs the KANN-DBSCAN clustering algorithm to divide the data into regions based on the data location information collected from all data providers in order to eliminate suspected forged data, including: The location information, i.e. latitude and longitude, corresponding to the datasets collected by all data providers is input into the KANN-DBSCAN clustering algorithm; Based on the spatial distribution of the data collected by the data provider, calculate the K-nearest neighbor distance between the data point and its K nearest neighbor data points, and average the K-nearest neighbor distances of all data points to obtain the K-average nearest neighbor distance of the dataset; By sequentially selecting different values ​​of K and inputting them into the DBSCAN algorithm, the optimal K value is obtained. The average nearest neighbor distance corresponding to this K value is the optimal value. parameter; Another parameter in the KANN-DBSCAN clustering algorithm Set to 2; complete the data area division.

3. The method according to claim 2, characterized in that, The process involves sequentially selecting different K values ​​from S02, inputting these K values ​​into the DBSCAN algorithm, and obtaining the optimal K value. The average nearest neighbor distance corresponding to this K value is the optimal value. Parameters, including: Select the balanced nearest neighbor distance corresponding to different K values, input different K values ​​into the DBSCAN algorithm, and obtain the number of clusters generated under different K value conditions respectively; When the number of clusters generated in three consecutive times remains consistent, the clustering result is considered to be stable, and the corresponding result is denoted as N. Subsequently, different values ​​of K are selected until the number of clusters generated is no longer N. The maximum K value corresponding to the number of clusters of N is then selected, and the average nearest neighbor distance corresponding to this K value is considered optimal. parameter.

4. The method according to claim 1, characterized in that, The process of calculating a probability matrix based on the data entry matrix to quantify data scarcity, calculating a Euclidean distance matrix to quantify data similarity, and forming a data quality formula based on data scarcity, data similarity, and weights for data quality assessment includes: The data quality formula is defined as follows: ; in, For data providers In the The first round of collection The final quality score of each data point This refers to data quality as measured by data scarcity. This indicates the quality of data measured by data similarity. As weight; Define multiple data providers in the first Data collected in the round of data collection tasks Each data item includes Each attribute represents the collected data entries in matrix form; Based on the data entries, calculate the probability matrix for each data entry, and use the probability matrix to calculate the relationship between data quality and data scarcity. The Euclidean distance matrix is ​​obtained by calculating the Euclidean distance between each data point. The relationship between data quality and data similarity is then calculated using the Euclidean distance matrix.

5. The method according to claim 4, characterized in that, The data entries are organized into a matrix as follows: ; in, , , It is the first of the first data entries. , The value of each attribute , , They represent the first The first of the data items , The value of each attribute , , They represent the first The first of the data items , The value of each attribute.

6. The method according to claim 5, characterized in that, The process involves calculating a probability matrix for each data entry, using the probability matrix to calculate the relationship between data quality and data rarity, and calculating the Euclidean distance between each data entry to obtain a Euclidean distance matrix. The Euclidean distance matrix is ​​then used to calculate the relationship between data quality and data similarity, including: Calculate the probability matrix for each data point. for: ; in, For the first The value of the element in the column is frequency, This is the total number of entries collected in this round; Calculate the relationship between data quality and data scarcity using probability matrices: ; Where m is the unit defining information content, used to quantify the contribution of data scarcity. , They represent the first The value of the b-th attribute of the data, the... The number of all attributes of each data item; The Euclidean distance matrix between each data point is as follows: ; in, , , , , These represent the Euclidean distances between the first and second data points, the first and h-th data points, the second and first data points, the h-th and first data points, and the h-th and second data points, respectively. The relationship between data quality and data similarity is defined as follows: ; ; in, , , , They represent the first The value of the first attribute of the data, the... The value of the first attribute of the data, the... The first data item The value of the first attribute, the first The first data item The value of each attribute , They represent the first , Data entries.

7. The method according to claim 4, characterized in that, The defined revenue strategies for data demanders and data providers construct the transactions between them into a Stackelberg game model, including: Define the revenue strategies for data demanders and data providers, and propose a Stackelberg game model based on these strategies: The revenue strategy for data demanders is defined as follows: ; in, Representing subgames, Represents an operator. This indicates that the data requester is performing a task. The Profits earned during rotation operations This represents the set of optimal data providers chosen by the data requester. The data provider's revenue strategy is defined as follows: ; in, This indicates the optimal price for the data provider to execute the task. This indicates that the data provider is performing the task. The Profits earned during rotational operations; According to the Stackelberg game model, the Nash equilibrium strategy is the optimal strategy when the following inequalities are satisfied. To form a Nash equilibrium: ; ; in, This represents the set of data providers selected by the data requester. This indicates the price charged by the data provider for executing the task.

8. The method according to claim 7, characterized in that, The method constructs a task allocation problem model for data requesters based on the Stackelberg game model, and reconstructs the model by introducing a reputation mechanism. It derives the optimal task allocation strategy for data requesters, finding that the task allocation problem is a 0-1 knapsack problem. The optimal task allocation strategy for data requesters is then calculated using a greedy strategy algorithm, including: The task allocation problem model is constructed in the following form: ; in, This indicates that the data requester is performing a task. The Profits earned during rotational operations; Introducing a reputation mechanism to estimate the quality level of data collected by different data providers. No. After the collection task, reputation is defined as follows: ; in, Representing the data provider No. Reputation value after each task execution Data provider in front The average quality of the data collected in this study It is the highest level of credibility; Reconstructing the data demand-side strategy, the greedy strategy algorithm initially... , Set all to 0. Represents the set of data demanders. Representing 0-1 variables, indicating the data provider. Is it in the first The second time selected to perform the task In each iteration, the greedy strategy algorithm selects the data with an execution cost less than the available budget for the current data requester and a current benefit. The largest data provider acts as the task executor, and the budget is reduced by the cost incurred by the data provider in performing the task; the process is terminated when the data requester's entire budget is exhausted.

9. The method according to claim 8, characterized in that, The process of obtaining the data demander's profit function based on their optimal task allocation strategy, and then using Newton's iteration method to determine the optimal bidding strategy when the first derivative of the profit function is zero, includes: The task allocation strategy of the data provider's policy: ; in, This indicates the human and material costs incurred in carrying out the task. The cost required for data providers to collect unit data. Indicates the data provider Task execution price This represents the profit coefficient for the data demander. This represents the profit coefficient for the data demander. Indicates the data provider Data collection Data quality; The profit formula for data providers Simplified to: ; in, Indicates the weighting coefficient. Indicates the data demander Selecting a data provider Execute the task At that time, the revenue generated per unit cost; right Taking the first and second derivatives, we get: ; ; Among them, it can be known If it is a strictly convex function, then it must have This enables data providers Profit maximization; The equation to be optimized is obtained through calculation, and a numerical solution is obtained through Newton's iteration method. ; in, Indicates the data provider Execute the task The profits generated from the obtained data.

10. A quality assurance data transaction system based on Stackelberg game theory and crowd intelligence sensing, characterized in that: The quality assurance data transaction system includes: The data quality assessment module is used for data quality assessment. The KANN-DBSCAN clustering algorithm was used to divide the data into regions based on the data location information collected by all data providers in order to eliminate suspected forged data; The probability matrix is ​​calculated based on the data entry matrix to quantify data scarcity, and the Euclidean distance matrix is ​​calculated to quantify data similarity. A data quality formula is formed based on data scarcity, data similarity, and weights to conduct data quality assessment. The game model building module is used to construct the Stackelberg game model to balance the interests of the participants. Define the revenue strategies for data demanders and data providers, and construct the transactions between data demanders and data providers as a Stackelberg game model; A task allocation problem model for data demanders is constructed based on the Stackelberg game model, and a reputation mechanism is introduced to reconstruct the task allocation problem model. The optimal task allocation strategy for data demanders is derived, and it is found that the task allocation problem model is a 0-1 knapsack problem. The optimal task allocation strategy for data demanders is calculated based on a greedy strategy algorithm. Based on the optimal task allocation strategy of data demanders, the profit function of data demanders is obtained. The optimal bidding strategy when the first derivative of the profit function is zero is obtained using the Newton iteration method. The Nash equilibrium analysis module is used to perform Nash equilibrium analysis on the strategies of data demanders and data providers from both the perspective of one data provider and multiple data providers, ensuring that both data demanders and data providers can obtain the best profit and complete the data transaction.