A data labeling crowdsourcing method and system with settlement balance transparency and a medium

By combining hash time locks and zero-knowledge proofs on the blockchain, the problem of unbalanced task rewards in blockchain crowdsourcing systems is solved, achieving fairness and transparency in task rewards, preventing malicious behavior, and is applicable to data annotation tasks such as image annotation, OCR recognition, and text translation.

CN116611920BActive Publication Date: 2026-06-12JINAN UNIVERSITY

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
JINAN UNIVERSITY
Filing Date
2023-04-23
Publication Date
2026-06-12

AI Technical Summary

Technical Problem

Achieving a balanced and transparent task reward settlement mechanism on the blockchain can solve the problems of false reporting by task requesters and free-riding by workers in crowdsourcing systems, especially in the absence of trusted third-party supervision, where existing technologies cannot effectively prevent dishonest behavior by malicious users.

Method used

By combining blockchain and hash time locks with zero-knowledge proofs and time-release encryption, a data labeling crowdsourcing method with balanced and transparent settlement is designed to ensure the fairness and transparency of task rewards. A hash time lock transaction mechanism and simple random sampling are used to evaluate task completion to prevent cheating.

🎯Benefits of technology

It achieves objectivity and fairness in task rewards, prevents cheating by malicious users, ensures the protection of the interests of task requesters and workers, and achieves fair transactions without the need for third-party supervision.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN116611920B_ABST
    Figure CN116611920B_ABST
Patent Text Reader

Abstract

The application discloses a kind of settlement balance transparent data labeling crowd sourcing method, system and medium, method includes: establishing data labeling crowd sourcing system, determine interactive entity;Task requester mixes and is labeled data and unlabeled data to obtain to be labeled data, while setting the public condition of labeled data, public task information tuple after encryption through smart contract on blockchain, deposit task reward setting task deposit, public task download link;Worker confirms to accept task and deposits task deposit, generates proof after downloading and completing data labeling task, and public task result and proof;Task requester worker proof, evaluate task result, create and broadcast a piece of hash time lock transaction, pay task reward to worker.The application carries out fair trade under the supervision of no trusted third party by blockchain and hash time lock, realizes the objectivity and fairness of task reward distribution, guarantees the balance and transparency of task reward settlement.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention belongs to the technical field of data annotation and crowdsourcing methods, and is particularly applicable to fields such as image annotation, OCR recognition, text translation, and semantic annotation. Specifically, it relates to a data annotation crowdsourcing method, system, and medium with transparent and balanced settlement. Background Technology

[0002] Crowdsourcing is a new technology that solicits solutions from the public through internet platforms. Since its inception in 2006, crowdsourcing has gradually become a mainstream business model as a distributed problem-solving mechanism, driven by the rapid development of internet technology. In machine learning, training machine models requires massive amounts of data, especially in data annotation fields such as image processing, natural language processing, and OCR recognition. This necessitates large amounts of raw data for image annotation and text translation, leading to the widespread use of crowdsourcing for image annotation and text translation tasks. In recent years, the rapid development of blockchain technology and its decentralized nature have offered a potential solution to the centralization problem in crowdsourcing systems, making blockchain-based crowdsourcing systems a current research hotspot. However, the lack of third-party oversight has resulted in imbalances in task rewards and settlements on blockchain-based crowdsourcing platforms.

[0003] In a task reward settlement system, if the task reward is paid before the task is completed, workers will be motivated to deliberately not try their best or even not complete the task, resulting in a task solution that is insufficient to meet the requirements of the task requester and harming the interests of the task requester. This is called "free-riding" behavior in crowdsourcing. If the task reward is paid after the task is completed, the task requester can try to reduce the reward they should pay by falsely reporting the task status, which harms the interests of the workers. This is called "false reporting" behavior in crowdsourcing. Most current blockchain-based crowdsourcing systems utilize reputation-based and deposit-based mechanisms to curb dishonest transactions. However, reputation-based crowdsourcing platforms cannot prevent dishonest behavior by "one-off" users. This means task requesters or workers can use one-off accounts—accounts that participate in only one task—to obtain solutions or task rewards without incurring any cost through "free-riding" or "false reporting." Although this results in reputation points being deducted, preventing participation in other crowdsourcing tasks, the malicious user's goal has been achieved. Furthermore, without binding their real-world identity, malicious users can register multiple accounts, completing each task only once before abandoning it, continuously using new accounts for malicious transactions to evade the reputation mechanism's restrictions. Currently, there is still no ideal solution for achieving a balanced and transparent task reward settlement mechanism in a blockchain-based crowdsourcing system. Summary of the Invention

[0004] The main objective of this invention is to overcome the shortcomings and deficiencies of the prior art and provide a data labeling crowdsourcing method, system and medium with balanced and transparent settlement. By using blockchain and hash time locks to conduct fair transactions without the supervision of a trusted third party, the objectivity and fairness of task reward distribution are achieved, and the balance and transparency of task reward settlement are guaranteed.

[0005] The primary objective of this invention is to provide a data labeling crowdsourcing method with transparent and balanced settlement.

[0006] The second objective of this invention is to provide a data labeling crowdsourcing system with transparent settlement balance;

[0007] A third objective of this invention is to provide an electronic device;

[0008] The fourth objective of this invention is to provide a computer-readable storage medium.

[0009] The first objective of this invention is achieved through the following technical solution:

[0010] A data labeling crowdsourcing method with transparent settlement balance includes the following steps:

[0011] Initialization phase: Establish a data labeling crowdsourcing system and determine the interactive entities; the interactive entities include task requesters and workers;

[0012] Task preparation phase: The task requester mixes labeled and unlabeled data to obtain data to be labeled, sets the public conditions for labeled data, uploads it to the InterPlanetary File System for encryption, publishes the task information tuple on the blockchain through a smart contract, deposits the task reward and sets the task deposit, and publishes the task download link.

[0013] Task execution phase: Workers confirm acceptance of the task and deposit a task deposit through a smart contract. After downloading and completing the data labeling task, they use zero-knowledge proof processing to generate an encrypted proof. They then confirm the completion of the task on the smart contract and publish the task results and proof.

[0014] Task settlement phase: The task requester uses zero-knowledge proof to verify the worker's proof, evaluates the overall completion of the worker's task results using simple random sampling to determine the task reward, creates and broadcasts a hash time-locked transaction, and pays the task reward to the worker when the hash lock is opened; if the hash lock is not opened within 48 hours, the task reward is returned to the task requester.

[0015] As a preferred technical solution, the initialization stage specifically includes:

[0016] Let TR = (TR.etup, R.KeyGen, R.Ext, TR.nc, R.Dec) be the time-release encryption process, i.e., TR processing; where TR.Setup(λ) is the TR processing setup algorithm, taking the security parameter λ as input and outputting the public parameter tpk and the private key tsk, used to initialize the TR processing; TR.KeyGen(tpk) is the TR processing key generation algorithm, taking the public parameter tpk as input and outputting the public key epk and the private key esk, used to generate the encryption and decryption keys; TR.xt(tpk, sk,) is the TR processing... The TR.nc algorithm generates a time-release key, which takes public parameters tpk, private key tsk, and release time t as input and outputs a time-release key trk, used to generate the time-release key. TR.nc(tpk, pk,,) is the encryption algorithm processed by TR, which takes public parameters tpk, public key epk, release time t, and plaintext message M as input and outputs ciphertext C, used to encrypt messages. TR.ec(tpk, sk, rk,) is the decryption algorithm processed by TR, which takes public parameters tpk, private key esk, time-release key trk, and ciphertext C as input and outputs plaintext message M, used to decrypt ciphertext.

[0017] Let ZK = (ZK.etup, K.Prover, K.Verifier) ​​be the zk-SNARK zero-knowledge proof processing, i.e., ZK processing; where ZK.Setup(λ, £) is the setup algorithm for ZK processing, taking the security parameter λ and the NP language £ as input, and outputting the common reference string crs, used to initialize ZK processing; ZK.Prover(s, w, crs) is the proof algorithm for ZK processing, taking the statement s, the secret w, and the common reference string crs as input, and outputting the proof π, used to generate the proof; ZK.Verifier(s, π, crs) is the verification algorithm for ZK processing, taking the statement s, the proof π, and the common reference string crs as input, and outputting 0 or 1, used to verify the proof;

[0018] Let H(m) be a secure hash function; let the total number of data to be labeled be n; let the number of labeled data owned by the task requester be m, where m = 10%n to 20%n;

[0019] Let the public-private key pair used by the task requester for encryption and decryption be (rpk, rsk);

[0020] Let the symmetric encryption key held by the worker be key i ; where i is a positive integer, representing the i-th key pair.

[0021] As a preferred technical solution, the task preparation stage specifically includes:

[0022] The task requester will randomly mix labeled and unlabeled data to obtain data to be labeled. The data to be labeled will be numbered α = [1, 2, ..., n], and the numbers of the labeled data will be recorded as β = [r1, r2, ..., r]. m ],in

[0023] Upload the data to be labeled to the InterPlanetary File System and obtain the IPFS Content Address (CID) of the data to be labeled. α ;

[0024] At the same time, the annotation information of the already labeled data will be... Encrypt using the private key rsk of the task requester, upload to the InterPlanetary File System to obtain the IPFS Content Address (CID) of the tag information γ. γ ;

[0025] Calculate the hash value H(CID) of the IPFS content address of the tag information γ. γ ), set the public conditions for the labeled data annotation information γ; the public conditions are that the worker confirms the completion of the task;

[0026] via address C ξ Smart contracts publicly display task information tuples τ = (CID) on the blockchain. α ,H(CID γ Deposit the task reward, set the task deposit, and make the download link for the data to be labeled public.

[0027] As a preferred technical solution, the task execution phase specifically includes:

[0028] Workers pass through address C ξ The smart contract confirms acceptance of the task, deposits a task deposit, and downloads the data to be labeled;

[0029] Label the data to be labeled, and then assign the labeling information δ={l1,l2,...,l...} to the data to be labeled. n} Encrypted using key1 Then use key2 on CT δ Encryption yields CT = {ct1, ct2, ..., ct} n} Upload the CT to the InterPlanetary File System to obtain the CT's IPFS Content Address (CID). δ ;

[0030] For NP languages: Run the ZK.Prover(s,w,crs) proof algorithm to generate proof π; where a = Enc(b,k) means that ciphertext a is obtained by encrypting plaintext b with key k;

[0031] At address C ξ Confirm task completion on smart contract and publicly disclose CID δ And prove π.

[0032] As a preferred technical solution, the task settlement stage specifically includes:

[0033] Once the worker confirms task completion, the annotation information γ of the labeled data set by the task requester is made public. The worker then uses key2 to decrypt the corresponding annotation information in the task result, based on the annotation information γ's number β, to obtain the encrypted task result of the labeled data. Among them l j ∈[r1,2,..., m ];

[0034] The encrypted CT results of the labeled data res Uploaded to InterPlanetary File System to obtain CT res IPFS Content Address CID res Public CID res and key1;

[0035] All interacting entities run the ZK.Verifier(s,π,crs) verification algorithm to verify the worker's proof π. The worker's task results are evaluated using simple random sampling to assess the overall completion status. Finally, the task requester determines the task reward based on the evaluation results.

[0036] The task requester randomly selects a number x, calculates its hash value H(x), uses it as a hash lock HL(x), and sends it to the worker;

[0037] The task requester creates and broadcasts a hash time-locked transaction tx1: When the hash lock HL(x) is opened by the worker, the task reward is paid from the task requester's wallet address to the worker's wallet address. If the hash lock is not opened within 48 hours, the reward is returned to the task requester's wallet address.

[0038] After receiving the hash lock HL(x), the worker runs TR.Setup(λ) to set the algorithm to generate public parameters tpk and private key tsk, runs TR.KeyGen(tpk) to generate a public-private key pair (epk, esk), sets the key expiration time t to 24 hours, runs TR.xt(tpk, sk,) to generate a time-release key trk, and runs TR.nc(tpk, pk,, ey2) to encrypt key2 to obtain the ciphertext C. k ;

[0039] If the worker is satisfied with the task reward from the task requester, they create and broadcast a transaction tx2, publicly exposing the ciphertext C on the smart contract. k When the hash lock HL(x) is opened by the task requester, the worker receives the task reward and sends the worker's private key esk and time release key trk to the task requester via a smart contract.

[0040] The task requester runs TR.ec(tpk,sk,rk, k The decryption algorithm yields key2, which is then used to decrypt the task result.

[0041] As a preferred technical solution, all interactive entities on the blockchain, including the task requester, can run the ZK.Verifier(s,π,crs) verification algorithm to verify the worker's proof π. Once verified, the task result is decrypted using key1 to obtain the plaintext task result with labeled data. Since only the worker possesses the decryption key key2, other interacting entities cannot obtain the plaintext of the task result without labeled data in the task result;

[0042] All interactive entities verify the task results of the decrypted labeled data based on the labeling information γ of the labeled data disclosed by the task requester, and calculate the matching degree with γ.

[0043] Based on the estimation of the whole by simple random sampling, the matching degree between the labeled data and γ is regarded as the matching degree of the data to be labeled, and it is made public on the blockchain;

[0044] The task requester collects the publicly disclosed matching scores of all interactive entities, removes extreme results, performs an average calculation, and discloses the calculation process to obtain the evaluation results.

[0045] The second objective of this invention is to provide a data annotation crowdsourcing system with transparent and balanced settlement, applied to the aforementioned data annotation crowdsourcing method with transparent and balanced settlement, including an initialization module, a task preparation module, a task execution module, and a task settlement module;

[0046] The initialization module is used to establish a data labeling crowdsourcing system and determine the interactive entities; the interactive entities include task requesters and workers.

[0047] The task preparation module is used by the task requester to mix labeled and unlabeled data to obtain data to be labeled, set the public conditions for labeled data, upload it to the InterPlanetary File System for encryption, and then publish the task information tuple on the blockchain through a smart contract, deposit the task reward, set the task deposit, and publish the task download link.

[0048] The task execution module is used by workers to confirm acceptance of tasks and deposit task deposits through smart contracts, download and complete data labeling tasks, use zero-knowledge proof processing to encrypt and generate proofs, confirm task completion on smart contracts, and publish task results and proofs.

[0049] The task settlement module is used by the task requester to process and verify the worker's proof using zero-knowledge proof, evaluate the overall completion of the worker's task results using simple random sampling to determine the task reward, create and broadcast a hash time-locked transaction, and pay the task reward to the worker when the hash lock is opened; if the hash lock is not opened within 48 hours, the task reward is returned to the task requester.

[0050] A third objective of this invention is to provide an electronic device, the electronic device comprising:

[0051] At least one processor; and a memory communicatively connected to said at least one processor; wherein,

[0052] The memory stores computer program instructions that can be executed by the at least one processor, which enables the at least one processor to perform the aforementioned transparent data labeling crowdsourcing method for settlement balance.

[0053] The fourth objective of this invention is to provide a computer-readable storage medium storing a program that, when executed by a processor, implements the aforementioned data labeling crowdsourcing method for achieving settlement balance and transparency.

[0054] Compared with the prior art, the present invention has the following advantages and beneficial effects:

[0055] Current decentralized crowdsourcing systems based on blockchain often fail to consider how to fairly distribute task rewards, leading to issues such as false reporting by task requesters or free-riding by workers. Existing methods for task reward allocation mostly rely on reputation-based mechanisms, which cannot prevent malicious behavior from "one-off" accounts. To address these problems, this invention proposes a data-annotated crowdsourcing method with balanced and transparent settlement, offering the following advantages:

[0056] (1) In view of the possible cheating behavior in the current decentralized crowdsourcing system based on blockchain, this invention proposes a crowdsourcing protocol to ensure that the cheating party can never obtain benefits, thus realizing the atomicity of transactions.

[0057] (2) This invention targets data annotation crowdsourcing tasks with objective answers, such as image annotation, text translation, and OCR recognition. By publicly disclosing simple random samples, it obtains fair and objective solution evaluation criteria, thereby achieving objectivity and fairness in task reward allocation.

[0058] (3) This invention uses a hash time locking method to prevent cheating by targeting the malicious behavior of "one-time" users. If the task requester who pays the reward wants to get the solution for free, the task requester will not be able to get it even if the worker who has done the work has submitted the completed task. If the worker wants to get the task reward without doing the work, the worker will not be able to get it even if the task requester has paid the task reward. Attached Figure Description

[0059] To more clearly illustrate the technical solutions in the embodiments of this application, the accompanying drawings used in the description of the embodiments will be briefly introduced below. Obviously, the accompanying drawings described below are only some embodiments of this application. For those skilled in the art, other drawings can be obtained based on these drawings without creative effort.

[0060] Figure 1 This is a flowchart of a data labeling crowdsourcing method for achieving transparent settlement balance, as described in an embodiment of the present invention.

[0061] Figure 2 This is a schematic diagram of the task settlement process in an embodiment of the present invention;

[0062] Figure 3 This is a structural diagram of a data labeling crowdsourcing system with transparent settlement balance, as described in an embodiment of the present invention.

[0063] Figure 4 This is a schematic diagram of the structure of an electronic device according to an embodiment of the present invention. Detailed Implementation

[0064] To enable those skilled in the art to better understand the present application, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are merely some embodiments of the present application, and not all embodiments. All other embodiments obtained by those skilled in the art based on the embodiments of the present application without creative effort are within the scope of protection of the present application.

[0065] In this application, the reference to "embodiment" means that a specific feature, structure, or characteristic described in connection with an embodiment may be included in at least one embodiment of this application. The appearance of this phrase in various places throughout the specification does not necessarily refer to the same embodiment, nor is it a mutually exclusive, independent, or alternative embodiment. It will be explicitly and implicitly understood by those skilled in the art that the embodiments described in this application can be combined with other embodiments.

[0066] Please see Figure 1This embodiment discloses a data labeling crowdsourcing method with transparent settlement balance, including the following steps:

[0067] I. Initialization Phase: Establish a data labeling crowdsourcing system and determine the interactive entities; in this method, the interactive entities involve task requesters and workers; among them, task requesters are people or organizations that need to post tasks in order to obtain data labeling results; workers are the general public who receive tasks and complete and submit the results within the specified time to obtain task rewards.

[0068] Specifically, in the initialization phase, let TR = (TR.etup, R.KeyGen, R.Ext, TR.nc, R.Dec) be the Timed-Release Encryption (TR) process; where TR.Setup(λ) is the TR process setup algorithm, taking the security parameter λ as input and outputting the public parameter tpk and the private key tsk, used to initialize the TR process; TR.KeyGen(tpk) is the TR process key generation algorithm, taking the public parameter tpk as input and outputting the public key epk and the private key esk, used to generate the encryption and decryption keys; TR.Ext(tpk, tsk, t) is the TR process timed-release key generation algorithm, taking the public parameter tpk as input. The TR.Enc(tpk,epk,t,C) function takes the public key tsk, release time t, and plaintext message M as input and outputs the time-release key trk, used to generate the time-release key. The TR.Enc(tpk,epk,t,C) function is the encryption algorithm processed by TR. It takes the public key tk, release time t, and plaintext message M as input and outputs the ciphertext C, used to encrypt messages. The TR.Dec(tpk,esk,trk,C) function is the decryption algorithm processed by TR. It takes the public key tk, release time trk, and ciphertext C as input and outputs the plaintext message M, used to decrypt the ciphertext.

[0069] Let ZK = (ZK.Setup, ZK.Prover, ZK.Verifier) ​​be the zk-SNARK zero-knowledge proof processing, i.e., ZK processing; where ZK.Setup(λ, £) is the setup algorithm for ZK processing, taking the security parameter λ and the NP language £ as input, and outputting the common reference string crs, used to initialize ZK processing; ZK.Prover(s, w, crs) is the proof algorithm for ZK processing, taking the statement s, the secret w, and the common reference string crs as input, and outputting the proof π, used to generate the proof; ZK.Verifier(s, π, crs) is the verification algorithm for ZK processing, taking the statement s, the proof π, and the common reference string crs as input, and outputting 0 or 1, used to verify the proof;

[0070] Let H(m) be a secure hash function; let the total number of data to be labeled be n; let the number of labeled data owned by the task requester be m, where m = 10%n to 20%n;

[0071] Let the public-private key pair used by the task requester for encryption and decryption be (rpk, rsk);

[0072] Let the symmetric encryption key held by the worker be key i ; where i is a positive integer, representing the i-th key pair.

[0073] II. Task Preparation Phase: The task requester mixes labeled and unlabeled data to obtain data to be labeled, sets the public conditions for labeled data, uploads it to the InterPlanetary File System for encryption, publishes the task information tuple on the blockchain through a smart contract, deposits the task reward and sets the task deposit, and publishes the task download link.

[0074] Specifically, in this stage, the task requester who needs to publish data annotation tasks is mainly responsible for the task formulation and publication operations. The steps are as follows:

[0075] 1) The task requester will randomly mix the labeled and unlabeled data to obtain the data to be labeled. The data to be labeled will be numbered as α=[1,2,…,n], and the numbers of the labeled data will be recorded as β=[r1,r2,...,r m ],in

[0076] 2) Upload the data to be labeled to the InterPlanetary File System and obtain the IPFS Content Address (CID) of the data to be labeled. α ;

[0077] 3) Simultaneously, add the annotation information to the already labeled data. Encrypt using the private key rsk of the task requester, upload to the InterPlanetary File System to obtain the IPFS Content Address (CID) of the tag information γ. γ ;

[0078] 4) Calculate the hash value H (CID) of the IPFS content address of the labeled information γ. γ ), Set the public conditions for the labeled data annotation information γ: the worker confirms that the task has been completed;

[0079] 5) Via address C ξ Smart contracts publicly display task information tuples τ = (CID) on the blockchain. α ,H(CID γ Deposit the task reward, set the task deposit, and make the download link for the data to be labeled public.

[0080] III. Task Execution Phase: Workers confirm acceptance of the task and deposit a task deposit through a smart contract. After downloading and completing the data labeling task, they use zero-knowledge proof processing to generate an encrypted proof. They then confirm the completion of the task on the smart contract and publish the task results and proof.

[0081] In this phase, workers receive, complete, and publicly disclose data annotation tasks, specifically:

[0082] 1) The worker passed through address C ξ The smart contract confirms acceptance of the task, deposits a task deposit, and downloads the data to be labeled;

[0083] 2) Label the data to be labeled, and after completion, assign the labeling information δ={l1,2,..., n} Encrypted using key1 Then use key2 on CT δ Encryption yields CT = {ct1, t2, ..., t} n} Upload the CT to the InterPlanetary File System to obtain the CT's IPFS Content Address (CID). δ ;

[0084] 3) For NP languages: Run the ZK.Prover(s,w,crs) proof algorithm to generate proof π; where a = Enc(b,k) means that ciphertext a is obtained by encrypting plaintext b with key k;

[0085] 4) At address C ξ Confirm task completion on smart contract and publicly disclose CID δ And prove π.

[0086] IV. Task Settlement Phase: The task requester uses zero-knowledge proof to verify the worker's proof, evaluates the overall completion of the worker's task results using simple random sampling to determine the task reward, creates and broadcasts a hash time-locked transaction, and pays the task reward to the worker when the hash lock is opened; if the hash lock is not opened within 48 hours, the task reward is returned to the task requester.

[0087] This phase primarily involves all interacting entities verifying the reward allocation for the task requester, specifically:

[0088] For workers, perform the following:

[0089] 1) Once the worker confirms task completion, the annotation information γ of the labeled data set by the task requester is made public. The worker, based on the annotation information γ's number β, decrypts the corresponding annotation information in the task result using key2 to obtain the encrypted task result of the labeled data. Among them l j ∈[r1,2,..., m ];

[0090] 2) Encrypt the task results of the labeled data into CT. res Uploaded to InterPlanetary File System to obtain CT res IPFS Content Address CID res Public CID res and key1;

[0091] The task requester then performs the following steps:

[0092] 1) All interacting entities first run the ZK.Verifier(s,π,crs) verification algorithm to verify the worker's proof π, use simple random sampling to evaluate the overall completion of the worker's task results, and finally the task requester determines the task reward based on the evaluation results.

[0093] Specifically, since the labeled data is discretely distributed among the unlabeled data, only the task requester knows which data is labeled before task settlement. After the worker completes the task, the worker publishes their task results, and the task requester also publishes the labeling information of the labeled data. Then, all interacting entities can verify the matching degree between the worker's task results and the labeling information of the labeled data published by the task requester, thereby completing the evaluation of the overall completion status.

[0094] All interacting entities on the blockchain, including the task requester, run the ZK.Verifier(s,π,crs) verification algorithm to verify the worker's proof π. Once verified, they use key1 to decrypt the annotation information of the annotated data in the task result, obtaining the plaintext of the annotated data. Since only the worker possesses the decryption key key2, other interacting entities cannot obtain its plaintext content;

[0095] All interactive entities verify the task results of the decrypted labeled data based on the labeling information γ of the labeled data disclosed by the task requester, and calculate the matching degree with γ.

[0096] Based on the estimation of the whole by simple random sampling, the matching degree between the labeled data and γ is regarded as the matching degree of the data to be labeled, and it is made public on the blockchain;

[0097] The task requester collects the publicly disclosed matching scores of all interactive entities, removes extreme results, performs an average calculation, and discloses the calculation process to obtain the evaluation results.

[0098] In this embodiment, the number of labeled data m accounts for 10% to 20% of the number of data to be labeled n. This is for the purpose of sample evaluation at this time. Based on the estimation of simple random sampling, assuming that the matching degree between the worker's labeled data and γ (objective answer) is 90%, it can be considered that the worker's overall task completion rate is about 90% correct. If the worker tries to cheat, he must guess or calculate which data is labeled before completing the task, which is considered infeasible. If the worker tries to modify the answer after the task requester discloses the labeled information γ, the zero-knowledge proof will fail.

[0099] 2) The task requester randomly selects a number x, calculates its hash value H(x), uses it as a hash lock HL(x), and sends it to the worker;

[0100] 3) The task requester creates and broadcasts a hash time-locked transaction tx1: When the hash lock HL(x) is opened by the worker, the task reward is paid from the task requester's wallet address to the worker's wallet address. If the hash lock is not opened within 48 hours, the reward is returned to the task requester's wallet address.

[0101] After receiving the hash lock HL(x), the worker performs the following operation:

[0102] 1) Run TR.Setup(λ) to set the algorithm to generate public parameters tpk and private key tsk. Run TR.KeyGen(tpk) to generate a public-private key pair (epk, esk). Set the key expiration time t to 24 hours. Run TR.xt(tpk, sk,) to generate a time-release key trk. Run TR.nc(tpk, pk,, ey2) to encrypt key2 to obtain the ciphertext C. k ;

[0103] 2) If the worker is satisfied with the task reward from the task requester, then creates and broadcasts a transaction tx2, publicly exposing the encrypted text C on the smart contract. k When the hash lock HL(x) is opened by the task requester, the worker receives the task reward and sends the worker's private key esk and time release key trk to the task requester via a smart contract.

[0104] Finally, the task requester runs TR.ec(tpk,sk,rk, k The decryption algorithm yields key2, which is used to decrypt the task result, thus ending the data annotation task.

[0105] If the task requester does not activate the hash lock, the worker will not receive the task reward, and the task requester will also be unable to decrypt and obtain the task result because the hash time lock, which is encrypted with time release, will expire. If the worker does not broadcast tx2, the task reward will be returned to the task requester after tx1 expires.

[0106] Compared to current decentralized crowdsourcing systems based on blockchain, this invention designs and implements a data annotation crowdsourcing method for data annotation scenarios with objective answers, such as image annotation, text translation, and OCR recognition. This method ensures that each transaction is tamper-proof without any third-party intervention. It addresses the unavoidable free-riding and false reporting behaviors inherent in blockchain-based crowdsourcing systems without trusted third-party oversight. Hash time locking is used to guarantee the atomicity of transactions: 1) the worker receives the task reward at the same time the task requester receives the task result, and vice versa; 2) if the worker does not receive the task reward, the task requester will not receive the task result, and vice versa. Furthermore, it employs the principle of simple random sampling, publicly proving the correctness of the worker's submitted solution by disclosing the sampled answers. This public standard is used to measure task rewards. This invention demonstrates significant innovation in achieving fair transactions without trusted third-party oversight.

[0107] It should be noted that, for the sake of simplicity, the aforementioned method embodiments are all described as a series of actions. However, those skilled in the art should understand that the present invention is not limited to the described order of actions, because according to the present invention, some steps can be performed in other orders or simultaneously.

[0108] Based on the same idea as the data annotation crowdsourcing method with transparent settlement balance in the above embodiments, the present invention also provides a data annotation crowdsourcing system with transparent settlement balance, which can be used to execute the aforementioned data annotation crowdsourcing method with transparent settlement balance. For ease of explanation, the structural diagram of an embodiment of the data annotation crowdsourcing system with transparent settlement balance only shows the parts related to the embodiments of the present invention. Those skilled in the art will understand that the illustrated structure does not constitute a limitation on the device, and may include more or fewer components than shown, or combine certain components, or have different component arrangements.

[0109] like Figure 3 As shown, another embodiment of the present invention provides a data labeling crowdsourcing system with transparent settlement balance, including an initialization module, a task preparation module, a task execution module and a task settlement module;

[0110] The initialization module is used to establish the data labeling crowdsourcing system and determine the interactive entities; the interactive entities include task requesters and workers.

[0111] The task preparation module is used by task requesters to mix labeled and unlabeled data to obtain data to be labeled, set the public conditions for labeled data, upload it to the InterPlanetary File System for encryption, and then publish the task information tuple on the blockchain through a smart contract, deposit the task reward, set the task deposit, and publish the task download link.

[0112] The task execution module is used by workers to confirm acceptance of tasks and deposit task deposits through smart contracts. After downloading and completing the data labeling task, the module uses zero-knowledge proof processing to generate encrypted proofs, confirms the completion of the task on the smart contract, and publishes the task results and proofs.

[0113] The task settlement module is used by task requesters to process and verify workers' proofs using zero-knowledge proofs. It evaluates the overall completion of workers' tasks using simple random sampling to determine task rewards, creates and broadcasts a hash time-locked transaction. When the hash lock is opened, the task reward is paid to the worker. If the hash lock is not opened within 48 hours, the task reward is returned to the task requester.

[0114] It should be noted that the settlement-balanced and transparent data labeling crowdsourcing system of the present invention corresponds one-to-one with the settlement-balanced and transparent data labeling crowdsourcing method of the present invention. The technical features and beneficial effects described in the above-mentioned embodiments of the settlement-balanced and transparent data labeling crowdsourcing method are all applicable to the embodiments of the settlement-balanced and transparent data labeling crowdsourcing system. For details, please refer to the description in the embodiments of the method of the present invention, which will not be repeated here.

[0115] Furthermore, in the above embodiment of a transparent data labeling crowdsourcing system with balanced settlement, the logical division of each program module is merely an example. In actual applications, the above functions can be assigned to different program modules as needed, for example, for the sake of corresponding hardware configuration requirements or the convenience of software implementation. That is, the internal structure of the transparent data labeling crowdsourcing system with balanced settlement can be divided into different program modules to complete all or part of the functions described above.

[0116] Please see Figure 4 In one embodiment, an electronic device is provided for a data labeling crowdsourcing method that achieves transparent settlement balance. The electronic device may include a first processor, a first memory, and a bus, and may also include a computer program, such as a data labeling crowdsourcing program, stored in the first memory and executable on the first processor.

[0117] The first memory includes at least one type of readable storage medium, such as flash memory, portable hard drive, multimedia card, card-type memory (e.g., SD or DX memory), magnetic memory, magnetic disk, optical disk, etc. In some embodiments, the first memory can be an internal storage unit of an electronic device, such as a portable hard drive. In other embodiments, the first memory can be an external storage device of the electronic device, such as a plug-in portable hard drive, Smart Media Card (SMC), Secure Digital (SD) card, Flash Card, etc. Furthermore, the first memory can include both internal and external storage units of the electronic device. The first memory can be used not only to store application software and various types of data installed on the electronic device, such as the code of a data annotation crowdsourcing program, but also to temporarily store data that has been output or will be output.

[0118] In some embodiments, the first processor may be composed of integrated circuits, such as a single packaged integrated circuit or multiple integrated circuits packaged with the same or different functions, including combinations of one or more central processing units (CPUs), microprocessors, digital processing chips, graphics processors, and various control chips. The first processor is the control unit of the electronic device, connecting various components of the entire electronic device through various interfaces and lines. It executes programs or modules stored in the first memory (e.g., data labeling crowdsourcing programs) and calls data stored in the first memory to perform various functions of the electronic device and process data.

[0119] Figure 4 Only electronic devices with components are shown; it will be understood by those skilled in the art that... Figure 4 The structure shown does not constitute a limitation on the electronic device and may include fewer or more components than shown, or combine certain components, or have different component arrangements.

[0120] The data labeling crowdsourcing program stored in the first memory of the electronic device is a combination of multiple instructions, which, when run in the first processor, can achieve the following:

[0121] Initialization phase: Establish the data labeling crowdsourcing system and determine the interaction entities; the interaction entities include task requesters and workers;

[0122] Task preparation phase: The task requester mixes labeled and unlabeled data to obtain data to be labeled, sets the public conditions for labeled data, uploads it to the InterPlanetary File System for encryption, publishes the task information tuple on the blockchain through a smart contract, deposits the task reward and sets the task deposit, and publishes the task download link.

[0123] Task execution phase: Workers confirm acceptance of the task and deposit a task deposit through a smart contract. After downloading and completing the data labeling task, they use zero-knowledge proof processing to generate an encrypted proof. They then confirm the completion of the task on the smart contract and publish the task results and proof.

[0124] Task settlement phase: The task requester uses zero-knowledge proof to verify the worker's proof, evaluates the overall completion of the worker's task results using simple random sampling to determine the task reward, creates and broadcasts a hash time-locked transaction, and pays the task reward to the worker when the hash lock is opened; if the hash lock is not opened within 48 hours, the task reward is returned to the task requester.

[0125] Furthermore, if the modules / units integrated in the electronic device are implemented as software functional units and sold or used as independent products, they can be stored in a non-volatile computer-readable storage medium. The computer-readable medium may include: any entity or device capable of carrying the computer program code, a recording medium, a USB flash drive, a portable hard drive, a magnetic disk, an optical disk, a computer memory, or a read-only memory (ROM).

[0126] Those skilled in the art will understand that all or part of the processes in the above embodiments can be implemented by a computer program instructing related hardware. The program can be stored in a non-volatile computer-readable storage medium, and when executed, it can include the processes of the embodiments of the above methods. Furthermore, any references to memory, storage, databases, or other media used in the embodiments provided in this application can include non-volatile and / or volatile memory.

[0127] The technical features of the above embodiments can be combined in any way. For the sake of brevity, not all possible combinations of the technical features in the above embodiments are described. However, as long as there is no contradiction in the combination of these technical features, they should be considered to be within the scope of this specification.

[0128] The above embodiments are preferred embodiments of the present invention, but the embodiments of the present invention are not limited to the above embodiments. Any changes, modifications, substitutions, combinations, or simplifications made without departing from the spirit and principle of the present invention shall be considered equivalent substitutions and shall be included within the protection scope of the present invention.

Claims

1. A data labeling crowdsourcing method with transparent settlement balance, characterized in that, Includes the following steps: Initialization phase: Establish a data labeling crowdsourcing system and determine the interactive entities; the interactive entities include task requesters and workers; Task preparation phase: The task requester mixes labeled and unlabeled data to obtain data to be labeled, sets the public conditions for labeled data, uploads it to the InterPlanetary File System for encryption, publishes the task information tuple on the blockchain through a smart contract, deposits the task reward and sets the task deposit, and publishes the task download link. Task execution phase: Workers confirm acceptance of the task and deposit a task deposit through a smart contract. After downloading and completing the data labeling task, they use zero-knowledge proof processing to generate an encrypted proof. They then confirm the completion of the task on the smart contract and publish the task results and proof. Task settlement phase: The task requester uses zero-knowledge proof to verify the worker's proof, evaluates the overall completion of the worker's task results using simple random sampling to determine the task reward, creates and broadcasts a hash time-locked transaction, and pays the task reward to the worker when the hash lock is opened; if the hash lock is not opened within 48 hours, the task reward is returned to the task requester. The task settlement phase specifically includes: Once the worker confirms task completion, the annotation information γ of the labeled data set by the task requester is made public. The worker then uses key2 to decrypt the corresponding annotation information in the task result, based on the annotation information γ's number β, to obtain the encrypted task result of the labeled data. , where l j ∈[r1,r2,...,r m ]; The encrypted CT results of the labeled data res Uploaded to InterPlanetary File System to obtain CT res IPFS Content Address CID res Public CID res and key1; All interacting entities run the ZK.Verifier(s,π,crs) verification algorithm to verify the worker's proof π. The worker's task results are evaluated using simple random sampling to assess the overall completion status. Finally, the task requester determines the task reward based on the evaluation results. The task requester randomly selects a number x, calculates its hash value H(x), uses it as a hash lock HL(x), and sends it to the worker; The task requester creates and broadcasts a hash time-locked transaction tx1: When the hash lock HL(x) is opened by the worker, the task reward is paid from the task requester's wallet address to the worker's wallet address. If the hash lock is not opened within 48 hours, the reward is returned to the task requester's wallet address. After receiving the hash lock HL(x), the worker runs TR.Setup(λ) to set the algorithm to generate public parameters tpk and private key tsk, runs TR.KeyGen(tpk) to generate a public-private key pair (epk, esk), sets the key expiration time t to 24 hours, runs TR.Ext(tpk, tsk, t) to generate a time-release key trk, and runs TR.Enc(tpk, epk, t, key2) to encrypt key2 to obtain the ciphertext C. k ; If the worker is satisfied with the task reward from the task requester, they create and broadcast a transaction tx2, publicly exposing the ciphertext C on the smart contract. k When the hash lock HL(x) is opened by the task requester, the worker receives the task reward and sends the worker's private key esk and time release key trk to the task requester via a smart contract. The task requester runs TR.Dec(tpk,esk,trk,C) k The decryption algorithm yields key2, which is then used to decrypt the task result.

2. The data labeling crowdsourcing method for transparent settlement balance according to claim 1, characterized in that, The initialization phase specifically includes: Let TR = (TR.Setup, TR.KeyGen, TR.Ext, TR.Enc, TR.Dec) be the time-release encryption process, i.e., TR processing; TR.Setup(λ) is the TR processing setup algorithm, which takes the security parameter λ as input and outputs the public parameter tpk and the private key tsk, and is used to initialize the TR processing. TR.KeyGen(tpk) is the key generation algorithm for TR processing. It takes the public parameter tpk as input and outputs the public key epk and the private key esk, which are used to generate encryption and decryption keys. TR.Ext(tpk,tsk,t) is the time release key generation algorithm processed by TR. It takes the public parameter tpk, the private key tsk and the release time t as input, and outputs the time release key trk, which is used to generate the time release key. TR.Enc(tpk,epk,t,M) is the encryption algorithm for TR processing. It takes public parameters tpk, public key epk, release time t and plaintext message M as inputs and outputs ciphertext C, which is used to encrypt messages. TR.Dec(tpk,esk,trk,C) is the decryption algorithm for TR processing. It takes public parameter tpk, private key esk, time release key trk and ciphertext C as input, and outputs plaintext message M, which is used to decrypt the ciphertext. Let ZK = (ZK.Setup, ZK.Prover, ZK.Verifier) ​​be the zk-SNARK zero-knowledge proof processing, i.e., ZK processing; where ZK.Setup(λ,£) is the setup algorithm of ZK processing, which takes the security parameter λ and NP language £ as input and outputs the common reference string crs, used to initialize ZK processing; ZK.Prover(s,w,crs) is the proof algorithm of ZK processing, which takes the statement s, the secret w and the common reference string crs as input and outputs the proof π, used to generate the proof; ZK.Verifier(s,π,crs) is the verification algorithm of ZK processing, which takes the statement s, the proof π and the common reference string crs as input and outputs 0 or 1, used to verify the proof; Let H(m) be a secure hash function; let the total number of data to be labeled be n; let the number of labeled data owned by the task requester be m, where m = 10%n ~ 20%n; Let the public-private key pair used by the task requester for encryption and decryption be (rpk, rsk); Let the symmetric encryption key held by the worker be key i ; where i is a positive integer, representing the i-th key pair.

3. The data labeling crowdsourcing method for transparent settlement balance according to claim 2, characterized in that, The task preparation phase specifically includes: The task requester will randomly mix labeled and unlabeled data to obtain data to be labeled. The data to be labeled will be numbered α=[1,2,…,n], and the numbers of the labeled data will be recorded β=[r1,r2,...,r...]. m ], where [r1,r2,...,r m ]⊂[1,2,…,n]; Upload the data to be labeled to the InterPlanetary File System and obtain the IPFS Content Address (CID) of the data to be labeled. α ; At the same time, the annotation information of the already labeled data will be... Encrypt using the private key rsk of the task requester, upload to the InterPlanetary File System to obtain the IPFS Content Address (CID) of the tag information γ. γ ; Calculate the hash value H(CID) of the IPFS content address of the tag information γ. γ ), set the public conditions for the labeled data annotation information γ; the public conditions are that the worker confirms the completion of the task; via address C ξ Smart contracts publicly display task information tuples τ=(CID) on the blockchain. α ,H(CID γ Deposit the task reward, set the task deposit, and make the download link for the data to be labeled public.

4. The data labeling crowdsourcing method for transparent settlement balance according to claim 3, characterized in that, The task execution phase specifically includes: Workers pass through address C ξ The smart contract confirms acceptance of the task, deposits a task deposit, and downloads the data to be labeled; Label the data to be labeled, and then assign the labeling information δ={l1,l2,...,l...} to the data to be labeled. n } Encrypted using key1 Then use key2 on CT δ Encryption yields CT={ct1,ct2,...,ct n } Upload CT to the InterPlanetary File System to obtain the IPFS content address of CT. δ ; For NP languages: , Run the ZK.Prover(s,w,crs) proof algorithm to generate proof π; where a=Enc(b,k) means that ciphertext a is obtained by encrypting plaintext b with key k; At address C ξ Confirm task completion on smart contract and publicly disclose CID δ And prove π.

5. The data labeling crowdsourcing method for transparent settlement balance according to claim 4, characterized in that, All interacting entities on the blockchain, including the task requester, can run the ZK.Verifier(s,π,crs) verification algorithm to verify the worker's proof π. Once verified, the task result is decrypted using the key1 to obtain the plaintext task result with labeled data. For ct... i Since only the worker possesses the decryption key key2, other interactive entities cannot obtain the plaintext of the task result without labeled data in the task result; All interactive entities verify the task results of the decrypted labeled data based on the labeling information γ of the labeled data disclosed by the task requester, and calculate the matching degree with γ. Based on the estimation of the whole by simple random sampling, the matching degree between the labeled data and γ is regarded as the matching degree of the data to be labeled, and it is made public on the blockchain; The task requester collects the publicly disclosed matching scores of all interactive entities, removes extreme results, performs an average calculation, and discloses the calculation process to obtain the evaluation results.

6. A data labeling crowdsourcing system with transparent settlement and balance, characterized in that, A data labeling crowdsourcing method with transparent settlement balance, applicable to any one of claims 1-5, includes an initialization module, a task preparation module, a task execution module, and a task settlement module; The initialization module is used to establish a data labeling crowdsourcing system and determine the interactive entities; the interactive entities include task requesters and workers. The task preparation module is used by the task requester to mix labeled and unlabeled data to obtain data to be labeled, set the public conditions for labeled data, upload it to the InterPlanetary File System for encryption, and then publish the task information tuple on the blockchain through a smart contract, deposit the task reward, set the task deposit, and publish the task download link. The task execution module is used by workers to confirm acceptance of tasks and deposit task deposits through smart contracts, download and complete data labeling tasks, use zero-knowledge proof processing to encrypt and generate proofs, confirm task completion on smart contracts, and publish task results and proofs. The task settlement module is used by the task requester to process and verify the worker's proof using zero-knowledge proof, evaluate the overall completion of the worker's task results using simple random sampling to determine the task reward, create and broadcast a hash time lock transaction, and pay the task reward to the worker when the hash lock is opened; if the hash lock is not opened within 48 hours, the task reward is returned to the task requester. The task settlement module is specifically as follows: Once the worker confirms task completion, the annotation information γ of the labeled data set by the task requester is made public. The worker then uses key2 to decrypt the corresponding annotation information in the task result, based on the annotation information γ's number β, to obtain the encrypted task result of the labeled data. , where l j ∈[r1,r2,...,r m ]; The encrypted CT results of the labeled data res Uploaded to InterPlanetary File System to obtain CT res IPFS Content Address CID res Public CID res and key1; All interacting entities run the ZK.Verifier(s,π,crs) verification algorithm to verify the worker's proof π. The worker's task results are evaluated using simple random sampling to assess the overall completion status. Finally, the task requester determines the task reward based on the evaluation results. The task requester randomly selects a number x, calculates its hash value H(x), uses it as a hash lock HL(x), and sends it to the worker; The task requester creates and broadcasts a hash time-locked transaction tx1: When the hash lock HL(x) is opened by the worker, the task reward is paid from the task requester's wallet address to the worker's wallet address. If the hash lock is not opened within 48 hours, the reward is returned to the task requester's wallet address. After receiving the hash lock HL(x), the worker runs TR.Setup(λ) to set the algorithm to generate public parameters tpk and private key tsk, runs TR.KeyGen(tpk) to generate a public-private key pair (epk, esk), sets the key expiration time t to 24 hours, runs TR.Ext(tpk, tsk, t) to generate a time-release key trk, and runs TR.Enc(tpk, epk, t, key2) to encrypt key2 to obtain the ciphertext C. k ; If the worker is satisfied with the task reward from the task requester, they create and broadcast a transaction tx2, publicly exposing the ciphertext C on the smart contract. k When the hash lock HL(x) is opened by the task requester, the worker receives the task reward and sends the worker's private key esk and time release key trk to the task requester via a smart contract. The task requester runs TR.Dec(tpk,esk,trk, C k The decryption algorithm yields key2, which is then used to decrypt the task result.

7. An electronic device, characterized in that, The electronic device includes: At least one processor; and, A memory communicatively connected to the at least one processor; wherein, The memory stores computer program instructions that can be executed by the at least one processor to enable the at least one processor to perform a data labeling crowdsourcing method for settlement balance transparency as described in any one of claims 1-5.

8. A computer-readable storage medium storing a program, characterized in that, When the program is executed by the processor, it implements the data labeling crowdsourcing method for settlement balance and transparency as described in any one of claims 1-5.