Proof of Contribution
Last updated
Last updated
In the distributed AI system constructed by Janction, we have designed a comprehensive resource scheduling mechanism. The primary purpose of introducing blockchain is to fairly distribute benefits to contributors, thereby facilitating a trustless operation of the collaborative system. However, trustlessness is based on the assumption that all participants act in good faith, which is clearly unreasonable. Malicious participants who aim to gain unfair benefits at low costs will inevitably exist. Therefore, we need to provide an appropriate mechanism to prevent malicious attacks on the collaborative system, and this mechanism goes beyond just network security. Janction introduces Proof of Contribution to allocate fair rewards based on the actual contributions of participants to the model. We need to consider contribution evaluation for both data providers and computational power providers, addressing issues such as the validity and reliability of data valuation, the fairness and reasonableness of contribution evaluation, and the efficiency of contribution evaluation calculations. We have investigated the following four schemes for participant contribution evaluation:
Individual Approach: A measure of the value of a participant's own data, or a related variant, is taken as the contribution of that participant. The individual approach can be performed based on any data value metric, in particular, individual reputation, individual cross-validation, individual mutual information, individual sampling, and individual influence function. The individual method is simple, efficient, and does not consider the value gain of individual participants for the federation collectively, which is suitable for cross-device federation scenarios with a large number of participants.
Leave-one-out: The loss of data value caused by removing a participant from the federation as a whole is considered as the contribution of that participant. The leave-one-out method only takes into account the value gain of one participant in the federation, which is unfair to multiple similar and substitutable participants. It is suitable for scenarios where scarce participants are identified, and is often used as a benchmark method for evaluating other scenarios.
Shapley's Value: Enumerates all possible combinations of participants, and takes the expected marginal gain in data value of a participant joining the federation as its contribution. The Shapley scheme is intuitive, easy to understand, and ensures fairness in assessing the contribution of each individual participant, and is currently the most widely used in federation contribution assessment.
Minimum Kernel: The estimation of each participant's contribution is transformed into an optimization problem, where the optimization objective is to make the sum of the contributions of any combination of participants as large as possible over the value of their combined data. The minimum kernel scheme is designed to optimize the allocation of sub-portfolio contributions, which ensures the fairness of the assessment of the contributions of the sub-portfolios of the participants relative to the value of the portfolios, and is more in line with the economic laws, and is therefore conducive to the long-term and stable development of the AI.
Given a distributed AI with participants , participant contribution evaluation is defined as computing the contribution vector , where represents the contribution of participant to the collaborative cooperation.
For data value measurement, in Janction's system design, data providers exist as individuals, and their data is not used in just one model training or inference process. It is inconvenient to evaluate after any computation task ends. Therefore, we use the individual method for one-time incentives to data providers, and the incentives are settled first using points. After a certain time period, the evaluation is aggregated based on multiple model inferences and training results of data usage:
where is the number of times the data is used within the settlement period, is the result evaluation of each use, is the individual reputation score according to the on-chain and off-chain behavior of the participants in Janction, and is an adjustment parameter determined by the Janction DAO based on feedback from each period.
Computational power providers directly participate in model inference and training processes. We use the Shapley value method for their contribution evaluation. The Shapley value calculation enumerates all subsets excluding a particular participant , calculating the expected marginal contribution brought by participant when added. In practice, this involves enumerating these subsets to estimate their contribution and weighting the marginal contribution by the probability of subset occurrences.
The contribution of participant is equal to the sum of the product of the probability of joining the participant set and the contribution of to the task.
: The set of participants in the computation process.
: All subsets excluding participant .
: The size of subset (i.e., the number of elements it contains).
: The size of set (i.e., the total number of participants).
: The contribution of subset to the result function .
: The contribution of subset to the result function after adding participant .
The above contribution evaluation algorithms are theoretically well-designed, and their specific implementation in Janction will be developed based on practical circumstances.