-
Notifications
You must be signed in to change notification settings - Fork 4
/
mechanism.tex
369 lines (289 loc) · 22.7 KB
/
mechanism.tex
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
% !TEX root=./whitepaper.tex
\section{Incentive Mechanism}
This section describes the incentive mechanism design of the \projabbrev Storage, which consists of two types of actors: users and miners (a.k.a. "storage nodes").
Users pay \projabbrev tokens to create data entries in the log and add data to the network, while miners provide data services and receive \projabbrev tokens as a reward from the network.
The payment from users to miners is mediated by the \project consensus network,
as the service is sustained by the whole network rather than any specific miner.
\projabbrev Storage implements storage service in a ``pay once for a fixed period of time'' manner.
Users pay a storage endowment for each created data entry for a certain period of time, e.g., 3 years, and thereafter the endowment is used to incentivize miners who maintain that data entry by linear distribution.
Users can recharge to extend the storage period.
The storage endowment is maintained per data entry,
and a miner is only eligible for storage rewards from data entries that he has access to.
The total storage reward paid for a data entry is independent of the popularity of that data entry.
For instance, a popular data entry stored by many miners will be frequently mined, but the reward is amortized amongst those miners.
On the other hand, a less popular data entry that is rarely mined would have its storage reward accumulate and hence provide a higher payoff to miners who store this rare data entry.
\subsection{Storage Request Pricing}
The cost of each \projabbrev Storage request is composed of two parts: a) fee and b) storage endowment.
The fee part is paid to host chain miners/validators for invoking the \project contract to process storage requests and add new data entries into the log, which is priced like any other smart contract invocation transaction.
In what follows we focus on the storage endowment part,
which supports the perpetual reward to \projabbrev Storage miners who serve the corresponding data.
Given a data storage request $\sr$ with specific amount of endowment $\sr_{endowment}$ and size of committed data $\sr_{data\_size}$ (measured in number of \sectorsize sectors),
the unit price of $\sr$ is calculated as follows:
\begin{align}
\sr_{unit\_price} &= \sr_{endowment}/\sr_{data\_size}
\end{align}
This unit price $\sr_{unit\_price}$ must exceed a globally specified lower bound to be added to the log, otherwise the request will be pending until when the lower bound decreases below $\sr_{unit\_price}$ (in the meantime miners will most likely not store this unpaid data).
Users are free to set a higher unit price $\sr_{unit\_price}$,
which would motivate more storage nodes mining on that data entry and hence lead to better data availability.
\subsection{\proof}
The \project network adopts a \proof (\sproof) mechanism to incentivize miners to store data.
%
By requiring miners to answer randomly produced queries to archived data chunks, the \sproof mechanism establishes the relation between mining proof generation power and data storage.
%
Miners answer the queries repeatly and computes an output digest for each loaded chunk util find a digest satisfies the mining difficulty (i.e., has enough leading zeros).
%
{\sproof} will stress the miners' disk I/O and reduce their capability in responding user queries.
%
So \projabbrev Storage adopts intermittent mining, in which
%
a mining epoch starts with a block generation at a specific block height on the host chain and stops when a valid {\sproof} is submitted to the \projabbrev Storage contract.
In a strawman design, a \sproof iteration consists of a computing stage and a loading stage.
%
In the computing stage, a miner computes a random recall position (the universal offset in the flow) based on an arbitrary picked random nonce and a mining status read from the host chain.
%
In the loading stage, a miner loads the archived data chunks at the given recall position, and computes output digest by hashing the tuple of mining status and the data chunks.
%
If the output digest satisfies the target difficulty, the miner can construct a legitimate \sproof consists of the chosen random nonce, the loaded data chunk and the proof for the correctness of data chunk to the mining contract.
In the following, we introduce some major considerations in improving the fairness in {\sproof} mining and formalize the {\sproof} mining mechanism.
%
% More specifically, a legitimate \sproof must include the content of queried data chunks (together with the proof of data validity).
% As a result, miners need to access archived data in order to generate \sproof's,
% and data storage service can be incentivized since it is the basis for proof generation power.
% The \sproof challenges are computed from the mining status recorded in the \project contract.
% To customize the challenges for every miner, these queries also depend on the miner's ID as well as timestamp and nonce.
% \project further requires every miner to claim in advance wihch potion of data he stores, and a miner can only get mining reward by answering challenges within his claimed storage range.
\subsubsection{Fairness for Small Miners}
When the storage size of the \projabbrev Storage network significantly exceeds the storage capacity of a single machine, a single machine will take almost all the time in finding an available recall position.
%
This makes the {\sproof} becomes proof of work.
%
To make the {\sproof} mining process be friendly to small miners with a single machine, the mining range is limited to a threshold 8 TB.
%
When the size of archived data chunks exceeds 8 TB, a miner must specify a mining range over the data flow sequence in size of 8 TB.
%
For large miners have enough machines to store all the data, it can mine on different data ranges concurrently.
\subsubsection{Disincentivize Storage Outsourcing}
To promote the whole network storing enough replicas of data chunks, \projabbrev Storage limits the reward share of a single miner with a single storage replica.
%
{\sproof} mechanism seals the data for each miner in different ways, challenges the accessing of sealed data, and prevents mining with data chunks from others.
%
If a miner outsources the data storage and queries the archived data chunks in {\sproof} mining, it must seal the answering data to compute the output digest, which is a large cost.
%
A miner can seal and unseal tens of MB data chunks by a single thread, so it can still serving the user queries efficiently.
%
As reading data from SSDs can reach up to 7 GB/s, the data sealing is a heavy task for {\sproof} mining.
%
As long as the cost imposed by data sealing exceeds the cost of purchasing disks and synchronizing data, the miners intend to store a replica of data chunks instead of sealing data on {\sproof} mining.
More specifically, every 4KB data chunk will be sealed by the miner ID and other context data.
%
Suppose {\sf seal\_seed} is a 32-byte digest of the miner ID and the context data, figure~\ref{fig:seal} defines the seal process.
%
\begin{algorithm}
\small
\SetNlSty{}{}{}
\DontPrintSemicolon
\SetKw{To}{ to }
\SetKw{In}{ in }
\SetKw{Or}{ or }
\SetKw{And}{ and }
\SetKw{From}{ from }
\SetKwInOut{Input}{Input}
\SetKwInOut{Output}{Output}
\SetKw{Define}{define}
\Input{{\sf seal\_seed}, {\sf unsealed\_data} }
\Output{{\sf sealed\_data}}
Regard {\sf unsealed\_data} in length of 4 KB as an array of 32-byte elements $\vec{d}$\;
$h\leftarrow {\sf seal\_seed}$\;
\For{$i$\From $0$\To $127$} {
$\vec{d}[i]\leftarrow \vec{d}[i]{\;\tt XOR\;} h$\;
$h\leftarrow \textrm{Keccak256}(\vec{d}[i])$\;
}
Regard $\vec{d}$ as a 4-KB data chunk {\sf sealed\_data}\;
\caption{Seal 4 KB data chunks}
\label{fig:seal}
\vspace{-4mm}
\end{algorithm}
\subsubsection{Disincentivize Distributed Mining}
As a mining mechanism for incentivizing data storage, a single storage replica should produce a similar hashrate for fairness.
%
In {\sproof} mining process, the hashrate is bottlenecked by the storage Input/Output (I/O).
%
So we expected all the miners to have similar storage I/O per unit of storage.
%
However, a mining farm can significantly increase its I/O compared to normal miners by storing data chunks in a distributed system with in-memory filesystem.
%
As the DDR memory has a significantly larger bandwidth compared to SSD, the mining farm can produce much more hashrate while contributing negligible to the data storage.
\projabbrev Storage prevents this behavior by imposing a large amount of data transfer from the computing stage to the loading stage to encourage miners to complete two stages on the same machine.
%
So if the computing stage and loading stage are separated by different machines, the hashrate is bounded by the network bandwidth, which is usually smaller than the storage I/O bandwidth.
%
In the computing stage, a random scratchpad will be generated coupled with the recall position.
%
It has the same length as the data chunk to be loaded.
%
Before computing the hash of the loaded data chunk, the data chunk should be mixed with the scratchpad by an XOR operation.
However, there is a dilemma between the read bandwidth and the transaction fee for submitting {\sproof} to the host chain.
%
Since loading data chunks from local disks should be more efficient than downloading them via the network,
%
the loading stage should leverage the fast SSD read speeds in sequential access to maximize the data chunk loading bandwidth.
%
According to our benchmarks, randomly loading 256-KB data chunks reaches 80\% read speed of sequential access.
%
However, when submitting the data chunk to the host chain, a 256-KB transaction is too large and will consume a large amount of transaction fee.
%
Reducing the size of the data chunk could mitigate this issue, but would impair the read speed.
{\sproof} mechanism adopts ``batched data loading'' to resolve this dilemma.
%
In each iteration, a miner loads a 256-KB data chunk from storage, divides it into 4-KB data chunks, and computes the hash for each 4-KB data chunk individually.
%
If one of the 64 outputs matches the target quality, the miner finds a valid answer.
%
So only the 4-KB data chunk contributing to this answer is required to be submitted to the host chain.
\subsubsection{Formal Definition for \sproof Mechanism}
Precisely, the mining process has the following steps:
\begin{enumerate}
\item Register the $\sf miner\_id$ on the mining contract;
\item For each mining epoch, repeat the following steps:
\begin{enumerate}
\item Wait for the layer-1 blockchain to release a block at a given epoch height;
\item Get the block hash $\sf block\_hash$ of this block and the relevant context (including ${\sf merkle\_root, data\_length, context\_digest}$) at this time;
\item Compute the number of minable entries $n=\lfloor {\sf data\_length} / 256\text{KB} \rfloor$.
\item For each iteration, repeat the following steps:
\begin{enumerate}
\item Pick a random 32-byte {\sf nonce};
\item Decide the mining range parameters {\sf start\_position} and {\sf mine\_length}; {\sf mine\_length} should be equal to $\min(8\text{TB},n\cdot 256\text{KB})$;
\item Compute the recall position $\tau$ and the scratchpad $\vec{s}$ by the algorithm in figure~\ref{fig:recall};
\item Load the 256-kilobyte sealed data chunk $\vec{d}$ started from the position of $h\cdot 256\text{KB}$;
\item Compute $\vec{w}=\vec{d}{\;\tt XOR\; }\vec{s}$ and divide $\vec{w}$ into 64 4-kilobyte pieces;
\item For each piece $\vec{v}$, compute the Blake2b hash of the tuple ({\sf miner\_id}, {\sf nonce}, {\sf context\_digest}, {\sf start\_position}, {\sf mine\_length}, $\vec{v}$);
\item If one of the Blake2b hash outputs is smaller than a target value, the miner found a legitimate {\sproof} solution.
\end{enumerate}
\end{enumerate}
\end{enumerate}
\begin{algorithm}
\small
\SetNlSty{}{}{}
\DontPrintSemicolon
\SetKw{To}{ to }
\SetKw{In}{ in }
\SetKw{Or}{ or }
\SetKw{And}{ and }
\SetKwInOut{Input}{Input}
\SetKwInOut{Output}{Output}
\SetKw{Define}{define}
\Input{{\sf miner\_id}, {\sf nonce}, {\sf context\_digest}, {\sf start\_position}, {\sf mine\_length}}
\Output{Recall position $\tau$ and the scratchpad $\vec{s}$}
$h\leftarrow \textrm{Blake2b}({\sf miner\_id}, {\sf nonce}, {\sf context\_digest}, {\sf start\_position}, {\sf mine\_length})$\;
Initialize an empty array $\vec{s}$ with 4096 64-byte elements\;
\For{$i\in \{x\in \mathbb{Z}\,|\,0\le x < 4095\}$}
{
$h\leftarrow \textrm{Blake2b}(h)$\;
$\vec{s}[i]\leftarrow h$\;
}
$r \leftarrow \textrm{Keccak256}(h)$\;
$n\leftarrow {\sf mine\_length}/256\mathrm{KB}$\;
$\tau \leftarrow {\sf start\_position} + r \;\text{mod}\; n$
\caption{Computing the recall position and the scratchpad}
\label{fig:recall}
\vspace{-4mm}
\end{algorithm}
% More specifically, the \sproof challenges are produced in two steps:
% first select candidate data entries based on mining status, miner's ID, and timestamp, where the probability of each data entry being selected is proportional to its size;
% then based on the miner specified nonce, determine challenged data chunks from the candidate data entries.
% Thus miners are slightly incentivized to maintain whole data entries rather than fragmented ones.
% Every \sproof challenge contains $\querynum$ queries to randomly selected data chunks,
% out of which at least $\ansnum$ queries must be answered.
% This $\ansnum$-out-of-$\querynum$ design incentivizes smaller nodes who merely store a small fraction of the data by increasing the frequency that they receive mining rewards.
% On the other hand, larger nodes who store a large fraction of the whole \project log tend to get greater reward each time by answering more than $\ansnum$ random queries (see \cref{subsec:mining reward} for details).
% %
% For example, a node storing $10\%$ of total data is able to answer $10\%$ random queries in expectation, which means answering a single random query ($1$-out-of-$1$) with probability exactly $10\%$.
% However, the probability that it manages to answer $2$-out-of-$10$ random queries turns out $1-0.9^{10}-10\times 0.1\times 0.9^9 \approx26.4\%$, while in expectation it still answers only $1$ query.
% Given the \sproof challenge, the mining node then decides whether to respond to it.
% If yes, the node should fill at least $\ansnum$ queried chunks into the proof,
% and the resulted proof is legitimate if its hash satisfies the target difficulty as recorded in the \project contract.
% Otherwise the node has to start over with a new \sproof challenge.
% In summary, the mining process has following steps:
% \begin{enumerate}
% \item Collect relevant metadata, including \emph{mining status}, \emph{target difficulty}, \emph{timestamp}, \etc.
% \label{step:start}
% \item Select candidate data entries by hashing on relevant metadata, \ie $\sproof_{entries}=\left( D_1, \dots, D_{\querynum} \right)=Hash\left(\textsf{mining\_status}, \textsf{timestamp}\right)$.
% If having less than $\ansnum$ data entries in its local storage, restart from step~\ref{step:start}.
% \item Determine challenged data chunks based on a random \emph{nonce}, \ie $\sproof_{chunks}=\left( Q_1, \dots, Q_{\querynum} \right)=Hash\left(\sproof_{entries}, \textsf{nonce}\right)$.
% \label{step:nonce}
% \item Respond to the challenge with the content of at least $\ansnum$ queried chunks, together with a characteristic vector showing which queries are answered,
% \ie $\sproof_{respond}=\left(\mathsf{v}, A_1, \dots, A_{\querynum} \right)$, where $\mathsf{v}\in\{0,1\}^{\querynum}$ is a $\querynum$-bit binary string with Hamming weight $|\mathsf{v}|\ge \ansnum$, and $A_i$ is the respond to query $Q_i$.
% More specifically, $A_i=\bot$ if the corresponding digit in $\mathsf{v}$ is $0$; and $A_i$ is the content of data chunk queried by $Q_i$ concatenated with the Merkle proof of $A_i$ with respect to the storage root of data entry $D_i$.
% The full proof is then defined as $\textsf{mining\_proof}=\left(\textsf{mining\_status}, \textsf{timestamp}, \textsf{nonce}, \sproof_{respond}\right)$.
% \item Examine the full proof against difficulty criteria, \ie check whether $Hash\left( \textsf{mining\_proof} \right) \le \textsf{target\_difficulty}$.
% If not satisfied, restart from step~\ref{step:nonce} with a new nonce, or from step~\ref{step:start} for new candidate entries.
% \item Package \textsf{mining\_proof} into a reward claiming transaction and submit to the \project contract on host chain.
% \end{enumerate}
%\subsection{Mining Reward}
%\label{subsec:mining reward}
%\projabbrev Storage creates pricing segments every 8 GB of data chunks over the data flow.
%
%Each pricing segment is associated with an \emph{Endowment Pool} and a \emph{Reward Pool}.
%
%The \emph{Endowment Pool} collects the storage endowments of all the data chunks belonging to this pricing segment and releases a fixed ratio of balance to the \emph{Reward Pool} every second.
%
%The rate of reward release is set at 4\% per year.
%The mining reward is paid to miners for providing data service.
%Miners receive mining reward when submit the first legitimate {\sproof} for a mining epoch to \projabbrev Storage contract.
%The mining reward consists of two parts:
%\begin{itemize}
% \item {\bf Base reward}: the base reward, denoted by $R_{base}$, is paid for every accepted mining proof. The base reward per proof decreases over time.
% \item {\bf Storage reward}: the storage reward, denoted by $R_{storage}$, is the perpetual reward from storing data.
%
% When a {\sproof} falls in a pricing segment, half of the balance in its \emph{Reward Pool} is claimed as the storage reward.
%\end{itemize}
%The total reward for a new mining proof is thus:
%\begin{align}
% R_{total} = R_{base} + R_{storage}
%\end{align}
\subsection{How to Balance the Number of Replicas?}
Every miner in \projabbrev Storage is required to explicitly claim the range of data entries that he maintains,
and then he may only use the claimed data in his mining process.
That is, a miner cannot receive mining rewards from unclaimed data, and on the other hand, the mining output is decreased if he fails to answer a challenge within his claimed range.
As a result, a miner has no incentive to lie about the range of data used for mining.
Since miners truthfully expose the list of data entries that they maintain,
it is easy to observe and estimate the number of replicas for every specific piece of data.
Furthermore, recalling that the storage endowment is paid to miners who maintain corresponding pieces of data,
miners are incentivized to store rare data and hence balance the number of replicas.
%\subsection{How to Incentivize Data Sharing?}
%In a storage network, storing data can be easily incentivized by relating the mining reward to storage size, whereas sharing is much less incentivized as the communication of data must happen outside consensus.
%Therefore, a rational node would prefer to share nothing to save bandwidth and power expenses.
%One might suggest using a mechanism similar to the BitTorrent network, where each node maintains a local ranking of its peers and refuses to share data with leechers.
%However, such a mechanism is far from optimal due to the mismatch between willingness and capability. This is especially so for a storage network as a newcomer cannot quickly bootstrap (they have nothing to share), whereas a powerful node with sufficient data has much less incentive to increase its ranking score.
%\projabbrev Storage introduces a built-in royalty mechanism to incentivize data-sharing behavior.
%The royalty mechanism allows a storage node to receive dividends
%from the data it shares with others
%when a new mining proof is mined based on the shared data.
%Thus, storage nodes have a clear incentive to share data with others, and newcomers get a better kick-start.
%The essence of the royalty mechanism is that data providers are rewarded only after the data is shared.
%Data sharing is hard to incentivize mainly because off-chain sharing behavior is not publicly observable.
%In particular, a third-party can hardly verify whether some node A sends specific data to another node B through an off-chain channel,
%especially when both of the two nodes could be lying.
%However, the \project loyalty mechanism bypasses the problem by switching to another observable event, \ie a new mining proof is generated based on the shared data.
%If the data sharing did not happen (e.g. the data provider ignored the request without sending data), then the requester would not be able to use that piece of data in its mining proof, and hence there is no reward for the unanswered data request.
%On the other hand, when the data is shared and by probability queried by the \sproof challenge, a rational miner will include it in the mining proof for a better payoff, which is positive even after deducting the dividends to the data provider.
%More specifically, the royalty mechanism works as follows:
%\begin{itemize}
% \item The data requester sends his request to the \projabbrev Storage contract to create a royalty record. Each royalty record includes the range of requested data entries, the designated data provider, and the length of the royalty period.
% \item The data provider, when observing a new request on the host chain, decides on its own whether or not to send requested data to the requester.
% \item When the requester submits a mining proof $\pi$, the \project contract checks for every \sproof challenge query $Q_i$ in $\pi$ whether $Q_i$ falls in the range of some valid royalty record.
% If yes, the data provider gets a certain fraction of the storage reward as a dividend.
% \begin{itemize}
% \item If yes, then half of the corresponding storage reward $R_i$ as in \cref{eq:storage reward} will be assigned to the data provider, \ie each of the author of $\pi$ and the data provider gets $\frac{1}{2} R_i$.
% \item Otherwise the author of $\pi$ gets full storage reward $R_i$ for query $Q_i$.
% \end{itemize}
% \item Expired royalty records will be removed.
%\end{itemize}
%As a result, data providers are incentivized to store and share data,
%and the expected royalty dividends are proportional to the amount of shared data.
%There is no incentive to forge sharing relations since the total storage reward $R_i$ is unchanged with or without data sharing royalty.
%The expiration of royalty records is introduced for three reasons: a) in case a request is ignored, the requester can resubmit it to another data provider after the previous request expires;
%b) a permanent royalty record does not make sense in practice, since the requester can always open a new account and hence renders the record useless;
%c) cleaning expired royalty records reduces the overhead of maintaining royalty-sharing relations.
%Furthermore, the history of data sharing and royalty dividends can be observed onchain, which improves the effectiveness of credit ranking in the \projabbrev Storage network.
% \section{Fairness and Counter-Cheating}