Replies: 1 comment 1 reply
-
Yeah, the more nodes the more security you gain overall and adding heavy ECC on top also provides very strong guarantees. The flipside is well... you need more nodes per dataset, which might not always be possible for less popular datasets. I don't think there is anything conflicting in Dagger's design that would prevent heavily distributing the dataset. I see the contract as a way of relating a set of chunks to how long this chunks need to be kept by the network. The fact that the chunks can be logically related is secondary and isn't at all a hard requirement. A general side note as to why we need proofs at all - we don't need them to elevate the overall security of the dataset (tho they do indirectly contribute to it), proofs are there to allow traceability - i.e. rewarding and punishing network participants, this allows us to reliably implement/deploy incentives. Redundancy is what gives you durability, and proofs and durability are related only to the extent that proofs will help us detect that redundancy of some dataset (or specific chunk) has decreased and punish the node that allowed the dataset/chunk to go amiss. But given enough redundancy, as you noted, already gives us enough probabilistic security that the data wont ever be lost.
I think this can be mitigated, maybe we can even aggregate proofs from different datasets. But yes, definitely a big tradeoff.
It depends on what you means by plain replicas, we might have many duplicate chunks in the network, both under same or different contracts, as well as by way of ephemeral/opportunistic caching. At any rate, if we use systematic ECC it means that we're expanding the dataset with additional chunks, but it doesn't fundamentally change the structure of the dataset, otherwise it wouldn't be systematic. This is why systematic codes are generally preferred, because decoding (recovery) is usually quite costly and this is also the reason why plain copies in addition to ECC have an advantage over pure ECC, they allow recovering the dataset without having to perform any sort of decoding and only resorting to decoding when enough plaintext pieces have been lost. Keep in mind, that certain ECC does allow for some level of local recovery, but as it happens this two aspects, recovery and redundancy, are orthogonal.
Yep, see above
Also a good point
Yeah, this is a concern as well and this is why we might prefer smaller blocks, say 64KB, which is what Reed-Solomon over GF(2^16) allow us.
Generally, I'd say that this would complicate the overall design/implementation, but might be worth looking into.
Yeah, I guess we need to understand what a lot of nodes means. It's obvious that the more nodes in the network and the more the dataset is distributed across this nodes, the more secure the dataset (given sufficient ECC). So is 20 nodes a lot or a little?
Yep, also a good point
Yeah, this is an interesting idea in general, having nodes aggregate proofs locally for all the datasets they have might not be a bad idea, the problem is not necesarily keys/dataset, rather than the fact that proofs for each dataset might have to be produced at different times, but this is still worth considering. Another way of looking at this would be to apply ZK proofs for all the CPORs generated locally, but at that point we might just use ZK proofs directly and be done with it, certainly something we can look into, the only reason we haven't is time.
The problem is that no matter how you look at it verifiers need to be staked to be able to verify, otherwise you have the nothing at stake problem, now if storing nodes that are also validators loose stake equally for missing verification as well as providing proofs it could work, but I still see a possibility of colluding since I assume that nodes on the same contract know who the other nodes on that contract are, so you inherently loose pseudo anonymity for the verifiers, which is I think very important to be able to guarantee that the data is still accessible and not being withheld. In general, I don't see any advantages in having nodes doing cross verification as opposed to having dedicated verifiers? Great comments overall!! |
Beta Was this translation helpful? Give feedback.
-
The more I think about it, the more I like the idea of heavily distributing a dataset.
eg: I want to store a 1gb file, I'll ECC to 2gb, and distribute 100mo on 20 nodes
Cons:
Pros:
Regarding the number of proof, my understanding is that we could combine multiple dataset proofs as long as the datasets have the same private key & block size.
So, instead of having "1 proof /storage node / contract", we could have "1 proof / storage node / client". If we build the clients to have affinity for storage nodes which already store some of their data, that could reduce the number of proof significantly. (100 trillions dataset in S3, but I'm pretty sure there less than 100 trillions aws customers. Can't find any numbers, but at best should be in millions)
Other idea, if you have enough people in a single contract, they can start to check each others (if they are not under the same gouvernance ofc). We could keep public verifiability, but make each participant of a contract be "aggregators". That would lead to better scalability, since each contract is "self-sufficient"
Could also open other interesting possibilities, but I've run out of characters for this message, so LMK what you think 🙂
by @Menduist
Beta Was this translation helpful? Give feedback.
All reactions