-
Notifications
You must be signed in to change notification settings - Fork 260
Home
- Where can I find BlockSci's documentation?
- Does BlockSci support cryptocurrency XYZ?
- Does BlockSci support Monero?
- Does BlockSci support Ethereum?
- Does BlockSci support Omni Layer / Colored Coins / etc.?
- What software do you use to develop BlockSci?
- Does BlockSci run on CentOS / Windows / etc.?
- Does BlockSci provide state-of-the-art clustering?
- How do I use BlockSci's clustering module?
- Which heuristic is the clusterer using by default?
- How do I disable change address clustering?
- Why is
cluster.size()
slow?
- How can I map addresses to exchanges or pools?
- How do I extract the full scriptPubKey and scriptSig of an output/input?
- How can I plot the UTXO Age Distribution over time?
Documentation for the Python module is available here.
BlockSci supports many cryptocurrencies that are similar to Bitcoin (e.g., they forked Bitcoin's codebase and made no modifications to the data model). BlockSci comes with a disk parser that is highly optimized for Bitcoin, and a RPC parser that should work with most forks of Bitcoin (but is much slower than the disk parser).
The disk parser can break when a cryptocurrency changes the data format, adds new consensus rules or otherwise changes the rules of how blocks and transactions are created.
No. Monero's data model is different from Bitcoin's and thus doesn't currently work with BlockSci. It would be possible to extend BlockSci to support Monero, but this is currently not on our roadmap.
No. Ethereum's design is fundamentally different from Bitcoin's and thus incompatible with BlockSci.
BlockSci only handles parsing of the core blockchain layer (layer 1), but exposes any special data stored in the blockchain. Thus, for most protocols that build upon layer 1 you can write your own analysis code.
Related issues:
We're developing BlockSci on OSX using XCode. You can easily generate an XCode project using cmake
:
mkdir xcode && cd xcode
cmake -G Xcode ..
We don't have any recommendations for IDEs on other platforms, though we are using gdb
to debug BlockSci on Linux.
We only provide support for Ubuntu and OSX (MacOS). It may be possible to run BlockSci on other platforms by manually compiling the various dependencies.
BlockSci provides the fundamental building blocks of address clustering: multi-input clustering with CoinJoin detection and change address clustering with support for various different change address heuristics.
There are, however, many individual cornercases (e.g., MtGox allowing users to import their private keys, breaking the multi-input heuristic) that deserve special consideration in order to prevent the occurrence of large superclusters. To some degree, address clustering today is more art than science, and building such a highly optimized clustering module is out of our scope for BlockSci. Anything that goes beyond the basic address clustering described above, you'll need to implement yourself.
Here's some helpful literature on address clustering:
- A Fistful of Bitcoins: Characterizing Payments Among Men with No Names
- The Unreasonable Effectiveness of Address Clustering
- Data-Driven De-Anonymization in Bitcoin
We recommend using the clustering module available through the Python interface.
If you haven't used the clusterer before, you'll need to first create a clustering:
import blocksci
chain = blocksci.chain("/path/to/blocksci/data/") # in v0.6 this needs to point to the config file
cm = blocksci.cluster.ClusterManager.create_clustering("/directory/where/cluster/files/can/be/stored", chain)
If you already created such a clustering, you can simply load it:
cm = blocksci.cluster.ClusterManager("/directory/where/cluster/files/can/be/stored", chain)
By default, the clusterer is using the following two heuristics:
- Multi-Input: Inputs that are co-spent in the same transaction are clustered together, unless the transaction looks like a CoinJoin transaction.
- Legacy Change: If there is an output that has less value than any of the inputs and was the first output to send coins to the associated address, it is clustered as the change address.
BlockSci provides a number of different change address heuristics.
You can use a different change address heuristic by passing it to the create_clustering
function. For example:
reuse_change_heuristic = blocksci.heuristics.change.address_reuse()
cm = blocksci.cluster.ClusterManager.create_clustering("/directory/where/cluster/files/can/be/stored", chain, reuse_change_heuristic)
Currently, you need to use the following workaround to disable change address clustering:
no_change_heuristic = blocksci.heuristics.change.legacy() - blocksci.heuristics.change.legacy()
cm = blocksci.cluster.ClusterManager.create_clustering("/directory/where/cluster/files/can/be/stored", chain, no_change_heuristic)
In v0.6, you can use the none
heuristic:
cm = blocksci.cluster.ClusterManager.create_clustering("/directory/where/cluster/files/can/be/stored", chain, blocksci.heuristics.change.none)
Clustering works based on equiv addresses. When calling cluster.size()
, BlockSci first needs to look up in a database with which address types the equiv addresses are actually used on chain.
Instead, you can use cluster.type_equiv_size
which does not need to perform the database lookups but simply returns the number of equiv addresses in the cluster.
BlockSci allows to tag address clusters with names, but we don't provide any such tags ourself. There are a few public sources such as WalletExplorer or Blockchain.info, but they may not be reliable or complete.
BlockSci can map blocks to pools by looking at the information contained in the coinbase transaction, but the data we use to identify pools does not cover all pools/coinbase transactions. Furthermore, there's no guarantee that this information is correct.
blocksci.get_miner(chain[300005])
>>> 'SlushPool'
For most standard scripts, BlockSci does not store the full scriptSig and scriptPubKey but instead extracts the important information and stores it as an Address
. Docs » Reference » Address Classes » Addresses provides more information about what information is stored.
Only for non-standard scripts the actual scriptSig and scriptPubKey is stored. For example:
myout = chain.tx_with_hash("15c2b9bc3b93e0c0a037c5fa8402d0e34e13d3bb0ce7fca65888e5d24e597dcc").outputs[0]
myout.address_type == blocksci.address_type.nonstandard
>> True
myoutput.address.out_script
>> 'OP_DEPTH OP_1SUB OP_IF OP_RETURN 737069746861736820616e6420796d6f64652c2062726f6772616d6d657273346c796665 OP_ENDIF 0 OP_TOALTSTACK OP_DUP OP_HASH256 efb81cd930d56703304f63d7f94575c4cd17f0985ed2fd126aabf1d866471d2f OP_EQUAL OP_IF 1 OP_TOALTSTACK OP_ENDIF OP_DUP OP_HASH256 9ddd5c986827e8bc5848b4fdc1f8152f597b852ed2429ae7ee2baf7a14096a8f OP_EQUAL OP_IF 1 OP_TOALTSTACK OP_ENDIF OP_DUP OP_HASH256 fda5bd74925349ba07de25db126b9148a7a508e48475c33d2abe7c81a341a3ab OP_EQUAL OP_IF 1 OP_TOALTSTACK OP_ENDIF OP_FROMALTSTACK'