Skip to content
Malte Möser edited this page Jul 25, 2020 · 17 revisions

Frequently Asked Questions

Table of Contents

General

Where can I find BlockSci's documentation?

Documentation for the Python interface is available here. Most users will want to use this interface.

Does BlockSci support cryptocurrency XYZ?

BlockSci can support many cryptocurrencies that are similar to Bitcoin (e.g., they forked Bitcoin's codebase and made no modifications to the data model). BlockSci comes with a disk parser that is highly optimized for Bitcoin, and a RPC parser that should work with forks of Bitcoin (but is much slower than the disk parser).

The disk parser can break when a cryptocurrency changes the data format, adds new consensus rules or otherwise changes the rules of how blocks and transactions are created.

Does BlockSci support Monero?

No. Monero's data model is different from Bitcoin's and thus doesn't currently work with BlockSci. It would be possible to extend BlockSci to support Monero, but this is currently not on our roadmap.

Does BlockSci support Ethereum?

No. Ethereum's design is fundamentally different from Bitcoin's and thus incompatible with BlockSci.

Does BlockSci support Omni Layer / Colored Coins / etc.?

BlockSci only handles parsing of the core blockchain layer (layer 1), but exposes any special data stored in the blockchain. Thus, for most protocols that build upon layer 1, you can write your own analysis code.

Related issues:

What software do you use to develop BlockSci?

We're developing BlockSci on OSX using XCode. You can easily generate an XCode project using cmake:

mkdir xcode && cd xcode
cmake -G Xcode -DOPENSSL_ROOT_DIR=/usr/local/opt/openssl ..

We don't have any recommendations for IDEs on other platforms, though we are using gdb to debug BlockSci on Linux.

Does BlockSci run on CentOS / Windows / etc.?

We only provide support for Ubuntu and OSX (MacOS). It may be possible to run BlockSci on other platforms by manually compiling the various dependencies.

Common Issues

BlockSci Diagnostics (v0.6)

If you are using the v0.6 development branch and encounter an issue with your BlockSci setup, you can try running blocksci_parser YOURCONFIG.json doctor to diagnose issues. Note that this only checks a handful of potential issues and may not be able to identify your specific problem.

Open files limit: Addresses are missing transactions

The default open files limit of many Linux distributions (e.g., Ubuntu) is too small for BlockSci. This can lead to, among other things, transactions apparently missing from addresses (i.e. when using addr.txes()). After you have increased the open files limit, reparse the chain and those missing transactions should show up.

AMI Disk Space

As of August 2019 the default disk size of 500GB of the v0.5 AMI may not suffice anymore, we therefore strongly recommend choosing a larger disk size (e.g., 600 GB) when you first create the instance.

Follow this guide to increase the disk space of your existing AMI.

Clustering

Does BlockSci provide state-of-the-art clustering?

BlockSci provides the fundamental building blocks of address clustering: multi-input clustering with CoinJoin detection and change address clustering with support for various different change address heuristics.

There are, however, many corner cases (e.g., MtGox allowing users to import their private keys, breaking the multi-input heuristic) that require special treatment to prevent the occurrence of "superclusters". Superclusters are extremely large clusters that occur when different clusters collapse into each other due to over-eager address linking. To some degree, address clustering today is more art than science, and building a highly accurate clustering module, while possible, is not in the current roadmap for BlockSci. Anything that goes beyond the basic address clustering described above, you'll need to implement yourself.

Here's some helpful literature on address clustering:

How do I use BlockSci's clustering module?

We recommend using the clustering module available through the Python interface.

If you haven't used the clusterer before, you'll need to first create a clustering:

import blocksci
chain = blocksci.chain("/path/to/blocksci/data/") # in v0.6 this needs to point to the config file

cm = blocksci.cluster.ClusterManager.create_clustering("/directory/where/cluster/files/can/be/stored", chain)

If you already created such a clustering, you can simply load it:

cm = blocksci.cluster.ClusterManager("/directory/where/cluster/files/can/be/stored", chain)

Which heuristic is the clusterer using by default?

By default, the clusterer is using the following two heuristics:

  • Multi-Input: Inputs that are co-spent in the same transaction are clustered together, unless the transaction looks like a CoinJoin transaction.
  • Legacy Change: If there is an output that has less value than any of the inputs and was the first output to send coins to the associated address, it is clustered as the change address. (In v0.6 no change address clustering is performed by default).

How do I use a different change address heuristic?

BlockSci provides a number of different change address heuristics.

You can use a different change address heuristic by passing it to the create_clustering function. For example:

reuse_change_heuristic = blocksci.heuristics.change.address_reuse()
cm = blocksci.cluster.ClusterManager.create_clustering("/directory/where/cluster/files/can/be/stored", chain, reuse_change_heuristic)

How do I disable change address clustering?

Currently, you need to use the following workaround to disable change address clustering:

no_change_heuristic = blocksci.heuristics.change.legacy() - blocksci.heuristics.change.legacy()
cm = blocksci.cluster.ClusterManager.create_clustering("/directory/where/cluster/files/can/be/stored", chain, no_change_heuristic)

In v0.6, you can use the none heuristic:

cm = blocksci.cluster.ClusterManager.create_clustering("/directory/where/cluster/files/can/be/stored", chain, blocksci.heuristics.change.none)

Why do some clusters appear to be empty?

Clusters may appear to be empty (with cluster.size() == 0 and cluster.transactions() == []) while cluster.type_equiv_size is greater than 0. This is not a bug, but an artifact of BlockSci's internal deduplication.

For example, assume there is a multisig address with three pubkeys. BlockSci keeps track of the three pubkeys independently of their combined use in a multisig address. During clustering, each of these four addresses (the multisig as well as the three pubkeys) starts in their own cluster. If the individual pubkeys are never used on their own, they'll remain in their single-address cluster. If a method such as .size() or .transactions() is called for such a cluster, BlockSci will check whether the addresses in the cluster have actually been used. If an address has never been used individually (as in the example above), BlockSci will tell you that the cluster is empty.

Why is cluster.size() slow?

Clustering works based on equiv addresses (see above). When calling cluster.size(), BlockSci first needs to look up in a database with which address types the equiv addresses are actually used on chain.

You can use cluster.type_equiv_size which does not perform the database lookups but simply returns the number of equiv addresses in the cluster.

How do I use BlockSci's tagging feature?

You can pass an {address: tag} dictionary to blocksci.cluster.ClusterManager.tagged_clusters(<tags>) function to retrieve an iterator over all clusters that contain tagged addresses. See below for an example that uses a graphsense-tagpack for tags from walletexplorer.com.

import yaml

def import_from_tagpack(chain, filename):
    tag_file = open(filename, "r")
    data = yaml.safe_load(tag_file)
    
    tags = {chain.address_from_string(x['address']): str(x['label']) for x in data['tags']}

    print(data['description'])
    print("Curated by {}\n".format(data['creator']))
    print("Successfully loaded {} tags.".format(len(tags)))
    
    return tags

tags = import_from_tagpack(chain, "data/walletexplorer.yaml")
tagged_clusters = cm.tagged_clusters(tags).to_list()

Refer to the documentation for more information about the TaggedCluster and TaggedAddress classes.

Analysis / How do I ...?

How can I map addresses to exchanges or pools?

BlockSci allows to tag address clusters with names, but we don't provide any such tags ourself. There are a few public sources such as WalletExplorer or Blockchain.info, but they may not be reliable or complete.

BlockSci can map blocks to pools by looking at the information contained in the coinbase transaction, but the data we use to identify pools does not cover all pools/coinbase transactions. Furthermore, there's no guarantee that miners report their identity correctly in the coinbase transaction.

blocksci.get_miner(chain[300005])
>>> 'SlushPool'

Related Issues: #160, #250

How do I extract the full scriptPubKey and scriptSig of an output/input?

For most standard scripts, BlockSci does not store the full scriptSig and scriptPubKey but instead extracts the important information and stores it as an Address. Docs » Reference » Address Classes » Addresses provides more information about what information is stored.

The actual scriptSig and scriptPubKey are stored only for non-standard scripts. For example:

myout = chain.tx_with_hash("15c2b9bc3b93e0c0a037c5fa8402d0e34e13d3bb0ce7fca65888e5d24e597dcc").outputs[0]

myout.address_type == blocksci.address_type.nonstandard
>> True

myoutput.address.out_script
>> 'OP_DEPTH OP_1SUB OP_IF OP_RETURN 737069746861736820616e6420796d6f64652c2062726f6772616d6d657273346c796665 OP_ENDIF 0 OP_TOALTSTACK OP_DUP OP_HASH256 efb81cd930d56703304f63d7f94575c4cd17f0985ed2fd126aabf1d866471d2f OP_EQUAL OP_IF 1 OP_TOALTSTACK OP_ENDIF OP_DUP OP_HASH256 9ddd5c986827e8bc5848b4fdc1f8152f597b852ed2429ae7ee2baf7a14096a8f OP_EQUAL OP_IF 1 OP_TOALTSTACK OP_ENDIF OP_DUP OP_HASH256 fda5bd74925349ba07de25db126b9148a7a508e48475c33d2abe7c81a341a3ab OP_EQUAL OP_IF 1 OP_TOALTSTACK OP_ENDIF OP_FROMALTSTACK'

How can I extract balances of all addresses?

See Faster way to get all address balances #264

How can I plot the UTXO Age Distribution over time?

See Updating the UTXO set at each block #108

Development version v0.6

We're currently working on a new version on the v0.6 branch. This development branch can be unstable at times.

Syntax changes for blocksci_parser

In v0.6 you first need to create a config file (e.g., btc.json). The general syntax is:

blocksci_parser <config file> generate-config <coin type> <data directory> [--max-block <max block>] [--disk <coin directory>] [--rpc <username> <password> [--address <address>] [--port <port>]]

Run blocksci_parser help to get more information about the available options.

Then, you can update the chain by running blocksci_parser <config file> update.

Clone this wiki locally