multi-node2vec

This is Python source code for the multi-node2vec algorithm. Multi-node2vec is a fast network embedding method for multilayer networks that identifies a continuous and low-dimensional representation for the unique nodes in the network.

Details of the algorithm can be found in the paper: Fast Embedding of Multilayer Networks: An Algorithm and Application to Group fMRI by JD Wilson, M Baybay, R Sankar, and P Stillman.

Preprint: https://arxiv.org/pdf/1809.06437.pdf

Contributors:

Melanie Baybay University of San Francisco, Department of Computer Science
Rishi Sankar Henry M. Gunn High School
James D. Wilson (maintainer) University of San Francisco, Department of Mathematics and Statistics

Questions or Bugs? Contact James D. Wilson at [email protected]

Description

The Mathematical Objective

A multilayer network of length m is a collection of networks or graphs {G₁, ..., G_m}, where the graph G_j models the relational structure of the jth layer of the network. Each layer G_j = (V_j, W_j) is described by the vertex set V_j that describes the units, or actors, of the layer, and the edge weights W_j that describes the strength of relationship between the nodes. Layers in the multilayer sequence may be heterogeneous across vertices, edges, and size. Denote the set of unique nodes in {G₁, ..., G_m} by N, and let N = |N| denote the number of nodes in that set.

The aim of the multi-node2vec is to learn an interpretable low-dimensional feature representation of N. In particular, it seeks a D-dimensional representation

F: N --> R^D,

where D < < N. The function F can be viewed as an N x D matrix whose rows {f_v: v = 1, ..., N} represent the feature space of each node in N.

The Algorithm

The multi-node2vec algorithm estimates F through maximum likelihood estimation, and relies upon two core steps

NeighborhoodSearch: a collection of vertex neighborhoods from the observed multilayer graph, also known as a BagofNodes, is identified. This is done through a 2nd order random walk on the multilayer network.
Optimization: Given a BagofNodes, F is then estimated through the maximization of the log-likelihood of F | N. This is done through the application of stochastic gradient descent on a two-layer Skip-gram neural network model.

The following image provides a schematic:

Running multi-node2vec

Requirements

This package requires Python == 3.6 with the following libraries:

numpy==1.12.1
pandas==0.24.0
gensim==2.3.0
networkx==2.5.1

You can install these libraries by running the command

pip install -r requirements.txt

from this project's root directory.

Usage

python3 multi_node2vec.py [--dir [DIR]] [--output [OUTPUT]] [--d [D]] [--walk_length [WALK_LENGTH]] [--window_size [WINDOW_SIZE]][--n_samples [N_SAMPLES]][--thresh [THRESH]][--w2v_iter [W2V_ITER]] [--w2v_workers [W2V_WORKERS]] [--rvals [RVALS]] [--pvals [PVALS]] [--qvals [QVALS]]

Arguments

--dir [directory name] : Absolute path to directory of correlation/adjacency matrix files in csv format. Note that each .csv should contain an adjacency matrix with columns and rows labeled by the node ID.
--output [filename] : Absolute path to output file (no extension).
--d [dimensions] : Dimensionality. Default is 100.
--walk_length [n] : Length of each random walk for identifying multilayer neighborhoods. Default is 100.
--window_size [w] : Size of context window used for Skip Gram optimization. Default is 10.
--n_samples [samples] : Number of times to sample a layer. Default is 1.
--thresh [thresh] : Threshold for converting a weighted network to an unweighted one. All weights less than or equal to thresh will be considered 0 and all others 1. Default is 0.5. Use None if the network is unweighted.
--w2v_workers [workers] : Number of parallel worker threads. Default is 8.
--rvals [layer walk prob]: The unnormalized walk probability for traversing layers. Default is .25.
--pvals [return prob] : The unnormalized walk probability of returning to a previously seen node. Default is 1.
--qvals [explore prob] : The unnormalized walk probability of exploring new nodes. Default is 0.50.

Examples

Quick Test example

This example runs multi-node2vec on a small test multilayer network with 2 layers and 264 nodes in each layer. It takes about 2 minutes to run on a personal computer using 8 cores.

python3 multi_node2vec.py --dir data/test --output results/test --d 100 --window_size 2 --n_samples 1 --thresh 0.5 --rvals 0.25

fMRI Case Study

This example runs multi-node2vec on the multilayer network representing group fMRI of 74 healthy controls as run in the paper Fast Embedding of Multilayer Networks: An Algorithm and Application to Group fMRI. The model will generate generate 100 features for each of 264 unique nodes using a walk parameter r = 0.25. The values of p (=1) and q (=0.50) are set to the default of what is available in the original node2vec specification. It takes about an hour to run on a personal computer using 8 cores.

python3 multi_node2vec.py --dir data/CONTROL_fmt --output results/control --d 100 --window_size 10 --n_samples 1 --rvals 0.25 --pvals 1 --thresh 0.5 --qvals 0.5

Name		Name	Last commit message	Last commit date
Latest commit History 25 Commits
data		data
results/test		results/test
src		src
.DS_Store		.DS_Store
README.md		README.md
mn2vec_toy.png		mn2vec_toy.png
multi_node2vec.py		multi_node2vec.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

multi-node2vec

Description

The Mathematical Objective

The Algorithm

Running multi-node2vec

Requirements

Usage

Examples

About

Releases

Packages

Contributors 2

Languages

jdwilson4/multi-node2vec

Folders and files

Latest commit

History

Repository files navigation

multi-node2vec

Description

The Mathematical Objective

The Algorithm

Running multi-node2vec

Requirements

Usage

Examples

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages