TypeError when Running Leiden Community Detection on Large Graph #213

mjaworski22 · 2022-07-25T16:42:58Z

Describe the bug
When I fed my graph data into the cdlib.algorithms.leiden() method on a 400k node graph with 600k edges, the algorithm operated correctly and identified communities in the graph. When I did this for a 1million node graph with 1.6 million edges, I get a TypeError.

To Reproduce
Steps to reproduce the behavior:

CDlib version: 0.2.6
Operating System: Windows 10
Python version: 3.9.7
Version(s) of CDlib required libraries:
numpy => 1.22.0
future => 0.18.2
matplotlib => 3.4.3
scikit-learn => 0.24.2
tqdm => 4.62.3
networkx => 2.6.3
demon => 2.0.6
python-louvain => 0.16
nf1 => 0.0.4
scipy => 1.7.1
pulp => 2.6.0
seaborn => 0.11.2
pandas => 1.3.4
eva_lcd => 0.1.1
bimlpa => 0.1.2
markov_clustering => 0.0.6.dev0
chinese_whispers => 0.8.0
python-igraph => 0.9.11
angel-cd => 1.0.3
pooch => 1.6.0
dynetx => 0.3.1
thresholdclustering => 1.1
pyclustering => 0.10.1.2
cython => 0.29.24
python-Levenshtein => 0.12.2

Step 1
Load dataset from csv file into NetworkX graph object using the following function:

def load(csv_path):
    df = pd.read_csv(csv_path)
    Graphtype = nx.Graph()
    G = nx.from_pandas_edgelist(df, 'from_address', 'to_address', edge_attr='value', create_using=Graphtype)

    return(G)

Step 2:
Run cdlib.algorithms.leiden() on the NetworkX graph from Step 1 using the following function:

def find_coms_leiden(graph_nx):
    coms = algorithms.leiden(graph_nx)

    return coms

Step 3:
Write the communities object to a file using the following function:

def write_coms(coms, out_file):
    readwrite.write_community_csv(coms, out_file, ",")

Step 4:
Main

def main():
    Graph = load('./data.csv')
    coms = find_coms_leiden(Graph)
    write_coms(coms, 'coms.csv')

When I run with data as 1M nodes and 1.6M edges:

Traceback (most recent call last):
  File "...\main.py", line 90, in <module>        
    main()
  File "...\main.py", line 78, in main
    coms = find_coms_leiden(Graph)
  File "...\main.py", line 33, in find_coms_leiden
    coms = algorithms.leiden(graph_nx)
  File "C:\Anaconda\lib\site-packages\cdlib\algorithms\crisp_partition.py", line 599, in leiden
    g = convert_graph_formats(g_original, ig.Graph)
  File "C:\Anaconda\lib\site-packages\cdlib\utils.py", line 187, in convert_graph_formats
    return __from_nx_to_igraph(graph, directed)
  File "C:\Anaconda\lib\site-packages\cdlib\utils.py", line 122, in __from_nx_to_igraph
    gi.add_edges([(u, v) for (u, v) in g.edges()])
  File "C:\Anaconda\lib\site-packages\igraph\__init__.py", line 376, in add_edges
    res = GraphBase.add_edges(self, es)
TypeError: only non-negative integers, strings or igraph.Vertex objects can be converted to vertex IDs

Expected behavior
When I run with data as 400k nodes and 600k edges, the program runs, loads data, calculates communities, and writes them to file properly:
See Screenshot 2 in Screenshots Section

Running with 1M nodes and 1.6M edges is expected to output to file the same way (different data obviously).

Screenshots
Example of expected result written to file of using input data of 400k nodes and 600k edges:

Additional Context
I use nx.info(my_graph) to check how many edges and nodes are in the input graphs. This was run before cdlib.algorithms.leiden() and it successfully parsed through the data.

The text was updated successfully, but these errors were encountered:

github-actions · 2022-07-25T16:43:47Z

Thanks for submitting your first issue!

GiulioRossetti · 2022-07-26T18:58:32Z

Thanks for raising the issue.

Have you tried loading the network with igraph instead of using networkx?

It seems that the error occurs during the graph conversion.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TypeError when Running Leiden Community Detection on Large Graph #213

TypeError when Running Leiden Community Detection on Large Graph #213

mjaworski22 commented Jul 25, 2022 •

edited

Loading

github-actions bot commented Jul 25, 2022

GiulioRossetti commented Jul 26, 2022 •

edited

Loading

TypeError when Running Leiden Community Detection on Large Graph #213

TypeError when Running Leiden Community Detection on Large Graph #213

Comments

mjaworski22 commented Jul 25, 2022 • edited Loading

github-actions bot commented Jul 25, 2022

GiulioRossetti commented Jul 26, 2022 • edited Loading

mjaworski22 commented Jul 25, 2022 •

edited

Loading

GiulioRossetti commented Jul 26, 2022 •

edited

Loading