Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TypeError when Running Leiden Community Detection on Large Graph #213

Open
mjaworski22 opened this issue Jul 25, 2022 · 2 comments
Open

Comments

@mjaworski22
Copy link

mjaworski22 commented Jul 25, 2022

Describe the bug
When I fed my graph data into the cdlib.algorithms.leiden() method on a 400k node graph with 600k edges, the algorithm operated correctly and identified communities in the graph. When I did this for a 1million node graph with 1.6 million edges, I get a TypeError.

To Reproduce
Steps to reproduce the behavior:

  • CDlib version: 0.2.6
  • Operating System: Windows 10
  • Python version: 3.9.7
  • Version(s) of CDlib required libraries:
    numpy => 1.22.0
    future => 0.18.2
    matplotlib => 3.4.3
    scikit-learn => 0.24.2
    tqdm => 4.62.3
    networkx => 2.6.3
    demon => 2.0.6
    python-louvain => 0.16
    nf1 => 0.0.4
    scipy => 1.7.1
    pulp => 2.6.0
    seaborn => 0.11.2
    pandas => 1.3.4
    eva_lcd => 0.1.1
    bimlpa => 0.1.2
    markov_clustering => 0.0.6.dev0
    chinese_whispers => 0.8.0
    python-igraph => 0.9.11
    angel-cd => 1.0.3
    pooch => 1.6.0
    dynetx => 0.3.1
    thresholdclustering => 1.1
    pyclustering => 0.10.1.2
    cython => 0.29.24
    python-Levenshtein => 0.12.2

Step 1
Load dataset from csv file into NetworkX graph object using the following function:

def load(csv_path):
    df = pd.read_csv(csv_path)
    Graphtype = nx.Graph()
    G = nx.from_pandas_edgelist(df, 'from_address', 'to_address', edge_attr='value', create_using=Graphtype)

    return(G)

Step 2:
Run cdlib.algorithms.leiden() on the NetworkX graph from Step 1 using the following function:

def find_coms_leiden(graph_nx):
    coms = algorithms.leiden(graph_nx)

    return coms

Step 3:
Write the communities object to a file using the following function:

def write_coms(coms, out_file):
    readwrite.write_community_csv(coms, out_file, ",")

Step 4:
Main

def main():
    Graph = load('./data.csv')
    coms = find_coms_leiden(Graph)
    write_coms(coms, 'coms.csv')
  • When I run with data as 1M nodes and 1.6M edges:
Traceback (most recent call last):
  File "...\main.py", line 90, in <module>        
    main()
  File "...\main.py", line 78, in main
    coms = find_coms_leiden(Graph)
  File "...\main.py", line 33, in find_coms_leiden
    coms = algorithms.leiden(graph_nx)
  File "C:\Anaconda\lib\site-packages\cdlib\algorithms\crisp_partition.py", line 599, in leiden
    g = convert_graph_formats(g_original, ig.Graph)
  File "C:\Anaconda\lib\site-packages\cdlib\utils.py", line 187, in convert_graph_formats
    return __from_nx_to_igraph(graph, directed)
  File "C:\Anaconda\lib\site-packages\cdlib\utils.py", line 122, in __from_nx_to_igraph
    gi.add_edges([(u, v) for (u, v) in g.edges()])
  File "C:\Anaconda\lib\site-packages\igraph\__init__.py", line 376, in add_edges
    res = GraphBase.add_edges(self, es)
TypeError: only non-negative integers, strings or igraph.Vertex objects can be converted to vertex IDs

Expected behavior
When I run with data as 400k nodes and 600k edges, the program runs, loads data, calculates communities, and writes them to file properly:
See Screenshot 2 in Screenshots Section

Running with 1M nodes and 1.6M edges is expected to output to file the same way (different data obviously).

Screenshots
Example of expected result written to file of using input data of 400k nodes and 600k edges:
image

Additional Context
I use nx.info(my_graph) to check how many edges and nodes are in the input graphs. This was run before cdlib.algorithms.leiden() and it successfully parsed through the data.

@github-actions
Copy link

Thanks for submitting your first issue!

@GiulioRossetti
Copy link
Owner

GiulioRossetti commented Jul 26, 2022

Thanks for raising the issue.

Have you tried loading the network with igraph instead of using networkx?

It seems that the error occurs during the graph conversion.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants