New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Draft how-to for getting nodes and edges tables from network #23

Open

caro401 wants to merge 1 commit into main from 20-howto-access-networkdata-tables

Collaborator

caro401 commented Dec 11, 2023

This is aimed at technical users (jupyter users of the python API and/or me developing UI code).

Looking for review from @makkus that the content is technically correct, and from an end user @CBurge95 that it's clear enough (have I provided enough context for example), and addresses her original questions from #20


          Draft how-to for getting nodes and edges tables from network

41256df

caro401 requested review from makkus and CBurge95

December 11, 2023 16:13

caro401 linked an issue

that may be closed by this pull request

Accessing/querying network_data & module outputs #20

Open

makkus reviewed

View reviewed changes

Collaborator

makkus left a comment

Added my comments, will update here once I decided on the interface and added the 'network_graph.pick.table` module.

developer/how-to-view-the-data-in-networks.md

		@@ -0,0 +1,66 @@
		# How to view the data in a NetworkData type

Collaborator

makkus Dec 24, 2023

One comment in advance: personally, I'd probably have a section in the docs that deals with tabular data, and only explain here how to get to the tables, and then link to the more generic documentaiton re: querying and other things to do with it.

developer/how-to-view-the-data-in-networks.md


		Quite often, you'll want to inspect the raw contents of the nodes and/or edges tables which contain the data behind a `NetworkData` value. This might be to get an overview of what's in your network, or to look at the values of centrality measures you've just calculated and applied to the network.

		The nodes and edges tables can be accessed from a `NetworkData` value by calling the `get_table` method on the `NetworkData`, passing the appropriate table name `"nodes"` or `"edges"` as argument. This resulting value is a `KiaraTable`, which in turn is backed by a `pyarrow.Table` from [Apache arrow](https://arrow.apache.org/docs/python/generated/pyarrow.Table.html). The Arrow table contains the raw data, and can be accessed via the `arrow_table` property on a `KiaraTable`.

Collaborator

makkus Dec 24, 2023

After the refactoring we talked about, the data type is now called 'NetworkGraph, but I tried to keep the interface the same as much as possible. get_tablewould still work, but a user could also just call theedgesandnodesattributes and get the sameKiaraTableinstance they would get withget_table`.

developer/how-to-view-the-data-in-networks.md


		In order to view the data contained in the Arrow table, you'll need to turn it into a different data format. The `pyarrow.Table` data type provides a few options for converting the data, for example `to_pandas()` to get a NumPy array or pandas DataFrame, and `to_pydict()` and `to_pylist()` to get plain Python data types, which you can then manipulate as you choose.

		Be aware that doing any of these data transformations means your whole nodes or edges table will be loaded into memory on your computer. If your tables are really big, this could cause your code to run slowly and use a lot of memory (RAM).

Collaborator

makkus Dec 24, 2023

I guess here it would make sense to point out that using the arrow data directly is considered best practice overall, unless you are writing custom code that is not going to get re-used, or where you know for sure you won't have to deal with unexpectedly large amounts of data.

For frontend developers that would mean using the arrow JS library, and ideally either send/receive 'unserialized' arrow format, or even better try to get a pointer to the data in memory for zero-copy style access (not always possible). For Jupyter users it would mean using polars or duckdb (or any of the modules that use it internally, like the query.table one you point out below.

developer/how-to-view-the-data-in-networks.md

+              # if you're in a jupyter context, printing edges_kiara_table will give you a preview of the data
+              # get all the data via the underlying Arrow table
+              edges_data = edges_table.arrow_table.to_pylist()

Collaborator

makkus Dec 24, 2023

Personally, I have never needed to use the to_pylist or to_pydict methods. I think a much more common use-case (at least for Jupyter users) would be the pandas export, since there is a high likely-hood they are using Pandas anyway. If there is indeed a valid use-case for frontend devs to use this over 'pure' arrow access, I'd say we can probably assume frontend devs have more programming background, and can figure things out themselves with a few links we could provide. Long story short, I would tend to document the pandas code, and not to_pylist.

developer/how-to-view-the-data-in-networks.md

+              # let's call it my_network_data
+              # get the nodes table for the network, as a `KiaraTable`
+              nodes_kiara_table = my_network_data.get_table("nodes")

Collaborator

makkus Dec 24, 2023

This is probably not a good idea, because if you do it like this you break the lineage of the result value. It depends of course if that matters in your particular cicrumstances or not, but I guess it's better to not confuse people by documenting a practice that would only make sense for some sort of frontend-preview scenario, but would be ill-advised within a Jupyter/Python research workflow.

Up until now for all the network analysis examples when there was a usecase like this, the querying always happened on the source tables (before they became network_data/network_graph. We can easily support this scenario too, all it takes is adding a module network_graph.pick.table (or something like that), that takes a network graph and either a 'edges' or 'nodes' string as input, and returns a table as result. I can easily add that, will have it ready in the 'tropy' plugin in the next few days.

Anyway, the result (of type 'table') can subsequently be used in the code below, and lineage will be intact in the result of that.

Collaborator

makkus commented Dec 29, 2023

Ok, I've written a small jupyter notebook that contains a version where the nodes table is picked via a kiara operation as to not break lineage (attached to this comment, had to zip it otherwise github wouldn't let me add it). This should work with an updated environment that has the tropy plugin installed in the most recent version (pip install -e kiara_plugin.tropy).

notebook_example.zip

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet