-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Accessing/querying network_data & module outputs #20
Comments
That's the plan as of now, returning a new network_data value, that has all the same columns, apart from one or two additional ones containing centrality information (there are still some open questions in my mind what exactly to attach, but that's independent from this issue). How to preview values is of course another question, and probably one of the central usage patterns for any UI we are building. Having all of this in the same (result) network_data value should make things easier though, because that network_data value can directly be rendered into a graph visualization, and maybe make the nodes bigger depending on centrality, or however else you plan to make that intuitive to the user. Having to lookup two values to do that would be much harder and messier in terms of frontend-code, IMHO. |
Please can I just have a method on network_data that gives me all the data contained in the nodes table, and another method that gives me all the data in the edges table? Or at least a clear set of steps for how to get this using the methods that currently exist on network_data. I don't understand arrow well enough to go through the underlying data types to dig out this information. I'll deal with visualising it, I just need access to the raw table data in some table-like or array-like or dict-like format. I often want to show the nodes and edges data separately, so separate methods is useful. For the moment, I don't care about the performance or memory overhead or serialization cost, just the ability to show this data to a Tropy-mini-app user, who I know will have a fairly small data set. The largest data we've ever seen in tropy is ~100k items, most projects are in the 100s of items, so getting all data at once is not at all a concern. |
Sure, just tell me what format you want. You said you where happy with the Arrow format, otherwise I'd given you more options. Pandas DataFrame? Something else? |
I don't know what the possible options are. If you can just tell me how to get all of the data out of arrow in the correct way, that's fine. Data frame is probably also fine, although I thought you were moving to polars now? Equally a json blob or a python list of lists, I don't really care as long as all the data is in it. |
Not sure about correct, that would probably be using the arrow data directly. But it's tabular data, so in theory we can export it to anything in the exact format you need. Check out: https://arrow.apache.org/docs/python/generated/pyarrow.Table.html I guess the For the example python code I gave you, accessing the methods will look something like:
Alternatively, the
All the non-arrow ones will definitely load all data into memory, but as long as you don't care... And for use-cases like this I also don't, this is more important within a kiara module |
Following on from this morning's discussion on module outputs, and the use of tables as repeat information. Rather than using tables to display the updated nodes/edges information after running an operation, are we able to access this information as stored in the network_data object? Can this be queried and filter, like with SQL? It doesn't make much sense to simply generate this information without being able to view it or do something with it.
This is particularly useful going forward into creating the mini-app: once we've run an operation, we want to be able to preview it, filter it, re-sort it etc.
If we can do this, we can return network_data as the only output for any centrality module, just with the updated information.
The text was updated successfully, but these errors were encountered: