Normalize GeoPandas support #200

robfitzgerald · 2024-05-09T03:14:53Z

The result of a CompassApp::run is a Python Object which has some standard keys and structures that are known/specified internally in the rust codebase. Handling this monolithic object in Python is counter-intuitive since we are not leveraging Pandas/GeoPandas, which are the standard format for analysis of columnar (geo) data.

The CompassApp::run method could include a geopandas: bool = false (or gdf if that's more Pythonic) argument which will assume the user wants GeoDataFrame outputs for route and tree data when it is true, but output the monolithic Object when false (by default).

The text was updated successfully, but these errors were encountered:

robfitzgerald · 2024-05-09T04:37:22Z

here's some thoughts on how we could do this.

I'd propose the signature for run gets expanded to support a few different combinations.

class CompassApp:

  def run(self, queries, gdf: bool = false, index_col: Optional[str]=None)

Top-level structure

In my mind a route is mapped to a single row of a dataframe. I think for a tree, a link is mapped to a row. Because their schemas are different, I think we want the output of a gpd=true run to return:

GeoDataFrame when either routes or trees are found (but not both)
Tuple[GeoDataFrame, GeoDataFrame] when both are present

This is similar to osmnx.to_gdfs().

Returning Routes

Summary row data

Any row in a route GeoDataFrame should have all traversal and cost columns that may be relevant to post-processing:

column	mapping
_index	(user-defined `index_col` path, or, by default, the top-level index value in the Compass output)
distance	traversal_summary.distance
time	traversal_summary.time
...
distance_cost	cost.distance
time_cost	cost.time
total_cost	cost.total_cost

All of these should be set only if they exist (optional semantics).

Geo row identifiers

The route geometries are parsed as geometry objects. A _path column is added to the dataframe in the case that there is more than one path. The value is the route.path index.

run() result	description	route column?
one route `{}`	`route.path` has one entry	no
n routes `[]`	`route.path` has i entries	append `_path` column with value i

As a result, a row is uniquely indexed by the _index column when there is one route per row, and a combination of _index and _path when route.path has more than one entry. Put another way: we end up with k * q rows for q queries and k paths.

Geometry data

We need to cover the possible geometry output types. We can know in advance what the geometry type is by inspecting the configuration of the CompassApp (via #201). We can then switch our geometry parsing method accordingly:

Compass `traversal.route` config value	parse method for string `s`
json	(no geometry)
wkt	shapely.wkt.loads(s)
geojson	shapely.ops.shape(s)

Returning trees

Trees are similar but different. Each Compass output row may have an entire tree, but, we do not want a GDF of trees (plural), we want instead to plot a single tree. Following the logic above, we can add one more index column so that each row is a link in a tree in a result:

for each Compass run result row _i_:
  for each tree _j_:
    for each tree branch _k_:
      create row for `_index` _i_, `_tree` _k_,  `_edge_id` edgeid(_k_)

robfitzgerald · 2024-05-09T04:47:08Z

btw, if we do the proposed above, then i feel like all of our plotting functions can disappear and get replaced by GeoDataFrame plotting methods, which for example supports the feature described in #199, but also can swap-in replace the existing to_plot_folium methods in compass-app-py.

robfitzgerald · 2024-05-09T15:25:02Z

Just thinking about it. Maybe instead of gdf: bool we instead provide output_format: str where it can be "json" or "geopandas"?

robfitzgerald · 2024-10-24T19:27:52Z

working with Compass right now in a python environment, and i want my run result as a dataframe. my result has thousands of rows, so, avoiding a copy would be ideal. that's where we should be using polars.

two thoughts:

expand on our CSV output policy so that the end result is a polars dataframe
- nice time to add JSONPath support to replace the current mapper operations
revise ResponsePersistencePolicy::PersistResponseInMemory so it has an optional output format
revise the run response to be either JSON or a DataTable
revise the Python run method so it returns either a PyObject Dict or a DataTable (via polars/Apache Arrow -> pandas.DataFrame method which depends on pandas and pyarrow)

edit: then, to make this result a geopandas.GeoDataFrame could be simple with the user providing a JSONPath to the geometry object

nreinicke · 2024-10-25T19:28:52Z

Yeah I really like the idea of using polars here for zero copy operations and I like that this will still allow support of geopandas if we use the pyarrow backend.

robfitzgerald added the python Applies to the python code label May 9, 2024

nreinicke added this to the PyCon 2024 milestone May 9, 2024

This was referenced May 14, 2024

route_to_geopandas function #210

Closed

tree_to_geopandas function #211

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Normalize GeoPandas support #200

Normalize GeoPandas support #200

robfitzgerald commented May 9, 2024 •

edited

Loading

robfitzgerald commented May 9, 2024 •

edited

Loading

robfitzgerald commented May 9, 2024

robfitzgerald commented May 9, 2024

robfitzgerald commented Oct 24, 2024 •

edited

Loading

nreinicke commented Oct 25, 2024

Normalize GeoPandas support #200

Normalize GeoPandas support #200

Comments

robfitzgerald commented May 9, 2024 • edited Loading

robfitzgerald commented May 9, 2024 • edited Loading

Top-level structure

Returning Routes

Summary row data

Geo row identifiers

Geometry data

Returning trees

robfitzgerald commented May 9, 2024

robfitzgerald commented May 9, 2024

robfitzgerald commented Oct 24, 2024 • edited Loading

nreinicke commented Oct 25, 2024

robfitzgerald commented May 9, 2024 •

edited

Loading

robfitzgerald commented May 9, 2024 •

edited

Loading

robfitzgerald commented Oct 24, 2024 •

edited

Loading