Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

run_in_batches function #209

Open
nreinicke opened this issue May 14, 2024 · 3 comments
Open

run_in_batches function #209

nreinicke opened this issue May 14, 2024 · 3 comments
Labels
python Applies to the python code
Milestone

Comments

@nreinicke
Copy link
Collaborator

The current workflow when using the python API is to load a CompassApp object from a config and then call the run() method, passing a set of queries as python dict objects. We could also add another way to interact with the application which would be to point to a config file and a query json file and then have the application read the queries in batches and also write to an output file in batches. This could take the form of a single function that looks like this:

def run_in_batches(app_config_file: Union[str, Path], query_file: Union[str, Path], batch_size: int):
    ...
@nreinicke nreinicke added this to the PyCon 2024 milestone May 14, 2024
@nreinicke nreinicke added the python Applies to the python code label May 14, 2024
@robfitzgerald
Copy link
Collaborator

and assuming we could wire this in via the CompassApp.run method:

class CompassApp:

  def run(..., chunksize: Optional[int]):
      if chunksize is not None:
          return run_in_batches(...)

thinking: i feel like stacking alternatives into the core run method would probably be more easily discoverable for users. same goes for the "to geopandas" behaviors, and, maybe all of that wiring-in is just one more downstream issue to collect these all as behaviors of CompassApp.run, which gives us one of those big APIs like Pandas.read_csv or similar that serves all users.

@nreinicke nreinicke changed the title Add batch running capability to python API run_in_batches function May 17, 2024
@nreinicke
Copy link
Collaborator Author

maybe all of that wiring-in is just one more downstream issue to collect these all as behaviors of CompassApp.run, which gives us one of those big APIs like Pandas.read_csv or similar that serves all users.

Yeah I like that idea. I changed to scope of this to just complete the function and then we can build out a new issue for wiring it in (along with any other run methods we discover)

@zenon18
Copy link
Contributor

zenon18 commented May 20, 2024

zenon18 (Mark W.) will like to work on this issue

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
python Applies to the python code
Projects
None yet
Development

No branches or pull requests

3 participants