Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add additional tutorials #2418

Open
25 tasks
adamjstewart opened this issue Nov 19, 2024 · 6 comments
Open
25 tasks

Add additional tutorials #2418

adamjstewart opened this issue Nov 19, 2024 · 6 comments
Labels
documentation Improvements or additions to documentation good first issue A good issue for a new contributor to work on
Milestone

Comments

@adamjstewart
Copy link
Collaborator

adamjstewart commented Nov 19, 2024

Issue

We are preparing for a TorchGeo tutorial at AGU and need to greatly expand our existing list of tutorials. This issue lists the tutorials that still need to be added and tracks progress towards completion.

General requirements:

  • Accessible: tutorials should require no prior knowledge of ML or RS
  • End-to-end: complete training and inference pipelines
  • Well-tested: all tutorials must be tested in CI to ensure they remain up-to-date
  • Resource-efficient: CI necessitates toy datasets that can be quickly downloaded
  • Linter-approved: all notebooks should pass our ruff style checks

Fix

The current plan is to completely rewrite all of our existing tutorials and organize them as follows:

  • Getting Started: general overview of DL/RS/TorchGeo, links to all other tutorial sections @adamjstewart
    • Introduction to deep learning: datasets, train-val-test splits, PyTorch, training, evaluation @adamjstewart
    • Introduction to remote sensing: challenges of RS data, CRS, projections, resolution @adamjstewart
    • Introduction to TorchGeo: types of datasets, purpose of samplers @adamjstewart
  • Basic Usage: targeted towards the ML crowd, more focused on training and evaluation
    • Datasets: NonGeo/Geo/Raster/Vector/Intersection/Union, dataset splitters
    • Samplers: GeoSampler, ROI, train vs evaluation samplers
    • Transforms: how to perform preprocessing, data augmentation, spectral indices, etc.
    • Models: how to use models from timm, torchvision, and SMP, how to load pre-trained models, torch.hub, etc.
    • Lightning: purpose of data modules and trainers, examples for classification, regression, etc. @burakekim
    • CLI: command-line interface and experimentation, reproducibility and best practices @adamjstewart
  • Case Studies: end-to-end workflows, targeted towards the RS crowd, more focused on inference
  • Customization and Contributing: how to write your own datasets and contribute them back @adamjstewart

In this design, there will only be 4 sections on the sidebar, but each one will expand when clicked on, listing all available tutorials. This will allow a growing number of tutorials without cluttering the docs. We will also move the tutorials above the API reference.

@adamjstewart adamjstewart added the documentation Improvements or additions to documentation label Nov 19, 2024
@adamjstewart adamjstewart added this to the 0.6.2 milestone Nov 19, 2024
@adamjstewart adamjstewart pinned this issue Nov 19, 2024
@adamjstewart adamjstewart added the good first issue A good issue for a new contributor to work on label Nov 19, 2024
@kaushikCanada
Copy link

i created my land cover classification barlow twins model on worldview 3 imagery for my phd thesis. did the whole codeing in torchgeo. felt so much relaxed with rochgeo doign the heavylifting.

@burakekim
Copy link
Contributor

While not directly related -- I remember spending time trying to understand when and where normalization is applied to the datasets. It might not require a tutorial, but clarification in the documentation would be helpful. Let me know if there is a better place to share this suggestion.

@burakekim
Copy link
Contributor

Re tutorial preparation: Count me in!

I am open to topics beyond land-cover mapping and can work with FTW since I have already spent some time familiarizing myself with it. I would like to focus on its instance segmentation labels and show how they can be useful in real-life applications. But, I feel the storyline might not be very striking if that is what we are going for -- likely something like: Here’s an inference tile, here are the instance segmentation masks, and some stats

cc: @calebrob6 -- not sure if the FTW folks plan to do something like this already

@adamjstewart
Copy link
Collaborator Author

adamjstewart commented Nov 23, 2024

I agree, we should clarify the normalization thing, you're not the only one who has told me that. Let's briefly mention that in the Lightning tutorial, and then I'll mention it in more detail in the Custom Data Modules tutorial. We have actually talked about changing the default to be no normalization (mean=0, std=1), but let's save that for 0.7.0, not 0.6.2.

Would love to have an Instance Segmentation tutorial, but first we need an InstanceSegmentationTask, which will also need to wait for 0.7.0. So not for AGU, but for future tutorials, yes please!

Any specific sections you would like to start working on? I can sign you up.

@burakekim
Copy link
Contributor

Re normalization, for future reference: #1780 (comment) and #1841

As for another tutorial that I can start working on right away, this could be it: Lightning: purpose of data modules and trainers, examples for classification, regression, semantic segmentation, etc.. The description seems quite open-ended -- what exactly do you have in mind for this item? My first impression is that we demonstrate how to construct trainers for different tasks and provide a high-level overview of their outputs. Or is it more about showing how the tasks are structured, down to the source code?

@adamjstewart
Copy link
Collaborator Author

adamjstewart commented Nov 23, 2024

The description seems quite open-ended -- what exactly do you have in mind for this item?

The Lightning tutorial should answer the following questions:

  • What is PyTorch Lightning?
  • What is a data module?
  • What is a trainer/task/thingy?
  • How do you combine them?
  • How to specify custom loggers, callbacks, etc.?

Honestly, our current Lightning tutorial isn't horrible and might be mostly sufficient. It's the other sections I'm more worried about.

No need to show source code for any tutorial except the Customization and Contributing section.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation good first issue A good issue for a new contributor to work on
Projects
None yet
Development

No branches or pull requests

3 participants