Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add method to list leaf nodes #188

Merged
merged 7 commits into from
Oct 31, 2024
Merged

feat: add method to list leaf nodes #188

merged 7 commits into from
Oct 31, 2024

Conversation

jokasimr
Copy link
Contributor

Fixes #183 #177

@@ -200,6 +205,15 @@ def _repr_html_(self) -> str:
nodes = ((key, data) for key, data in self.underlying_graph.nodes.items())
return pipeline_html_repr(nodes)

def leafs(self) -> tuple[type, ...]:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Can we consider making this a free function? We should avoid bloating the interface of Pipeline.
  • What is a leaf? Can we use some clearer terminology such as used in mathematics for DAGs, or by NetworkX? Sink nodes, for example

Copy link
Contributor Author

@jokasimr jokasimr Oct 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is a leaf? Can we use some clearer terminology such as used in mathematics for DAGs, or by NetworkX? Sink nodes, for example

I think sink_nodes is a better name than leaf, but they are both "leaking" the graph abstraction below Pipeline. Maybe we should have something more specific to the domain, like final_results or output_keys.

Can we consider making this a free function? We should avoid bloating the interface of Pipeline.

I'm not so sure about this. We could make it a free function, but what's the advantage of doing so? It's intrinsically tightly coupled to the Pipeline class. Having it be a method makes it easier for people to find it using dot-notation access.

def final_result_keys(self) -> tuple[type, ...]:
"""Returns the keys that are not inputs to any other providers."""
sink_nodes = [
cast(type, node)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure I understand. What is this cast doing?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's just for appeasing mypy.
We promise the return type of the function is tuple[type, ...] because we know our keys are types.
But what we get from the underlying_graph is Any.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay took another look now and it was actually not necessary to cast in case we just specified the return as tuple[Key, ...] instead

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed. Our keys are, in general not type.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are using type in a lot of other places instead of Key though, that's what confused me.
For example https://github.com/scipp/sciline/blob/main/src/sciline/pipeline.py#L108-L111

Is that just a mistake or is there a difference between the contexts that makes type the right annotation there?

@@ -200,6 +209,13 @@ def _repr_html_(self) -> str:
nodes = ((key, data) for key, data in self.underlying_graph.nodes.items())
return pipeline_html_repr(nodes)

def final_result_keys(self) -> tuple[Key, ...]:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry about discussing names a second time, but can we think about the name once more. This seems a bit hard to remember, with the 3-part name. Have you considered output_keys? It would also be symmetric with the terminology "inputs" or input keys", which we might use elsewhere.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No worries, yeah if you think output_keys is better we can use that instead 👍

Have you considered output_keys?

Yes it's one of the names I suggested the first time you brought up naming 😆

@jokasimr jokasimr enabled auto-merge (squash) October 31, 2024 12:16
@jokasimr jokasimr merged commit 8f40579 into main Oct 31, 2024
8 checks passed
@jokasimr jokasimr deleted the list-leafs branch October 31, 2024 12:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Listing all leaf types
2 participants