-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rewrite of Sciline's Pipeline #165
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't reviewed DataGraph
yet. That will need more time.
pyproject.toml
Outdated
@@ -30,6 +30,7 @@ requires-python = ">=3.10" | |||
# Run 'tox -e deps' after making changes here. This will update requirement files. | |||
# Make sure to list one dependency per line. | |||
dependencies = [ | |||
"cyclebane @ git+https://github.com/scipp/cyclebane@fix-conceptual-naming-issues", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should merge this and release cyclebane before merging the current PR. I don't want to risk depending on a branch in a released package.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It still requires review.
self._args = args | ||
self._kwargs = kwargs | ||
self._return = return_ | ||
|
||
def __len__(self) -> int: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think __len__
is the best choice here. The user might wonder whether this includes the return annotation or not. (This is how len(__annotations__)
works)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By the same argument, isn't what existing methods like keys()
return also confusing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
True. It should be more like argument_keys
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this is just an internal implementation detail, I suggest to leave this for now.
src/sciline/_provider.py
Outdated
out = self.map_keys(lambda arg: _bind_free_typevars(arg, bound=bound)) | ||
if self._return is not None: | ||
out._return = _bind_free_typevars(self._return, bound=bound) | ||
return out |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be moved into map_keys
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't that make the method name misleading (since it is not only mapping the keys any more)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See above. In Sciline, 'key' refers to the things that connect providers, i.e., domain types. So it applies to the return type as well.
But my question was in particular about whether it always makes sense to map the return key.
tests/task_graph_test.py
Outdated
@@ -31,7 +31,7 @@ def as_float(x: int) -> float: | |||
|
|||
def make_task_graph() -> Graph: | |||
pl = sl.Pipeline([as_float], params={int: 1}) | |||
return pl.build(float, handler=sl.HandleAsBuildTimeException()) | |||
return pl.build((float,), handler=sl.HandleAsBuildTimeException()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the interface of build
and get
different now? Why? This seems confusing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
build
is a lower-level method with a simpler interface. The complexity of distinguishing between single and multiple keys was moved in to get
. build
should (as before) not be used by users.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then why is it public?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm curious about performance. Did you check if the new solution is any slower than the old?
I know that building graphs should be fast compared to the actual computation. But given that this may be done per update in Beamlime, I would like to make sure that it doesn't add too much latency.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cyclebane currently has some sub-optimal handling of graph creation after map
-reduce
. But I have not tested/benchmarked so far.
But given that this may be done per update in Beamlime
I don't think we should or plan to do that?
"""Returns the set of all TypeVars in a type expression.""" | ||
if isinstance(t, TypeVar): | ||
return {t} | ||
return set(itertools.chain(*map(_find_all_typevars, get_args(t)))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The chain of map looked a bit odd to me at first. Is it equivalent to this?
return set(itertools.chain(*map(_find_all_typevars, get_args(t)))) | |
return set.union(*map(_find_all_typevars, get_args(t))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't get that to work, can you fix the suggested syntax?
if key is NoneType: | ||
raise ValueError('Key must not be None') | ||
if key in self._graph: | ||
self._graph.remove_edges_from(list(self._graph.in_edges(key))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
self._graph.remove_edges_from(list(self._graph.in_edges(key))) | |
self._graph.remove_edges_from(self._graph.in_edges(key)) |
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think NetworkX' in_edges
returns a view, so that would break? @MridulS can correct me?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes it's the view of the dictionary underneath, so need to create a new list object to remove the edges.
""" | ||
# This is a questionable approach: Using MyGeneric[T] as a key will actually | ||
# not pass mypy [valid-type] checks. What we do on our side is ok, but the | ||
# calling code is not. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you elaborate why this fails in mypy?
Does this mean that every insertion of a param needs to have a type: ignore[valid-type]
comment?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apparently MyGeneric[T]
is not a valid type. MyGeneric[MyType]
is ok. So no, not every insertion needs a type: ignore[valid-type]
, just the less common ones using a generic (not specialized).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be better to annotate __setitem__
with something more lenient? Maybe even Any
? I don't like having legitimate usages flagged as errors. That is, I would prefer false negatives over false positives.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should chat about that, as I have struggled quite a bit coming up with something sensible that is not just Any
everywhere.
|
||
def __setitem__(self, key: Key, value: DataGraph | Any) -> None: | ||
""" | ||
Provide a concrete value for a type. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or a data graph.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add a bit of text explaining that type vars must have constraints?
"Then we can compute `Result` for each index in the parameter table:" | ||
"Note how we used a node name of the pipeline as the column name in the parameter table.\n", | ||
"For convenience we used a `pandas.DataFrame` to represent the table above, but the use of Pandas is entirely optional.\n", | ||
"Equivalently the table could be represented as a `dict`, where each key corresponds to a column header and each value is a list of values for that column.\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add an example of this?
Co-authored-by: Jan-Lukas Wynen <[email protected]>
Co-authored-by: Jan-Lukas Wynen <[email protected]>
Are there more comments here that cannot be done in a follow-up? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am fine with merging. But please make sure to add docstrings where missing in follow up work.
Overview
This is a rewrite of
Pipeline
, as discussed in many places. Please see therewrite.ipynb
notebook for some context.List of changes
TypeVar
s need to be constrained. The new mechanism eagerly inserts duplicate nodes based on this.insert
should replace existing providers or not, and may even make it impossible to replace specialized providers with a generic one. The new implementation is very simple, so that is what we do.map
andreduce
based on Cyclebane. Not thatgroupby
is not supported yet, but we have never used this in practice.__setitem__
and__getitem__
can be used to compose more complex graphs, sidestepping the dependency-injection-driven assembly of the base pipeline.