-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for parameter tables #33
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Incomplete review
# Copyright (c) 2023 Scipp contributors (https://github.com/scipp) | ||
from __future__ import annotations | ||
|
||
from typing import Any, Collection, Dict, Mapping, Optional |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typing.Collection
is deprecated in favour of collections.abc.Collection
in python 3.9
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My problem is that `typing.abc.* is not generic in 3.8.
src/sciline/pipeline.py
Outdated
@@ -39,8 +55,12 @@ class AmbiguousProvider(Exception): | |||
"""Raised when multiple providers are found for a type.""" | |||
|
|||
|
|||
Provider = Callable[..., Any] | |||
Key = type | |||
def _indexed_key(index_name: Any, i: int, value_name: Type[T] | Item[T]) -> Item[T]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought you want to support python 3.8? This use of the pipe was added in 3.10
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, odd, why is it passing builds then?
src/sciline/pipeline.py
Outdated
|
||
|
||
def _find_nodes_in_paths( | ||
graph: Mapping[T, Tuple[Callable[..., Any], Collection[T]]], start: T, end: T |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function and _find_all_paths
use different graph types. Can you give the arguments a clearer name to distinguish? The type hint alone doesn't really say what kind of graph we are dealing with.
Also, why are there different graph types in the first place?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seemed easier to handle in _find_all_paths
, without having the unused callable in there. Renaming graph->dependencies
there.
src/sciline/pipeline.py
Outdated
@@ -96,6 +241,8 @@ def __init__( | |||
""" | |||
self._providers: Dict[Key, Provider] = {} | |||
self._subproviders: Dict[type, Dict[Tuple[Key | TypeVar, ...], Provider]] = {} | |||
self._param_tables: Dict[Key, ParamTable] = {} | |||
self._param_series: Dict[Key, Key] = {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Confusing name. This maps from parameter names to dimensions., not so series objects.
if arg not in graph: | ||
stack.append(arg) | ||
return graph | ||
|
||
def _build_series(self, tp: Type[Series[KeyType, ValueType]]) -> Graph: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Quite a long and deeply nested function. Can you split it up, e.g., along the steps?
Also, I don't understand it. We may need to go through it together. But refactoring may already help.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this is the part that I wanted to do the walk-through for.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I completely rewrote this, moving the functionality mostly into the "grouper" classes. I think it is more readable now, please have a look @jl-wynen .
tests/graph_test.py
Outdated
graph = {"D": ["B", "C"], "C": ["A"], "B": ["A"]} | ||
assert _find_all_paths(graph, "D", "A") == [["D", "B", "A"], ["D", "C", "A"]] | ||
assert _find_all_paths(graph, "B", "C") == [] | ||
assert _find_all_paths(graph, "B", "A") == [["B", "A"]] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing cases for an empty graph.
But this tests an implementation detail. Can we do without it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably, I think I just added it during dev.
return self._index | ||
|
||
def __contains__(self, key: Any) -> bool: | ||
return self._columns.__contains__(key) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why did you use the dunder methods of self._columns
instead of the builtin functions which call them?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No reason.
tests/pipeline_with_index_test.py
Outdated
assert pl.compute(Sum) == "['a1', 'a2', 'ccc3']" | ||
|
||
|
||
def test_diamond_dependency_pulls_values_from_columns_in_same_param_table() -> None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This does not look like a diamond dependency to me. It looks like a Y: Sum depends on float which depends on Param1 and Param2. But those depend on nothing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I meant: they both depend on the same param table and thus index.
Co-authored-by: Jan-Lukas Wynen <[email protected]>
Fixes #32.
Support parameter tables, enabling "map"-, "reduce"-, and "groupby"-like graph building. Please see the added documentation page in the user guide for details.
Internal change: The graph now does not use keyword arguments any more. This did not work with Dask anyway, and just got in the way when adding the new functionality.