Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for parameter tables #33

Merged
merged 87 commits into from
Aug 11, 2023
Merged

Add support for parameter tables #33

merged 87 commits into from
Aug 11, 2023

Conversation

SimonHeybrock
Copy link
Member

@SimonHeybrock SimonHeybrock commented Aug 2, 2023

Fixes #32.

Support parameter tables, enabling "map"-, "reduce"-, and "groupby"-like graph building. Please see the added documentation page in the user guide for details.

Internal change: The graph now does not use keyword arguments any more. This did not work with Dask anyway, and just got in the way when adding the new functionality.

Copy link
Member

@jl-wynen jl-wynen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Incomplete review

# Copyright (c) 2023 Scipp contributors (https://github.com/scipp)
from __future__ import annotations

from typing import Any, Collection, Dict, Mapping, Optional
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typing.Collection is deprecated in favour of collections.abc.Collection in python 3.9

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My problem is that `typing.abc.* is not generic in 3.8.

@@ -39,8 +55,12 @@ class AmbiguousProvider(Exception):
"""Raised when multiple providers are found for a type."""


Provider = Callable[..., Any]
Key = type
def _indexed_key(index_name: Any, i: int, value_name: Type[T] | Item[T]) -> Item[T]:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought you want to support python 3.8? This use of the pipe was added in 3.10

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, odd, why is it passing builds then?



def _find_nodes_in_paths(
graph: Mapping[T, Tuple[Callable[..., Any], Collection[T]]], start: T, end: T
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function and _find_all_paths use different graph types. Can you give the arguments a clearer name to distinguish? The type hint alone doesn't really say what kind of graph we are dealing with.

Also, why are there different graph types in the first place?

Copy link
Member Author

@SimonHeybrock SimonHeybrock Aug 8, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seemed easier to handle in _find_all_paths, without having the unused callable in there. Renaming graph->dependencies there.

src/sciline/pipeline.py Outdated Show resolved Hide resolved
src/sciline/pipeline.py Show resolved Hide resolved
@@ -96,6 +241,8 @@ def __init__(
"""
self._providers: Dict[Key, Provider] = {}
self._subproviders: Dict[type, Dict[Tuple[Key | TypeVar, ...], Provider]] = {}
self._param_tables: Dict[Key, ParamTable] = {}
self._param_series: Dict[Key, Key] = {}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Confusing name. This maps from parameter names to dimensions., not so series objects.

src/sciline/pipeline.py Outdated Show resolved Hide resolved
if arg not in graph:
stack.append(arg)
return graph

def _build_series(self, tp: Type[Series[KeyType, ValueType]]) -> Graph:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Quite a long and deeply nested function. Can you split it up, e.g., along the steps?

Also, I don't understand it. We may need to go through it together. But refactoring may already help.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this is the part that I wanted to do the walk-through for.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I completely rewrote this, moving the functionality mostly into the "grouper" classes. I think it is more readable now, please have a look @jl-wynen .

src/sciline/pipeline.py Outdated Show resolved Hide resolved
graph = {"D": ["B", "C"], "C": ["A"], "B": ["A"]}
assert _find_all_paths(graph, "D", "A") == [["D", "B", "A"], ["D", "C", "A"]]
assert _find_all_paths(graph, "B", "C") == []
assert _find_all_paths(graph, "B", "A") == [["B", "A"]]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing cases for an empty graph.
But this tests an implementation detail. Can we do without it?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably, I think I just added it during dev.

return self._index

def __contains__(self, key: Any) -> bool:
return self._columns.__contains__(key)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why did you use the dunder methods of self._columns instead of the builtin functions which call them?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No reason.

src/sciline/visualize.py Outdated Show resolved Hide resolved
src/sciline/visualize.py Show resolved Hide resolved
tests/pipeline_with_index_test.py Outdated Show resolved Hide resolved
assert pl.compute(Sum) == "['a1', 'a2', 'ccc3']"


def test_diamond_dependency_pulls_values_from_columns_in_same_param_table() -> None:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This does not look like a diamond dependency to me. It looks like a Y: Sum depends on float which depends on Param1 and Param2. But those depend on nothing.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meant: they both depend on the same param table and thus index.

src/sciline/pipeline.py Outdated Show resolved Hide resolved
Co-authored-by: Jan-Lukas Wynen <[email protected]>
@SimonHeybrock SimonHeybrock merged commit a4cec89 into main Aug 11, 2023
5 checks passed
@SimonHeybrock SimonHeybrock deleted the varargs branch August 11, 2023 09:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

Variadic workflows
2 participants