Merge remote-tracking branch 'origin/main' into v2-prototype

scipp · May 8, 2024 · bcee03a · bcee03a
2 parents 52c1646 + 03b0a16
commit bcee03a
Show file tree

Hide file tree

Showing 10 changed files with 212 additions and 13 deletions.
diff --git a/docs/developer/adr/0001-remove-isinstance-checks-when-setting-parameters.md b/docs/developer/adr/0001-remove-isinstance-checks-when-setting-parameters.md
@@ -0,0 +1,70 @@
+# ADR 0001: Remove isinstance checks when setting parameters
+
+- Status: accepted
+- Deciders: Jan-Lukas, Neil, Simon
+- Date: 2024-04-15
+
+## Context
+
+Sciline builds a data dependency graph based on type hints of callables.
+Dependencies can be fulfilled by setting values (instances of classes) as so called *parameters*.
+In an attempt to extend the correctness guarantees of the dependency graph, Sciline's `__setitem__` checks if the value is instance of the key (a type) when setting a parameter.
+
+This has led to a number of problems.
+For example, supporting different file handles types is too difficult [#140](https://github.com/scipp/sciline/issues/140),
+parameter type handling is too inflexible in general [#144](https://github.com/scipp/sciline/issues/144),
+and the mechanism is broken with Python 3.12 type aliases [#145](https://github.com/scipp/sciline/issues/145).
+In short, the mechanism gets in the way of the user, since it causes false positives.
+
+Considering the bigger picture, we can think of this mechanism as a poor man's form of *validation*.
+Validation of input parameters is very important when running workflows, but it should be done in a more explicit way.
+Validating the type is only a fraction of what we want to do when validating parameters.
+Therefore, we should remove this mechanism and replace it with a more general validation mechanism.
+The more general validation mechanism can be considered out of scope for Sciline, and should be implemented in the user code or using other common libraries such as `pydantic`.
+
+Finally, we can think of this mechanism as a form of runtime type checking.
+We should ask ourselves if this is the intended scope of Sciline.
+If it is, shouldn't we also check that each provider actually returns the correct type?
+
+The main problem with not checking value types when setting parameters is that it is not possible to catch such errors with `mypy`, in contrast to return values of providers, which `mypy` *can* check.
+
+Consider the following example of setting $Q$ bins for a workflow, given by a `scipp.Variable`, which would then be passed to `scipp.hist` to create a histogram:
+
+```python
+pipeline[QBins] = sc.linspace(...)
+pipeline[QBins] = 1000  # error in current implementation
+pipeline[QBins] = sc.linspace(..., unit='m')  # no error, but wrong unit
+```
+
+Checking the type catches the first error, but not the second.
+Paradoxically, setting an integer would often be a valid operation in the example, since `scipp.hist` can handle this case, whereas the wrong unit would not be valid.
+This may indicate that defining `QBins` as an alias of `scipp.Variable` is actually an instance of an anti-pattern.
+Instead, imagine we have defined a specific `class QBins`, which performs validation in its constructor, and defines `__call__` so it can be used as a provider:
+
+```python
+pipeline.insert(QBins(sc.linspace(...)))
+pipeline.insert(QBins(1000))  # ok
+pipeline.insert(QBins(sc.linspace(..., unit='m')))  # error constructing QBins
+```
+
+This example illustrates that a clearer and more specific expression of intent can avoid the need for relying on checking the type of the value when setting a parameter.
+
+## Decision
+
+- The core scope of Sciline is the definition of task graphs.
+  Type validation is not.
+- Remove the mechanism that checks if a value is an instance of the key when setting it as a parameter.
+- Encourage users to validate inputs in providers, which can also be tested in unit tests without setting up the full workflow.
+- Encourage users to use a more general parameter validation mechanism using other libraries.
+- Consider adding a mechanism to inject a callable to use for parameter validation as a argument when creating a `Pipeline`.
+
+## Consequences
+
+### Positive
+
+- The mechanism will no longer get in the way of the user.
+- The code will be simplified slightly.
+
+### Negative
+
+- `sciline.Pipeline` will support duck-typing for parameters, in a way that cannot be checked with `mypy`.
diff --git a/docs/developer/adr/0002-remove-special-handling-of-optional-and-union.md b/docs/developer/adr/0002-remove-special-handling-of-optional-and-union.md
@@ -0,0 +1,105 @@
+# ADR 0002: Remove special handling of Optional and Union
+
+- Status: accepted
+- Deciders: Jan-Lukas, Johannes, Mridul, Simon, Sunyoung
+- Date: 2024-04-15
+
+## Context
+
+### General
+
+Sciline builds a data dependency graph based on type hints of callables.
+Some callables may have optional inputs, which are commonly represented by `Optional[T]` in the type hint, for some type `T`.
+Therefore, in [#50](https://github.com/scipp/sciline/pull/50) we have added special handling for `Optional` and [#89](https://github.com/scipp/sciline/pull/89) extended this for `Union`.
+In the case of `Optional`, they way this works is that `sciline.Pipeline` prunes branches at the node where the optional input used, if any ancestor node has unsatisfied dependencies.
+Instead, an implicit `None` provider is added.
+This has a series of problems, which we exemplify for the case of `Optional`.
+
+1. Default values (which are currently ignored by Sciline) are overridden by the implicit `None` provider.
+   In other words, Sciline assumes that the default value of the optional input is `None`.
+2. Entire branches are pruned, which can hide bugs.
+   If the users added providers for the optional input, they will not be used if any of them has unintentionally unsatisfied dependencies.
+3. The special mechanism prevents the (in principle very valid) use of any providers that return an `Optional` or `Union` type.
+4. Optional inputs cannot be set to `None` *explicitly*.
+
+In summary, the special handling of `Optional` and `Union` is too implicit and causes more problems than it solves.
+There are a couple more aspects to consider.
+
+### Readability of user code
+
+Handling `Optional` explicitly would make user code more readable.
+Consider the following example:
+
+```python
+pipeline[MyParam] = 1.2
+```
+
+In the current implementation this gives no indication to the user that `MyParam` is not a required input.
+Furthermore, if the line is removed, the user may not realize that `MyParam` is available as an optional input.
+With the proposed change, the user can make this explicit:
+
+```python
+pipeline[Optional[MyParam]] = 1.2
+```
+
+Above it is clear that `MyParam` is optional, and it can be set to `None` explicitly:
+
+```python
+pipeline[Optional[MyParam]] = None
+```
+
+### Code complexity and maintainability
+
+The special handling of `Optional` and `Union` is a significant source of complexity in the code, requiring a significant amount of unit testing.
+
+### Conceptual clarity
+
+The current redesign of Sciline highlighted that the current implementation is conceptually flawed.
+It makes it tricky to represent the internals of `sciline.Pipeline` as a simple data dependency graph.
+The special handling of `Optional` and `Union` seems to require pervasive changes to the code, which is a sign that it is not a good fit for the design.
+
+### Counter arguments
+
+#### Multiple providers may depend on the same input, but not all optionally
+
+This seems like a special case that we have not seen in practice, is likely not worth the complexity of the current implementation.
+
+#### Using a provider returning a non-optional output to fulfill an optional input
+
+This is a very valid use case, but it would be made impossible if we stop associating a node `T` with an optional input `Optional[T]`.
+There are a couple of possible workarounds:
+
+- Add an explicit `Optional` provider that wraps (or depends on) the non-optional provider.
+- Modify the graph structure (which we plan to support in the redesign of Sciline) using something like `pipeline[Optional[MyParam]] = pipeline[MyParam]`.
+
+#### Using a provider to return one of a union's types
+
+Same as above, for `Optional[T]`.
+
+#### Setting union parameters is unwieldy
+
+Given a provider `f(x: A | B | C) -> D: ...`, a user would need to set a value for the input of `f` like `pipeline[A | B | C] = ...`.
+It would be easier if they could be more specific, like `pipeline[A] = ...`.
+
+In this case, we think defining an alias for `A | B | C` would be a better solution than the current special handling of `Union`.
+It would force the user to be more explicit about the input type, which is a good thing.
+Conceptually the use of `Union` may just be an indicator that `f` depends on some common aspect of `A`, `B`, and `C`, which could be made explicit by defining a new type or protocol.
+
+## Decision
+
+Remove the special handling of `Optional` and `Union`.
+
+## Consequences
+
+### Positive
+
+- Sciline's code will be simplified significantly.
+- User code will be more readable.
+- Implicit behavior around pruning and using `None` providers will be removed.
+- Users can use providers that return `Optional` or `Union` types.
+- Decouples the handling of optional inputs from the handling of default values.
+  This will enable us to make independent decisions about how to handle default values.
+
+### Negative
+
+- Workarounds are needed for the use case of using a provider returning a non-optional output to fulfill an optional input, and for setting union parameters.
diff --git a/docs/developer/architecture-decision-records.md b/docs/developer/architecture-decision-records.md
@@ -0,0 +1,10 @@
+# Architecture Decision Records
+
+```{toctree}
+---
+maxdepth: 1
+glob: true
+---
+
+adr/*
+```
diff --git a/docs/developer/index.md b/docs/developer/index.md
@@ -14,4 +14,5 @@ getting-started
 coding-conventions
 dependency-management
 architecture-and-design/index
+architecture-decision-records
 ```
diff --git a/docs/index.md b/docs/index.md
@@ -116,7 +116,7 @@ The [User Guide](user-guide/index) also contains a description of more advanced
 At first it may feel unclear how to apply Sciline to your own code and workflows, given only the above documentation.
 Consider the following concrete examples of how we use Sciline in our own projects:
 
- - [Small Angle Neutron Scattering](https://scipp.github.io/esssans/examples/sans2d.html)
+ - [Small Angle Neutron Scattering](https://scipp.github.io/esssans/user-guide/isis/sans2d.html)
  - [Neutron Powder Diffraction](https://scipp.github.io/essdiffraction/examples/POWGEN_data_reduction.html)
  - [Neutron Reflectometry](https://scipp.github.io/essreflectometry/examples/amor.html)
 

diff --git a/src/sciline/scheduler.py b/src/sciline/scheduler.py
@@ -1,7 +1,16 @@
 # SPDX-License-Identifier: BSD-3-Clause
 # Copyright (c) 2023 Scipp contributors (https://github.com/scipp)
 import inspect
-from typing import Any, Callable, Dict, Hashable, Optional, Protocol, Tuple
+from typing import (
+    Any,
+    Callable,
+    Dict,
+    Hashable,
+    Optional,
+    Protocol,
+    Tuple,
+    runtime_checkable,
+)
 
 from sciline.typing import Graph
 
@@ -10,6 +19,7 @@ class CycleError(Exception):
     pass
 
 
+@runtime_checkable
 class Scheduler(Protocol):
     """
     Scheduler interface compatible with :py:class:`sciline.Pipeline`.

diff --git a/src/sciline/task_graph.py b/src/sciline/task_graph.py
@@ -80,6 +80,10 @@ def __init__(
                 scheduler = DaskScheduler()
             except ImportError:
                 scheduler = NaiveScheduler()
+        elif not isinstance(scheduler, Scheduler):
+            raise ValueError(
+                "Scheduler interface must be compatible with sciline.Scheduler"
+            )
         self._scheduler = scheduler
 
     def compute(self, targets: Targets | None = None) -> Any:

diff --git a/tests/serialize/json_test.py b/tests/serialize/json_test.py
@@ -2,7 +2,6 @@
 # Copyright (c) 2023 Scipp contributors (https://github.com/scipp)
 # type: ignore
 
-import sys
 from copy import deepcopy
 from typing import Any, NewType, TypeVar
 
@@ -169,7 +168,6 @@ def node_sort_key(node: dict[str, Any]) -> str:
 }
 
 
-@pytest.mark.skipif(sys.version_info < (3, 10), reason="requires python3.10 or higher")
 def test_serialize() -> None:
     pl = sl.Pipeline([make_int_b, zeros, to_string], params={Int[A]: 3})
     graph = pl.get(str)
@@ -217,7 +215,6 @@ def fn_w_kwonlyargs(*, x: int) -> float:
 }
 
 
-@pytest.mark.skipif(sys.version_info < (3, 10), reason="requires python3.10 or higher")
 def test_serialize_kwonlyargs() -> None:
     pl = sl.Pipeline([fn_w_kwonlyargs], params={int: 3})
     graph = pl.get(float)
@@ -261,7 +258,6 @@ def test_serialize_kwonlyargs() -> None:
 }
 
 
-@pytest.mark.skipif(sys.version_info < (3, 10), reason="requires python3.10 or higher")
 def test_serialize_lambda() -> None:
     lam = lambda x: float(x)  # noqa: E731
     lam.__annotations__['x'] = int

diff --git a/tests/task_graph_test.py b/tests/task_graph_test.py
@@ -81,3 +81,13 @@ def test_keys_iter() -> None:
     tg = pl.get(list[str])
     assert len(list(tg.keys())) == 4  # there are no duplicates
     assert set(tg.keys()) == {A, B, Str[B], list[str]}
+
+
+def test_scheduler_not_supported() -> None:
+    with pytest.raises(
+        ValueError,
+        match="Scheduler interface must be compatible with sciline.Scheduler",
+    ):
+        TaskGraph(
+            graph={}, targets=(), scheduler="not a scheduler"  # type: ignore[arg-type]
+        )
diff --git a/tests/utils_test.py b/tests/utils_test.py
@@ -57,7 +57,6 @@ def test_key_name_builtin_generic() -> None:
     assert _utils.key_name(dict[str, MyType]) == 'dict[str, MyType]'
 
 
-@pytest.mark.skipif(sys.version_info < (3, 10), reason="requires python3.10 or higher")
 def test_key_name_custom_generic() -> None:
     MyType = NewType('MyType', float)
     Var = TypeVar('Var')
@@ -70,7 +69,6 @@ class G(sciline.Scope[Var, str], str):
     assert _utils.key_name(G[MyType]) == 'G[MyType]'
 
 
-@pytest.mark.skipif(sys.version_info < (3, 10), reason="requires python3.10 or higher")
 def test_key_name_custom_generic_two_params() -> None:
     MyType = NewType('MyType', float)
     Var1 = TypeVar('Var1')
@@ -89,9 +87,6 @@ def test_key_full_qualname_builtin() -> None:
     assert _utils.key_full_qualname(object) == 'builtins.object'
 
 
-# NewType returns a class since python 3.10,
-# before that, we cannot get a proper name for it.
-@pytest.mark.skipif(sys.version_info < (3, 10), reason="requires python3.10 or higher")
 def test_key_full_qualname_new_type() -> None:
     # The __qualname__ of NewTypes is the same as __name__, the result is therefore
     # missing test_key_full_qualname_new_type.<locals>
@@ -142,7 +137,6 @@ def test_key_full_qualname_builtin_generic() -> None:
     )
 
 
-@pytest.mark.skipif(sys.version_info < (3, 10), reason="requires python3.10 or higher")
 def test_key_full_qualname_custom_generic() -> None:
     MyType = NewType('MyType', float)
     Var = TypeVar('Var')
@@ -165,7 +159,6 @@ class G(sciline.Scope[Var, str], str):
     )
 
 
-@pytest.mark.skipif(sys.version_info < (3, 10), reason="requires python3.10 or higher")
 def test_key_full_qualname_custom_generic_two_params() -> None:
     MyType = NewType('MyType', float)
     Var1 = TypeVar('Var1')