-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Best practices when writing a kiara module #24
Comments
use imports only within the
|
some useful insights are available via the review of the following PR: DHARPA-Project/kiara_plugin.topic_modelling#4 I am trying to recap some of the points encountered via this first review (more best practices items might come over time and as more modules and reviews take place), please correct me if needed @makkus
|
It's always good to do that, yes. Technically, kiara will check that the data type is correct (and the value is set if it is marked 'required'), but there are a lot of cases where the generic validation that kiara can perform are not sufficient. So, in short: always make sure that you are getting valid input for your specific task at hand. |
thanks, |
Well, in my opinion code in kiara plugins should be type-hinted, linted, etc. Since it's re-useable code, code quality requirements should be higher than what makes sense for Jupyter notebooks for example. But that's just my opinion. For now, I've enabled both ruff and mypy checks by default whenever code is pushed to the 'develop' & 'main' branches on Github. I can disable them if that is the consensus. Pydantic is a dependency, and it's using Python type hints at runtime to do some validation/schema-related stuff. That's different to what we are talking about here, mypy, which does run checks on the code and makes sure that type-hinted Python code is not doing anything it is not allowed to do. In my experience, it's a fairly good failsafe against writing stupid code, and it helped make my code better for sure. I can't really give an introduction to Python type hints and using mypy here, but there are really good resources out there, like: Also, ChatGPT can help you with examples, errors, etc. In the context of kiara plugins, everything should be setup, so as long as you have the If I remember right, I set the mypy config to 'strict', so yes, it will fail if it can't determine the variable type automatically and no type-hint is provided. In a lot of cases it can be fairly smart about it though, and figure it out itself. |
Thanks, Concerning the consensus, hints were discussed in #10. I had understood that hints were not to be made into a best practice item, but from more recent discussions on Slack, it seemed that it changed, this is also why I am asking here. So, to recap: type hints are now recommended as a best practice and checks happen via mypy. |
As I said, I can disable it if people think that is the right way to go about it. Should I? |
As for the previous discussion, I think we only talked about type hints in the context of code examples in documentation, and I agree its a good idea to leave them out there. Having them in re-usable plugin Python code is different though, and would need different considerations IMHO. |
Ah yes indeed, perfect then, all is clear. |
After doing some code reviews, I realized there is no information about how kiara does input validation, and what type of things a module developer should validate themselves. So I'll try to summarize the situation here a bit, feel free to add your own experiences or tell me if something is unclear/doesn't make sense. Basically, the first step that happens is the module developer creating the inputs schema. They specify the name of the input field, the type of each input, the documentation for the field (that is not important in this context), whether it has a default value if the user doesn't provide anything, and whether the field is 'optional'. Those last two interact in a way (assuming the user didn't specify an input):
The second important thing to be aware of is that the
You can get the wrapped data directly by either doing:
Alternatively, as I said, you can use the
The type of
Just be aware that the value is not the same as the value data. All this has (hopefully fairly straightforward) implications on what the module developer needs to test. I'm going to split up the cases depending on what is specified in the schema for a field (in all cases, if there is a value, the developer can assume it is the correct type, so no need to check that): field is not optional, default is set or notThe developer can assume that the field is set, otherwise kiara would have already thrown an exception. field is optional, default is setThe developer can assume that the field is set, because if no user input happened, then kiara would have just used the default. field is optional, no defaultHere, the developer needs to do the check themselves, something like:
(what exactly happens depends on the module of course, this is just to demonstrate) Sidenote: if
Because we already established that the There is a second validation step after this basic one where kiara can take over some of the work: validating that the value (which we assume is set from here on out) is in a form that makes sense for the module. For example if you expect a table to have a specific column, you need to check that it is available, and raise an Exception if it is not, ideally with a helpful error message so the user knows what exactly is expected. Kiara can't test this, because it doesn't know what it should test for. There is a possibility to specify the data type in more detail so that kiara can take over some of that validation, so keep that in mind and let me know if you ever think this would make your code significantly more streamlined. For now, I don't want to overcomplicate things, doing it like this is in most cases sufficient. |
Thanks for the additional info about data types. I had indeed misinterpreted the previous comment when it was said:
Concerning type hints, do we need to specify them when we know them from the module's input specification? |
In that case you only know the 'kiara data type', not the Python one. And type hints only relate to Python ones. This is where the critical thing happens:
(obviously this would apply to every other data type in the same way) If you look up the
From now on the type checker assumes it's an instance of
Here, the type checker knows for sure that |
Thanks, |
Sorry for the misunderstanding, it's specifically the Python class |
Basically, the You can see the available data-types with |
Ok, I see,
|
Yes, it's a two step process, but it's the other way around to what you wrote. Maybe think of it from the perspective of the type checker. That's a 'pure Python' thing, with no knowledge of kiara, so it has no idea that the The only way the type checker can know what type a variable is supposed to be is either by:
The latter is when you as a user doesn't have to worry, because whoever wrote the method you are using did the work for you (by specifying the output type). Some libraries don't have type hints, in which case you have to do the work manually still. Or the result type is of type
Long story short, because the
(for all practical purposes, this is equivalent to what you had) If you didn't specify the |
Alright, yes, thanks for clarifying, type hinting is optional in the second line of the code example and not the first since the type is clear from |
(I thought that it was the other way around, assuming the input documentation specified the type and thought this made it unnecessary for type hinting / redundant with type indicated in input doc) |
Yeah, fair enough, don't blame you to get confused, lots of types of different types floating around :) |
Since there is no other place for this yet, I'll use this issue to add all the best practices (and other tips and tricks) I can think of for developing kiara modules here. This can then be either massaged in its own doc page, or maybe a FAQ-type page.
The text was updated successfully, but these errors were encountered: