-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
kiara features #29
Comments
Introspectionkiara lets frontends introspect basically every aspect of its internal workings, if something is not exposed, then there is a good chance it can be added. Objects that can be inspected include the current environment the backend is running (Python env, OS, etc.), which plugins are installed and their details (documentation, which modules do they contain, data-types, operations, pipelines, archives, ..), the current config and runtime config. It also lets the the frontend inspect which contexts are available (because the user created them some time before) and their details (like archives it contains), which context is currently used (and its details), and it will let users create and change to new contexts. A context is basically just a workspace with its own data values and aliases. In most cases, kiara has two different methods to retrieve information about each of the internal objects, one quick one that contains only basic data, and one expensive one that contains basically everything kiara knows about the object. For details which exact internal objects can be inspected, read the |
Modules, Operations, PipelinesThose are arguably the most important concepts within kiara, and introspection is available for all of them. Custom operation instances can be created via the API, a list of all pre-registered operations can be retrieved, and the details of each operation can be inspected. The same goes for pipelines, and modules. There are also methods to retrieve a filtered list of operations, pipelines, and modules, which is useful for specific use-cases like only presenting operations to the user that work with a specific file-type. In addition, the API lets the user register pipeline stuctures, which in turn can then be used like any other operation. Pipeline config can be a local (to the backend) file, or a dictionary containing the pipeline config data directly. kiara also has so-called operation types, which are basically categories operations can be sorted into. One operation can be of multiple types. One such category would be 'pipeline', for example, indicating to you that this operation contains multiple sub-operations. Another use is for operations that should have a known, same interface, regardless of their input data type, which could be useful in specific frontend use-cases mabye. |
Data / Valueskiara has its own data types, which as everything else can be inspected. kiara data types wrap around Python data types, and provide functionality that is used by the introspection features, as well as for purposes like validation, serialization, and deserialization. Also, in some cases to parse strings into the actual data, as strings are often the only way for users to directly input data (like in the case of the CLI, or yaml/json config files). kiara lets frontends 'register' data, which involves creating said wrapper around it, after validating against the data type it is supposed to represent, calculating it's hash, and assigning it a globally unique value id. A kiara context keeps track of data using so called 'data-archives' / 'data-stores', which are classes that implement the necessary management features. Curently kiara has two types of stores: sqlite and filesystem, up until now 'filesystem' was the default, but that may/will change in the future (transparently to the user). Stores keep track of the values they contain, and are responsible for serializing and deserializing them. In most cases the details here are not important for end-users/front-end devs, as the API abstracts away the details of the stores. That might change in the future. kiara also has a feature that lets users assign meaningful (to them) aliases to values, similar to how filesystems work for Operating Systems (filenames -> inodes). This might or might not be useful for a specific frontend, it's ceratinly possible to create one without using this feature. The metadata that is contained in a kiara value includes its 'pedigree' (direct ancestor values & module type used to create it), 'lineage' (all ancestor values & module types used to create it), details about the environments it was created in (not fully implemented), its data type, pre-computed data-type specific properties, the hash for its serialized form, and a few other things I probably forgot. From the next version onwards kiara also supports exporting one or several values into a file that can be shared with other people, and used by them to import those values (incl. their metadata) into their own contexts. The API supports listing the ids of all values in a context, retrieving details about all of them, a filtered list of them (e.g. only values that are of a specific data type). Deleting of values is not supported atm, as it's surprisingly complex to implement and not a high priority for the use-cases I came across. |
Job managementkiara contains a component that manages when/how the jobs the user wants to run are run. In most cases a frontend would use the 'queue_job' endpoint, which returns a reference (uuid) to the job, and which can be used to retrieve the status and eventually result of the job. The job manager also has introspection features, like listing all jobs, and retrieving details about a specific job. There also exists a 'run_job' endpoint, which blocks (so probably not suitable for a UI frontend), and returns the result of the job directly. In addition, there are also 'queue_manifest' and 'run_manifest' endpoints, which are lower-level and not recommended for use by frontends. |
ConfigurationThe basic aspects of a kiara backend can be configured, this is split up into base configuration (KiaraConfig) and runtime_config (KiaraRuntimeConfig). Which configuration options are available can be looked up in the source code, or retrieved via introspection (get familiar with the cli and it's In most cases, configuration should not be necessary as the defaults should be sufficient, but if you have special requirements, then check out the configuration model classes. |
Let me know if you think I forgot anything, that's entirely possible. |
Also, happy to expand on anything that is unclear. |
just a quick comment about that: I think this shifted since the early stages of the project, since 2 years ago we defined a user persona of "modules creators" (via community plugins) (there is a discussion about that here: #5) |
What I meant is that with just the backend, kiara is fairly use-less. We want people to eventually do research with it. Just writing a module without anyone ever using it obviously doesn't make any sense, so module creators can't be our main persona. For doing research with kiara, we need to have at least one frontend, even if its just a very thin one like using the backend via jupyter (and the requirements that this would put on the backend). |
And this means that whatever features are important, they need to be defined/arrived at via frontend requirements. Otherwise there is no justification for any of them to exist. |
Sure this is not contradictory at all, but I think it is important to not forget about these module creators (who know data analysis/statistics python ecosystem and not software engineering python ecosystem, which are 2 very distinct areas of expertise even if both fields are technical/computing related and that Python is a common word) These users are important since they would bridge the gap for modules that are non existent and needed, as the dharpa team won't have the possibility to anticipate all needed mdoules. |
How do these somewhat abstract things map to what a user (of any kind) can actually do with kiara? Given a (future, imagined, all-powerful) UI or python expertise, what kind of tasks can I do, or what things can I achieve using kiara? My best guess from the above is as follows, but as you can tell from the amount of question marks, I really don't have a clear picture of what's currently possible
Does this list cover everything kiara can currently actually do? And does this cover all the end-user needs that have been identified in the various user research/surveys that have been done? |
Ok, again, the main point I'm trying to get across is that I did never got 'real' requirements from someone responsible for the frontend experience. I had to 'translate' the end-user needs we collected into a backend design, without knowing about the thing between, what is between the end-user and the backend API. I hope it's clear that this is not how things ideally work, but that was out of my control, and this is why the list of features I gave you is fairly generic/abstract. For some of the requirements (notes for example) I found it impossible to come up with an implementation without knowing more about how a frontend intends to use it. Does that make sense? I can answer all of the questions above, but before that I'd like to make sure we're on the same page about our basic premise:
If we can all agree on that premise in some way, that would be great. Then I'll go through all of the questions above and comment on them, assuming behind all of them is an an actual use-case and reason why those things should be possible. And I'd suggest that we come up with descriptions / wireframes / or whatever details we have about our two (? not sure about the topic-modeling one) mini-apps. We know what data will go into them, and what users want to do with them (users in this case being Caitlin and Lorella/Mariellal), and we use that to come up with a list of specs/requirements the backend needs to satisfy? As I said, it's fairly likely that some of the stuff kiara can do is not necessary at all, and would not be used at all by such a mini-app. Like, aliases, we might not need at all. Sharing workflows/pipelines, I can't really see how that would be important for a mini-app, but if you can show me a usage flow for that, I'm more than happy to implement it or change an existing implementation. I guess the short of it is that I really need your help with all that, and that I dont' have all the answers, and also that we should probably ask ourselves whether we should talk about all the possible features we could have or that exist, or limit our discussion to only what is needed for our specific next goal, the mini-apps. And those should have their own requirements inbuilt, totally independent from what exists atm, right? |
I feel like I spent a huge amount of time making this easy, writing tutorials on how to create modules targeted at that audience, creating the plugin template to make it easy to get started, investigating ways to make it easier to create a Python env (pixi), etc. To be honest, I'm running out of ideas (and time) to spend a lot more on this, and I was hoping that the docs sub-project could take off some of the load and others could jump in and clean up what already exists. If there is something you want specifically me to do, I'm also happy to do it, but as I said, my own ideas are sort of running out, and I could really need some help there...
Again, I feel like I've always had that in mind, and tried to make that as easy as possible. There is a discussion to be had about the quality of the modules we ship 'officially', but that would be independent of this. |
there was not a feature request at all in my previous message, this was just to acknowledge the existence of such back-end users, nothing more |
I do not understand what you mean by "not sure about the topic-modeling" one ? |
Concerning the topic modeling one, I think you saw the initial jupyter notebook, the wireframes, the list of inputs/outputs, the modules roadmap, and you participated in several of the functional previous prototype versions. Could you please explain what would be needed at this stage? |
Just that I'm not sure if we're planning to have a topic-modeling mini-app. |
A frontend dev who takes all that and designs/architects the frontend? Decides on the technical details, how it's implemeted etc. As I said before, I can't do that. |
Ah ok, this is something to confirm with @caro401 indeed. If not, I could come up with a streamlit one and/or be available to help if anything is required. At the moment I am preparing the modules, and using this as an opportunity for doc material. |
Just weighing in here - we absolutely will still be having a topic modelling mini-app, and all the modules that are written for that plugin can be used for the app as well as all the existing use-cases (i.e. the CLI and Jupyter), so prepping the modules and doing the documentation is hugely valuable work, thank you Mariella!! The goal is ultimately to have a topic-modelling mini-app and a network analysis / Tropy one, which (in essence) can use the same UI framework, and these will be the plugins/modules that are currently under development. These will also be useable in Jupyter notebooks, just with obviously a little more flexibility in terms of potentially combining plugins / introducing new ones given that it will be outside of a UI framework. In terms of @caro401 's initial features question, aka at it's most basic, what can kiara do for a researcher:
In all of this kiara acts as a 'wrapper' to the process, tracing and recording the metadata. Markus I understand this will probably be 'higher level' rather than technically accurate but as an overview of kiara from a very basic user point of view (removed from any consideration of API or frontend), this is correct yes? Essentially this is what Caro needs for the mini-app (I believe) so just confirmation or correction on this would be great. In terms then of building the mini-app(s) what is needed is really:
Some of this probably goes off this initial issue raised and also doesn't cover everything (like notes, for example, but we can put a pin in that until the end of February at least) and some of it requires a little further discussion, but for the main part this should set out some idea of what is needed versus what already exists and does or doesn't need further work on. |
Yes, exactly, for the purpose of the mini-app that would be a good set of initial requirements/features that kiara can fulfill. No need to complicate it further IMHO.
The API was specifically not built with Jupyter in mind, and that it works is purely coincidental (and hopefully also a bit because its design is good enough) and personally I'd probably would have written another layer for that specific purpose, but since everyone seems happy with its current state I won't bother. So, just for the record: I don't think the requirement for using kiara via Python/Jupyter are remotely the same as the ones needed to develop a UI. But again, since everyone seems happy I'm not going to argue that point any further. If the API needs to be changed in reaction of new/unforseen requirements coming from the mini-apps, this might be a problem because how to use it might change, which would have a knock-on effect on the Jupyter usage since that would have to be adjusted as well, docs would get outdated and needs updating, older Jupyter notebooks might not work anymore. I'll obviously put those breaking changes in the changelog, but not everyone reads those, so the experience can still be frustrating for end-users.
Not the onboarding ones, I have barely started working on them. |
Ah well, might as well go through the list since I can't be bothered to write any more code today. I want to make clear that for a lot of how things should work I don't have an answer myself, it's something I always assumed I'd have some help for figuring things out. Anyway, here goes:
Yes, you can import data. Basically by specifying inputs that refer to 'outside things' to an operation (like a file path/url). Simple data types like bools, integers, strings are imported directly, everything else is basically a byte-array. Since the important bits are the actual bytes (pun not intended), in most cases we are talking about files (either local ones, or remotely downloaded ones). How kiara stores those is abstracted away, and it depends on whether the user decides to 'store' a value or not, if not, depending on the file type the bytes might be stored in a temp file, or in memory. If the user decides to store the value, kiara stores it inside a kiara store (which can either be a folder structure, or a sqlite database -- as I said before this will be documented more in the issue I started about the data export). In addition, kiara also stores the metadata you should have seen by now in some way or other (if not, Aliases are just references, meaningful (to the user -- they are choosing them after all), human readable strings that point to a value id. Value ids are globally unique. Aliases can be overwritten, so the same alias (string) might point to a different value id depending on when you look. Those aliases are also stored inside a different store -- an alias store. As I said, its not clear to me whether a frontend would need aliases or not, it's certainly possible to create a UI without those, but it really depends on how you plan to guide the user, how you partition the UI, how/whether you want to provide data management, previews, etc. It's there if you need it, but you can also just ignore it. Deleting aliases would be possible, it's not implemented, but I can do that quickly if there is a requirement. Just didn't need it so far. Deleting values that have been 'stored' into a store is different, it's a fairly complex problem that I didn't have time to tackle yet, and again I'd probably like to know more about how frontends deal with data and present it to users before I finally get to it. It's difficult because I'm not sure what to do with other values that are before or behind the value that should be deleted in the lineage graph. Again, so far I didn't have a usage situation where this was necessary, and with the upcoming export feature it might become even less important. But again, not sure, it would really depend.
No, as I've tried to stress before, data within kiara can never be changed. You can only ever create new values, using old ones (as inputs) plus an operation (a 'configured module'). kiara tries to be smart about how it actually stores the bytes, to avoid storing the same byte-sequences over and over again. This works sometimes better, and sometimes worse, but that's nothing a non-backend dev would need to worry about (yet, anyway, if our storage needs get out of hand we might have to look into it). Again, aliases are tangential, and not necessary for any of this to work, they are just human readable references that point to a value id (at that point in time -- since the aliases could point to something else in the future). As I said above, if values are stored, kiara also stores the metadata of that value, which includes something I call 'pedigree' (basically the direct parent op & input values). This is then used to construct the lineage internally, basically gluing together all the pedigrees until a value doesn't have an op that produced it, but was imported directly.
Again, refer to the issue I started a few days ago, once I have a release I'll update this. No point duplicating the info, and it's not 100% fleshed out anyway. But since you can't change data in kiara this might not be relevant anyway?
Yes.
Well, a pipeline is just a yaml/json file, so yes, you can share that yaml file. Not sure why we would need that at this point, given our focus on jupyter and the mini-apps, but it's easy to do. The tutorial I've written about pipelines should give a good base for more detailed questions if necessary. Not sure how you define workflow, we've used that word on different occasions, and I've been guilty to use it to mean different things over time as my own picture of it evolved and I played around with a few ideas briefly. But there is no 'official' workflow feature atm, as far as I am aware. Again, this would probably be shaped by requirements coming from a frontend design decision. Sometimes I refer to a 'workflow (or kiara-) session', by which I mean the time between a kiara context is spun up, and the process it lives in ends. But that's just for technical stuff.
Yes, sorta. That's just one of those accidental features I needed for dev work (separating test data, etc.). I can flesh it out with little additional work if there is a requirement, but this is not something that came from user stories or something like that. Ignore it you you don't need it.
Again, this is something I need for dev work, and can be fleshed out if we come across something we want users to be able to change. Unless you have any ideas what that would be, you can ignore it and just keep in the back of your head that there is the option to implement something in that area. There are no user-stories related to that as far as I'm aware.
Yes, and again, whether it's useful or not totally depends on how you decide to implement your frontend. Querying configuration will probably never be useful for a mini-app, but why it's useful to list all operations that it can execute should be fairly obvious, I guess. But really really depends, can't stress that enough. For the streamlit prototype I tried to implement the frontend in a way that it doesn't take anything for granted (which modules/operations are available, which data types, how to render the data types, ...), and does things by investigating the internal kiara objects via the API. If you want to hard-code all the modules/operations you want to use like I think was talked about, it's probably not useful at all, and you wouldn't need most of the endpoints related to that. This is a sort of low-level feature, that is never directly referenced by user-stories (apart from some doc-related ones maybe), but I consider it a technical means to implement the higher-level features that actually come from user-stories. I might be wrong here, but I'm cautiously confident that I'm not.
Dunno about complicated. I'd need some actual usage patterns from a real frontend that wants to do things concurrently, but I've prepared some abstractions so that it'll be possible. kiara itself is not thread-safe, but running jobs concurrently would be possible if needed. The more important pattern I've prepared for is non-blocking job submission (
This is something I'm not sure about at all, so you'll need to ask someone else. I'd need some actual specs from a frontend how this would look like and work, what the notes would be attached to, how editing notes would work, and quite a few other things. So, that would be for you to figure out, but as soon as I get a description (and/or detailed specs), I'm confident I can implement it.
There are a lot of small features that came up while I developed the more overarching features Caitlin summarized (and which I think are the important ones for us atm). Most of those other smaller things are necessary for development, testing, or some experiments I did but small enough so I did not want to confuse/overload other people unless they become necessary because of a 'real-world' requirement. I didn't throw some of those half-baked features out because I figured there might be a chance they would become relevant in which case I'd continue the work (the API contains a few endpoints like that -- but they would be marked in the code docs). I guess the more important question for me would be: is there anything else you need for your mini-app? If you have any requirement, just let me know, and either I'll write you some example code how to do it (if already possible), or implement the thing if I can, or tell you why it's not possible or why I have concerns... |
Just a quick comment following our discussions during today's dev meeting. |
Here is a condensed version of the main kiara features. Most of them you should already have come across if you have used kiara in the past and went through the existing tutorials I wrote (information in those would be complementary to this -- I might not have included stuff here that is contained in there). This information here that can be found in more details via (mainly) the KiaraAPI source code or discovered via the cli (using the
--help
option o thekiara
command and sub-commands).As we've discussed before, the more important features are the ones that come from frontend requirements via the use-cases, those should be the 'official' ones. The backend won't be our product, the frontend(s) will be.
So there is a good chance I have missed some obvious, necessary features (since I haven't gotten real frontend requirements/specs yet), and in some cases I did deliberately not implement anything (like notes/comments) because the design of necessary API endpoints would depend entirely on how the frontend implements a specific feature. To that end, it's also important to note (again) that this API is the result of me guessing the endpoints a frontend might want to use, and designing them in a way so that changes can be made relatively easy. Which also means that I sort of assumed we can adjust the endpoints once frontend development gets on its way, and 'real' requirements crop up.
Obviously, that won't be possible every time, but given the constraints in the project it's something I tried to account for. So if you come across requirements that are not implemented, or slightly different to existing endpoints, let me know and we can discuss and adjust.
The text was updated successfully, but these errors were encountered: