-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH: DAG facilitating nested DataCatalog structure #648
Comments
Hi @felixschmitz, what you are discussing is only a cosmetic issue when creating an image of the dag with Using the data catalog name to resolve duplicates sounds like a reasonable idea. pytask does something related to nodes with path names by removing as many prefixes as possible while keeping the name unique. Using the nested dictionary and reading the prefixes from the keys is only possible if there is a Do you want to fix it? I could give you some guidance. |
Thanks & apologies for my late response. Yet, I disagree with your elaboration on what pytask is doing "related to nodes with path names by removing as many prefixed as possible while keeping the name unique". The result I am observing, is that in the image of the dag, the element of a datacatalog gets shortened to "fitted_model", which is non-unique across the datacatalogs. Hence, all tasks point towards "fitted_model" and suggest there is a single (shared) product. My point is that the name is non-unique, when it should be. Do you agree on that? |
I don't think we have a disagreement here 😄. Maybe my comment was confusing and going too much on a tangent. The comment was about related functionality to keep displayed names short but not about the specific issue. Long names easily pollute the whole command line interface. When it comes to the data catalog node names in the DAG, pytask just displays the names of the nodes which are the keys to the data catalog without any shortening. In the linked PR I have a fix ready, I just have to think about whether I like it. BTW: Please, consider next time to post the command you ran plus the image of the DAG. It would have helped to understand the issue even better. |
Alright, that helped! Thanks a lot. I'd be happy if the feature (or a similar solution) made it into the package. |
I second that the DAG representation can be very misleading, so some fix would be great! While you are at the data catalog, would it be possible to bring |
Is your feature request related to a problem?
When using a nested
DataCatalog
of the kindand adding products to a
DataCatalog
e.g. via the following task:as described in the extended DataCatalog guide, I would expect the DAG to facilitate the nested structure of the
DataCatalog
.For now the
PickleNode
's name, "fitted_model" in the example, is only used in the representation of the DAG. When having multiple models and datasets, the information "fitted_model" is on the one hand insufficient, and on the other hand, produces a DAG which implies the wrong structure and dependencies.Describe the solution you'd like
I would want the DAG to facilitate the nested structure of the
DataCatalog
and not only use thePickleNode
's name. One approach would be to display in the DAG the name of theDataCatalog
and thePickleNode
, e.g.ols1-data_1-fitted_model
. Another approach would be to use the key values ofnested_data_catalogs
and join these with thePickleNode
's name, producing a similar result in the example above, but guaranteeing a more informative name in general.The text was updated successfully, but these errors were encountered: