Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[INF]Documentation improvement #867

Open
19 tasks
viswa-nvidia opened this issue Mar 21, 2023 · 1 comment
Open
19 tasks

[INF]Documentation improvement #867

viswa-nvidia opened this issue Mar 21, 2023 · 1 comment
Assignees
Labels
documentation Improvements or additions to documentation roadmap
Milestone

Comments

@viswa-nvidia
Copy link

viswa-nvidia commented Mar 21, 2023

Description

We want to focus on improving our documentations.

We brainstormed / collected following ideas (not prioritised, yet):

General

  • Merlin Overview Page
  • Installation
  • Schema File and Definition
  • Merlin DAG
  • Session-based Tensorflow API
  • Negative sampling strategy api documentation
  • migration guide from Transformers4Rec to Merlin Models Transformer API
  • Examples by industry(retail/Finance/M&E)
  • Examples by use cases (homepage carousel/item2item/etc.)
  • Best practices for enterprise customers

quick start example documentation

How to write an operator

  • NVTabular
  • Merlin Systems

Inline Documentation (Coverage)

Docstring Coverage (March 28th):

  • Merlin Models: 40%
  • Transformers4Rec: 41%
  • Merlin Systems: 80%
  • Merlin Core: 80%
  • DataLoader: 78%

Some previous attempts:
#788
#795
#794

@viswa-nvidia viswa-nvidia added the documentation Improvements or additions to documentation label Mar 21, 2023
@viswa-nvidia viswa-nvidia added this to the Merlin 23.04 milestone Mar 21, 2023
@bschifferer
Copy link
Contributor

bschifferer commented Mar 28, 2023

Docstring Coverage (March 28th):
Merlin Models: 40%
Transformers4Rec: 41%
Merlin Systems: 80%
Merlin Core: 80%
DataLoader: 78%

Merlin Models:
============================ Coverage for /workspace/01_MerlinDev/62_DocStrings/models/merlin/ ============================
--------------------------------------------------------- Summary ---------------------------------------------------------

Name Total Miss Cover Cover%
datasets/synthetic.py 4 1 3 75%
datasets/advertising/criteo/dataset.py 5 3 2 40%
datasets/ecommerce/aliccp/dataset.py 4 2 2 50%
datasets/ecommerce/booking/dataset.py 5 1 4 80%
datasets/ecommerce/dressipi/dataset.py 3 2 1 33%
datasets/entertainment/movielens/dataset.py 7 4 3 43%
models/config/schema.py 11 6 5 45%
models/tf/loader.py 5 2 3 60%
models/tf/blocks/cross.py 7 5 2 29%
models/tf/blocks/dlrm.py 2 1 1 50%
models/tf/blocks/experts.py 19 12 7 37%
models/tf/blocks/interaction.py 13 8 5 38%
models/tf/blocks/mlp.py 8 6 2 25%
models/tf/blocks/optimizer.py 12 4 8 67%
models/tf/blocks/retrieval/base.py 15 9 6 40%
models/tf/blocks/retrieval/matrix_factorization.py 5 3 2 40%
models/tf/blocks/sampling/base.py 4 4 0 0%
models/tf/blocks/sampling/cross_batch.py 6 4 2 33%
models/tf/blocks/sampling/in_batch.py 7 5 2 29%
models/tf/core/aggregation.py 39 29 10 26%
models/tf/core/base.py 39 27 12 31%
models/tf/core/combinators.py 40 21 19 48%
models/tf/core/encoder.py 20 9 11 55%
models/tf/core/index.py 14 9 5 36%
models/tf/core/prediction.py 9 5 4 44%
models/tf/core/tabular.py 36 27 9 25%
models/tf/distributed/embedding.py 5 2 3 60%
models/tf/experimental/sample_weight.py 5 3 2 40%
models/tf/inputs/continuous.py 9 6 3 33%
models/tf/inputs/embedding.py 46 32 14 30%
models/tf/losses/base.py 1 1 0 0%
models/tf/metrics/evaluation.py 19 12 7 37%
models/tf/metrics/topk.py 25 15 10 40%
models/tf/models/base.py 62 34 28 45%
models/tf/models/utils.py 2 2 0 0%
models/tf/outputs/base.py 12 9 3 25%
models/tf/outputs/block.py 5 3 2 40%
models/tf/outputs/classification.py 15 10 5 33%
models/tf/outputs/contrastive.py 13 9 4 31%
models/tf/outputs/topk.py 13 5 8 62%
models/tf/outputs/sampling/base.py 10 7 3 30%
models/tf/outputs/sampling/in_batch.py 7 5 2 29%
models/tf/outputs/sampling/popularity.py 5 3 2 40%
models/tf/prediction_tasks/base.py 22 16 6 27%
models/tf/prediction_tasks/classification.py 12 8 4 33%
models/tf/prediction_tasks/next_item.py 6 3 3 50%
models/tf/prediction_tasks/regression.py 5 3 2 40%
models/tf/prediction_tasks/retrieval.py 5 4 1 20%
models/tf/transformers/block.py 16 7 9 56%
models/tf/transformers/transforms.py 23 14 9 39%
models/tf/transforms/bias.py 17 13 4 24%
models/tf/transforms/features.py 50 36 14 28%
models/tf/transforms/noise.py 5 4 1 20%
models/tf/transforms/regularization.py 3 2 1 33%
models/tf/transforms/sequence.py 48 23 25 52%
models/tf/transforms/tensor.py 5 4 1 20%
models/tf/utils/batch_utils.py 8 5 3 38%
models/tf/utils/repr_utils.py 5 5 0 0%
models/tf/utils/search_utils.py 3 3 0 0%
models/tf/utils/testing_utils.py 11 7 4 36%
models/tf/utils/tf_utils.py 24 15 9 38%
models/torch/losses.py 2 1 1 50%
models/torch/block/base.py 19 14 5 26%
models/torch/block/mlp.py 4 4 0 0%
models/torch/features/base.py 1 1 0 0%
models/torch/features/continuous.py 4 3 1 25%
models/torch/features/embedding.py 15 10 5 33%
models/torch/features/tabular.py 4 1 3 75%
models/torch/model/base.py 32 24 8 25%
models/torch/model/prediction_task.py 6 6 0 0%
models/torch/tabular/aggregation.py 13 9 4 31%
models/torch/tabular/base.py 29 15 14 48%
models/torch/tabular/transformations.py 9 7 2 22%
models/torch/utils/data_utils.py 14 9 5 36%
models/torch/utils/torch_utils.py 20 14 6 30%
models/utils/dataset.py 7 4 3 43%
models/utils/dependencies.py 4 4 0 0%
models/utils/doc_utils.py 1 1 0 0%
models/utils/misc_utils.py 9 4 5 56%
models/utils/nvt_utils.py 1 1 0 0%
models/utils/registry.py 19 12 7 37%
models/utils/schema_utils.py 12 9 3 25%
------------------------------------------------------------- -------------- ------------- -------------- ---------------
TOTAL 1172 692 480 41.0%
-------------------------------------------------------------------------------------------------------------------------
(16 of 98 files omitted due to complete coverage)

Transformers4Rec:
========================== Coverage for /workspace/01_MerlinDev/62_DocStrings/Transformers4Rec/ ===========================
--------------------------------------------------------- Summary ---------------------------------------------------------

Name Total Miss Cover Cover%
merlin_standard_lib/proto/schema_bp.py 47 8 39 83%
merlin_standard_lib/schema/schema.py 30 29 1 3%
merlin_standard_lib/utils/embedding_utils.py 2 2 0 0%
transformers4rec/config/schema.py 6 6 0 0%
transformers4rec/config/transformer.py 22 22 0 0%
transformers4rec/data/dataset.py 4 4 0 0%
transformers4rec/torch/experimental.py 4 3 1 25%
transformers4rec/torch/losses.py 2 1 1 50%
transformers4rec/torch/masking.py 19 8 11 58%
transformers4rec/torch/ranking_metric.py 9 8 1 11%
transformers4rec/torch/trainer.py 19 6 13 68%
transformers4rec/torch/block/base.py 19 19 0 0%
transformers4rec/torch/block/mlp.py 4 4 0 0%
transformers4rec/torch/block/transformer.py 8 5 3 38%
transformers4rec/torch/features/base.py 1 1 0 0%
transformers4rec/torch/features/continuous.py 4 3 1 25%
transformers4rec/torch/features/embedding.py 17 11 6 35%
transformers4rec/torch/features/sequence.py 9 6 3 33%
transformers4rec/torch/features/tabular.py 4 1 3 75%
transformers4rec/torch/model/base.py 31 20 11 35%
transformers4rec/torch/model/prediction_task.py 14 11 3 21%
transformers4rec/torch/tabular/aggregation.py 13 9 4 31%
transformers4rec/torch/tabular/base.py 29 15 14 48%
transformers4rec/torch/tabular/transformations.py 12 9 3 25%
transformers4rec/torch/utils/data_utils.py 14 8 6 43%
transformers4rec/torch/utils/examples_utils.py 4 1 3 75%
transformers4rec/torch/utils/schema_utils.py 1 1 0 0%
transformers4rec/torch/utils/torch_utils.py 29 16 13 45%
transformers4rec/utils/data_utils.py 4 2 2 50%
transformers4rec/utils/dependencies.py 3 3 0 0%
------------------------------------------------------------- -------------- ------------- -------------- ---------------
TOTAL 413 242 171 41.4%
-------------------------------------------------------------------------------------------------------------------------

Merlin Systems:
======================= Coverage for /workspace/01_MerlinDev/62_DocStrings/systems/merlin/systems/ ========================
--------------------------------------------------------- Summary ---------------------------------------------------------

Name Total Miss Cover Cover%
model_registry.py 4 2 2 50%
dag/ops/faiss.py 8 3 5 62%
dag/ops/feast.py 6 1 5 83%
dag/ops/fil.py 27 7 20 74%
dag/ops/implicit.py 6 2 4 67%
dag/ops/pytorch.py 4 1 3 75%
dag/ops/session_filter.py 5 1 4 80%
dag/ops/softmax_sampling.py 4 1 3 75%
dag/ops/tensorflow.py 5 1 4 80%
dag/ops/unroll_features.py 3 2 1 33%
dag/ops/workflow.py 4 1 3 75%
dag/runtimes/triton/ops/fil.py 11 1 10 91%
dag/runtimes/triton/ops/operator.py 4 1 3 75%
dag/runtimes/triton/ops/pytorch.py 6 1 5 83%
triton/utils.py 5 2 3 60%
triton/models/pytorch_model.py 3 1 2 67%
workflow/base.py 3 3 0 0%
workflow/hugectr.py 2 2 0 0%
workflow/pytorch.py 1 1 0 0%
workflow/tensorflow.py 1 1 0 0%
------------------------------------------------- ----------------- ---------------- ----------------- ------------------
TOTAL 176 35 141 80.1%
-------------------------------------------------------------------------------------------------------------------------

Merlin Core:
============================= Coverage for /workspace/01_MerlinDev/62_DocStrings/core/merlin/ =============================
--------------------------------------------------------- Summary ---------------------------------------------------------

Name Total Miss Cover Cover%
dag/base_operator.py 11 2 9 82%
dag/graph.py 5 2 3 60%
dag/node.py 13 4 9 69%
dtypes/shape.py 8 6 2 25%
schema/schema.py 22 5 17 77%
schema/io/schema_bp.py 51 11 40 78%
table/conversions.py 4 4 0 0%
table/cupy_column.py 6 1 5 83%
table/numpy_column.py 6 1 5 83%
table/tensor_column.py 7 1 6 86%
table/tensor_table.py 12 3 9 75%
table/tensorflow_column.py 8 3 5 62%
table/torch_column.py 6 1 5 83%
----------------------------------------- ------------------- ------------------ ------------------- --------------------
TOTAL 232 44 188 81.0%
-------------------------------------------------------------------------------------------------------------------------

DataLoader:
========================== Coverage for /workspace/01_MerlinDev/62_DocStrings/dataloader/merlin/ ==========================
--------------------------------------------------------- Summary ---------------------------------------------------------

Name Total Miss Cover Cover%
dataloader/loader_base.py 14 7 7 50%
dataloader/tensorflow.py 5 1 4 80%
dataloader/ops/embeddings/embedding_op.py 7 2 5 71%
dataloader/utils/tf/tf_trainer.py 2 1 1 50%
dataloader/utils/torch/torch_trainer_dist.py 4 3 1 25%
--------------------------------------------------------- --------------- -------------- --------------- ----------------
TOTAL 66 14 52 78.8%
-------------------------------------------------------------------------------------------------------------------------

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation roadmap
Projects
None yet
Development

No branches or pull requests

3 participants