Releases: NVIDIA-Merlin/models
Releases · NVIDIA-Merlin/models
v0.8.0
What’s Changed
⚠ Breaking Changes
- Add loader transforms @edknv (#740)
- Move transformation-layers to sub package @marcromeyn (#710)
🐜 Bug Fixes
- Bug when predicting with list-column @marcromeyn (#758)
- Fix saving + reloading models with ContrastiveOutput @marcromeyn (#757)
- Fixed inconsistency on multi-task learning metrics due to a mismatch or tasks and metrics order @gabrielspmoreira (#751)
- Implement
on_epoch_end
method onBatchedDataset
@oliverholworthy (#724) - Fixes RegressionTask which was raising an error because compute_output_shape() was not defined @gabrielspmoreira (#741)
🚀 Features
- Return correct type from data augmentation layer depending on the context @oliverholworthy (#703)
- Add test for
TwoTowerModel
withInputBlockV2
andEmbeddings
@oliverholworthy (#759) - XGBoost - Use DaskDMatrix for evals data to ensure metrics in logs match result of evaluate @oliverholworthy (#682)
- Add
BroadcastToSequence
Layer to broadcast context features to match sequence shapes. @oliverholworthy (#737) - Fix saving + reloading models with ContrastiveOutput @marcromeyn (#757)
- Add select_by_tag to ParallelBlock @edknv (#701)
- Add loader transforms @edknv (#740)
- Making InputBlockV2 more generic @marcromeyn (#754)
- Example notebook of Wide&Deep model @gabrielspmoreira (#716)
- Introducing Outputs-V2 @marcromeyn (#715)
- Move transformation-layers to sub package @marcromeyn (#710)
- Add EncoderBlock @marcromeyn (#705)
📄 Documentation
- Fix Feature Engineering Dressipi @bschifferer (#723)
🔧 Maintenance
- Improve reliability of
test_train_metrics_steps
with a custom test metric @oliverholworthy (#766) - Remove
cluster_type
fromdask_client
test fixture @oliverholworthy (#763) - Update tensorflow version test matrix version specifiers for clarity @oliverholworthy (#726)
- Update
ItemRetrievalScorer
to use tf_inspect to get call args @oliverholworthy (#756) - Added InputBlockV2 support to DeepFMModel (refactored and fixed) and DCNModel @gabrielspmoreira (#717)
v0.7.0
What’s Changed
🐜 Bug Fixes
- Fix ecommerce session based test @edknv (#690)
- Add
dtype
conversion for continuous columns in synthetic data generation @oliverholworthy (#662) - Breaks ties of top-k metrics @gabrielspmoreira (#653)
- Enable saving of TwoTowerModel @oliverholworthy (#615)
- Update stacking in
DLRMBlock
to use correct axis for the dot product. @oliverholworthy (#637) - Fix lightfm evaluate code @benfred (#616)
- Enable predict in XGBoost without targets in provided dataset @oliverholworthy (#612)
- Use labels instead of predictions in XGBoost eval data @karlhigley (#610)
- Stop reordering columns in merlin.models.xgb.dataset_to_xy @radekosmulski (#603)
🚀 Features
- Extend
EmbeddingTable
andEmbeddings
with support for shared embeddings @oliverholworthy (#700) - Filter features in ParallelBlock @edknv (#686)
- Improvements in Wide and Deep architecture @Timmy00 (#638)
- Add CategoryEncoding transformation block @Timmy00 (#614)
- Add Layer freezing @Timmy00 (#635)
- Add
save
andload
methods to XGBoost class @oliverholworthy (#656) - Restructuring Embeddings @marcromeyn (#649)
- enhance pretrained-embeddings example to set trainable=False for the pre-trained embeddings @rnyak (#654)
- Tf/categorical prediction v2 @sararb (#633)
- Enable customization of embeddings in DLRM Model @oliverholworthy (#619)
- Add
trainable
andembeddings_initializers
args toEmbeddings
function @rnyak (#634) - Add Wide Deep model @Timmy00 (#623)
- Enable static weights to be used with
EmbeddingTable.from_pretrained
@oliverholworthy (#632) - Enable
TabularData
to be used withEmbeddingTable
@oliverholworthy (#631) - Add LazyAdam optimizer @Timmy00 (#602)
- Tf/contrastive prediction @marcromeyn (#594)
- Hashed Crosses for all levels @Timmy00 (#590)
- Supports task-specific sample weights, weighed_metrics and fixes InputBlockV2 @gabrielspmoreira (#600)
- Introducing new base-implementations to replace prediction-tasks @marcromeyn (#589)
- Add multi optimizer @Timmy00 (#581)
- Introducing MapValues @marcromeyn (#591)
- Fix CategoricalOneHot with compute_output_shape and check inputs. @Timmy00 (#597)
- Add EmbeddingOptions arg in the DLRM block @rnyak (#592)
- Decoupling of InputBlock(v2) and Embedding Tables, support to Ragged Embeddings Lookup and AverageEmbeddingsByWeightFeature @gabrielspmoreira (#593)
📄 Documentation
- update logo usecase @bschifferer (#693)
- [WIP] Add Dressipi RecSys Util Function to Merlin Models @bschifferer (#664)
- update 01-Getting-started @radekosmulski (#671)
- update logos for example notebooks 02 - 06 @radekosmulski (#672)
- add tracking logo to examples/07 @radekosmulski (#661)
- Add lightfm and implicit training examples @radekosmulski (#629)
🔧 Maintenance
- Separate the
dev
anddocs
requirements @karlhigley (#708) - Fix ecommerce session based test @edknv (#690)
- Add
dtype
conversion for continuous columns in synthetic data generation @oliverholworthy (#662) - Add unit test for the trainable embeddings func @rnyak (#648)
- Update
versioneer
from 0.20 to 0.23 @oliverholworthy (#646) - Remove
-e
from pip install commands in github workflows @oliverholworthy (#645) - Trying to make model unit-tests faster @marcromeyn (#595)
v0.6.0
What’s Changed
⚠ Breaking Changes
- Remove masking for now @marcromeyn (#557)
- Removing Queue-based negative-sampling for now @marcromeyn (#558)
🐜 Bug Fixes
- Add signature checks for various models like two-tower & DLRM @marcromeyn (#588)
- Keep Dataloader lazy when using map operations @oliverholworthy (#572)
- Enable serialization of TensorInitializer @oliverholworthy (#549)
- EmbeddingFeatures - Move embedding table creation from build to init @oliverholworthy (#532)
- Handle ParallelBlock in Model.from_block @oliverholworthy (#517)
- Propagate schema to branches if needed in ParallelBlock @marcromeyn (#478)
🚀 Features
- Hashed cross @Timmy00 (#587)
- Update UniformNegativeSampling to handle targets and add optional control for testing @oliverholworthy (#583)
- Allow for blocks in the Model to transform the targets @marcromeyn (#554)
- Add In-Batch Negative Sampling Block for positive-only batches @oliverholworthy (#560)
- Add functionality to set activation as a list in MLPBlock @rvk007 (#548)
- Add Cond Layer to tensorflow combinators @oliverholworthy (#552)
- Adding AsRaggedTensors transformation @marcromeyn (#556)
- Breaking out BaseModel to support sub-classing @marcromeyn (#518)
- Add new block EmbeddingTable @oliverholworthy (#541)
- Adding pre & post to Model @marcromeyn (#542)
- XGBoost - Add evals argument to fit to support eval on datasets other than training @oliverholworthy (#538)
- XGBoost - Train/eval with features from schema passed in to the constructor @oliverholworthy (#531)
- XGBoost - Use DaskDeviceQuantileDMatrix with GPU Training @oliverholworthy (#528)
- Use standard Keras embedding layers inside
EmbeddingFeatures
block @karlhigley (#472) - Refactored top-k metrics and created TopKMetricsAggregator for optimized metrics computation @gabrielspmoreira (#514)
📄 Documentation
- Attribute source of MF image @mikemckiernan (#568)
- add xgboost example @radekosmulski (#522)
- add usecase with pretrained embeddings @radekosmulski (#508)
- Update URLs to Criteo datasets @mikemckiernan (#516)
🔧 Maintenance
- Adding the last 2 major TF versions to our CPU-tests @marcromeyn (#573)
- Add Keras layer_test function to testing_utils @oliverholworthy (#584)
- Attempt to remove docs dependencies @mikemckiernan (#586)
- Integration Tests for Retrieval models @gabrielspmoreira (#537)
- Move data-augmentation transformations into the new data_augmentation package @marcromeyn (#566)
- Moving tf/core/blocks to tf/core @marcromeyn (#565)
- Replace usage of tensorflow.python.keras with tensorflow.keras @oliverholworthy (#555)
- Make blocks part of Model and not of SequentialBlock @marcromeyn (#551)
- Adding schema's to manifest @marcromeyn (#539)
- Add tests for train_metrics_steps ensuring we can control the frequency of computing metrics @oliverholworthy (#519)
- Add tests for calling filter block when output returns empty @oliverholworthy (#481)
v0.5.0
What’s Changed
- Quick fix to simplify DLRM when there are no continuous features @marcromeyn (#479)
- Move Input-blocks @marcromeyn (#471)
- Fixing the inconsistency when exporting embeddings to cudf DataFrame @gabrielspmoreira (#452)
- Remove the
mask
variable from theModelContext
@karlhigley (#449) - Move masking into the
FeatureContext
@karlhigley (#443) - Create
FeatureContext
and pass it to model layers withcall_layer
@karlhigley (#428) - Rename "mask schema" to "mask" or "feature mask" @karlhigley (#425)
- Rename
BlockContext
toModelContext
@karlhigley (#417) - Create
FeatureCollection
as a way to pass input features to blocks @karlhigley (#418) - Added TensorInitializer that can be used to initialize variables/embeddings with pretrained weights @gabrielspmoreira (#357)
⚠ Breaking Changes
- Remove
Block.to_model
@benfred (#501) - Moving loss- and metrics-calculation to model instead of the predicti… @marcromeyn (#431)
- Make Model creation explicit @karlhigley (#432)
🐜 Bug Fixes
- Fixes computing metrics each N steps during training and adding regularization_loss metric @gabrielspmoreira (#511)
- Fix to pass shuffle argument to dataloader @benfred (#493)
- fix broken links in the 04-Exporting-ranking-models.ipynb nb @rnyak (#490)
- Fixes errors on retrieval models evaluation in graph mode @gabrielspmoreira (#488)
- Fixes and improvements in ranking metrics @gabrielspmoreira (#475)
- Fixes and improves masking (for both Causal LM and Masked LM) and adds SequenceAggregation.MASKED_MEAN (used by YouTubeDNN) @gabrielspmoreira (#464)
- Replace use of removeprefix function which requires Python 3.9+ @oliverholworthy (#465)
- Parametrize base-model-tests to run in both graph- and eager-mode @marcromeyn (#456)
- Add validation for movielens variant and fix docstring @oliverholworthy (#434)
- Use collections.abc for Sequence import @oliverholworthy (#427)
🚀 Features
- XGBoost - Switch to dask API @oliverholworthy (#466)
- Fixes and improves masking (for both Causal LM and Masked LM) and adds SequenceAggregation.MASKED_MEAN (used by YouTubeDNN) @gabrielspmoreira (#464)
- Moving loss- and metrics-calculation to model instead of the predicti… @marcromeyn (#431)
- XGBoost - Change group_columns to qid_column @oliverholworthy (#462)
- First sketch of XGBoost integration with Merlin Dataset @oliverholworthy (#433)
- Getting rid of feature-variables in ModelContext @marcromeyn (#455)
- Adding the missing logQ correction to sampled softmax @gabrielspmoreira (#457)
- Making YoutubeDNNRetrievalModel more configurable @gabrielspmoreira (#447)
- Making EmbeddingOptions.embedding_dims to have priority over .inferred_embedding_dims @gabrielspmoreira (#398)
📄 Documentation
- Update to refer to merlin-tensorflow container @mikemckiernan (#512)
- Fix prediction task documentation @benfred (#505)
- Fix dataloader docstring @benfred (#492)
- Add common release-drafter workflow @mikemckiernan (#439)
- Add common release-drafter configuration @mikemckiernan (#435)
- Adding container info to notebook 01, 02 and 06 @bschifferer (#424)
🔧 Maintenance
- Fix pytest marker for integration tests. @oliverholworthy (#510)
- Add a GA workflow that requires labels on PR's @benfred (#506)
- Use shared implementation of triage workflow @benfred (#504)
- remove unnecessary line @radekosmulski (#494)
- Adding automatic branch pruning to ParallelBlock @marcromeyn (#477)
- Move tabular tests from test_core to test_tabular @marcromeyn (#476)
- Remove tf.test.is_gpu_available call @benfred (#470)
- Update setup.py to include only the merlin namespace @oliverholworthy (#469)
- Add labels to PRs for change log @mikemckiernan (#453)
- Make use of model_test for all ranking- and benchmark-models @marcromeyn (#440)
- Move tests to tests/unit or tests/integration @jperez999 (#445)
- Creating generic model_test @marcromeyn (#430)
- Create separate a Github-workflow for each backend @marcromeyn (#423)
- rm git pull cmd in test unit @jperez999 (#438)
- Adding CI for notebooks 03, 04, 05 and 06 @bschifferer (#414)
v0.4.0
Known Issues
- model.evaluate() gives much higher acc value for small batch size when we set validation_data int the model.fit() #373
What's Changed
- Fixing MF by applying L2-norm in the towers individually by @gabrielspmoreira in #353
- Ensure inferred embedding dim from feature cardinality is multiple of 8 by @gabrielspmoreira in #355
- Update advanced_example notebook by @sararb in #360
- Refactoring the evaluation of RetrievalModel by @marcromeyn in #352
- First design of popularity correction by @sararb in #362
- Fixes and improvements in embeddings export for MatrixFactorizationModel and TwoTowerModel by @gabrielspmoreira in #365
- Adding AlphaDropout support, which is recommended by selu activation by @gabrielspmoreira in #369
- Simplified TopKIndexBlock implementation and fixed intermittent inconsistency on retrieval models eval by @gabrielspmoreira in #370
- Evaluation metrics of a top-k recommender by @sararb in #371
- Adding unttests for example notebooks by @bschifferer in #377
- Adding option for embeddings_l2_reg, which is in particular critical for MF by @gabrielspmoreira in #381
- Creating a reg_factor for PopularityLogitsCorrection by @gabrielspmoreira in #386
- Adding unittests for Notebook 04 and 05 by @bschifferer in #385
- fix accessing item-id embeddings table based on IntDomain name by @sararb in #376
- Fix top-k recommender to return correct top-k relevant items. by @sararb in #389
- Update
testbook
tests to use absolute notebook paths by @karlhigley in #394 - Add missing features to Ali-ccp data generator by @rnyak in #399
- Fixes error with fit() with validation data by @gabrielspmoreira in #402
- Update README.rst by @bschifferer in #401
- Add unit test for 03_exploring_different_models by @rnyak in #405
- remove unneeded packages by @jperez999 in #412
- Use shuffle_df from merlin-core by @benfred in #413
- Adding CI for movielens notebooks by @bschifferer in #411
- Remove
.python-version
file by @karlhigley in #416 - Update the tags selection from the schema by excluding the target variable by @sararb in #397
- Create a basic conda recipe by @benfred in #420
- docs: Use sphinx-ext-toc to control examples by @mikemckiernan in #404
Full Changelog: v0.3.0...v0.4.0
v0.3.0
What's Changed
- fix for tf when creating cpu dataloader by @jperez999 in #222
- Create the necessary blocks/methods for ItemRetrieval evaluation by @sararb in #189
- Adding wrap_as_model to Block by @marcromeyn in #209
- chore: add PR doc previews by @mikemckiernan in #229
- Refactors and optimizes evaluation with ranking metrics by @gabrielspmoreira in #226
- Dataloader/targets by @marcromeyn in #195
- Break up core.py by @marcromeyn in #227
- Improvements in two-tower model and DCN by @gabrielspmoreira in #237
- Adding NCFModel as benchmark by @sararb in #230
- fix: Typo in workflow script by @mikemckiernan in #248
- Updates container testing by @albert17 in #234
- Specify encoding when reading csv file for ml-100k by @benfred in #249
- chore: Trigger on closed pull requests by @mikemckiernan in #251
- docs: Incremental improvements by @mikemckiernan in #193
- Set up automated docstring coverage checks by @karlhigley in #257
- Automatic setting of retrieval model top-k evaluation by @sararb in #238
- Fixing MLPBlock saving with regularizers by @marcromeyn in #264
- Check if valid data is merlin.io.dataset and then convert to BatchedDataset by @rnyak in #269
- Fix mypy issues & make MF return a RetrievalModel by @marcromeyn in #266
- Update Dataset schema when schema is passed in BatchedDataset by @marcromeyn in #265
- Fix small bug when shuffle is passed in .fit by @marcromeyn in #272
- Moving RetrievalModel._ensure_unique to utils by @marcromeyn in #267
- Fix bug in tower saving where all features were required by @marcromeyn in #275
- Move interrogate docstring coverage settings to pyproject.toml by @karlhigley in #276
- Fix MatrixFactorizationModel bug by @sararb in #280
- Implementation of DeepFM model by @sararb in #271
- Add intersphinx mapping for merlin-core by @benfred in #281
- Examples for Getting Started and Schema File by @bschifferer in #253
- Explore different ranking models by @rnyak in #256
- Item retrieval evaluation fixes by @gabrielspmoreira in #279
- [Doc] a hyperlink missing by @rhdong in #286
- docs: Add API documentation by @mikemckiernan in #212
- Exclude additional directories from docstring coverage req't by @karlhigley in #293
- fix evaluation loss equal to zero and add docstrings by @sararb in #294
- docs: Add nightly multi-version build by @mikemckiernan in #298
- fix: Do not rebuild on PR close by @mikemckiernan in #299
- Retrieval models (ALI-CCP)- Two-Tower model example by @rnyak in #262
- docs: Revise API page by @mikemckiernan in #305
- Exporting ranking models by @rnyak in #300
- Small fixes on Matrix Factorization by @gabrielspmoreira in #306
- first draft of advanced example notebook by @sararb in #295
- Fix default int64 dtype in index.py to avoid dtype errors in Retrieval model validation step by @rnyak in #307
- Specify encoding when reading csv file for ml-1m by @benfred in #310
- Example Readme by @bschifferer in #296
- fix categorical-ohe block by @sararb in #314
- Improving handling of data in examples + tests by @marcromeyn in #309
- quick fix in aliccp schema by @sararb in #323
- fix binary tag in Ali-CCP workflow by @sararb in #325
- updates from readme bash by @sararb in #303
- Pairwise losses fixes by @gabrielspmoreira in #317
- Upgrading black + enabling black on notebooks by @marcromeyn in #326
- docs: Add redirect page by @mikemckiernan in #329
- Fix the evaluation of sampled softmax-based tasks by @sararb in #315
- Fixing synthetic data-generation for aliccp-raw by @marcromeyn in #327
- Adding codespell + flake8-nb to pre-commit by @marcromeyn in #331
- remove barchart from example_utils by @rnyak in #334
- Fix 03-03-05 nbs wrt bug bash by @rnyak in #333
- Merlin models examples fixes by @gabrielspmoreira in #313
- Update NVTabular and merlin-core requirements for the release by @benfred in #347
- Report skipped tests in blossom CI by @benfred in #348
- Fixing error on retrieval model evaluation when using mixed_precision by @gabrielspmoreira in #337
- Automate pushing package to pypi by @benfred in #349
New Contributors
Full Changelog: v0.2.0...v0.3.0
v0.2.0
What's Changed
- Fix TARGET tag typo in dataloader and
sampler
initialization in ItemRetrievalTask() by @rnyak in #176 - Pointwise, Pairwise and Listwise Losses by @gabrielspmoreira in #166
- Add download, convert and etl util func for movielens dataset by @rnyak in #155
- Copying Github templates from T4Rec by @marcromeyn in #165
- [REVIEW] docs: initial commit for Sphinx docs by @mikemckiernan in #174
- fix references to item-id column by @sararb in #180
- Activates GPU CI System by @albert17 in #186
- quick fix of setting dtypes from sparse inputs by @sararb in #183
- Adding some basic doc-strings by @marcromeyn in #185
- Introducing TupleAggregator & use it for CosineSimilarity + ElementWiseMultiply by @marcromeyn in #164
- Unified Schema by @benfred in #184
- Adding models by @marcromeyn in #187
- Fix SyntheticData.read_schema with proto text files by @benfred in #191
- Adding batch-predict by @marcromeyn in #145
- Move to 'merlin' namespace by @benfred in #198
- Fix ItemRetrieval assertion and output errors by @rnyak in #200
- Adds initializers and regularizers options to MLPBlock by @gabrielspmoreira in #206
- Update merlin core version by @benfred in #213
- Assume 'merlin' is a first party package for isort by @benfred in #216
- Implemented support for TF mixed precision (mixed_float16) by @gabrielspmoreira in #217
- Add shim to adapt the implicit library by @benfred in #218
- Add a helper function to get the movielens dataset by @benfred in #207
- Add shim to adapt the lightfm library by @benfred in #219
New Contributors
- @mikemckiernan made their first contribution in #174
Full Changelog: v0.1.0...v0.2.0