v0.2.0
jackalcooper
released this
09 Oct 06:57
·
3684 commits
to 64c20462f245b5cbef4230a62fa06edff85411b3
since this release
Changelog
v0.2.0 (09/10/2020)
Op 修复、性能优化
支持二元 add op 与前驱节点融合
kernel 性能优化
- Fused BatchNormAddRelu #3519
- bn_add_relu use bit mask #3645
- layer_norm param grad #3604
- Fused layer norm #3591
- BiasAdd Row Col Half2 #3636
- MaskAndScaleHalf2 #3643
- Optimize CudaAsyncMemoryCopier #3543
- Avoid using local memory in CropMirrorNormalizeGpuKernel #3539
- LayerNormGpuKernel use fused InstanceScaleCenter #3573
使用 user op 实现 model update ops,以及 model update ops 支持 fusion
- Add model update user ops #3546
- Migrate L1L2RegularizeGradientOp to UserOp Framework #3527
- model update fuse scalar_mul_by_tensor #3635
- Dev indexed slices model update user ops #3561
- Dev adam xla and rm sys op #3584
NCCL 支持设置最大融合 op 数量
- Add nccl_fusion_max_ops #3567
新 op
- [feature] Fused ImageDecoderRandomCropResize #3644
- Add AmpWhiteIdentityOp #3658
- Add ImageDecoderRandomCropResizeOp::InferParallelSignature #3646
- Dev add op tril #3511
- add masked fill op #3515
cuDNN 算法推导支持全局缓存
- Add CudnnConvAlgoCache #3649
Bugfix 与 其他
- fix broadcast div grad #3525
- fix optimizer copy-paste bug #3508
- fix bug about pad value #3640
- Optimize some default values #3648
- Fix cuda runtime #3621
- Fix reshape inplace #3545
- Refactor rmsprop
mean_square
and add unit tests for optimizers #3523 - Remove cuDNN fields from OperatorConf #3536
- Add UserOpConfWrapperBuilder::ScopeSymbolId #3528
- Fix NcclCollectiveBoxing builder_name #3563
- rm conv2d cpu testcase #3574
- fix broadcast_to_compatible_with grad bug #3609
- Add inline for half #3600
- Fix converter half #3599
- Fix gpu_atomic_max double overload use fmaxf #3578
- fix upsample #3579
Eager Execution
给eager相关的代码加上更多注释;微调stateless_call指令,区分mutable_input和output两类不同的参数;实现broadcast指令;
- fix fmt cuda_copy_d2h_stream_type #3606
- add comments for cuda_copy_d2h_stream_type.cpp #3603
- Fix TopoForEachNode in GenCollectiveBoxingPlan #3566
- Split call_op_kernel instruction args into const_input/mutable_input/output #3562
- split BlobObject and EagerBlobObject #3485
- remove unused code under vm/ #3585
- Dev broadcast instruction #3555
- Broadcast instruction #3552
pybind11 集成
现在 OneFlow 内 SWIG 和 pybind11 共存,之后会逐步切换到 pybind11
- pybind11 integration #3517
- upgrad to pybind11 master and pass exe path #3522
- Update rel script for pybind11 #3526
- Dev oneflow pybind api #3625
优化、修复编译工具
修复了一些导致编译失败缓慢的不合理配置、加速了依赖下载、 修复了 ubuntu dockerfile
- [bug] fix ubuntu docker build #3504
- change link order to fix the cpu+openblas build #3634
- [bug] fix bug: oneflow cpu-only lib flags #3615
- add convert_url_to_oss_https_url and DCN flag #3595
- Add cn url in readme #3583
- make absl use tar not git #3570
- Optimize nvcc gencode flag #3577
Transport 网络传输子系统
支持 P2P 动态网络传输
- [feature] Transport #3549
集成 CFG 工具
CFG 是基于 proto 语法的、生成跨 python、C++ 数据交互代码的工具
XLA 支持优化
升级到了 TF 最新版本
GRPC 升级
升级到了 GRPC 最新版本
CI、测试优化
将 XLA 也加入 CI,优化了 op 的测试用例,自动上传 master 最新 commit
- Parallel unit tests (Step 1, refactor existing unit tests) #3632
- Add build type for pr oss upload #3627
- XLA ci support #3564
- Auto upload tar to aliyun oss #3592
- Don't pack source code if it is not master #3593
- move fmt to github hosted #3559
- refactor ci #3557
- CtrlTest find available port for ctrl port instead of handwriting #3610
ONNX 支持
优化 IR,更新测试脚本
- onnx update #3495
增加、修订文档
Python 前端修复
- Fix the bug of using op_module_builder in namespace scope #3513
- Comment release global for now to avoid random crash in python #3629
- update lib name in link flags #3623
- rm spaces in rm_spaces optimizer.py #3619
优化、修复系统通用组件
- [enhancement] flat ErrorProto error_type #3474
- [enhancement] Added user_op_conf getter for BatchAxisContext/KernelInitContext/SbpContext #3506
- [bug] Fix UserOpConfWrapper::has_input/has_output #3507
- support reflecting cfg message #3655
- Refactor scope #3652
- Refactor placement scope #3650
- Bugfix split config proto and session job set #3637
- [Bug fix] Release global variables #3624
- Add OpRegistry::SetAreaId #3608
- Dev converter #3580
- Tensor::dptr support half #3582
- Use InferOutBlobDescsIf instead of InferBlobDescsIf in InferOpNodeLogicalBlobDesc #3535
- Add ctrl_in_op_name only when unreachable #3537