v0.2.0

jackalcooper released this 09 Oct 06:57

· 3684 commits to 64c20462f245b5cbef4230a62fa06edff85411b3 since this release

Changelog

v0.2.0 (09/10/2020)

Op 修复、性能优化

支持二元 add op 与前驱节点融合

FuseAddToOutput #3524
Dropout support add_to_output #3569
Dev matmul add to output #3581

kernel 性能优化

Fused BatchNormAddRelu #3519
bn_add_relu use bit mask #3645
layer_norm param grad #3604
Fused layer norm #3591
BiasAdd Row Col Half2 #3636
MaskAndScaleHalf2 #3643
Optimize CudaAsyncMemoryCopier #3543
Avoid using local memory in CropMirrorNormalizeGpuKernel #3539
LayerNormGpuKernel use fused InstanceScaleCenter #3573

使用 user op 实现 model update ops，以及 model update ops 支持 fusion

Add model update user ops #3546
Migrate L1L2RegularizeGradientOp to UserOp Framework #3527
model update fuse scalar_mul_by_tensor #3635
Dev indexed slices model update user ops #3561
Dev adam xla and rm sys op #3584

NCCL 支持设置最大融合 op 数量

Add nccl_fusion_max_ops #3567

新 op

[feature] Fused ImageDecoderRandomCropResize #3644
Add AmpWhiteIdentityOp #3658
Add ImageDecoderRandomCropResizeOp::InferParallelSignature #3646
Dev add op tril #3511
add masked fill op #3515

cuDNN 算法推导支持全局缓存

Add CudnnConvAlgoCache #3649

Bugfix 与其他

fix broadcast div grad #3525
fix optimizer copy-paste bug #3508
fix bug about pad value #3640
Optimize some default values #3648
Fix cuda runtime #3621
Fix reshape inplace #3545
Refactor rmsprop mean_square and add unit tests for optimizers #3523
Remove cuDNN fields from OperatorConf #3536
Add UserOpConfWrapperBuilder::ScopeSymbolId #3528
Fix NcclCollectiveBoxing builder_name #3563
rm conv2d cpu testcase #3574
fix broadcast_to_compatible_with grad bug #3609
Add inline for half #3600
Fix converter half #3599
Fix gpu_atomic_max double overload use fmaxf #3578
fix upsample #3579

Eager Execution

给eager相关的代码加上更多注释；微调stateless_call指令，区分mutable_input和output两类不同的参数；实现broadcast指令；

fix fmt cuda_copy_d2h_stream_type #3606
add comments for cuda_copy_d2h_stream_type.cpp #3603
Fix TopoForEachNode in GenCollectiveBoxingPlan #3566
Split call_op_kernel instruction args into const_input/mutable_input/output #3562
split BlobObject and EagerBlobObject #3485
remove unused code under vm/ #3585
Dev broadcast instruction #3555
Broadcast instruction #3552

pybind11 集成

现在 OneFlow 内 SWIG 和 pybind11 共存，之后会逐步切换到 pybind11

pybind11 integration #3517
upgrad to pybind11 master and pass exe path #3522
Update rel script for pybind11 #3526
Dev oneflow pybind api #3625

优化、修复编译工具

修复了一些导致编译失败缓慢的不合理配置、加速了依赖下载、修复了 ubuntu dockerfile

[bug] fix ubuntu docker build #3504
change link order to fix the cpu+openblas build #3634
[bug] fix bug: oneflow cpu-only lib flags #3615
add convert_url_to_oss_https_url and DCN flag #3595
Add cn url in readme #3583
make absl use tar not git #3570
Optimize nvcc gencode flag #3577

Transport 网络传输子系统

支持 P2P 动态网络传输

[feature] Transport #3549

集成 CFG 工具

CFG 是基于 proto 语法的、生成跨 python、C++ 数据交互代码的工具

Dev integrate cfg #3597
Less usage of PbMessage in Operator #3651

XLA 支持优化

升级到了 TF 最新版本

upgrade XRT XLA to TF 2.3.0 #3531
Fix XLA crash #3548

GRPC 升级

升级到了 GRPC 最新版本

Upgrade grpc #3551
[bug] [bugfix] GRPC: control server CompletionQueue shutdown. #3589

CI、测试优化

将 XLA 也加入 CI，优化了 op 的测试用例，自动上传 master 最新 commit

Parallel unit tests (Step 1, refactor existing unit tests) #3632
Add build type for pr oss upload #3627
XLA ci support #3564
Auto upload tar to aliyun oss #3592
Don't pack source code if it is not master #3593
move fmt to github hosted #3559
refactor ci #3557
CtrlTest find available port for ctrl port instead of handwriting #3610

ONNX 支持

优化 IR，更新测试脚本

onnx update #3495

增加、修订文档

Add api docs zzk #3505
Add api docs zzk #3533
Add api docs zzk #3514
fix masked_fill op doc #3560

Python 前端修复

Fix the bug of using op_module_builder in namespace scope #3513
Comment release global for now to avoid random crash in python #3629
update lib name in link flags #3623
rm spaces in rm_spaces optimizer.py #3619

优化、修复系统通用组件

[enhancement] flat ErrorProto error_type #3474
[enhancement] Added user_op_conf getter for BatchAxisContext/KernelInitContext/SbpContext #3506
[bug] Fix UserOpConfWrapper::has_input/has_output #3507
support reflecting cfg message #3655
Refactor scope #3652
Refactor placement scope #3650
Bugfix split config proto and session job set #3637
[Bug fix] Release global variables #3624
Add OpRegistry::SetAreaId #3608
Dev converter #3580
Tensor::dptr support half #3582
Use InferOutBlobDescsIf instead of InferBlobDescsIf in InferOpNodeLogicalBlobDesc #3535
Add ctrl_in_op_name only when unreachable #3537

Assets 2