Use pre-trained BERT weights with TorchSharp-defined BERT for NLP tasks #457
Replies: 9 comments 7 replies
-
@fwaris please look at these transformer diagrams created from GPT-2.ONNX using a .NET Winform application What you have done is exciting. The community needs a way to reverse engineer the HuggingFace models ( tensorflow or ONNX) into TorchSharp models. We need your feedback on the following questions:
This is a quick write up. Essentially we need a UI assisted reverse engineering of Hugging Face models to TorchSharp ( Model and weight transfer), to assist/empower less experienced users to contribute to the building up the .NET Hugging Face models. If possible, we learn the best practice from Hugging Face. The .NET tokenizers can be provided by BlingFire. We will have a repository of various Tokenizers corresponding to what are available from Hugging Face. The e.g. WinForm application will download the right tokenizers and provide the code generation for the encoding and decoding. Obviously, it does not have to be Winform, it could also be a .NET library that can generate the needed diagrams for .NET interactive or other UI interface ( Web: Blazor or WPF/UWP). What you have done is to bring us closer to the vision. It is about the time that we pull the community effort in having a .NET (community driven) hugging face solutions. |
Beta Was this translation helpful? Give feedback.
-
@mfagerlund I know your hands are full NOW. However, if you have time to spare end of the year or beginning of next year, do put a bit of your brain power into this (VERY COMPLEX, but requires genius design considerations) overdue .NET Deep NLP vision :-) |
Beta Was this translation helpful? Give feedback.
-
I was assigned a new project today, the new project is about the research and application of NLP😥 |
Beta Was this translation helpful? Give feedback.
-
The basic transformer module exists in TorchSharp today, however NLP with transformers is a large subject area these days - what with ChatGPT, GPT-3, GPT-4 etc. Its hard for individuals (or even small companies) to train models like ChatGPT due to the infrastructure required. ChatGPT further uses reinforcement learning with human supervision and, as far as I know, that data is not public. If you want state-of-the-art results, I suggest that you use the APIs provided by OpenAI or Microsoft. If you want to train a generative language model from scratch in dotnet, see this code repo. It has the code for defining a GPT-3 style model and training it with a small corpus. |
Beta Was this translation helpful? Give feedback.
-
@fwaris has done much for GPT, U need to bring that from f# to c# |
Beta Was this translation helpful? Give feedback.
-
You already know Deep Reinforcement learning, try understand https://github.com/fwaris/LangModel and make it available from F# to C# if you have time |
Beta Was this translation helpful? Give feedback.
-
@ChengYen-Tang |
Beta Was this translation helpful? Give feedback.
-
I do not know enough time to examine how to port F# https://github.com/fwaris/LangModel to c#. Since we do not YET have TorchText for .NET , I wonder how to re-purpose many projects you have written for NLP, NLG in f# to c#, with specific interest to initiate a TorchText for .NET Best approach for designing F# libraries for use from both F# and C#F# - C# Interop |
Beta Was this translation helpful? Give feedback.
-
The core models are written in TorchSharp.Fun - a thin functional wrapper over TorchSharp.Net. TorchSharp.Fun models are easily convertible to C#. There are two basic cases: F# Case 1:Linear(100,100) ->> Dropout(0.1) ->> ReLU() Equivalent C# Case 1:new Sequential( ("lin1",Linear(100,100)), ("drp1",Dropuout(0.1)),("rel1",ReLU())); F# Case 2:let lin1 = Linear(100,100)
let drp1 = Dropout(0.1)
let rel1 = ReLU()
let myModel = F [] [lin1; drp1; rel1] (fun t -> t --> lin1 --> drp1 --> rel1 ) Equivalent C# Case 2:public class MyModel : Module<Tensor, Tensor> {
private Module<Tensor,Tensor> lin1 = Linear(100,100);
private Module<Tensor,Tensor> drp1 = Dropout(0.1);
private Module<Tensor,Tensor> rel1 = ReLU();
public MyModel(.string name) : base(name) {
RegisterComponents();
}
public override Tensor forward(Tensor input) {
var l1 = this.lin1.forward(input);
var d1 = this.drp1.forward(l1);
return this.rel1.forward(d1);
}
} OpinionData science work requires formulating scores of small hypotheses (re model/parameters/features) and associated experiments to prove/disprove each. This is iterative work that is best done interactively. I use F# interactive (REPL) for model development. TorchSharp.Fun was created so as to quickly iterate on model structure in F# script code - it reduces boilerplate. Also, today, data scientists should be polyglots. I routinely use Python and Scala (Spark) at work - in addition to F#. If you are doing data science in .Net then its worthwhile learning F#. |
Beta Was this translation helpful? Give feedback.
-
The goal was to:
It was accomplished. The notebook is available in this repo. Output from a sample run is here: https://github.com/fwaris/BertTorchSharp/blob/master/saved_output.ipynb.
There were several hurdles to overcome:
To my surprise the easiest to overcome was defining the BERT model. Given that PyTorch/TorchSharp support 'transformers' directly, I was able to construct BERT easily - the code is less than 60 lines and quite easily read.
The hardest (for me) was reading the checkpoints which requires the understanding of levedb file format. This functionality is now available as a nuget package TfCheckpoint.
The mapping of weights from checkpoint to TorchSharp was tricky but not difficult. There are few googlies to watch out for.
It's expensive and time-consuming to train BERT (or other language models). The ability to use pre-trained weights makes NLP tasks much easier to perform in TorchSharp.
Beta Was this translation helpful? Give feedback.
All reactions