Skip to content

ccmehmet/Llama2-Finetuned-for-Translation

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 

Repository files navigation

Fine-Tuned Llama-2 For Machine Translation

In this repository, I store the code for Fine Tuning Meta's Llama-2 Model for Neural Machine Translation From Bengali to English Language

Task

Chosen Task:
Neural Machine Translation (NMT) from Bengali to English Language

Why I choose it?
I have been working with Neural Machine Translation for a while. For my research purpose, I am exploring different Machine Translation model. I know that we can fine tune any LLM for doing a specific task. As, I am working with Machine Translation, I want to see the performance of LLM for Machine Translation, I also have a Good Dataset. So, I think it will be a good choice for me to work on this task.

Dataset

Base Dataset: BUET-BanglaNMT Dataset(2.5 Million)
Preprocessed Dataset: Preprocessed Dataset(2.1 Million)
< Small Dataset: Small Dataset(200k)

Why I choose this dataset?
This is one of the largest Bengali to English parallel corpus available. I have format the dataset for my task, according to model. I have started working with large dataset. But for low resource and time, I have also created a small dataset for my task and fine tune the model with that dataset.

I have used the BUET-BanglaNMT Dataset from HuggingFace. It contains around 2.5 million pairs of Bengali and English sentences.

Model

I have used the Meta's Llama-2 Model from Meta. This is my fine-tuned adapter: Fine-Tuned Llama-2

About

Fine-Tuned Llama-2 For Machine Translation

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 100.0%