#

llm-datasets

Here are 12 public repositories matching this topic...

neo4j-labs / text2cypher

collection of text2cypher datasets, evaluations, and finetuning instructions

neo4j graph cypher cypher-query-language llm llms llm-training llm-datasets text2cypher

Updated Jun 13, 2024
Jupyter Notebook

dsdanielpark / open-llm-datasets

Repository for organizing datasets and papers used in Open LLM.

natural-language-processing datasets large-language-models llm llm-training llm-datasets

Updated Jul 6, 2023

discus-labs / discus

A data-centric AI package for ML/AI. Get the best high-quality data for the best results. Discord: https://discord.gg/t6ADqBKrdZ

python openai gpt synthetic-data fine-tuning synthetic-dataset-generation ner-data huggingface-transformers gpt-4 large-language-models llms llm-training llm-datasets fine-tuning-llm

Updated Nov 20, 2023
Python

asimsinan / LLM-Research

A collection of LLM related papers, thesis, tools, datasets, courses, open source models, benchmarks

arxiv-papers large-language-models llm llms llm-datasets llm-tools buyuk-dil-modelleri llm-research llm-theses llm-benchmarking llm-frameworks

Updated Oct 8, 2024
Python

altunenes / rustysozluk

Efficiently fetch and perform sentiment analysis (Turkish Only) on eksisozluk.com entries using Rust

rust scraper sentiment-analysis turkish eksisozluk rust-lang webscraping eksi-sozluk reqwest duyguanalizi rust-scraping llm-training llm-datasets

Updated Feb 8, 2024
Rust

DefinetlyNotAI / LLM_Data

A bunch of very famous repos source code's in python as pure localdocs all in this repo to train CODE AI

c data cpp cuda jupyter-notebook python3 code-examples llm llm-datasets data-dum programming-data programming-data-sets llm-code

Updated Sep 14, 2024
Python

arian-askari / SOLID

Synthetically Generating Intent-Aware Information-Seeking Dialogues! Useful for various tasks such as training/evaluating User Intent Predictors with the possibility to training/evaluating on real human dialogues. The backbone LLM of SOLID is Zephyr-7b-beta.

solid dataset-generation conversational-ai intent-classification llm-training llm-inference llm-datasets llm-dialogs llm-conversations zephyr-7b-beta intent-aware-conversation-generation solid-rl

Updated Aug 18, 2024
Python

tiddly-gittly / TiddlyWiki-LLM-dataset

WikiText syntax dataset generation pipeline and open dataset for auto UI generation in TiddlyWiki. (WIP)

dataset tiddlywiki wikitext llm llm-training llm-datasets

Updated Nov 20, 2024
TypeScript

redblock-ai / parrot-python

PARROT (Performance Assessment of Reasoning and Responses On Trivia) is a novel benchmarking framework designed to evaluate Large Language Models (LLMs) on real-world, complex, and ambiguous QA tasks.

benchmarking-framework llm-inference llm-datasets llm-qa-document llm-benchmarking

Updated Oct 14, 2024
Python

aloobun / ccpem-modified

A modified dataset consisting of English dialogs between a user and an assistant discussing movie preferences in natural language.

dataset llm-datasets

Updated Sep 29, 2023

jsurrea / LLM-Latino

Collection of ETL scripts used to create a dataset of text in Spanish to train Large Language Models.

python web-scraping google-cloud-platform etl-pipeline llm-datasets

Updated Aug 5, 2024
Python

aloobun / basedUX

minimal dataset conisting og 363 Human & Assitant dialogs

dataset llm-datasets

Updated Oct 1, 2023

Improve this page

Add a description, image, and links to the llm-datasets topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the llm-datasets topic, visit your repo's landing page and select "manage topics."