rlhf

Star

Here are 138 public repositories matching this topic...

hiyouga / LLaMA-Factory

Star

A WebUI for Efficient Fine-Tuning of 100+ LLMs (ACL 2024)

Updated Jul 16, 2024
Python

argilla-io / argilla

Star

Argilla is a collaboration platform for AI engineers and domain experts that require high-quality outputs, full data ownership, and overall efficiency.

nlp machine-learning natural-language-processing ai weak-supervision developer-tools active-learning annotation-tool text-annotation weakly-supervised-learning human-in-the-loop mlops text-labeling gpt-4 llm langchain rlhf

Updated Jul 16, 2024
Python

argilla-io / distilabel

Star

⚗️ distilabel is a framework for synthetic data and AI feedback for AI engineers that require high-quality outputs, full data ownership, and overall efficiency.

python ai openai synthetic-data synthetic-dataset-generation huggingface llms rlhf rlaif

Updated Jul 16, 2024
Python

InternLM / InternLM

Star

Official release of InternLM2.5 7B base and chat models. 1M context support

chatbot chinese gpt pretrained-models llm long-context rlhf large-language-model flash-attention fine-tuning-llm

Updated Jul 16, 2024
Python

patrick-tssn / LM-Research-Hub

Star

Language Modeling Research Hub, a comprehensive compendium for enthusiasts and scholars delving into the fascinating realm of language models (LMs), with a particular focus on large language models (LLMs)

open-source api-wrapper accelerate multi-modal pretraining large-language-models llm rlhf instruction-tuning

Updated Jul 16, 2024
Python

sathishkumar67 / GPT-2-Non-Toxic-RLHF

Star

Aligning GPT2 model to generate Non-Toxic words

reinforcement-learning text-generation pytorch transformer ppo gpt2 rlhf

Updated Jul 15, 2024
Python

log10-io / log10

Star

Python client library for improving your LLM app accuracy

python debugging ai monitoring evaluations feedback logging artificial-intelligence openai agents autonomous-agents fine-tuning llms rlhf llmops anthropic

Updated Jul 16, 2024
Python

princeton-nlp / SimPO

Star

SimPO: Simple Preference Optimization with a Reference-Free Reward

alignment large-language-models rlhf preference-alignment

Updated Jul 15, 2024
Python

opening-up-chatgpt / opening-up-chatgpt.github.io

Star

Tracking instruction-tuned LLM openness. Paper: Liesenfeld, Andreas, Alianda Lopez, and Mark Dingemanse. 2023. “Opening up ChatGPT: Tracking Openness, Transparency, and Accountability in Instruction-Tuned Text Generators.” In Proceedings of the 5th International Conference on Conversational User Interfaces. doi:10.1145/3571884.3604316.

open-source transparency llm chatgpt rlhf chatgpt-free

Updated Jul 14, 2024
Python

uclaml / SPPO

Star

The official implementation of Self-Play Preference Optimization (SPPO)

deep-learning fine-tuning self-play large-language-models rlhf

Updated Jul 14, 2024
Python

tatsu-lab / alpaca_eval

Star

An automatic evaluator for instruction-following language models. Human-validated, high-quality, cheap, and fast.

nlp deep-learning leaderboard evaluation instruction-following foundation-models large-language-models rlhf

Updated Jul 13, 2024
Jupyter Notebook

mihirp1998 / VADER

Star

Video Diffusion Alignment via Reward Gradients. We improve a variety of video diffusion models such as VideoCrafter, OpenSora, ModelScope and StableVideoDiffusion by finetuning them using various reward models such as HPS, PickScore, VideoMAE, VJEPA, YOLO, Aesthetics etc.

reinforcement-learning alignment rl diffusion vader rlhf video-diffusion video-diffusion-alignment reinforcement-learning-human-feedback