#

llm-evaluation

Here are 84 public repositories matching this topic...

awesome-software / ray-summit-2023-training

Updated Sep 21, 2023
Jupyter Notebook

rungalileo / hallucination-index

Initiative to evaluate and rank the most popular LLMs across common task types based on their propensity to hallucinate.

openai rag hallucinations large-language-models llm retrieval-augmented-generation llm-evaluation

Updated Nov 15, 2023

rochitasundar / Generative-AI-with-Large-Language-Models

This repository contains the lab work for Coursera course on "Generative AI with Large Language Models".

reinforcement-learning transformer kl-divergence proximal-policy-optimization large-language-models prompt-engineering flan-t5 instruction-finetuning low-rank-adaptation reward-model parameter-efficient-fine-tuning llm-evaluation

Updated Dec 1, 2023
Jupyter Notebook

GiacomoMeloni / ExploringLLMs

Exploring the depths of LLMs 🚀

rag llm prompt-engineering generative-ai retrieval-augmented-generation llm-evaluation

Updated Dec 7, 2023
Jupyter Notebook

Agenta-AI / job_extractor_template

Template for an AI application that extracts the job information from a job description using openAI functions and langchain

template example extraction extract-information openai extract-data unstructured-text llm langchain llmops openai-function-example llm-evaluation llm-evaluation-toolkit

Updated Dec 21, 2023
Python

yandex-research / mind-your-format

Mind Your Format: Towards Consistent Evaluation of In-Context Learning Improvements

few-shot-learning in-context-learning llm-evaluation

Updated Jan 18, 2024
Jupyter Notebook

ChanLiang / CONNER

The implementation for EMNLP 2023 paper ”Beyond Factuality: A Comprehensive Evaluation of Large Language Models as Knowledge Generators“

llama factuality hallucinations large-language-models nlg-evaluation chatgpt llm-evaluation emnlp2023

Updated Jan 22, 2024
Python

Re-Align / just-eval

A simple GPT-based evaluation tool for multi-aspect, interpretable assessment of LLMs.

evaluation gpt4 llm llm-eval llm-evaluation llm-evaluation-toolkit

Updated Jan 29, 2024
Python

awesome-software / lm-evaluation-harness

A framework for few-shot evaluation of language models.

Updated Jan 31, 2024
Python

SharathHebbar / eval_llms

eleutherai llm-evaluation llms-benchmarking

Updated Feb 4, 2024
Jupyter Notebook

aknvictor / calibrationgame

Calibration game is a game to get better at identifying hallucination in LLMs.

game calibration llms llm-evaluation

Updated Feb 4, 2024
CSS

minnesotanlp / cobbler

Code and data for Koo et al's ACL 2024 paper "Benchmarking Cognitive Biases in Large Language Models as Evaluators"

nlp evaluation bias bias-detection llm llms llm-evaluation llms-benchmarking llm-as-judge llm-as-a-judge llm-as-evaluator

Updated Feb 16, 2024
Jupyter Notebook

AdamCoscia / iScore

Upload, score, and visually compare multiple LLM-graded summaries simultaneously!

transformers visual-analytics summary-evaluation learning-sciences responsible-ai ethical-ai llm-evaluation

Updated Mar 8, 2024
JavaScript

VITA-Group / llm-kick

[ICLR 2024] Jaiswal, A., Gan, Z., Du, X., Zhang, B., Wang, Z., & Yang, Y. Compressing llms: The truth is rarely pure and never simple.

llm-inference llm-evaluation llm-compression llm-pruning

Updated Mar 13, 2024
Python

allenai / CommonGen-Eval

Evaluating LLMs with CommonGen-Lite

evaluation text-generation llm chatgpt gpt-evaluation llama2 llm-evaluation

Updated Mar 21, 2024
Python

euskoog / openai-assistants-evals

Visualize LLM Evaluations for OpenAI Assistants

openai tailwindcss llms llm-evaluation openai-assistants

Updated Mar 27, 2024
TypeScript

euskoog / openai-assistants-link

Link your OpenAI Assistants to a custom store + Evaluate Assistant responses

python openai fastapi llms llm-evaluation openai-assistants openai-assistant-api

Updated May 24, 2024
Python

DavidGir / LangChain-Familiarization

For the purposes of familiarization and learning. Consists of utilizing LangChain framework, LangSmith for tracing, OpenAI LLM models, Pinecone serverless vectorDB using Jupyter Notebook and Python.

models prompt parsers pinecone rag llm langchain-python langchain-chains langchain-agent llm-evaluation llmchain

Updated Mar 29, 2024
Jupyter Notebook

Awesome-LLMs-ICLR-24

azminewasi / Awesome-LLMs-ICLR-24

It is a comprehensive resource hub compiling all LLM papers accepted at the International Conference on Learning Representations (ICLR) in 2024.

pretrained-models pretrained-weights pretrained-language-model large-language-models llm llms llmops large-language-model llm-serving llm-prompting llm-agent llm-security llm-training llm-inference llm-framework llm-privacy llm-evaluation large-language-models-for-graph-learning large-language-models-and-translation-systems

Updated Apr 4, 2024

ivarfresh / Interaction_LLMs

[Personalize@EACL 2024] LLM Agents in Interaction: Measuring Personality Consistency and Linguistic Alignment in Interacting Populations of Large Language Models.

personality-traits bfi linguistic-alignment llms generative-agents llm-evaluation

Updated Apr 8, 2024
Python

Improve this page

Add a description, image, and links to the llm-evaluation topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the llm-evaluation topic, visit your repo's landing page and select "manage topics."