-
Updated
Sep 21, 2023 - Jupyter Notebook
llm-evaluation
Here are 84 public repositories matching this topic...
Initiative to evaluate and rank the most popular LLMs across common task types based on their propensity to hallucinate.
-
Updated
Nov 15, 2023
This repository contains the lab work for Coursera course on "Generative AI with Large Language Models".
-
Updated
Dec 1, 2023 - Jupyter Notebook
Exploring the depths of LLMs 🚀
-
Updated
Dec 7, 2023 - Jupyter Notebook
Template for an AI application that extracts the job information from a job description using openAI functions and langchain
-
Updated
Dec 21, 2023 - Python
Mind Your Format: Towards Consistent Evaluation of In-Context Learning Improvements
-
Updated
Jan 18, 2024 - Jupyter Notebook
The implementation for EMNLP 2023 paper ”Beyond Factuality: A Comprehensive Evaluation of Large Language Models as Knowledge Generators“
-
Updated
Jan 22, 2024 - Python
A simple GPT-based evaluation tool for multi-aspect, interpretable assessment of LLMs.
-
Updated
Jan 29, 2024 - Python
-
Updated
Feb 4, 2024 - Jupyter Notebook
Calibration game is a game to get better at identifying hallucination in LLMs.
-
Updated
Feb 4, 2024 - CSS
Code and data for Koo et al's ACL 2024 paper "Benchmarking Cognitive Biases in Large Language Models as Evaluators"
-
Updated
Feb 16, 2024 - Jupyter Notebook
Upload, score, and visually compare multiple LLM-graded summaries simultaneously!
-
Updated
Mar 8, 2024 - JavaScript
[ICLR 2024] Jaiswal, A., Gan, Z., Du, X., Zhang, B., Wang, Z., & Yang, Y. Compressing llms: The truth is rarely pure and never simple.
-
Updated
Mar 13, 2024 - Python
Evaluating LLMs with CommonGen-Lite
-
Updated
Mar 21, 2024 - Python
Visualize LLM Evaluations for OpenAI Assistants
-
Updated
Mar 27, 2024 - TypeScript
Link your OpenAI Assistants to a custom store + Evaluate Assistant responses
-
Updated
May 24, 2024 - Python
For the purposes of familiarization and learning. Consists of utilizing LangChain framework, LangSmith for tracing, OpenAI LLM models, Pinecone serverless vectorDB using Jupyter Notebook and Python.
-
Updated
Mar 29, 2024 - Jupyter Notebook
It is a comprehensive resource hub compiling all LLM papers accepted at the International Conference on Learning Representations (ICLR) in 2024.
-
Updated
Apr 4, 2024
[Personalize@EACL 2024] LLM Agents in Interaction: Measuring Personality Consistency and Linguistic Alignment in Interacting Populations of Large Language Models.
-
Updated
Apr 8, 2024 - Python
Improve this page
Add a description, image, and links to the llm-evaluation topic page so that developers can more easily learn about it.
Add this topic to your repo
To associate your repository with the llm-evaluation topic, visit your repo's landing page and select "manage topics."