llm-evaluation

Star

Here are 84 public repositories matching this topic...

fuxiAIlab / CivAgent

Star

CivAgent is an LLM-based Human-like Agent acting as a Digital Player within the Strategy Game Unciv.

game llm-agent llm-evaluation aiagent

Updated Jul 4, 2024
Python

awesome-software / ray-summit-2023-training

Star

llm-evaluation

Updated Sep 21, 2023
Jupyter Notebook

j0st / PoliticalLLM

Star

A framework for automatically manipulating and evaluating the political ideology of LLMs with two ideology tests: Wahl-O-Mat and Political Compass Test.

german pct manifesto-project rag wahlomat political-ideology-detection llms llm-evaluation

Updated Jul 4, 2024
Python

nagababumo / Automated-Testing-for-LLMOps

Star

automation evaluation llm llmops llm-evaluation llm-automation

Updated Jun 4, 2024
Jupyter Notebook

johnsonhk88 / Web-Scraping-by-LLM-And-AI-Agent

Star

Use LLM for Web scraping (collection data)

python web-scraping gemma ai-agents rag llm vectordb llm-evaluation llama3

Updated Jul 3, 2024
Python

kwinkunks / promptly

Star

A prompt collection for testing and evaluation of LLMs.

prompts prompt-engineering chatgpt llm-evaluation

Updated Jun 5, 2024
Jupyter Notebook

prompt-foundry / java-sdk

Star

The prompt engineering, prompt management, and prompt evaluation tool for Java.

java evaluation openai prompt-engineering prompt-manager prompt-management llm-evaluation prompt-evaluation

Updated Jun 16, 2024

gretelai / navigator-helpers

Star

Navigator Helpers

ai agent-based synthetic-data llm llm-evaluation

Updated Jul 16, 2024
Python

euskoog / openai-assistants-evals

Star

Visualize LLM Evaluations for OpenAI Assistants

openai tailwindcss llms llm-evaluation openai-assistants

Updated Mar 27, 2024
TypeScript

prompt-foundry / ruby-sdk

Star

The prompt engineering, prompt management, and prompt evaluation tool for Ruby.

ruby ruby-gem openai ruby-on-rails prompt-engineering prompt-manager prompt-management llm-eval llm-evaluation prompt-evaluation

Updated Jun 16, 2024

alan-turing-institute / prompto

Star

An open source library for asynchronous querying of LLM endpoints

python nlp machine-learning natural-language-processing deep-learning transformers transformer hut23 large-language-models llms llm-eval llm-evaluation

Updated Jul 16, 2024
Python

Litmus tests HTTP requests and responses, including those from LLMs. Users define expected results for specific requests, and Litmus sends those requests, evaluates the responses (with LLMs) to the expected results, and reports any discrepancies. This ensures the accuracy and consistency of API responses, providing comprehensive end-to-end testing.

testing api devops testing-tools cicd apitesting llm llmops llm-security llm-training llm-evaluation

Updated Jul 2, 2024
Vue

pyladiesams / llm-guardrails-jul2024

Star

Dive into the world of LLM Guardrails using tools like NVIDIA’s NeMo Guardrails. Discover the mechanisms that ensure applications produce reliable, robust, safe, and ethical outputs, and understand their crucial role in LLMs

llm llms nemo-guardrails llm-evaluation

Updated Jul 16, 2024
Jupyter Notebook

prompt-foundry / typescript-sdk

Star

The prompt engineering, prompt management, and prompt evaluation tool for TypeScript, JavaScript, and NodeJS.

typescript gpt open-ai gpt-3 gpt-4 llm prompt-engineering llmops prompt-testing prompt-manager prompt-management llm-eval llm-test llm-ops llm-evaluation prompt-evaluation

Updated Jul 16, 2024
TypeScript

Yifan-Song793 / GoodBadGreedy

Star

The Good, The Bad, and The Greedy: Evaluation of LLMs Should Not Ignore Non-Determinism

large-language-models llm llm-evaluation

Updated Jul 14, 2024
Python

wittyicon29 / Custom-Evaluate-LLM

Star

Evaluate LLMs using custom functions for reasoning and RAGs and dataset using Langchain

llms langchain llm-evaluation

Updated Apr 21, 2024
Jupyter Notebook

GURPREETKAURJETHRA / LLMs-Evaluation

Star

LLMs Evaluation

large-language-models llm generative-ai llm-evaluation

Updated May 16, 2024
Jupyter Notebook

SharathHebbar / eval_llms

Star

eleutherai llm-evaluation llms-benchmarking

Updated Feb 4, 2024
Jupyter Notebook

nagababumo / Building-and-Evaluating-Advanced-RAG

Star

python rag llamaindex retrieval-augmented-generation llm-evaluation llm-evaluation-framework

Updated Jun 1, 2024
Jupyter Notebook

reuank / ThinkBench

Star

ThinkBench is an LLM benchmarking tool focused on evaluating the effectiveness of chain-of-thought (CoT) prompting for answering multiple-choice questions.

multiple-choice-question-answering large-language-models llm chain-of-thought llm-evaluation chain-of-thought-prompting

Updated Jul 1, 2024
Python

Improve this page

Add a description, image, and links to the llm-evaluation topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the llm-evaluation topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

llm-evaluation

Here are 84 public repositories matching this topic...

fuxiAIlab / CivAgent

awesome-software / ray-summit-2023-training

j0st / PoliticalLLM

nagababumo / Automated-Testing-for-LLMOps

johnsonhk88 / Web-Scraping-by-LLM-And-AI-Agent

kwinkunks / promptly

prompt-foundry / java-sdk

gretelai / navigator-helpers

euskoog / openai-assistants-evals

prompt-foundry / ruby-sdk

alan-turing-institute / prompto

google / litmus

pyladiesams / llm-guardrails-jul2024

prompt-foundry / typescript-sdk

Yifan-Song793 / GoodBadGreedy

wittyicon29 / Custom-Evaluate-LLM

GURPREETKAURJETHRA / LLMs-Evaluation

SharathHebbar / eval_llms

nagababumo / Building-and-Evaluating-Advanced-RAG

reuank / ThinkBench

Improve this page

Add this topic to your repo