CivAgent is an LLM-based Human-like Agent acting as a Digital Player within the Strategy Game Unciv.
-
Updated
Jul 4, 2024 - Python
CivAgent is an LLM-based Human-like Agent acting as a Digital Player within the Strategy Game Unciv.
A framework for automatically manipulating and evaluating the political ideology of LLMs with two ideology tests: Wahl-O-Mat and Political Compass Test.
Use LLM for Web scraping (collection data)
A prompt collection for testing and evaluation of LLMs.
The prompt engineering, prompt management, and prompt evaluation tool for Java.
Visualize LLM Evaluations for OpenAI Assistants
The prompt engineering, prompt management, and prompt evaluation tool for Ruby.
An open source library for asynchronous querying of LLM endpoints
Litmus tests HTTP requests and responses, including those from LLMs. Users define expected results for specific requests, and Litmus sends those requests, evaluates the responses (with LLMs) to the expected results, and reports any discrepancies. This ensures the accuracy and consistency of API responses, providing comprehensive end-to-end testing.
Dive into the world of LLM Guardrails using tools like NVIDIA’s NeMo Guardrails. Discover the mechanisms that ensure applications produce reliable, robust, safe, and ethical outputs, and understand their crucial role in LLMs
The prompt engineering, prompt management, and prompt evaluation tool for TypeScript, JavaScript, and NodeJS.
The Good, The Bad, and The Greedy: Evaluation of LLMs Should Not Ignore Non-Determinism
Evaluate LLMs using custom functions for reasoning and RAGs and dataset using Langchain
LLMs Evaluation
ThinkBench is an LLM benchmarking tool focused on evaluating the effectiveness of chain-of-thought (CoT) prompting for answering multiple-choice questions.
Add a description, image, and links to the llm-evaluation topic page so that developers can more easily learn about it.
To associate your repository with the llm-evaluation topic, visit your repo's landing page and select "manage topics."