evaluation-metrics

Star

Here are 404 public repositories matching this topic...

confident-ai / deepeval

Star

The LLM Evaluation Framework

evaluation-metrics evaluation-framework llm-evaluation llm-evaluation-framework llm-evaluation-metrics

Updated Jul 16, 2024
Python

huggingface / lighteval

Star

LightEval is a lightweight LLM evaluation suite that Hugging Face has been using internally with the recently released LLM data processing library datatrove and LLM training library nanotron.

evaluation evaluation-metrics evaluation-framework huggingface

Updated Jul 16, 2024
Python

LAIT-CVLab / TopPR

Star

NeurIPS 2023 - TopP&R: Robust Support Estimation Approach for Evaluating Fidelity and Diversity in Generative Models Official Code

generative-model evaluation-metrics topological-data-analysis

Updated Jul 16, 2024
Python

AgentOps-AI / agentops

Star

Python SDK for agent monitoring, LLM cost tracking, benchmarking, and more. Integrates with most LLMs and agent frameworks like CrewAI, Langchain, and Autogen

agent ai openai evaluation-metrics mistral cost-estimation autogen groq agentops llm langchain anthropic evals ollama crewai

Updated Jul 16, 2024
Python

shi-ang / SurvivalEVAL

Star

The most comprehensive Python package for evaluating survival analysis models.

survival-analysis evaluation-metrics survival-curves

Updated Jul 16, 2024
Python

up42 / image-similarity-measures

Star

📈 Implementation of eight evaluation metrics to access the similarity between two images. The eight metrics are as follows: RMSE, PSNR, SSIM, ISSM, FSIM, SRE, SAM, and UIQ.

processing machine-learning image metrics evaluation-metrics p1

Updated Jul 15, 2024
Python

kolenaIO / kolena

Star

Python client for Kolena's machine learning testing platform

testing machine-learning evaluation evaluation-metrics evaluation-framework mlops evaluate-models llmops

Updated Jul 16, 2024
Python

encord-team / text-to-image-eval

Star

Evaluate custom and HuggingFace text-to-image/zero-shot-image-classification models like CLIP, SigLIP, DFN5B, and EVA-CLIP. Metrics include Zero-shot accuracy, Linear Probe, Image retrieval, and KNN accuracy.

knn-search evaluation-metrics evaluation-framework linear-probing embedding-evaluation zero-shot-retrieval zero-shot-classification model-evaluation-metrics embeddings-extraction zero-shot-image-classification text-to-image-evaluation

Updated Jul 15, 2024
Jupyter Notebook

relari-ai / continuous-eval

Star

Production-Grade Evaluation for LLM-Powered Applications

information-retrieval evaluation-metrics evaluation-framework rag llmops retrieval-augmented-generation llm-evaluation

Updated Jul 12, 2024
Python

athina-ai / athina-evals

Star

Python SDK for running evaluations on LLM generated responses

evaluation evaluation-metrics evaluation-framework llmops llm-eval llm-ops llm-evaluation llm-evaluation-toolkit

Updated Jul 12, 2024
Python

v-iashin / SpecVQGAN

Star

Source code for "Taming Visually Guided Sound Generation" (Oral at the BMVC 2021)

audio video pytorch transformer gan multi-modal evaluation-metrics video-understanding vas video-features vqvae bmvc melgan audio-generation vggsound

Updated Jul 12, 2024
Jupyter Notebook

TonicAI / tonic_validate

Star

Metrics to evaluate the quality of responses of your Retrieval Augmented Generation (RAG) applications.

evaluation-metrics evaluation-framework rag large-language-models llm llms llmops retrieval-augmented-generation

Updated Jul 15, 2024
Python

Striveworks / valor

Star

Valor is a centralized evaluation store which makes it easy to measure, explore, and rank model performance.

computer-vision evaluation classification object-detection image-segmentation evaluation-metrics model-evaluation mlops

Updated Jul 16, 2024
Python

AllenNeuralDynamics / segmentation-skeleton-metrics

Star

Evaluates neuron segmentations in terms of statistics related to the number of splits and merges

python evaluation-metrics neuron-segmentation

Updated Jul 11, 2024
Python

wenhao728 / awesome-diffusion-v2v

Star

Awesome diffusion Video-to-Video (V2V). A collection of paper on diffusion model-based video editing, aka. video-to-video (V2V) translation. And a video editing benchmark code.

benchmark survey video-editing evaluation-metrics diffusion-models video-to-video

Updated Jul 11, 2024
Python

ziqihuangg / Awesome-Evaluation-of-Visual-Generation

Star

A list of works on evaluation of visual generation models, including evaluation metrics, models, and systems

benchmark awesome evaluation image-generation evaluation-metrics generative-models video-generation evaluation-system

Updated Jul 10, 2024

katha-ai / VELOCITI

Star

VELOCITI Benchmark Evaluation and Visualisation Code

benchmarking benchmark video artificial-intelligence dataset awesome-list clip evaluation-metrics video-understanding vlm semantic-role-labeling llm chain-of-thought vision-language-model llm-inference llama3

Updated Jul 9, 2024
Python

k4black / codebleu

Star

Pip compatible CodeBLEU metric implementation available for linux/macos/win

code evaluation code-generation code-evaluation evaluation-metrics codebleu

Updated Jul 9, 2024
Python

clentafrica / monitoring-and-evaluation

Star

Design and implement monitoring and evaluation frameworks. Measure and report on the impact of programs.

data reporting evaluation climate-data measurement monitoring-and-evaluation hacktoberfest climate-change community-driven evaluation-metrics community-project monitoring-tool evaluation-framework good-first-issue capacity-building climate-reporter capacity-management hacktoberfest-accepted

Updated Jul 8, 2024

Ahmad-Ali-Rafique / Electricity-Consumption-Analysis-Household-Dataset

Star

This repository contains analysis and predictive modeling of household electricity consumption using Python. It includes data cleaning, exploratory data analysis (EDA), time series forecasting (ARIMA, SARIMA, LSTM), and model evaluation to optimize energy usage.

data-science data time-series modeling exploratory-data-analysis artificial-intelligence artificial-neural-networks evaluation-metrics long-short-term-memory datacleaning arima-forecasting dataanalytics timeseries-forecasting lstmmodel

Updated Jul 6, 2024
Jupyter Notebook

Improve this page

Add a description, image, and links to the evaluation-metrics topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the evaluation-metrics topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

evaluation-metrics

Here are 404 public repositories matching this topic...

confident-ai / deepeval

huggingface / lighteval

LAIT-CVLab / TopPR

AgentOps-AI / agentops

shi-ang / SurvivalEVAL

up42 / image-similarity-measures

kolenaIO / kolena

encord-team / text-to-image-eval

relari-ai / continuous-eval

athina-ai / athina-evals

v-iashin / SpecVQGAN

TonicAI / tonic_validate

Striveworks / valor

AllenNeuralDynamics / segmentation-skeleton-metrics

wenhao728 / awesome-diffusion-v2v

ziqihuangg / Awesome-Evaluation-of-Visual-Generation

katha-ai / VELOCITI

k4black / codebleu

clentafrica / monitoring-and-evaluation

Ahmad-Ali-Rafique / Electricity-Consumption-Analysis-Household-Dataset

Improve this page

Add this topic to your repo