LLM Evaluation Tools

1

Langfuse

Langfuse serves as an advanced open-source platform designed for collaborative debugging and analysis of LLM applications. It offers essential features...

By: Langfuse (YC W23) From Germany
2

Scale Evaluation

Scale Evaluation serves as an advanced platform for the assessment of large language models, addressing critical gaps in evaluation datasets...

By: Scale From United States
3

Chatbot Arena

Chatbot Arena allows users to engage with various anonymous AI chatbots, including ChatGPT, Gemini, and Claude. Users can ask questions,...

4

Arize Phoenix

Phoenix is an open-source observability tool that empowers AI engineers and data scientists to experiment, evaluate, and troubleshoot AI and...

By: Arize AI From United States
5

Opik

Opik empowers developers to seamlessly debug, evaluate, and monitor LLM applications and workflows. By enabling trace logging and performance scoring,...

By: Comet From United States
6

promptfoo

With over 70,000 developers utilizing it, Promptfoo revolutionizes LLM testing through automated red teaming for generative AI. Its custom probes...

By: Promptfoo From United States
7

Galileo

Galileo's Evaluation Intelligence Platform empowers AI teams to effectively evaluate and monitor their generative AI applications at scale. With tools...

By: Galileo🔭 From United States
8

Ragas

Ragas is an open-source framework that empowers developers to rigorously test and evaluate Large Language Model applications. It provides automatic...

From United States
9

DeepEval

DeepEval is an open-source framework designed for evaluating large-language models (LLMs) in Python. It offers specialized unit testing akin to...

By: Confident AI From United States
10

AgentBench

AgentBench is an evaluation framework tailored for assessing the performance of autonomous AI agents. It employs a standardized set of...

From China
11

Keywords AI

An innovative platform for AI startups, Keywords AI streamlines the monitoring and debugging of LLM workflows. With a unified API...

By: Keywords AI From United States
12

ChainForge

ChainForge is an innovative open-source visual programming environment tailored for prompt engineering and evaluating large language models. It empowers users...

From United States
13

Symflower

Enhancing software development, Symflower integrates static, dynamic, and symbolic analyses with Large Language Models (LLMs) to deliver superior code quality...

By: Symflower From Austria
14

Literal AI

Literal AI serves as a dynamic platform for engineering and product teams, streamlining the development of production-grade Large Language Model...

By: Literal AI From United States
15

Traceloop

Traceloop empowers developers to monitor Large Language Models (LLMs) by providing real-time alerts for quality changes and insights into how...

By: Traceloop From Israel
16

TruLens

TruLens 1.0 is a powerful open-source Python library designed for developers to evaluate and enhance their Large Language Model (LLM)...

From United States