LLMops/Evaluation.md at main · andysingal/LLMops · GitHub

Self-improving evaluation in LangSmith

Custom LLM Evaluations ⚙️: Function Calling Agent

SummHay benchmark

Deepval-LlamaIndex

langsmith-evaluation-helper

LLM Hallucination Index

Running SWE-bench with LangSmith

Screenshot 2024-08-22 at 10 30 29 PM