cheRAGh · چراغ · Embedding Benchmark

Persian RAG
Embedding Evaluation

بنچمارک RAG فارسی
ارزیابی مدل‌های Embedding

cheRAGh — Benchmarking Suite for Persian RAG Systems
چراغِ جست‌وجو افروختیم در شامِ پرسش‌ها
که پیدا گردد از نورش، رهِ پاسخ زِ چالش‌ها
We lit the lamp of search in the night of questions,
so its light may reveal the path of answers from challenges.

cheRAGh (چراغ) is a unified benchmarking suite for Persian Retrieval-Augmented Generation (RAG) systems, covering embedding models, rerankers, retrieval quality, tool calling, and large language model performance across diverse Persian-language datasets from General, Scientific, Education, Legal, and Religious domains.

This report presents the evaluation results of embedding models in the cheRAGh benchmark. Retrieval performance is measured using Recall@5 and MRR.

31
Models
5
Domains
0.650
Avg MRR
0.717
Avg Recall@5
01
Benchmark Datasets
Five Persian-language domains — Legal, Religious, General, Scientific, and Education — each containing queries paired with a single ground-truth document. Models retrieve top-k candidates from the full corpus and are evaluated on ranking quality and retrieval coverage.
📚
Education
Official high-school textbook passages
🌐
General
Open-domain Q&A platform logs, everyday topics
⚖️
Legal
Legal statutes, court decisions & regulatory texts
🕌
Religious
Islamic texts, scholarly commentary & theological Q&A
🔬
Scientific
Peer-reviewed scientific paper abstracts
02
Leaderboard
Green ≥ 0.75 · Orange 0.50–0.74 · Red < 0.50 · Bold green = best in column
Sorted by Average MRR ↓
# Model Ctx Len Emb Dim Params Size Architecture Education General Legal Religious Scientific Average
MRR Recall@5 MRR Recall@5 MRR Recall@5 MRR Recall@5 MRR Recall@5 MRR Recall@5
03
Visualisations
MRR by Dataset
Mean Reciprocal Rank
Recall@5 by Dataset
Fraction of queries where ground-truth is in top-k
Radar — MRR
Multi-domain strength profile
Radar — Recall@5
Multi-domain coverage profile
Performance Heatmap
All models × datasets — scroll to explore · hover for value
Worst
Best
04
About the Metrics
MRR — Mean Reciprocal Rank
Averages 1/rank across all queries. Rank 1 every time → 1.0; rank 5 every time → 0.20. Rewards getting the right answer to the top.
Recall@5
Fraction of queries where the correct document appears in the top-k results.
About cheRAGh
cheRAGh (چراغ) is a unified benchmarking suite for Persian RAG systems, covering embeddings, rerankers, retrieval quality, tool calling, and end-to-end pipeline performance.