cheRAGh · چراغ · Embedding Benchmark

Persian RAG
Embedding Evaluation

cheRAGh — Benchmarking Suite for Persian RAG Systems

چراغِ جست‌وجو افروختیم در شامِ پرسش‌ها

که پیدا گردد از نورش، رهِ پاسخ زِ چالش‌ها

We lit the lamp of search in the night of questions,
so its light may reveal the path of answers from challenges.

cheRAGh (چراغ) is a unified benchmarking suite for Persian Retrieval-Augmented Generation (RAG) systems, covering embedding models, rerankers, retrieval quality, tool calling, and large language model performance across diverse Persian-language datasets from General, Scientific, Education, Legal, and Religious domains.

This report presents the evaluation results of embedding models in the cheRAGh benchmark. Retrieval performance is measured using Recall@5 and MRR.

Models

Domains

0.650

Avg MRR

0.717

Avg Recall@5

Benchmark Datasets

Five Persian-language domains — Legal, Religious, General, Scientific, and Education — each containing queries paired with a single ground-truth document. Models retrieve top-k candidates from the full corpus and are evaluated on ranking quality and retrieval coverage.

📚

Education

Official high-school textbook passages

🌐

General

Open-domain Q&A platform logs, everyday topics

⚖️

Legal

Legal statutes, court decisions & regulatory texts

🕌

Religious

Islamic texts, scholarly commentary & theological Q&A

🔬

Scientific

Peer-reviewed scientific paper abstracts

Leaderboard

Green ≥ 0.75 · Orange 0.50–0.74 · Red < 0.50 · Bold green = best in column

# ↕	Model ↕	Ctx Len ↕	Emb Dim ↕	Params ↕	Size	Architecture	Education ↕		General ↕		Legal ↕		Religious ↕		Scientific ↕		Average ↕
# ↕	Model ↕	Ctx Len ↕	Emb Dim ↕	Params ↕	Size	Architecture	MRR ↕	Recall@5 ↕	MRR ↕	Recall@5 ↕	MRR ↕	Recall@5 ↕	MRR ↕	Recall@5 ↕	MRR ↕	Recall@5 ↕	MRR ↕	Recall@5 ↕

Visualisations

MRR by Dataset

Mean Reciprocal Rank

Recall@5 by Dataset

Fraction of queries where ground-truth is in top-k

Radar — MRR

Multi-domain strength profile

Radar — Recall@5

Multi-domain coverage profile

Performance Heatmap

All models × datasets — scroll to explore · hover for value

Worst

Best

About the Metrics

MRR — Mean Reciprocal Rank

Averages 1/rank across all queries. Rank 1 every time → 1.0; rank 5 every time → 0.20. Rewards getting the right answer to the top.

Recall@5

Fraction of queries where the correct document appears in the top-k results.

About cheRAGh

cheRAGh (چراغ) is a unified benchmarking suite for Persian RAG systems, covering embeddings, rerankers, retrieval quality, tool calling, and end-to-end pipeline performance.

📊 Sorted by: Average MRR (descending) Click column headers to sort

🏆 Rank by: Average MRR

* = Local model (running on local / Sentence Transformers)

# ⇅	Model ⇅	Context Len	Emb Dim	Params	Size	Type	Education ⇅		General ⇅		Legal ⇅		Religious ⇅		Scientific ⇅		Avg ⇅
# ⇅	Model ⇅	Context Len	Emb Dim	Params	Size	Type	MRR	Recall@5	MRR	Recall@5	MRR	Recall@5	MRR	Recall@5	MRR	Recall@5	MRR	Recall@5
🥇	codefuse-ai/F2LLM-v2-4B	32768	2560	4.0B	~16 GB	Decoder-only (F2LLM-v2 embedding model)	0.8150	0.9463	0.8693	0.9420	0.9340	0.9800	0.7899	0.8586	0.9769	0.9920	0.8770	0.9438
🥈	Snowflake/snowflake-arctic-embed-l-v2.0 *	8192	1024	568M	~2.2 GB	Transformer-based	0.8430	0.9756	0.8235	0.9200	0.9715	0.9980	0.7555	0.8307	0.9867	1.0000	0.8760	0.9449
🥉	Qwen/Qwen3-Embedding-8B	32k	4096	8.0B	~32 GB	Decoder-only (Qwen3)	0.8642	0.9756	0.8028	0.8760	0.9516	0.9920	0.7876	0.8586	0.9721	0.9861	0.8756	0.9376
4	Octen/Octen-Embedding-8B	32768	4096	7.6B	~30 GB	Decoder-only (Qwen3-based)	0.8545	0.9756	0.8066	0.8920	0.9298	0.9720	0.7786	0.8486	0.9875	0.9980	0.8714	0.9372
5	BAAI/bge-m3	8192	1024	568M	~2.3 GB	BERT-based	0.8056	0.9220	0.8027	0.8740	0.9766	0.9940	0.7584	0.8287	0.9813	0.9960	0.8649	0.9229
6	Qwen/Qwen3-Embedding-4B	32k	2560	4.0B	~16 GB	Decoder-only (Qwen3)	0.8404	0.9659	0.7854	0.8560	0.9640	0.9860	0.7260	0.8088	0.9725	0.9841	0.8577	0.9201
7	Octen/Octen-Embedding-4B	32768	2560	4.0B	~16 GB	Decoder-only (Qwen3-based)	0.8510	0.9610	0.7917	0.8840	0.9375	0.9820	0.7230	0.8068	0.9838	0.9900	0.8574	0.9248
8	bflhc/MoD-Embedding	32768	2560	4.0B	~16 GB	Decoder-only (Qwen3-Embedding-4B + LoRA)	0.8365	0.9707	0.7947	0.8800	0.9405	0.9800	0.7249	0.8187	0.9767	0.9900	0.8547	0.9279
9	nvidia/llama-nemotron-embed-1b-v2 *	8192	2048	1.2B	~4 GB	Decoder-only (Llama-based)	0.8693	0.9610	0.7814	0.8720	0.9745	0.9960	0.6602	0.7510	0.9849	0.9980	0.8540	0.9156
10	jinaai/jina-embeddings-v3 *	8192	1024	570M	~2.0–2.4 GB	Transformer (BERT-style)	0.7321	0.9024	0.8044	0.8860	0.9351	0.9900	0.7309	0.8167	0.9648	0.9861	0.8334	0.9162
11	intfloat/multilingual-e5-large-instruct *	512	1024	560M	~2.2 GB	BERT-based (with instruction tuning)	0.7644	0.9122	0.7687	0.8560	0.9498	0.9840	0.6598	0.7390	0.9787	0.9940	0.8243	0.8971
12	jinaai/jina-embeddings-v5-text-small *	32768	1024	677M	~2.7 GB	Decoder-only (Qwen3-based)	0.8128	0.9317	0.7461	0.8400	0.9137	0.9800	0.6333	0.7112	0.9722	0.9900	0.8156	0.8906
13	microsoft/harrier-oss-v1-0.6b *	32768	1024	0.6B	~2.4 GB	Decoder-only (Harrier-based)	0.8210	0.9463	0.7300	0.8240	0.9291	0.9700	0.6187	0.6992	0.9699	0.9920	0.8138	0.8863
14	ibm-granite/granite-embedding-311m-multilingual-r2 *	512	768	311M	~1.2 GB	BERT-based	0.7504	0.9024	0.7799	0.8700	0.9195	0.9720	0.6289	0.7092	0.9673	0.9920	0.8092	0.8891
15	jinaai/jina-embeddings-v5-text-nano *	8192	768	239M	~0.9 GB	BERT-based (EuroBERT)	0.7638	0.9122	0.7447	0.8380	0.9208	0.9720	0.5895	0.6912	0.9716	0.9880	0.7981	0.8803
16	PartAI/Tooka-SBERT-V2-Large *	512	1024	353M	~1.4 GB	BERT-based (Sentence Transformer)	0.7774	0.9073	0.7411	0.8440	0.8718	0.9480	0.6435	0.7410	0.9514	0.9861	0.7971	0.8853
17	jinaai/jina-embeddings-v4-vllm-retrieval *	32768	2048	3.8B	~15 GB	Multimodal (Qwen2.5-VL-3B + retrieval LoRA)	0.7518	0.8780	0.7193	0.8200	0.8918	0.9520	0.6185	0.7012	0.9455	0.9681	0.7854	0.8639
18	Octen/Octen-Embedding-0.6B	32768	1024	0.6B	~2.4 GB	Decoder-only (Qwen3-based)	0.7725	0.9073	0.7071	0.8040	0.8719	0.9520	0.5512	0.6434	0.9546	0.9821	0.7715	0.8578
19	PartAI/Tooka-SBERT-V2-Small *	512	768	123M	~0.5 GB	BERT-based (Sentence Transformer)	0.7714	0.8732	0.6868	0.8000	0.8719	0.9380	0.5409	0.6275	0.9573	0.9861	0.7657	0.8449
20	Qwen/Qwen3-Embedding-0.6B	32k	1024	0.6B	~2.4 GB	Decoder-only (Qwen3)	0.7470	0.8780	0.6877	0.7900	0.8870	0.9560	0.5518	0.6414	0.9499	0.9801	0.7647	0.8491
21	ibm-granite/granite-embedding-97m-multilingual-r2 *	512	768	97M	~0.4 GB	BERT-based	0.6654	0.8390	0.7277	0.8400	0.8716	0.9460	0.5093	0.5837	0.9319	0.9681	0.7412	0.8354
22	PartAI/Tooka-SBERT *	512	1024	353M	~1.4 GB	BERT-based (Sentence Transformer)	0.6900	0.8341	0.6819	0.7820	0.7805	0.8900	0.5302	0.6355	0.9495	0.9761	0.7264	0.8235
23	geevec-ai/geevec-embeddings-1.0-lite *	32768	4096	349M	~1.4 GB	Decoder-only (Qwen3-style PseudoMoE)	0.6665	0.7854	0.6117	0.7060	0.8640	0.9400	0.4233	0.4940	0.9586	0.9801	0.7048	0.7811
24	sentence-transformers/LaBSE *	256	768	471M	~1.8 GB	BERT-based (Sentence Transformer)	0.4937	0.6488	0.5074	0.6160	0.5753	0.6660	0.3158	0.3984	0.7549	0.8725	0.5294	0.6403
25	google/embeddinggemma-300m *	8192	768	300M	~1.2 GB	Decoder-only (Gemma-based)	0.3440	0.4585	0.3714	0.4660	0.4248	0.5340	0.1662	0.2072	0.4668	0.5837	0.3546	0.4499
26	nomic-ai/nomic-embed-text-v1.5 *	8192	768	0.1B	~0.4 GB	BERT-based	0.1759	0.2439	0.1758	0.2100	0.1364	0.1720	0.0476	0.0598	0.0928	0.1175	0.1257	0.1606
27	nomic-ai/nomic-embed-text-v1 *	8192	768	0.1B	~0.4 GB	BERT-based	0.1733	0.2341	0.1155	0.1540	0.1475	0.1840	0.0323	0.0319	0.1346	0.1733	0.1207	0.1555
28	intfloat/e5-large-v2 *	512	1024	335M	~1.3 GB	BERT-based	0.1235	0.1659	0.2033	0.2440	0.0679	0.0920	0.0516	0.0578	0.0775	0.1056	0.1048	0.1330
29	mixedbread-ai/mxbai-embed-large-v1 *	512	1024	335M	~1.34 GB	BERT-based (Mixedbread AI)	0.1406	0.1902	0.1032	0.1200	0.1036	0.1380	0.0399	0.0518	0.1055	0.1315	0.0985	0.1263
30	sentence-transformers/all-MiniLM-L12-v2 *	256	384	33.4M	~120 MB	BERT-based	0.0673	0.0878	0.0789	0.1120	0.0535	0.0620	0.0449	0.0498	0.0208	0.0259	0.0531	0.0675
31	sentence-transformers/all-MiniLM-L6-v2 *	256	384	22.7M	~90 MB	BERT-based	0.0095	0.0000	0.0641	0.0940	0.0183	0.0260	0.0294	0.0339	0.0065	0.0100	0.0256	0.0328

Persian RAGEmbedding Evaluation

بنچمارک RAG فارسیارزیابی مدل‌های Embedding

Persian RAG
Embedding Evaluation

بنچمارک RAG فارسی
ارزیابی مدل‌های Embedding