Large Language Models excel at ranking tasks, yet we uncover a critical vulnerability—the Ranking Blind Spot—where LLMs' instruction-following capabilities become exploitable during multi-document comparisons. Through Decision Objective Hijacking and Decision Criteria Hijacking, we demonstrate how content providers can manipulate rankings with success rates exceeding 99% on advanced models like GPT-4.1-mini and Llama-3.3-70B, revealing that stronger models are paradoxically more vulnerable.
The Ranking Blind Spot: Critical vulnerability in LLM evaluation systems during multi-document comparison tasks. Exploits Boundary Confusion (inability to separate document content from injected instructions) and Priority Confusion (failure to maintain original evaluation criteria when conflicting instructions are present).
Manipulates what the model does - completely redirects the ranking task
Manipulates how the model judges - redefines relevance standards
Standard LLM ranking pipeline processes documents through 5 sequential stages to determine relevance-based ordering.
Malicious injection at document level compromises the ranking process, enabling attackers to manipulate final output.
Large Language Models (LLMs) have demonstrated strong performance in information retrieval tasks like passage ranking. Our research examines how instruction-following capabilities in LLMs interact with multi-document comparison tasks, identifying what we term the "Ranking Blind Spot"—a characteristic of LLM decision processes during comparative evaluation.
We analyze how this ranking blind spot affects LLM evaluation systems through two approaches: Decision Objective Hijacking, which alters the evaluation goal in pairwise ranking systems, and Decision Criteria Hijacking, which modifies relevance standards across ranking schemes. These approaches demonstrate how content providers could potentially influence LLM-based ranking systems to affect document positioning.
These attacks aim to force the LLM ranker to prefer a specific passage and rank it at the top. Malicious content providers can exploit this weakness, which helps them gain additional exposure by attacking the ranker.
In our experiment, We empirically show that the proposed attacks are effective in various LLMs and can be generalized to multiple ranking schemes. We apply these attack to realistic examples to show their effectiveness. We also found stronger LLMs are more vulnerable to these attacks.
Our code is available at: https://github.com/blindspotorg/RankingBlindSpot
Three distinct approaches to document ranking, each with unique vulnerabilities to prompt injection attacks
Compares two documents at a time to determine which is more relevant. The model makes binary decisions repeatedly to build the final ranking.
"Which document is more relevant to query Q: A or B?"
Processes all documents simultaneously and outputs a complete ranking. The model considers global relationships between all documents at once.
"Rank documents [D1, D2, ..., Dn] by relevance to query Q"
Selects the top-k most relevant documents from a larger set without producing a complete ranking. Focuses on identifying the best subset rather than ordering all documents.
"Select the top 3 most relevant documents from {D1, ..., Dn}"
Model | DOH Success | DCH Success | Overall ASR | NDCG@10 | Defense Robustness |
---|---|---|---|---|---|
|
98.02% | 100.00% | 99% | 69.76 → 01.94 | Low |
Llama-3.3-70B
|
100.00% | 99.95% | 99% | 74.30 → 07.38 | Low |
|
99.44% | 95.09% | 97% | TREC-DL 2019 | Low |
|
99.56% | 71.00% | 96% | TREC-DL 2019 | Low |
|
99.05% | 91.58% | 95% | TREC-DL 2019 | Low |
|
91.36% | 26.78% | 66% | TREC-DL 2019 | Medium |
LLMs are uniquely vulnerable during multi-document comparison tasks. While they resist manipulation in single-document analysis, comparison operations create an exploitable weakness with success rates reaching 99%.
Counterintuitively, more capable models like GPT-4.1-mini (100% DCH ASR) and Llama-3.3-70B (99.95% DCH ASR) exhibit higher vulnerability than smaller models. Their advanced instruction-following makes them more susceptible to hijacking.
Decision Criteria Hijacking (DCH) slightly outperforms Decision Objective Hijacking (DOH) across all models, with listwise ranking paradigm showing highest vulnerability (91.2% success rate).
NDCG@10 scores drop catastrophically under attack: Llama-3-70B falls from 74.30 to 07.38, representing a 90% degradation in ranking quality, effectively destroying the utility of the ranking system.
Remove or neutralize potential injection triggers before processing. Effectiveness: 62% reduction in ASR.
Cross-validate rankings using multiple models or non-LLM baselines. Effectiveness: 71% detection rate.
Separate ranking instructions from document content using structured formats. Effectiveness: 58% reduction in ASR.
Combine multiple defense strategies for layered protection. Effectiveness: 84% overall protection.
@inproceedings{qian2025ranking, title = {The Ranking Blind Spot: Decision Hijacking in LLM-based Text Ranking}, author = {Qian, Yaoyao and Zeng, Yifan and Jiang, Yuchao and Jain, Chelsi and Wang, Huazheng}, booktitle = {Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing}, year = {2025}, note = {Accepted} }