Published — IEEE ICCIT 2025

Jailbreaking LLMs via Misleading Comments

A peer-reviewed adversarial study showing how deceptive code comments can poison the outputs of Large Language Models. Presented at the 28th International Conference on Computer and Information Technology (ICCIT), Cox's Bazar, Bangladesh.

Paper DOI: 10.1109/ICCIT68739.2025.11491067Dataset DOI: 10.5281/zenodo.15786008

📁 Dataset

The dataset features 200+ adversarial prompts crafted across 7 harm categories and 5 narrative types to evaluate LLM behavior under misleading code comments.

  • Categories: Physical Harm, Malware, Illegal Activity, Hate Speech, Economic Harm, Fraud, Benign
  • Narratives: Research Simulation, Cybersecurity Game, PenTest Framework, Educational Tool, Fictional App Dev
  • Includes prompt, comment context, expected behavior, and generated output
➜ View Full Dataset on Zenodo

📊Results & Insights

Our experiments reveal critical weaknesses in LLMs when exposed to deceptive code comments. Below we highlight model vulnerabilities, harm categories, narrative impacts, and awareness failures.

2025 Study
93.5%
Max Jailbreak Success (LLaMA-3.3)
100%
Success in Malware & Physical Harm
63%
Harmful & Unaware Outputs
>80%
PenTest & Cyber Game Narratives

Success Rate by Model Pair

Success Rate by Prompt Category

Success Rate by Narrative Type

Output Awareness Breakdown

* Human evaluator agreement ranged from 34.29% to 90.91%. Judging harm and awareness remains partially subjective.

📘 Research Paper

IEEEICCIT 2025Conference Paper

Code Poisoning Through Misleading Comments: Jailbreaking Large Language Models via Contextual Deception

A. A. Sami, G. Debnath, R. Dey, and A. N. Chowdhury

Venue: 2025 28th International Conference on Computer and Information Technology (ICCIT)
Location: Cox's Bazar, Bangladesh
Conference Dates: 19–21 December 2025
Added to IEEE Xplore: 06 May 2026
Pages: 3812–3817
Electronic ISBN: 979-8-3315-7867-1
Electronic ISSN: 2474-9656

Abstract

Large language models (LLMs) increasingly underpin everyday software tooling; yet, their deference to surrounding comments creates a subtle yet potent attack surface. We demonstrate a new jail-break technique that hides prohibited requests inside ostensibly educational or maintenance comments, inducing models to emit disallowed code and instructions despite alignment safeguards. To quantify the risk, we assemble a 200-prompt benchmark covering seven harm categories (e.g., physical, economic, malware) and five narrative frames (e.g., research simulation, penetration testing), each expressed as short Python snippets with carefully crafted deceptive annotations. A 3×3 factorial study probes three state-of-the-art LLMs — Gemini-2.0-Flash, DeepSeek-R1-Distill-LLaMA-70B, and LLaMA-3.3-70B-Versatile — evaluating 1,800 generations with automated rules plus expert adjudication for both “harmfulness” and “harm awareness.” Attacks succeed in 63%–93.5% of cases; LLaMA-3.3-70B-Versatile proves most susceptible, while malware and illegal activity prompts achieve near-perfect bypass rates. More than half of successful outputs are produced “harmful & unaware,” indicating that current safety layers frequently fail silently rather than refuse. The results call for comment-level auditing whenever LLMs are deployed in production code workflows.

Author Keywords

Adversarial attackCode poisoningLarge language models (LLMs)JailbreakingContextual deceptionCode commentsLLM vulnerabilitiesAI safety

📝 Cite This Work

If you use this work, please cite both the paper and the dataset:

Paper Citation

BibTeX format:

@INPROCEEDINGS{11491067, author = {Sami, Aftar Ahmad and Debnath, Gourob and Dey, Rajon and Chowdhury, Abdulla Nasir}, booktitle = {2025 28th International Conference on Computer and Information Technology (ICCIT)}, title = {Code Poisoning Through Misleading Comments: Jailbreaking Large Language Models via Contextual Deception}, year = {2025}, pages = {3812-3817}, doi = {10.1109/ICCIT68739.2025.11491067}, publisher = {IEEE}, address = {Cox's Bazar, Bangladesh}, keywords = {Adversarial attack; Code poisoning; Large language models (LLMs); Jailbreaking; Contextual deception; Code comments; LLM vulnerabilities; AI safety} }

IEEE format:

A. A. Sami, G. Debnath, R. Dey and A. N. Chowdhury, “Code Poisoning Through Misleading Comments: Jailbreaking Large Language Models via Contextual Deception,” 2025 28th International Conference on Computer and Information Technology (ICCIT), Cox’s Bazar, Bangladesh, 2025, pp. 3812–3817, doi: 10.1109/ICCIT68739.2025.11491067.

APA format:

Sami, A. A., Debnath, G., Dey, R., & Chowdhury, A. N. (2025). Code Poisoning Through Misleading Comments: Jailbreaking Large Language Models via Contextual Deception. In 2025 28th International Conference on Computer and Information Technology (ICCIT) (pp. 3812–3817). IEEE. https://doi.org/10.1109/ICCIT68739.2025.11491067

Dataset Citation

BibTeX format:

@dataset{sami_2025_15786008, author = {Sami, Aftar Ahmad and Debnath, Gourob and Dey, Rajon and Chowdhury, Abdulla Nasir}, title = {LLM Comment Vulnerability Dataset}, month = jul, year = 2025, publisher = {Zenodo}, doi = {10.5281/zenodo.15786008}, url = {https://doi.org/10.5281/zenodo.15786008} }

IEEE format:

A. A. Sami, G. Debnath, R. Dey and A. N. Chowdhury, “LLM Comment Vulnerability Dataset”. Zenodo, Jul. 01, 2025. doi: 10.5281/zenodo.15786008.

APA format:

Sami, A. A., Debnath, G., Dey, R., & Chowdhury, A. N. (2025). LLM Comment Vulnerability Dataset [Data set]. Zenodo. https://doi.org/10.5281/zenodo.15786008

👤About the Researchers

Meet the team behind this project—four researchers passionate about AI safety, adversarial attacks, and LLM robustness. Our collaboration brings together diverse expertise to advance the field.

Aftar Ahmad Sami

AI Analyst & Data Researcher
Focus: Data analysis, prompt engineering

Dept. of Computer Science and Engineering, Leading University, Sylhet, Bangladesh

Gourob Debnath

Software Engineer
Focus: Backend, security, and automation

EARL Research Lab, Bangladesh

Rajon Dey

Software Engineer
Focus: Project direction, architecture, and full-stack development

EARL Research Lab, Bangladesh

Abdulla Nasir Chowdhury

Research Supervisor
Focus: AI safety, adversarial ML, research direction

EARL Research Lab, Bangladesh