Jailbreaking LLMs via Misleading Comments

A collaborative adversarial research exploring how deceptive code comments can poison outputs of Large Language Models.

Explore DatasetDownload Paper (Coming Soon)

📁 Dataset

The dataset features 200+ adversarial prompts crafted across 7 harm categories and 5 narrative types to evaluate LLM behavior under misleading code comments.

Categories: Physical Harm, Malware, Illegal Activity, Hate Speech, Economic Harm, Fraud, Benign
Narratives: Research Simulation, Cybersecurity Game, PenTest Framework, Educational Tool, Fictional App Dev
Includes prompt, comment context, expected behavior, and generated output

➜ View Full Dataset on Zenodo

📊Results & Insights

Our experiments reveal critical weaknesses in LLMs when exposed to deceptive code comments. Below we highlight model vulnerabilities, harm categories, narrative impacts, and awareness failures.

2025 Study

93.5%

Max Jailbreak Success (LLaMA-3.3)

100%

Success in Malware & Physical Harm

63%

Harmful & Unaware Outputs

>80%

PenTest & Cyber Game Narratives

Success Rate by Model Pair

Success Rate by Prompt Category

Success Rate by Narrative Type

Output Awareness Breakdown

* Human evaluator agreement ranged from 34.29% to 90.91%. Judging harm and awareness remains partially subjective.

📘 Research Paper

Our research paper is currently under review and will be published soon. Stay tuned for updates and access to the full paper, which will include detailed methodology, prompts, and results.

The PDF will be available for download after publication.

📝 Cite This Work

Cite the dataset and paper as follows:

Dataset Citation

BibTeX format:

@dataset{sami_2025_15786008,
  author       = {Sami, Aftar Ahmad and
                  Debnath, Gourob and
                  Dey, Rajon and
                  Chowdhury, Abdulla Nasir},
  title        = {LLM Comment Vulnerability Dataset},
  month        = jul,
  year         = 2025,
  publisher    = {Zenodo},
  doi          = {10.5281/zenodo.15786008},
  url          = {https://doi.org/10.5281/zenodo.15786008},
}

IEEE format:

A. A. Sami, G. Debnath, R. Dey and A. N. Chowdhury, “LLM Comment Vulnerability Dataset”. Zenodo, Jul. 01, 2025. doi: 10.5281/zenodo.15786008.

APA format:

Sami, A. A., Debnath, G., Dey, R., & Chowdhury, A. N. (2025). LLM Comment Vulnerability Dataset [Data set]. Zenodo. https://doi.org/10.5281/zenodo.15786008

Paper Citation

Coming soon

👤About the Researchers

Meet the team behind this project—three researchers passionate about AI safety, adversarial attacks, and LLM robustness. Our collaboration brings together diverse expertise to advance the field.

After

AI Analyst & Data Researcher
Focus: Data analysis, prompt engineering

Gourob

Software Engineer
Focus: Backend, security, and automation

Rajon

Module Lead
Focus: Project direction, architecture, and full-stack development