文章
106
标签
103
分类
28
首页
归档
分类
类别分类
作者分类
标签
友链
关于
LLM Security Group 's Notes
搜索
首页
归档
分类
类别分类
作者分类
标签
友链
关于
LLM Security Group 's Notes
全部文章 - 106
2025
2025-10-31
Jailbreaking Black Box Large Language Models in Twenty Queries
2025-10-26
Multi-Turn Jailbreaking Large Language Models via Attention Shifting
2025-10-24
FigStep: Jailbreaking Large Vision-Language Models via Typographic Visual Prompts
2025-10-24
Con Instruction: Universal Jailbreaking of Multimodal Large Language Models via Non-Textual Modalities
2025-10-19
Jailbroken: How Does LLM Safety Training Fail?
2025-10-18
Fine-tuning Aligned Language Models Compromises Safety, Even When Users Do Not Intend To!
2025-10-14
Weak-to-Strong Jailbreaking on Large Language Models
2025-09-19
Audio Jailbreak Attacks: Exposing Vulnerabilities in SpeechGPT in a White-Box Frameworkgeneration models
2025-09-15
Search-R1: Training LLMs to Reason and Leverage Search Engines with Reinforcement Learning
2025-09-14
Safe in Isolation, Dangerous Together: Agent-Driven Multi-Turn Decomposition Jailbreaks on LLMs
1
2
3
4
LLM Security Group
分享知识,认识世界
文章
106
标签
103
分类
28
Follow Me
公告
This is my Blog
最新文章
PLeak: Prompt Leaking Attacks against Large Language Model Applications
2025-11-24
Making Them Ask and Answer: Jailbreaking Large Language Models in Few Queries via Disguise and Reconstruction
2025-11-23
BaitAttack: Alleviating Intention Shift in Jailbreak Attacks via Adaptive Bait Crafting
2025-11-23
Salience-Aware Face Presentation Attack Detection via Deep Reinforcement Learning
2025-11-20
Towards Universal AI-Generated Image Detection by Variational Information Bottleneck Network
2025-11-20
分类
ADVERSARIAL DEFENSE
1
AI系统优化
1
Adversarial
2
Adversarial Text Generation
1
Adversarial attack
1
Attack
1
BLACK BOX ATTACKS
1
High Confidence Predictions for Unrecognizable Images
1
标签
噪声表示学习
GCG优化
人脸伪造检测
基本迭代法
基于CatmullRom样条回归
算法
补丁攻击
模型安全
大型多模态模型
PUZZLED
微调
成对排序学习
多智能体协作
对抗提示
自适应感知模块
面部伪装攻击检测
特征增强
注意力分散
adversarial example
Image Recognition
密码攻击
上下文学习
越狱攻击防御
特征融合
对抗样本
编码器解码器
Search-R1
LLM辅助越狱
进化算法
Adversarial Text Generation
信噪分离
注意力机制
MASTERKEY
多轮越狱
频域特征
可学习干预
BaitAttack
数据集创建(自动标注)
PAPILLON
梯度上升
归档
十一月 2025
23
十月 2025
25
九月 2025
13
八月 2025
45
网站信息
文章数目 :
106
本站访客数 :
本站总浏览量 :
最后更新时间 :
搜索
数据加载中