文章
106
标签
103
分类
28
首页
归档
分类
类别分类
作者分类
标签
友链
关于
LLM Security Group 's Notes
搜索
首页
归档
分类
类别分类
作者分类
标签
友链
关于
归档
全部文章 - 106
2025
2025-11-03
AutoDAN: Interpretable Gradient-Based Adversarial Attacks on Large Language Models
2025-11-03
GPT-4 Is Too Smart to Be Safe: Stealthy Chat with LLMs via Cipher
2025-11-01
MASTERKEY: Automated Jailbreaking of Large Language Model Chatbots
2025-10-31
SELFDEFEND: LLMs Can Defend Themselves against Jailbreaking in a Practical Manner
2025-10-31
Rethinking Image Forgery Detection via Soft Contrastive Learning and Unsupervised Clustering
2025-10-31
Jailbreaking Black Box Large Language Models in Twenty Queries
2025-10-29
RAIDX: A Retrieval-Augmented Generation and GRPO Reinforcement Learning Framework for Explainable Deepfake Detection
2025-10-29
Let Images Speak More: An Efficient Method for Detecting Image Manipulation History
2025-10-27
Jailbreaking? One Step Is Enough
2025-10-26
Multi-Turn Jailbreaking Large Language Models via Attention Shifting
1
2
3
4
…
11
LLM Security Group
分享知识,认识世界
文章
106
标签
103
分类
28
Follow Me
公告
This is my Blog
最新文章
PLeak: Prompt Leaking Attacks against Large Language Model Applications
2025-11-24
Making Them Ask and Answer: Jailbreaking Large Language Models in Few Queries via Disguise and Reconstruction
2025-11-23
BaitAttack: Alleviating Intention Shift in Jailbreak Attacks via Adaptive Bait Crafting
2025-11-23
Salience-Aware Face Presentation Attack Detection via Deep Reinforcement Learning
2025-11-20
Towards Universal AI-Generated Image Detection by Variational Information Bottleneck Network
2025-11-20
分类
ADVERSARIAL DEFENSE
1
AI系统优化
1
Adversarial
2
Adversarial Text Generation
1
Adversarial attack
1
Attack
1
BLACK BOX ATTACKS
1
High Confidence Predictions for Unrecognizable Images
1
标签
噪声表示学习
GCG优化
人脸伪造检测
基本迭代法
基于CatmullRom样条回归
算法
补丁攻击
模型安全
大型多模态模型
PUZZLED
微调
成对排序学习
多智能体协作
对抗提示
自适应感知模块
面部伪装攻击检测
特征增强
注意力分散
adversarial example
Image Recognition
密码攻击
上下文学习
越狱攻击防御
特征融合
对抗样本
编码器解码器
Search-R1
LLM辅助越狱
进化算法
Adversarial Text Generation
信噪分离
注意力机制
MASTERKEY
多轮越狱
频域特征
可学习干预
BaitAttack
数据集创建(自动标注)
PAPILLON
梯度上升
归档
十一月 2025
23
十月 2025
25
九月 2025
13
八月 2025
45
网站信息
文章数目 :
106
本站访客数 :
本站总浏览量 :
最后更新时间 :
搜索
数据加载中