Signature Filtering Boosts LLM Watermark Detection in Weak Signal Settings.

Chih-Duo Hong, Yen-Pang Chen, Fang Yu· June 18, 2026 View original

Summary

This paper introduces signature filtering, a detection-time module that enhances statistical watermark detection in LLM outputs without altering embedding or generation. It identifies and removes "signature" tokens that make watermark tests unreliable, significantly improving detection rates in weak-signal and low-entropy texts.

Statistical watermarks are crucial for organizations to attribute the outputs of large language models (LLMs), but current detection methods often struggle with weak watermark signals, repetitive texts, or when watermarks have been edited. Researchers have developed a new approach called signature filtering, a lightweight module applied during the detection phase, which enhances watermark detection without requiring any modifications to the watermark embedding or text generation processes. Signature filtering operates by learning a small set of "signature" tokens. The presence of these tokens tends to render watermark tests unreliable. Before detection, these identified signature tokens are removed from the text. The signatures themselves are determined by solving a mixed-integer linear program on a small training dataset, with constraints designed to maximize the true positive rate of detection. Empirical evaluations across four prominent watermark families (Kgw, Sweet, Unigram, Exp), four benchmark corpora (C4, MBPP, HumanEval, Code-Search-Net), and six different LLMs (Opt-1.3b, Opt-6.7b, Llama2-13b, Llama3.1-8b, Qwen2.5-14b, Phi-3-medium-14b) demonstrate significant improvements. Using 2- and 3-gram signatures, detection rates in weak-signal and low-entropy contexts increased from 8-31% without filtering to 78-99% with filtering, while maintaining controllable and often negligible false positives. In stress tests involving scrambled sentences and 25-50% token perturbations, 2-gram filters for Kgw-style watermarks largely preserved the detection gains observed in clean text, frequently matching or surpassing advanced watermark detectors like WinMax. This makes signature filtering a simple, scalable, and model-agnostic addition for strengthening provenance checks of LLM-generated text in various information processing workflows.

Why it matters

For organizations relying on LLMs, signature filtering provides a robust, practical solution to enhance the reliability of content provenance and attribution, crucial for maintaining trust and combating misinformation.

How to implement this in your domain

  1. 1Integrate signature filtering as a post-processing step for watermark detection in LLM-generated content.
  2. 2Train signature filters on a small dataset to identify tokens that interfere with watermark detection.
  3. 3Apply signature filtering to improve detection rates in scenarios with weak watermark signals or low-entropy text.
  4. 4Utilize this method to enhance the robustness of provenance checks for LLM outputs in your workflows.

Who benefits

Media & PublishingCybersecurityContent ModerationEducationLegalTech

Key takeaways

  • Signature filtering enhances LLM watermark detection without modifying embedding or generation.
  • It removes "signature" tokens that make watermark tests unreliable.
  • Detection rates significantly improve in weak-signal and low-entropy settings.
  • The method is simple, scalable, and model-agnostic, improving provenance checks.

Original post by Chih-Duo Hong, Yen-Pang Chen, Fang Yu

"arXiv:2606.18430v1 Announce Type: new Abstract: Statistical watermarks help organizations attribute large language model (LLM) outputs, yet existing detectors often struggle when watermark signals are weak, texts are repetitive, or watermarks are edited. We propose signature filt…"

View on X

Originally posted by Chih-Duo Hong, Yen-Pang Chen, Fang Yu on X · view source

Want to go deeper?

Turn these trends into skills with Learnijoy's hands-on AI & tech courses.

Explore courses