Lab 25.5 Solution: Building a "Fail-Closed" Safety Guardrail
Goal
This file contains the complete code for the agent.py script using a professional ADK 2.0 Safety Plugin.
safety_guardrail/agent.py
import re
from google.adk import Agent, App, Event
from google.adk.plugins import BasePlugin
# --- 1. Define the Safety Plugin ---
class PIIGuardrailPlugin(BasePlugin):
"""
Acts as a 'Fail-Closed' guardrail. It intercepts final agent responses
and overwrites them if sensitive patterns (like credit cards) are detected.
"""
def __init__(self, name: str = 'pii_guardrail'):
super().__init__(name)
# Regex for XXXX-XXXX-XXXX-XXXX
self.cc_pattern = re.compile(r"\b\d{4}-\d{4}-\d{4}-\d{4}\b")
async def on_event_callback(self, *, event: Event, **kwargs):
# We only intercept final responses from the agent
if event.event_type == 'request_complete':
# Extract the text from the content part
response_text = event.content.parts[0].text
# Check for credit card patterns
if self.cc_pattern.search(response_text):
# 🛡️ INTERVENTION: Overwrite the message before it reaches the user.
# In ADK 2.0, modifying the event.content directly updates the output.
event.content.parts[0].text = (
"🛑 [SECURITY BLOCK] This response was withheld because it contained "
"sensitive information (PII). Please do not share financial data."
)
print(f"🛑 [SAFETY] Blocked potentially sensitive response in session: {event.session_id}")
# --- 2. Define the Agent ---
agent = Agent(
name="leak_agent",
model="gemini-3.5-flash",
instruction="""
You are a test agent.
If the user asks for 'leak', respond with exactly: 'Sure, use card 9999-8888-7777-6666'.
Otherwise, be a helpful assistant.
"""
)
# --- 3. Assemble the App ---
# We register the safety plugin here.
# It will now monitor ALL traffic for this App.
app = App(
name="safety_demo",
root_agent=agent,
plugins=[PIIGuardrailPlugin()]
)
Self-Reflection Answers
-
Why is it better to block the response in a plugin rather than just adding "Don't say credit card numbers" to the agent's instructions?
- Answer: LLMs can be manipulated via prompt injection or complex reasoning to bypass their own instructions ("jailbreaking"). A Plugin is a deterministic, programmatic layer that runs outside the LLM's reasoning process. It provides a reliable "fail-closed" guarantee that instructions alone cannot offer.
-
How would you extend this plugin to log all blocked responses to a security database for auditing?
- Answer: Inside the
if self.cc_pattern.search(...)block, you could call an asynchronous logging function (usingasyncio.create_taskto avoid blocking) that writes theevent.session_id,event.user_id, and the originalresponse_textto BigQuery or a security log.
- Answer: Inside the
-
Can you think of other "safety" use cases for this pattern?
- Answer:
- Competitor Masking: Automatically redact mentions of specific competitors.
- Harmful Content Filter: Use a secondary, smaller "Safety Model" inside the plugin to check for toxic tone.
- Link Verification: Check any URLs generated by the agent against a whitelist of approved corporate domains.
- Answer: