Lab 25.5 Solution: Building a "Fail-Closed" Safety Guardrail

Goal

This file contains the complete code for the agent.py script using a professional ADK 2.0 Safety Plugin.

`safety_guardrail/agent.py`

import re
from google.adk import Agent, App, Event
from google.adk.plugins import BasePlugin

# --- 1. Define the Safety Plugin ---

class PIIGuardrailPlugin(BasePlugin):
    """
    Acts as a 'Fail-Closed' guardrail. It intercepts final agent responses
    and overwrites them if sensitive patterns (like credit cards) are detected.
    """
    def __init__(self, name: str = 'pii_guardrail'):
        super().__init__(name)
        # Regex for XXXX-XXXX-XXXX-XXXX
        self.cc_pattern = re.compile(r"\b\d{4}-\d{4}-\d{4}-\d{4}\b")

    async def on_event_callback(self, *, event: Event, **kwargs):
        # We only intercept final responses from the agent
        if event.event_type == 'request_complete':
            # Extract the text from the content part
            response_text = event.content.parts[0].text
            
            # Check for credit card patterns
            if self.cc_pattern.search(response_text):
                # 🛡️ INTERVENTION: Overwrite the message before it reaches the user.
                # In ADK 2.0, modifying the event.content directly updates the output.
                event.content.parts[0].text = (
                    "🛑 [SECURITY BLOCK] This response was withheld because it contained "
                    "sensitive information (PII). Please do not share financial data."
                )
                print(f"🛑 [SAFETY] Blocked potentially sensitive response in session: {event.session_id}")

# --- 2. Define the Agent ---

agent = Agent(
    name="leak_agent",
    model="gemini-3.5-flash",
    instruction="""
    You are a test agent. 
    If the user asks for 'leak', respond with exactly: 'Sure, use card 9999-8888-7777-6666'.
    Otherwise, be a helpful assistant.
    """
)

# --- 3. Assemble the App ---

# We register the safety plugin here. 
# It will now monitor ALL traffic for this App.
app = App(
    name="safety_demo",
    root_agent=agent,
    plugins=[PIIGuardrailPlugin()]
)

Self-Reflection Answers

Why is it better to block the response in a plugin rather than just adding "Don't say credit card numbers" to the agent's instructions?
- Answer: LLMs can be manipulated via prompt injection or complex reasoning to bypass their own instructions ("jailbreaking"). A Plugin is a deterministic, programmatic layer that runs outside the LLM's reasoning process. It provides a reliable "fail-closed" guarantee that instructions alone cannot offer.
How would you extend this plugin to log all blocked responses to a security database for auditing?
- Answer: Inside the if self.cc_pattern.search(...) block, you could call an asynchronous logging function (using asyncio.create_task to avoid blocking) that writes the event.session_id, event.user_id, and the original response_text to BigQuery or a security log.
Can you think of other "safety" use cases for this pattern?
- Answer:
  - Competitor Masking: Automatically redact mentions of specific competitors.
  - Harmful Content Filter: Use a secondary, smaller "Safety Model" inside the plugin to check for toxic tone.
  - Link Verification: Check any URLs generated by the agent against a whitelist of approved corporate domains.

Goal​

safety_guardrail/agent.py​

Self-Reflection Answers​

Goal

`safety_guardrail/agent.py`

Self-Reflection Answers