Lab 25.5: Building a "Fail-Closed" Safety Guardrail

Goal

In this lab, you will implement a professional RAI (Responsible AI) Plugin that acts as a mandatory safety guardrail. You will build a system that detects sensitive information (like credit card numbers) and blocks the agent's response if such data is present.

This demonstrates the Fail-Closed pattern used in enterprise AI applications.

Step 1: Create the Project

Create a new project:
```
uv run adk create safety_guardrail
```

Step 2: Implement the Safety Plugin

Open agent.py. You need to create a class that intercepts the agent's final response and checks it for PII (Personally Identifiable Information).

Exercise: Complete the on_event_callback to detect a simple credit card pattern (4-4-4-4 digits) and block the response.

# In agent.py (Starter Code)
import re
from google.adk import Agent, App, Context, Event
from google.adk.plugins import BasePlugin

class PIIGuardrailPlugin(BasePlugin):
    """Blocks responses containing potential credit card numbers."""
    def __init__(self, name: str = 'pii_guardrail'):
        super().__init__(name)
        # Simple regex for XXXX-XXXX-XXXX-XXXX
        self.cc_pattern = re.compile(r"\b\d{4}-\d{4}-\d{4}-\d{4}\b")

    async def on_event_callback(self, *, event: Event, **kwargs):
        # TODO: 1. Target the correct event type ('request_complete')
        
        # TODO: 2. Extract the response text
        
        # TODO: 3. Use self.cc_pattern.search() to check for violations
        
        # TODO: 4. If a violation is found:
        # - Overwrite event.content.parts[0].text with a safety warning.
        # - Print a log message: "🛑 [PII] Response blocked."
        pass

# --- Agent that might 'leak' data (for testing) ---
agent = Agent(
    name="leak_agent",
    model="gemini-3.5-flash",
    instruction="""
    You are a testing assistant. 
    If the user asks for 'test data', respond with: 'Here is your card: 1234-5678-9012-3456'.
    Otherwise, answer normally.
    """
)

# --- Register Plugin with App ---
app = App(
    name="safety_demo",
    root_agent=agent,
    plugins=[PIIGuardrailPlugin()]
)

Step 3: Test the Guardrail

Launch the Dev UI:
```
uv run adk web .
```
Trigger the Leak:
- Ask: "Give me some test data."
- Observe: The agent's raw response (seen in the Trace) will contain the card number, but the Final Response shown to the user should be your safety warning.
Normal Interaction:
- Ask: "What is the capital of Italy?" -> Should work normally.

Lab Summary

You have built an active safety layer for your agent!

You implemented the Fail-Closed pattern using a Plugin.
You learned how to intercept and modify agent events before they reach the user.
You realized that safety logic can be managed centrally, independently of the LLM's own internal filters.

Self-Reflection Questions

Why is it better to block the response in a plugin rather than just adding "Don't say credit card numbers" to the agent's instructions?
How would you extend this plugin to log all blocked responses to a security database for auditing?
Can you think of other "safety" use cases for this pattern (e.g., detecting competitor names, preventing offensive language)?

🕵️ Hidden Solution 🕵️

Looking for the solution? Here's a hint (Base64 decode me): L2RvYy1hZGstdHJhaW5pbmcvbW9kdWxlMjVfNS1yYWktc2FmZXR5LXBsdWdpbnMvbGFiLXNvbHV0aW9u

The direct link is: Lab Solution

Goal​

Step 1: Create the Project​

Step 2: Implement the Safety Plugin​

Step 3: Test the Guardrail​

Lab Summary​

Self-Reflection Questions​

🕵️ Hidden Solution 🕵️​