Lab 38: Building a Production-Ready Agent Challenge
Goal
In this lab, you will build a Best Practices Agent that demonstrates several production-ready patterns, including input validation, error handling with retries, and caching.
Step 1: Create the Agent Project
-
Create the agent project:
adk create best-practices-agentWhen prompted, choose the Programmatic (Python script) option.
-
Navigate into the new directory:
cd best-practices-agent
Step 2: Implement the Production-Ready Tools
Exercise: Open agent.py. Skeletons for three tools are provided. Your task is to apply the best practices of validation, resilience, and caching using the # TODO comments as a guide.
# In agent.py (Starter Code)
import time
import random
import functools
from pydantic import BaseModel, Field, constr
from retry import retry
from google.adk.agents import Agent
from google.adk.tools import FunctionTool
from google.adk.tools.tool_context import ToolContext # Import ToolContext
# --- 1. Input Validation with Pydantic ---
sidebar_position: 2
class ValidatedInput(BaseModel):
"""A Pydantic model to validate inputs for a tool."""
user_id: constr(regex=r'^[a-zA-Z0-9_-]{3,50}$')
query: str = Field(..., max_length=1000)
# TODO: 1. Implement this tool. Inside a try/except block, attempt to
# instantiate the `ValidatedInput` model with the provided `user_id` and `query`.
# Return a success dictionary if it validates, or an error dictionary if it fails.
def validate_input_tool(user_id: str, query: str, tool_context: ToolContext) -> dict:
"""Validates user_id and query using a Pydantic model."""
try:
ValidatedInput(user_id=user_id, query=query)
return {"status": "success", "message": "Input is valid."}
except ValueError as e:
return {"status": "error", "message": f"Invalid input: {e}"}
# --- 2. Resilience with Retries and Exponential Backoff ---
sidebar_position: 2
# TODO: 2. Apply the `@retry` decorator to this function. Configure it to
# try 4 times with a delay that doubles, starting at 1 second.
# Note: `logger=None` prevents duplicate logging if a root logger is already configured.
@retry(tries=4, delay=1, backoff=2, logger=None)
def _flaky_api_call():
"""Simulates an API call that might fail."""
print("Attempting to call the flaky API...")
if random.random() > 0.33: # 67% chance of failure
print("API call failed! Retrying...")
raise ConnectionError("The external API is temporarily unavailable.")
print("API call succeeded!")
return {"status": "success", "data": "Successfully retrieved data."}
def retry_with_backoff_tool(tool_context: ToolContext) -> dict:
"""Calls an external service that might fail intermittently."""
try:
return _flaky_api_call()
except ConnectionError as e:
return {"status": "error", "message": f"The API call failed after multiple retries: {e}"}
# --- 3. Performance with Caching ---
sidebar_position: 2
# TODO: 3. Apply the `@functools.lru_cache` decorator to this function
# to cache its results. Set a maxsize of 128.
# Note: `lru_cache` is a built-in Python decorator for memoization.
@functools.lru_cache(maxsize=128)
def _slow_database_query(item_id: str) -> str:
"""Simulates a slow database query that takes 2 seconds."""
print(f"Performing slow query for item: {item_id}...")
time.sleep(2)
print("Query complete.")
return f"Data for {item_id}"
def cache_operation_tool(item_id: str, tool_context: ToolContext) -> dict:
"""Fetches data from a slow database with caching."""
result = _slow_database_query(item_id)
return {"status": "success", "data": result}
# --- Agent Definition ---
sidebar_position: 2
# TODO: 4. Define the `root_agent`. Give it an instruction to use the
# appropriate tool based on the user's request and register your three
# new tools with it.
root_agent = Agent(
model='gemini-2.5-flash',
name='best_practices_agent',
instruction="""
You are an agent that demonstrates production best practices.
You have tools for validation, resilience, and performance.
Use the appropriate tool based on the user's request.
""",
tools=[
FunctionTool(validate_input_tool),
FunctionTool(retry_with_backoff_tool),
FunctionTool(cache_operation_tool),
]
)
Step 3: Run and Test the Agent
- Install dependencies:
pip install pydantic retry - Set up your
.envfile and start the Dev UI:adk web best-practices-agent - Interact with the Agent and Observe the Patterns:
- Test Caching:
- "Fetch item
item-123" (will be slow) - "Fetch item
item-123again" (will be instant)
- "Fetch item
- Test Validation:
- "Validate user
user_!@#$with querytest" (will fail)
- "Validate user
- Test Retries:
- "Attempt a flaky operation" (observe the console for retry logs)
- Test Caching:
Having Trouble?
If you get stuck, you can find the complete, working code in the lab-solution.md file.
Lab Summary
You have successfully built an agent with tools that incorporate production-grade best practices. You have learned to:
- Use
Pydanticfor robust input validation. - Implement automatic retries with
retry. - Use caching with
lru_cacheto improve performance.
Self-Reflection Questions
- The
@lru_cachedecorator stores results in memory. What are the limitations of this caching strategy in a distributed, multi-instance deployment on a platform like Cloud Run? What would be a better solution for caching in that environment? - Our
_flaky_api_callfunction simulates a network error. What other kinds of errors should a production-grade tool handle gracefully? - How does using a library like
Pydanticfor input validation make your tools more secure and reliable compared to writing manualif/elsechecks?
🕵️ Hidden Solution 🕵️
Looking for the solution? Here's a hint (Base64 decode me):
L2RvYy1hZGstdHJhaW5pbmcvbW9kdWxlMzgtYmVzdC1wcmFjdGljZXMvbGFiLXNvbHV0aW9u
The direct link is: Lab Solution