Lab 7: Building a Visual Product Catalog Analyzer
Goal
In this lab, you will build a multimodal agent that can analyze a product image and generate a marketing description. This will teach you how to handle image inputs and perform programmatic execution with a vision-capable agent.
The Architecture
You will use the App and Runner pattern you learned in Module 6 to build a Python script that:
- Loads local image files as
types.Partobjects. - Constructs a multimodal prompt containing both text and image data.
- Executes the agent using
run_async.
Step 1: Create and Prepare the Project
- Create the agent project:
adk create visual_catalog - Navigate into the new directory:
cd visual_catalog - Install Dependencies:
This lab requires the
Pillowlibrary for image handling.pip install Pillow - Set up your
.envfile. Vision models require a Vertex AI configuration. Ensure your.envfile looks like this:GOOGLE_GENAI_USE_VERTEXAI=1
GOOGLE_CLOUD_PROJECT="your-project-id"
GOOGLE_CLOUD_LOCATION="us-central1"
Step 2: Implement the Multimodal Script
Exercise: Create a file named main.py in your visual_catalog directory. Your task is to complete the core logic inside the analyze_product method by filling in the # TODO sections.
Note on Sessions: When using
run_asyncprogrammatically, you must explicitly create a session in the runner's session service before sending a message, just as you did withcurlin Module 5.
import asyncio
import os
from google.adk.agents import LlmAgent
from google.adk.apps import App
from google.adk.runners import InMemoryRunner
from google.genai import types
from dotenv import load_dotenv
# Helper function to load an image from a local file path
def load_image_from_file(path: str) -> types.Part:
"""Load image from file and return a types.Part object."""
with open(path, 'rb') as f:
image_bytes = f.read()
mime_type = 'image/png' if path.lower().endswith('.png') else 'image/jpeg'
return types.Part(
inline_data=types.Blob(data=image_bytes, mime_type=mime_type)
)
class VisualCatalogApp:
def __init__(self):
# 1. Define the Intelligence
self.agent = LlmAgent(
model='gemini-2.5-flash',
name='catalog_writer',
instruction="You are a professional product catalog writer. Analyze the image and write a compelling description."
)
# 2. Build the Infrastructure
self.app = App(name="visual_catalog", root_agent=self.agent)
self.runner = InMemoryRunner(app=self.app)
async def analyze_product(self, product_id: str, image_path: str):
print(f"\n--- Analyzing Product: {product_id} ---")
user_id = "catalog_admin"
session_id = f"sess_{product_id}"
# TODO: 1. Create the session explicitly in the session service
# Hint: Use self.runner.session_service.create_session(...)
...
# TODO: 2. Load the image using the helper function
image_part = ...
# TODO: 3. Create the multimodal message (types.Content object)
msg = types.Content(
role="user",
parts=[
types.Part(text=f"Analyze this image for product {product_id}."),
image_part
]
)
# TODO: 4. Run the agent using run_async
print("📸 Sending image to Gemini...")
async for event in self.runner.run_async(
user_id=user_id,
session_id=session_id,
new_message=msg
):
if event.is_final_response():
print(f"✅ Description:\n{event.content.parts[0].text}\n")
async def main():
load_dotenv()
catalog = VisualCatalogApp()
# Images are expected to be in the parent directory for this lab
products = [
('HEADPHONES-01', '../headphones.jpg'),
('LAPTOP-02', '../laptop.jpg'),
]
for product_id, path in products:
if os.path.exists(path):
await catalog.analyze_product(product_id, path)
await asyncio.sleep(1)
else:
print(f"⚠️ Image not found: {path}")
if __name__ == '__main__':
asyncio.run(main())
Lab Summary
You have successfully built a multimodal agent! You have learned:
- How to package image bytes into a
types.Partobject. - The importance of explicitly creating a Session when using
run_asyncprogrammatically. - How to construct a structured
types.Contentobject for complex inputs.
Self-Reflection Questions
- Why did we have to call
create_sessionmanually this time, whereas in Module 6'srun_debugwe didn't? - How does the
InMemoryRunnersimplify the setup of theSessionServicecompared to a baseRunner? - If you wanted to analyze a PDF document instead of an image, which
mime_typewould you use in thetypes.Blob?
🕵️ Hidden Solution 🕵️
Looking for the solution? Here's a hint (Base64 decode me):
L2RvYy1hZGstdHJhaW5pbmcvbW9kdWxlMDctbXVsdGltb2RhbC1hbmQtaW1hZ2VzL2xhYi1zb2x1dGlvbg==
The direct link is: Lab Solution