Lab 30: Interacting with a Custom Voice Client Challenge
Goal
In this lab, you will run an ADK agent as a backend API server and interact with it using a custom, standalone HTML/JavaScript client. This will demonstrate how to build your own user-facing applications on top of the ADK's streaming capabilities.
Step 1: Prepare the Streaming Agent
We will use the same streaming agent configuration from Module 22.
-
Create the project directory and agent:
mkdir custom-streaming-app
cd custom-streaming-app
adk create --type=config streaming_agent -
Configure the agent:
- Navigate into
streaming_agent. - Configure the
.envfile for Vertex AI. - Replace the contents of
root_agent.yamlwith:name: streaming_conversational_agent
model="gemini-2.5-flash",
instruction: |
You are a friendly and talkative assistant. Keep your answers concise.
streaming: True
- Navigate into
Step 2: Create the Custom HTML/JavaScript Client
Exercise: Navigate back to the custom-streaming-app directory. Create an index.html file. A skeleton is provided below. Your task is to complete the JavaScript logic for the startStreaming function based on the # TODO comments.
<!-- In index.html (Starter Code) -->
<!DOCTYPE html>
<html>
<head>
<title>ADK Custom Streaming Client</title>
<style>
body { font-family: sans-serif; display: flex; flex-direction: column; align-items: center; }
#transcript { width: 500px; height: 300px; border: 1px solid #ccc; padding: 10px; overflow-y: scroll; }
button { margin: 10px; padding: 10px; font-size: 16px; }
</style>
</head>
<body>
<h1>ADK Custom Streaming Client</h1>
<button id="streamButton">Start Streaming</button>
<div id="transcript"></div>
<script>
const streamButton = document.getElementById('streamButton');
const statusDiv = document.getElementById('status');
const transcriptDiv = document.getElementById('transcript');
let websocket;
let audioContext;
let mediaStream;
let mediaRecorder;
let isStreaming = false;
streamButton.onclick = () => {
if (!isStreaming) {
startStreaming();
} else {
stopStreaming();
}
};
function log(message) {
const p = document.createElement('p');
p.textContent = message;
transcriptDiv.appendChild(p);
transcriptDiv.scrollTop = transcriptDiv.scrollHeight;
}
async function startStreaming() {
try {
// 1. Get microphone access
mediaStream = await navigator.mediaDevices.getUserMedia({ audio: true });
// 2. Establish WebSocket connection
const sessionId = Math.random().toString(36).substring(7);
const wsUrl = `ws://localhost:8000/live/${sessionId}?is_audio=true`;
websocket = new WebSocket(wsUrl);
websocket.onopen = () => {
isStreaming = true;
streamButton.textContent = 'Stop Streaming';
statusDiv.textContent = 'Status: Connected';
log('[CLIENT]: WebSocket connection opened.');
// 3. Start recording and sending audio
mediaRecorder = new MediaRecorder(mediaStream, { mimeType: 'audio/webm;codecs=opus' });
mediaRecorder.ondataavailable = (event) => {
if (event.data.size > 0 && websocket.readyState === WebSocket.OPEN) {
websocket.send(event.data);
}
};
mediaRecorder.start(100); // Send data every 100ms
};
websocket.onmessage = (event) => {
// 4. Handle incoming messages (agent's response)
const data = JSON.parse(event.data);
if (data.mime_type === 'text/plain') {
log(`[AGENT]: ${data.data}`);
}
// A production client would also handle 'audio/mp3' mime_type for playback.
};
websocket.onclose = () => {
log('[CLIENT]: WebSocket connection closed.');
stopStreaming();
};
websocket.onerror = (error) => {
log(`[CLIENT]: WebSocket Error: ${JSON.stringify(error)}`);
stopStreaming();
};
} catch (error) {
log(`[CLIENT]: Error starting stream: ${error}`);
}
}
function stopStreaming() {
if (mediaRecorder && mediaRecorder.state === 'recording') {
mediaRecorder.stop();
}
if (mediaStream) {
mediaStream.getTracks().forEach(track => track.stop());
}
if (websocket && websocket.readyState === WebSocket.OPEN) {
websocket.close();
}
isStreaming = false;
streamButton.textContent = 'Start Streaming';
statusDiv.textContent = 'Status: Disconnected';
}
</script>
</body>
</html>
Step 3: Run the Server and the Client
-
Terminal 1 (ADK Server):
- Navigate to the
custom-streaming-appdirectory. - Run
adk api_server streaming_agent.
cd /path/to/custom-streaming-app
adk api_server streaming_agent - Navigate to the
-
Terminal 2 (Client Web Server):
- Navigate to the
custom-streaming-appdirectory (whereindex.htmlis). - Start a simple Python web server.
python3 -m http.server 8081 - Navigate to the
Step 4: Test Your Custom Application
- Open the Client: In your browser, navigate to
http://localhost:8081.- Note on Browser Permissions: Your browser will likely ask for permission to access your microphone. You must grant this for the streaming to work.
- Start Streaming: Click the "Start Streaming" button.
- Talk to the Agent: Speak into your microphone and watch the transcript area for the agent's real-time text response.
- Note on Audio Playback: This client currently only displays the agent's text response. A production client would also handle the
audio/mp3mime type from the server for voice playback.
- Note on Audio Playback: This client currently only displays the agent's text response. A production client would also handle the
Having Trouble?
If you get stuck, you can find the complete, working index.html code in the lab-solution.md file.
Lab Summary
You have successfully built a custom client for a streaming ADK agent. You have learned:
- How to run an ADK agent as a backend service using
adk api_server. - The basic structure of an HTML/JavaScript client for streaming.
- How to use the
WebSocketand Web Audio APIs to create a real-time voice application.
Self-Reflection Questions
- This lab's client only displays the text from the agent. How would you modify the
websocket.onmessagehandler to also process and play back theaudio/mp3data that the server sends? - What are the benefits of using WebSockets for this application compared to the Server-Sent Events (SSE) approach used in the previous UI lab?
- The
MediaRecorderis configured to send audio data every 100ms. What do you think would be the impact on the user experience if you increased this value to 1000ms (1 second)?
🕵️ Hidden Solution 🕵️
Looking for the solution? Here's a hint (Base64 decode me):
L2RvYy1hZGstdHJhaW5pbmcvbW9kdWxlMzAtY3VzdG9tLXN0cmVhbWluZy1jbGllbnQvbGFiLXNvbHV0aW9u
The direct link is: Lab Solution