The basic flow is straightforward. Press a button, speak a command like 'forward' or 'left', and the robot moves. Behind the scenes, the STM32F3 Discovery captures audio from an INMP441 microphone using I2S, streams the raw audio over UART to an ESP-01S WiFi module, which sends it to a Python server running on my laptop. The server wraps the audio in a WAV file, sends it to OpenAI's Whisper API, parses the transcription for keywords, and sends back a simple command byte. The STM32 receives this and controls the rear drive motors through an L293D driver, while an SG90 servo steers the front wheels.





