ESP audio development boards, powered by ESP32 and ESP32-S2 SoCs, are designed for audio applications such as smart speakers, voice robots, story-teller machines and other voice-controlled devices used in smart-home solutions. They provide out-of-the-box voice enablement, and support connection to multiple voice platforms.
Overview


Top-Notch Audio Solutions
Espressif’s audio boards provide advanced Wi-Fi + Bluetooth dual-mode audio solutions that can be easily and quickly implemented, supporting music-playing from multiple sources, such as HTTP, HLS (HTTP Live Streaming), SPIFFS, SDCARD, A2DP-Source, A2DP-Sink, HFP, etc. To this end, a number of different audio formats are supported, such as MP3, AAC, FLAC, WAV, OGG, OPUS, AMR, TS, ALC, and G.711.

Extensive Applications
ESP audio boards support one-key Wi-Fi configuration, voice wake-up, voice recognition and cloud-platform access. They are designed for the development of audio and AIoT applications, e.g., Wi-Fi or Bluetooth speakers, speech-based remote controllers, voice robots, smart toys and connected smart-home appliances with a wide-ranging audio functionality.
ESP Audio SDKs
Espressif’s SoCs are not just about connectivity. With integrated Wi-Fi and Bluetooth interconnection capacity, excellent computing power and a rich set of peripherals, Espressif offers a complete solution for various types of voice and audio applications. Espressif’s audio SDKs assist with the implementation of smart speakers, web radios, voice assistants and voice-controlled devices, in general. They also support out-of-the-box connectivity to various platforms, such as Amazon’s AVS and Amazon Lex, Google’s Voice Assistant and Google DialogFlow, DuerOS, TmallGenie, XiaoAi, etc.
ESP-ADF
ESP-ADF is the official audio development framework for the ESP32 and ESP32-S2 SoCs. With ESP-ADF, you can easily add features, and develop a number of different audio applications ranging from simple to complex ones. ESP-ADF supports connection to various voice platforms.
ESP-VA-SDK
The ESP-Voice-Assistant SDK provides an implementation of Amazon's Alexa Voice Service, Google’s Voice Assistant, and Google's conversational interface (Dialogflow) for the ESP32 microcontroller.
ESP-Skainet
ESP-Skainet is Espressif’s speech recognition SDK targeting voice-controlled devices. ESP-Skainet can operate completely offline, without depending on cloud connectivity. Both wake-word and speech-command (phrase) detections happen locally on Espressif SoCs. ESP-Skainet takes advantage of various acoustic algorithms, such as voice-activity detection, acoustic echo cancellation, noise reduction and beamforming. As a result, ESP-Skainet achieves enhanced acoustic performance.

Related Audio Hardware

ESP32-LyraT
ESP32-LyraT is a standard hardware platform supporting recording, audio playback, and simple IoT controls. It's designed for dual-core ESP32 audio applications, e.g., Wi-Fi or Bluetooth audio speakers, story-teller machines, reading pens, etc.
ESP32-LyraT-Mini is a lightweight audio development board based on ESP32-WROVER-E, which implements wake-word engine and front-end acoustic algorithms such as AEC, AGC and NS.

ESP32-LyraT-Mini

ESP32-LyraTD-MSC
ESP32-LyraTD-MSC consists of two parts: the upper board, which provides a three-microphone array, function keys and LED lights; and the lower board, which integrates ESP32-WROVER-E, a Microsemi-DSP chip, and a power management module. It effectively supports far-field voice solutions, such as smart speakers, smart lamps and other voice-controlled appliances in smart-home applications.

ESP32-LyraTD-DSPG is based on the ESP32-WROVER-B module and a digital signal processor (DSP) that features a three-microphone array for noise reduction, echo cancellation, beamforming and wake-word detection. ESP32-LyraTD-DSPG consists of two development boards. The main board integrates power management, as well as Wi-Fi and audio modules like DSP, codec and power amplifier. The sub-board mainly consists of the microphone array, function keys and LEDs.

ESP32-LyraTD-DSPG

ESP32-LyraTD-SYNA
ESP32-LyraTD-SYNA is one of Espressif’s Audio Development Boards based on the ESP32 MCU and Synaptics’ DSP. It is an Acoustic Echo Cancelation (AEC) solution, supporting voice recognition and voice wake-up. It also supports connection to Amazon’s AVS (Alexa Voice Service), Google's Dialogflow and GVA (Google Voice Assistant).
ESP32-Vaquita-DSPG is an Alexa built-in solution powered by ESP32 and DSP Group’s DBMD5P audio SoC. The board, together with Alexa Voice Service (AVS) for AWS IoT, provides a turnkey solution to easily create Alexa built-in IoT devices featuring voice enablement and AWS IoT cloud connectivity.

ESP32-Vaquita-DSPG

ESP32-Korvo
ESP32-Korvo is an ESP32-based audio development board with a microphone array. Together with Espressif's speech recognition SDK, ESP-Skainet, ESP32-Korvo is suitable for far-field speech recognition applications that need to achieve low power consumption. ESP32-Korvo is composed of two boards: the main board contains the ESP32-WROVER-E module, a power port, a micro-SD-card slot, earphone and speaker connectors; the sub-board contains a microphone array, function buttons and LEDs.
ESP32-Korvo-DU1906 is an Espressif audio development board with an ESP32-DU1906 module at its core. This board is designed to provide not only advanced end-to-end audio solutions with highly efficient AI capabilities, but also a Cloud + End integrated device-level AIoT platform, which significantly lowers the barrier to building AI capabilities in IoT devices.

ESP32-Korvo-DU1906

ESP32-S2-Kaluga-1
(Audio extension board: ESP-LyraT-8311A)
ESP32-S2-Kaluga-1 is based on ESP32-S2, and has various features, such as an LCD screen display, touch panel control, camera image acquisition, audio playback, etc. It can be flexibly assembled and disassembled, thus fulfilling a variety of customized requirements.
Boards | Related Module | Supported SDKs | DSP (Digital Signal Processing) Chip | Flash / PSRAM | Interfaces | UI |
---|---|---|---|---|---|---|
ESP32-LyraT | ESP32-WROVER-E | ESP-ADF ESP-VA-SDK ESP-Skainet |
N/A | 4 MB Flash + 8 MB PSRAM | I2S, I2C, JTAG, USB, UART, MicroSD Slot, Audio Output, Speaker Output | Buttons, Function Keys, LEDs, Microphones |
ESP32-LyraT-Mini | ESP32-WROVER-E | ESP-ADF ESP-VA-SDK ESP-Skainet |
N/A | 4 MB Flash + 8 MB PSRAM | I2S, I2C, JTAG, USB, UART, MicroSD Slot, Audio Output, Speaker Output | Buttons, Function Keys, LEDs, Microphones |
ESP32-LyraTD-MSC | ESP32-WROVER-E | ESP-ADF ESP-VA-SDK ESP-Skainet |
Microsemi’s Zl38063 Chip | 4 MB Flash + 8 MB PSRAM | I2S, I2C, SPI, JTAG, USB, UART, MicroSD Slot, Audio Output, Speaker Output | Buttons, Function Keys, LEDs, Microphones |
ESP32-LyraTD-DSPG | ESP32-WROVER-B | ESP-VA-SDK | DSP Group’s DBMB5P Chip | 16 MB Flash + 8 MB PSRAM | I2S, I2C, JTAG, USB, UART, USB, Earphone Connector, Speaker Connector, FPC Connector, Mini Din Connector | Buttons, LEDs, Microphones |
ESP32-LyraTD-SYNA | ESP32-WROVER-E | ESP-VA-SDK | Synaptics’s Cx20921 Chip | 16 MB Flash + 8 MB PSRAM | I2S, I2C, SPI, JTAG, USB, UART, FPC Connector, Earphone Jack, Speaker Output | Buttons, LEDs, Microphones |
ESP32-Vaquita-DSPG | ESP32-WROVER-E | ESP-VA-SDK | DSP Group’s DBMB5P Chip | 16 MB Flash + 8 MB PSRAM | I2S, I2C, JTAG, USB, UART, Speaker Connector, Earphone Connector, FPC Connector | Buttons, Function Keys, LEDs, Microphones |
ESP32-Korvo | ESP32-WROVER-E | ESP-Skainet | N/A | 16 MB Flash + 8 MB PSRAM | I2S, I2C, JTAG, USB, UART, Micro SD Card, Speaker Connector, Earphone Connector, FPC Connector | Function Buttons, LED, Analog Microphone |
ESP32-Korvo-DU1906 | ESP32-WROVER-E | ESP-ADF | Baidu’s DU1906 Chip | 8 MB Flash + 8 MB PSRAM | I2S, I2C, JTAG, USB, UART, MicroSD Slot, LCD Connector, Speaker Connector, Earphone Jacks, Battery Connector | Function Buttons, Microphone Array, LEDs |
ESP32-S2-Kaluga-1 (Audio extension board: ESP-LyraT-8311A) | ESP32-S2-WROVER | ESP-ADF | N/A | 4 MB Flash + 2 MB PSRAM | I2S, I2C, JTAG, USB, UART, LCD FPC Connector, Camera Header, Extension Header, Touch FPC Connector, Battery Port, ESP Prog Connector | Buttons, Function Keys, LED, Touch, LCD Screen, Speaker, Microphones, Camera |
