Audio System¶
The pib3 audio system provides unified audio playback, recording, text-to-speech (TTS), and device management capabilities that work seamlessly across both the real robot and Webots simulation.
Overview¶
The audio system consists of several key components:
- Audio Playback: Play raw audio data or WAV files on local speakers, robot speakers, or both
- Audio Recording: Record from local microphone or robot's microphone array
- Text-to-Speech: Synthesize speech using Piper TTS with German and English voices
- Device Management: List and select specific speakers and microphones
- Output/Input Routing: Control where audio plays/records using
AudioOutputandAudioInputenums
Audio Format¶
All audio in the pib3 system uses a standardized format:
- Sample rate: 16000 Hz (16kHz)
- Channels: 1 (Mono)
- Bit depth: 16-bit signed integers (int16)
Quick Start¶
from pib3 import Robot, AudioOutput, AudioInput
import numpy as np
with Robot(host="172.26.34.149") as robot:
# Text-to-speech (German speech model and on the robot device by default)
robot.speak("Hallo! Ich bin pib, dein freundlicher Roboter.")
# Play on local speakers only
robot.speak("Testing local audio", output=AudioOutput.LOCAL)
# Play on both robot and laptop
robot.speak("Playing everywhere!", output=AudioOutput.LOCAL_AND_ROBOT)
# Play a WAV file (on the robot device)
robot.play_file("sound.wav", output=AudioOutput.ROBOT)
# Generate and play a tone
tone = (np.sin(2 * np.pi * 440 * np.linspace(0, 1, 16000)) * 16000).astype(np.int16)
robot.play_audio(tone, output=AudioOutput.ROBOT)
# Record from local microphone - returns numpy int16 array
audio_data = robot.record_audio(duration=5.0, input_source=AudioInput.LOCAL)
# Record from robot microphone
audio_data = robot.record_audio(duration=5.0, input_source=AudioInput.ROBOT)
# The same audio_data format works for playback!
robot.play_audio(audio_data)
# List available devices
for device in robot.get_audio_output_devices():
print(f"Speaker: {device}")
for device in robot.get_audio_input_devices():
print(f"Microphone: {device}")
AudioOutput Enum¶
The AudioOutput enum controls where audio is played:
from pib3 import AudioOutput
# Available options:
AudioOutput.LOCAL # Play on local machine (laptop) speakers only
AudioOutput.ROBOT # Play on robot speakers only
AudioOutput.LOCAL_AND_ROBOT # Play on both simultaneously
Backend-Specific Behavior¶
- Real Robot: Default is
AudioOutput.ROBOT LOCAL: Plays on your laptop/computerROBOT: Sends audio to robot via/audio_playbackROS topic-
LOCAL_AND_ROBOT: Plays on both (best-effort synchronization) -
Webots Simulation: Default is
AudioOutput.LOCAL LOCAL: Plays on your laptop/computerROBOT: Automatically redirects toLOCAL(no robot speakers in simulation)LOCAL_AND_ROBOT: Automatically redirects toLOCAL(no duplication)
AudioInput Enum¶
The AudioInput enum controls where audio is recorded from:
from pib3 import AudioInput
# Available options:
AudioInput.LOCAL # Record from local machine (laptop) microphone
AudioInput.ROBOT # Record from robot's microphone array
Backend-Specific Behavior¶
- Real Robot: Default is
AudioInput.ROBOT LOCAL: Records from your laptop/computer microphone-
ROBOT: Records from robot via/audio_inputROS topic -
Webots Simulation: Default is
AudioInput.LOCAL LOCAL: Records from your laptop/computer microphoneROBOT: Automatically redirects toLOCAL(no robot microphone in simulation)
API Reference¶
Playback Methods¶
play_audio()¶
Play raw audio data.
robot.play_audio(
data: Union[bytes, np.ndarray, List[int]],
sample_rate: int = 16000,
output: Optional[AudioOutput] = None,
block: bool = True
) -> bool
Parameters:
data: Audio data as bytes, numpy array (int16), or list of int16 samplessample_rate: Sample rate in Hz (default: 16000)output: Playback destination (LOCAL, ROBOT, LOCAL_AND_ROBOT). If None, uses backend defaultblock: If True, wait for playback to complete before returning
Returns: True if playback succeeded
Example:
import numpy as np
# Generate a 440Hz tone for 1 second
sample_rate = 16000
t = np.linspace(0, 1.0, sample_rate, dtype=np.float32)
tone = (np.sin(2 * np.pi * 440 * t) * 16000).astype(np.int16)
# Play on robot
robot.play_audio(tone, output=AudioOutput.ROBOT)
# Play on both robot and laptop
robot.play_audio(tone, output=AudioOutput.LOCAL_AND_ROBOT)
play_file()¶
Play audio from a WAV file.
robot.play_file(
filepath: Union[str, Path],
output: Optional[AudioOutput] = None,
block: bool = True
) -> bool
Parameters:
filepath: Path to WAV fileoutput: Playback destination. If None, uses backend defaultblock: If True, wait for playback to complete
Returns: True if playback succeeded
Example:
# Play a sound effect on robot
robot.play_file("sounds/beep.wav", output=AudioOutput.ROBOT)
# Play background music locally while robot works
robot.play_file("music.wav", output=AudioOutput.LOCAL, block=False)
Text-to-Speech¶
speak()¶
Synthesize and play text-to-speech using Piper TTS.
robot.speak(
text: str,
output: Optional[AudioOutput] = None,
voice: Optional[str] = None,
block: bool = True
) -> bool
Parameters:
text: Text to speakoutput: Playback destination. If None, uses backend defaultvoice: Piper voice model name (default:"de_DE-thorsten-high"for German)block: If True, wait for speech to complete
Returns: True if speech was played successfully
Available Voices:
"de_DE-thorsten-high"- German (Thorsten voice, high quality) - Default"en_US-lessac-medium"- English (Lessac voice, medium quality)- Other Piper voices from rhasspy/piper-voices
Voice models are automatically downloaded on first use and cached in ~/.cache/pib3/piper_models/.
Example:
# German (default)
robot.speak("Hallo! Wie geht es dir?")
# English
robot.speak("Hello! How are you?", voice="en_US-lessac-medium")
# Play on both robot and laptop
robot.speak("Wichtige Nachricht!", output=AudioOutput.LOCAL_AND_ROBOT)
Recording Methods¶
record_audio()¶
Record audio for a specified duration.
robot.record_audio(
duration: float,
sample_rate: int = 16000,
input_source: Optional[AudioInput] = None
) -> Optional[np.ndarray]
Parameters:
duration: Recording duration in secondssample_rate: Sample rate in Hz (default: 16000)input_source: Recording source (LOCAL or ROBOT). If None, uses backend default
Returns: Audio data as numpy array of int16 samples, or None if failed
Example:
from pib3 import AudioInput
from pib3.backends.audio import save_audio_file
# Record from local microphone - returns numpy int16 array
audio_data = robot.record_audio(duration=5.0, input_source=AudioInput.LOCAL)
# Record from robot microphone
audio_data = robot.record_audio(duration=5.0, input_source=AudioInput.ROBOT)
# Save recording to file
save_audio_file("recording.wav", audio_data)
# Play back immediately - same format works for playback!
robot.play_audio(audio_data)
Important:
record_audio()requiresAudioInput, notAudioOutput. PassingAudioOutputwill raise aTypeErrorwith a helpful message.
record_to_file()¶
Record audio and save directly to a WAV file.
robot.record_to_file(
filepath: Union[str, Path],
duration: float,
sample_rate: int = 16000,
input_source: Optional[AudioInput] = None
) -> bool
Parameters:
filepath: Output file path (WAV format)duration: Recording duration in secondssample_rate: Sample rate in Hz (default: 16000)input_source: Recording source (LOCAL or ROBOT). If None, uses backend default
Returns: True if recording was saved successfully
Example:
# Record from local microphone to file
robot.record_to_file("my_recording.wav", duration=10.0)
# Record from robot microphone
robot.record_to_file("robot_audio.wav", duration=5.0, input_source=AudioInput.ROBOT)
Device Management¶
Note: Device selection only applies to LOCAL audio (laptop speakers/microphones). Robot audio uses fixed ROS topics (
/audio_playbackfor output,/audio_inputfor input) with no device selection.
get_audio_output_devices()¶
List available local audio output (speaker) devices.
Returns: List of AudioDevice objects that support output
Example:
for device in robot.get_audio_output_devices():
print(device) # e.g., "[0] Built-in Speakers (output)"
get_audio_input_devices()¶
List available audio input (microphone) devices.
Returns: List of AudioDevice objects that support input
Example:
set_audio_output_device()¶
Set the audio output device for local playback.
Parameters:
device: Device index (int), name substring (str), AudioDevice object, or None for system default
Example:
# List available devices
devices = robot.get_audio_output_devices()
for d in devices:
print(d)
# Select by index
robot.set_audio_output_device(0)
# Select by name (substring match)
robot.set_audio_output_device("Speakers")
# Select by AudioDevice object
robot.set_audio_output_device(devices[0])
# Reset to system default
robot.set_audio_output_device(None)
set_audio_input_device()¶
Set the audio input device for local recording.
Parameters:
device: Device index (int), name substring (str), AudioDevice object, or None for system default
Example:
# List available devices
devices = robot.get_audio_input_devices()
for d in devices:
print(d)
# Select by name
robot.set_audio_input_device("USB Microphone")
# Reset to system default
robot.set_audio_input_device(None)
get_current_audio_output_device() / get_current_audio_input_device()¶
Get the currently selected device.
robot.get_current_audio_output_device() -> Optional[AudioDevice]
robot.get_current_audio_input_device() -> Optional[AudioDevice]
Returns: Currently selected AudioDevice, or system default if None was set
Legacy Recording API¶
subscribe_audio_stream()¶
Subscribe to audio stream from the robot's microphone array.
Note: For simple recording, prefer
record_audio(). Use this for continuous streaming or custom processing.
Parameters:
callback: Function called with list of int16 audio samples for each chunk (~1024 samples)
Returns: Topic object (call .unsubscribe() to stop streaming)
Example:
from pib3 import AudioStreamReceiver
# Using AudioStreamReceiver helper
receiver = AudioStreamReceiver()
sub = robot.subscribe_audio_stream(receiver.on_audio)
# Record for 10 seconds
import time
time.sleep(10)
sub.unsubscribe()
# Save to WAV file
receiver.save_wav("recording.wav")
Helper Classes¶
AudioDevice¶
Represents an audio device (speaker or microphone).
from pib3 import AudioDevice
# Properties:
device.index # int: Device index for sounddevice
device.name # str: Human-readable device name
device.channels # int: Number of channels supported
device.sample_rate # float: Default sample rate
device.is_input # bool: True if this is an input (microphone) device
device.is_output # bool: True if this is an output (speaker) device
Standalone Device Functions¶
These functions work without a robot connection and can be used for device discovery.
list_audio_devices()¶
List available audio devices with optional filtering.
from pib3 import list_audio_devices
# List all devices
all_devices = list_audio_devices()
# List only input devices (microphones)
inputs = list_audio_devices(input=True, output=False)
# List only output devices (speakers)
outputs = list_audio_devices(input=False, output=True)
Parameters:
| Parameter | Type | Default | Description |
|---|---|---|---|
input |
bool |
True |
Include input (microphone) devices |
output |
bool |
True |
Include output (speaker) devices |
Returns: List[AudioDevice]
Deprecated Functions
The following functions are deprecated and will be removed in a future version:
list_audio_input_devices()→ Uselist_audio_devices(input=True, output=False)list_audio_output_devices()→ Uselist_audio_devices(input=False, output=True)
AudioStreamReceiver¶
Helper class for buffering and saving audio from the robot's microphone.
from pib3 import AudioStreamReceiver
receiver = AudioStreamReceiver(
sample_rate: int = 16000,
max_buffer_seconds: float = 300.0
)
Methods:
on_audio(samples)- Callback for incoming audio (pass tosubscribe_audio_stream())get_audio()- Get buffered audio as numpy array (int16)get_audio_float()- Get buffered audio as normalized float32 in range [-1, 1]clear()- Clear the audio buffersave_wav(filepath)- Save buffered audio to WAV file
Properties:
duration- Duration of buffered audio in secondssample_count- Number of samples in buffer
ROS Topics¶
The audio system uses the following ROS topics for communication with the robot:
/audio_playback¶
Type: std_msgs/msg/Int16MultiArray
Direction: Client → Robot
Purpose: Send audio data to robot for playback through speakers
/audio_input¶
Type: std_msgs/msg/Int16MultiArray
Direction: Robot → Client
Purpose: Stream audio from robot's microphone array to client
/audio_stream¶
Type: std_msgs/msg/Int16MultiArray
Direction: Robot → Client
Purpose: Legacy topic for audio streaming (use /audio_input for new code)
Installation Requirements¶
Audio functionality is included in the core pib3 package:
Platform-Specific Requirements¶
Linux¶
PortAudio is required for audio playback and recording:
macOS¶
No additional system dependencies required:
Windows¶
No additional system dependencies required:
Advanced Usage¶
Custom Audio Processing¶
import numpy as np
from pib3 import Robot, AudioOutput
with Robot(host="172.26.34.149") as robot:
# Generate custom waveform
sample_rate = 16000
duration = 2.0
t = np.linspace(0, duration, int(sample_rate * duration))
# Frequency sweep from 200Hz to 800Hz
freq = np.linspace(200, 800, len(t))
audio = (np.sin(2 * np.pi * freq * t) * 16000).astype(np.int16)
# Apply fade in/out
fade_samples = int(0.1 * sample_rate)
audio[:fade_samples] *= np.linspace(0, 1, fade_samples)
audio[-fade_samples:] *= np.linspace(1, 0, fade_samples)
robot.play_audio(audio, output=AudioOutput.ROBOT)
Recording and Playback Round-Trip¶
from pib3 import Robot, AudioOutput, AudioInput
with Robot(host="172.26.34.149") as robot:
# Record from robot microphone
robot.speak("Ich höre jetzt zu. Bitte sprechen Sie.")
audio_data = robot.record_audio(duration=5.0, input_source=AudioInput.ROBOT)
# Play back the recording - same format works for playback!
robot.speak("Das haben Sie gesagt:")
robot.play_audio(audio_data, output=AudioOutput.ROBOT)
Device Selection Example¶
from pib3 import Robot
with Robot(host="172.26.34.149") as robot:
# List and select output device
print("Available speakers:")
for device in robot.get_audio_output_devices():
print(f" {device}")
robot.set_audio_output_device("Headphones") # Select by name
# List and select input device
print("\nAvailable microphones:")
for device in robot.get_audio_input_devices():
print(f" {device}")
robot.set_audio_input_device(0) # Select by index
# Now record and play with selected devices
audio_data = robot.record_audio(duration=3.0, input_source=AudioInput.LOCAL)
robot.play_audio(audio_data, output=AudioOutput.LOCAL) # Same format!
Complete Example¶
Here's a complete example demonstrating the main audio features:
from pib3 import Robot, AudioOutput, AudioInput
import numpy as np
with Robot(host="172.26.34.149") as robot:
# 1. Text-to-speech (German by default)
robot.speak("Hallo, ich bin pib!")
# 2. Play on different outputs
robot.speak("On robot speakers", output=AudioOutput.ROBOT)
robot.speak("On local speakers", output=AudioOutput.LOCAL)
robot.speak("On both!", output=AudioOutput.LOCAL_AND_ROBOT)
# 3. Play a WAV file
robot.play_file("notification.wav", output=AudioOutput.ROBOT)
# 4. Generate and play a tone (440Hz for 1 second)
sample_rate = 16000
duration = 1.0
t = np.linspace(0, duration, int(sample_rate * duration))
tone = (np.sin(2 * np.pi * 440 * t) * 16000).astype(np.int16)
robot.play_audio(tone, output=AudioOutput.LOCAL)
# 5. Record from microphone
print("Recording for 3 seconds...")
audio_data = robot.record_audio(duration=3.0, input_source=AudioInput.LOCAL)
print(f"Recorded {len(audio_data)} samples")
# 6. Play back recording
robot.play_audio(audio_data, output=AudioOutput.LOCAL)
# 7. List available devices
print("Output devices:")
for device in robot.get_audio_output_devices():
print(f" {device}")
print("Input devices:")
for device in robot.get_audio_input_devices():
print(f" {device}")
Troubleshooting¶
No audio playback or recording¶
Problem: sounddevice not available warning
Solution:
# Linux
sudo apt-get install libportaudio2 portaudio19-dev
pip install --force-reinstall pib3
# macOS/Windows
pip install --force-reinstall pib3
If you see: "OSError: PortAudio library not found" during import pib3 or when running audio code
- This means the Python
sounddevicepackage is installed but the native PortAudio library is missing or incompatible on the system. - Fix on Debian/Ubuntu:
sudo apt-get install libportaudio2 portaudio19-dev
pip install --upgrade --force-reinstall sounddevice
After installing PortAudio, re-installing pib3 in the virtualenv may be necessary.
TTS not working¶
Problem: piper-tts is required for text-to-speech
Solution:
Voice models are downloaded automatically on first use (~20-50MB per voice).
Audio not playing on robot¶
Problem: Audio plays locally but not on robot
Checklist:
1. Verify robot connection: robot.is_connected should be True
2. Check output setting: Use output=AudioOutput.ROBOT
3. Verify rosbridge is running on robot
4. Check robot's audio system is functional
Recording returns empty or None¶
Problem: record_audio() returns empty array or None
Checklist:
1. Check if input device is available: robot.get_audio_input_devices()
2. Verify microphone permissions (especially on macOS)
3. Try selecting a specific device: robot.set_audio_input_device(0)
4. For robot recording, ensure /audio_input topic is publishing
Audio quality issues¶
Problem: Distorted or choppy audio
Solutions: - Ensure audio is 16kHz, mono, int16 format - Check network latency (for robot playback/recording) - Verify sample rate matches (16000 Hz)
See Also¶
- Backends API - Base backend interface
- Robot Backend - Real robot backend
- Webots Backend - Simulation backend