1.1 Getting Started
Features
- Real-time pitch estimation using SWIPE
- Real-time loudness using K-weighted RMS
- Voice-quality analysis (HNR, Jitter, Shimmer, CPP/CPPS) via Praat-Parselmouth
- Silero-based VAD for speech/sustained segmentation
- 30 HZ UI updates with smooth motion
- Automatic calibration (background noise, sustained vowel, running speech)
- CSV logging of key per-frame metrics
- Modular interfaces for custom algorithms and robotic extensions
Use Cases
- HRI experiments (CARs / SARs)
- Speech therapy biofeedback
- Assistive voice-based systems
- Pitch/loudness control research
System Requirements
- OS: Ubuntu 22.04 LTS (tested). Debian-based distros also work.
- Unity: 6000.0.58f2 (Make sure to use the exact Unity version)
- Python: 3.12
- CPU: ≥ 4-6 cores recommended
- Microphone: Any 16 kHz mono-capable USB or built-in mic
Setup Overview
You will:
- Install Unity Editor (6000.0.58f2) through Unity Hub
- Create a Python virtual environment
- Install Python dependencies
- Open the Unity Project
- Point Unity to your system Python executable
- Perform calibration
- Set up the experiment
- Run real-time analysis
Folder Overview
Assets/
|─ Calibration Files/ # User-specific calibration JSON + WAV Recordings
|─ Logs/ # Per-session CSV exports
|─ Scripts/
| |─ Python/ # Python signal processing & calibration
| |─ Unity/
| |─ Core/ # Core Unity controller
| |─ Interfaces/ # Abstractions for capture, analysis, alert, and feedback
| |─ UI/ # Visual components
|─ TrainingJSON/ # User-specific session settings (target goals)
1.2 Installation Guide
Install Unity
- Download Unity Hub
- Install Unity Editor 6000.0.58f2
- Clone or unzip the IVoice repository
- Open it from Unity Hub
Set Up Python
- Open a terminal and change directory to your desired location.
- Create a virtual environment and install the dependencies
python3 -m venv venv
source venv/bin/activate
pip install numpy torch torchaudio torchcodec soundfile pandas pysptk pydub praat-parselmouth
- Make sure ffmpeg is installed on your machine(for pydub)
sudo apt install ffmpeg
Link Python to Unity
- Open the Unity Project via Unity Hub
- In Unity hierarchy, find the Settings
- In the inspector, find the PythonPath.cs component, and set the path to:
/venv/bin/python
- To find the path to your virtual environment Python executable, use:
which python
Validate Installation
- Press Play in Unity
- Console should show Python engine startup
- The microphone gets initialized, and you can calibrate your voice with no issues
- UI shows idle streaming
1.3 Running the Application
Start Unity
- Before running Unity, you successfully linked Python to Unity as described above
- Make sure your microphone is connected and works properly
- Press Play
- Python engine boots in the background, and you get no errors in the console
- You get directed to the calibration step, and you can successfully calibrate your voice
- Set up your session goals in the setup UI
- Once the settings are set up, press the Record button to start a training session.
- Unity streams audio to Python and Python returns per-frame metrics. UI shows real-time feedback
Interpreting UI
- Stack Bars: Instant pitch/loudness classification (Below/Target/Above)
- Graphs: Smoothed trends over time.
- Quality Rings: HNR, Jitter, Shimmer, CPP/CPPS (Red=Below Target; Yellow = Close to Target; Green = Within Target Range)
- Time-in-Target: Cumulative seconds meeting desired behavior.
1.4 Calibration Guide
Calibration Steps
- Background Noise (5s): Stay silent
- Sustained Vowel /a/ (5s): Produce a steady voice
- Sentence: (Running Speech): Read/Speak a natural sentence
Unity will automatically save:
Assets/Calibration Files/
background.wav
sustained.wav
sentence.wav
calibration_data.json
Understanding Calibration Values
- backgroundThreshold: RMS noise floor
- sustainedAvgPitch/sustainedAvgLoudness: Baseline for sustained feedback
- sentenceAvgPitch/sentenceAvgLoudness: Baseline for running speech
These baselines are used during "target band" determination for training
1.5 Logging & Data Export
Log Location?
Assets/Logs/
Each session gets its own non-overwriting file.
Columns Include
- t,t0 (wall-time,voiced-time)
- mode(speech/sustained)
- pitch, loudness (absolute values)
- rel_pitch_diff, rel_loudness_diff (relative values against baselines)
- Ok, jitter_ok, shimmer_ok, hnr_ok, cpp_ok (flags showing pitch detected (voiced), and whether jitter, shimmer, hnr, and cpp are computed)
- jitter, shimmer, HNR
- CPP, CPPS
- avg_rel_pitch, avg_rel_loud (based on average window seconds selected)
- avg_jitter, avg_shimmer, avg_HNR (based on average window seconds selected)
- avg_CPP, avg_CPPS (based on average window seconds selected)
- cumulative in-target time (pitch & loudness)
- panel thresholds (thresholds used for each pitch or loudness panel added to the scene)
- pitchTargetMin, pitchTargetMax
- loudTargetMin, loudTargetMax
- targetMaxPhonation, targetInTarget
- averageWindowSeconds
1.6 Troubleshooting
Python Engine Not Launching
- Wrong Python path provided to Unity for linkage
- Missing dependencies
- Virtual environment not activated before pip install
No Microphone Detected
- OS permissions
- No connection between Unity and Python (# Python Engine Not Launching)
Unity Errors
- Scripting backend set incorrectly or its parameters are not correctly set (missing proper game object)
- Missing packages or packages need to be updated (check Unity's package manager)
High Latency
- CPU throttling
- Too many background apps
- Using a 48 kHz microphone (Unity resamples automatically but adds load)