User Documentation

1.1 Getting Started

Features

Real-time pitch estimation using SWIPE
Real-time loudness using K-weighted RMS
Voice-quality analysis (HNR, Jitter, Shimmer, CPP/CPPS) via Praat-Parselmouth
Silero-based VAD for speech/sustained segmentation
30 HZ UI updates with smooth motion
Automatic calibration (background noise, sustained vowel, running speech)
CSV logging of key per-frame metrics
Modular interfaces for custom algorithms and robotic extensions

Use Cases

HRI experiments (CARs / SARs)
Speech therapy biofeedback
Assistive voice-based systems
Pitch/loudness control research

System Requirements

OS: Ubuntu 22.04 LTS (tested). Debian-based distros also work.
Unity: 6000.0.58f2 (Make sure to use the exact Unity version)
Python: 3.12
CPU: ≥ 4-6 cores recommended
Microphone: Any 16 kHz mono-capable USB or built-in mic

Setup Overview

You will:

Install Unity Editor (6000.0.58f2) through Unity Hub
Create a Python virtual environment
Install Python dependencies
Open the Unity Project
Point Unity to your system Python executable
Perform calibration
Set up the experiment
Run real-time analysis

Folder Overview

Assets/
|─ Calibration Files/    # User-specific calibration JSON + WAV Recordings
|─ Logs/                 # Per-session CSV exports
|─ Scripts/
|  |─ Python/            # Python signal processing & calibration
|  |─ Unity/              
|     |─ Core/           # Core Unity controller
|     |─ Interfaces/     # Abstractions for capture, analysis, alert, and feedback
|     |─ UI/             # Visual components
|─ TrainingJSON/         # User-specific session settings (target goals)

1.2 Installation Guide

Install Unity

Download Unity Hub
Install Unity Editor 6000.0.58f2
Clone or unzip the IVoice repository
Open it from Unity Hub

Set Up Python

Open a terminal and change directory to your desired location.
Create a virtual environment and install the dependencies

python3 -m venv venv
source venv/bin/activate

pip install numpy torch torchaudio torchcodec soundfile pandas pysptk pydub praat-parselmouth

Make sure ffmpeg is installed on your machine(for pydub)

sudo apt install ffmpeg

Link Python to Unity

Open the Unity Project via Unity Hub
In Unity hierarchy, find the Settings
In the inspector, find the PythonPath.cs component, and set the path to:

<project>/venv/bin/python

To find the path to your virtual environment Python executable, use:

which python

Validate Installation

Press Play in Unity
Console should show Python engine startup
The microphone gets initialized, and you can calibrate your voice with no issues
UI shows idle streaming

1.3 Running the Application

Start Unity

Before running Unity, you successfully linked Python to Unity as described above
Make sure your microphone is connected and works properly
Press Play
Python engine boots in the background, and you get no errors in the console
You get directed to the calibration step, and you can successfully calibrate your voice
Set up your session goals in the setup UI
Once the settings are set up, press the Record button to start a training session.
Unity streams audio to Python and Python returns per-frame metrics. UI shows real-time feedback

Interpreting UI

Stack Bars: Instant pitch/loudness classification (Below/Target/Above)
Graphs: Smoothed trends over time.
Quality Rings: HNR, Jitter, Shimmer, CPP/CPPS (Red=Below Target; Yellow = Close to Target; Green = Within Target Range)
Time-in-Target: Cumulative seconds meeting desired behavior.

1.4 Calibration Guide

Calibration Steps

Background Noise (5s): Stay silent
Sustained Vowel /a/ (5s): Produce a steady voice
Sentence: (Running Speech): Read/Speak a natural sentence

Unity will automatically save:

<project>Assets/Calibration Files/
  background.wav
  sustained.wav
  sentence.wav
  calibration_data.json

Understanding Calibration Values

backgroundThreshold: RMS noise floor
sustainedAvgPitch/sustainedAvgLoudness: Baseline for sustained feedback
sentenceAvgPitch/sentenceAvgLoudness: Baseline for running speech

These baselines are used during "target band" determination for training

1.5 Logging & Data Export

Log Location?

<project>Assets/Logs/

Each session gets its own non-overwriting file.

Columns Include

t,t0 (wall-time,voiced-time)
mode(speech/sustained)
pitch, loudness (absolute values)
rel_pitch_diff, rel_loudness_diff (relative values against baselines)
Ok, jitter_ok, shimmer_ok, hnr_ok, cpp_ok (flags showing pitch detected (voiced), and whether jitter, shimmer, hnr, and cpp are computed)
jitter, shimmer, HNR
CPP, CPPS
avg_rel_pitch, avg_rel_loud (based on average window seconds selected)
avg_jitter, avg_shimmer, avg_HNR (based on average window seconds selected)
avg_CPP, avg_CPPS (based on average window seconds selected)
cumulative in-target time (pitch & loudness)
panel thresholds (thresholds used for each pitch or loudness panel added to the scene)
pitchTargetMin, pitchTargetMax
loudTargetMin, loudTargetMax
targetMaxPhonation, targetInTarget
averageWindowSeconds

1.6 Troubleshooting

Python Engine Not Launching

Wrong Python path provided to Unity for linkage
Missing dependencies
Virtual environment not activated before pip install

No Microphone Detected

OS permissions
No connection between Unity and Python (# Python Engine Not Launching)

Unity Errors

Scripting backend set incorrectly or its parameters are not correctly set (missing proper game object)
Missing packages or packages need to be updated (check Unity's package manager)

High Latency

CPU throttling
Too many background apps
Using a 48 kHz microphone (Unity resamples automatically but adds load)

← Back to IVoice