Biocomputing Research Lab (Singh Lab)

Lab banner

User Documentation

1.1 Getting Started

Features

  • Real-time pitch estimation using SWIPE
  • Real-time loudness using K-weighted RMS
  • Voice-quality analysis (HNR, Jitter, Shimmer, CPP/CPPS) via Praat-Parselmouth
  • Silero-based VAD for speech/sustained segmentation
  • 30 HZ UI updates with smooth motion
  • Automatic calibration (background noise, sustained vowel, running speech)
  • CSV logging of key per-frame metrics
  • Modular interfaces for custom algorithms and robotic extensions

Use Cases

  • HRI experiments (CARs / SARs)
  • Speech therapy biofeedback
  • Assistive voice-based systems
  • Pitch/loudness control research

System Requirements

  • OS: Ubuntu 22.04 LTS (tested). Debian-based distros also work.
  • Unity: 6000.0.58f2 (Make sure to use the exact Unity version)
  • Python: 3.12
  • CPU: ≥ 4-6 cores recommended
  • Microphone: Any 16 kHz mono-capable USB or built-in mic

Setup Overview

You will:

  1. Install Unity Editor (6000.0.58f2) through Unity Hub
  2. Create a Python virtual environment
  3. Install Python dependencies
  4. Open the Unity Project
  5. Point Unity to your system Python executable
  6. Perform calibration
  7. Set up the experiment
  8. Run real-time analysis

Folder Overview

Assets/
|─ Calibration Files/    # User-specific calibration JSON + WAV Recordings
|─ Logs/                 # Per-session CSV exports
|─ Scripts/
|  |─ Python/            # Python signal processing & calibration
|  |─ Unity/              
|     |─ Core/           # Core Unity controller
|     |─ Interfaces/     # Abstractions for capture, analysis, alert, and feedback
|     |─ UI/             # Visual components
|─ TrainingJSON/         # User-specific session settings (target goals)

1.2 Installation Guide

Install Unity

  1. Download Unity Hub
  2. Install Unity Editor 6000.0.58f2
  3. Clone or unzip the IVoice repository
  4. Open it from Unity Hub

Set Up Python

  1. Open a terminal and change directory to your desired location.
  2. Create a virtual environment and install the dependencies
python3 -m venv venv
source venv/bin/activate

pip install numpy torch torchaudio torchcodec soundfile pandas pysptk pydub praat-parselmouth
  1. Make sure ffmpeg is installed on your machine(for pydub)
sudo apt install ffmpeg

Link Python to Unity

  1. Open the Unity Project via Unity Hub
  2. In Unity hierarchy, find the Settings
  3. In the inspector, find the PythonPath.cs component, and set the path to:
<project>/venv/bin/python
  1. To find the path to your virtual environment Python executable, use:
which python

Validate Installation

  1. Press Play in Unity
  2. Console should show Python engine startup
  3. The microphone gets initialized, and you can calibrate your voice with no issues
  4. UI shows idle streaming

1.3 Running the Application

Start Unity

  1. Before running Unity, you successfully linked Python to Unity as described above
  2. Make sure your microphone is connected and works properly
  3. Press Play
  4. Python engine boots in the background, and you get no errors in the console
  5. You get directed to the calibration step, and you can successfully calibrate your voice
  6. Set up your session goals in the setup UI
  7. Once the settings are set up, press the Record button to start a training session.
  8. Unity streams audio to Python and Python returns per-frame metrics. UI shows real-time feedback

Interpreting UI

  • Stack Bars: Instant pitch/loudness classification (Below/Target/Above)
  • Graphs: Smoothed trends over time.
  • Quality Rings: HNR, Jitter, Shimmer, CPP/CPPS (Red=Below Target; Yellow = Close to Target; Green = Within Target Range)
  • Time-in-Target: Cumulative seconds meeting desired behavior.

1.4 Calibration Guide

Calibration Steps

  1. Background Noise (5s): Stay silent
  2. Sustained Vowel /a/ (5s): Produce a steady voice
  3. Sentence: (Running Speech): Read/Speak a natural sentence

Unity will automatically save:

<project>Assets/Calibration Files/
  background.wav
  sustained.wav
  sentence.wav
  calibration_data.json

Understanding Calibration Values

  1. backgroundThreshold: RMS noise floor
  2. sustainedAvgPitch/sustainedAvgLoudness: Baseline for sustained feedback
  3. sentenceAvgPitch/sentenceAvgLoudness: Baseline for running speech

These baselines are used during "target band" determination for training

1.5 Logging & Data Export

Log Location?

<project>Assets/Logs/

Each session gets its own non-overwriting file.

Columns Include

  • t,t0 (wall-time,voiced-time)
  • mode(speech/sustained)
  • pitch, loudness (absolute values)
  • rel_pitch_diff, rel_loudness_diff (relative values against baselines)
  • Ok, jitter_ok, shimmer_ok, hnr_ok, cpp_ok (flags showing pitch detected (voiced), and whether jitter, shimmer, hnr, and cpp are computed)
  • jitter, shimmer, HNR
  • CPP, CPPS
  • avg_rel_pitch, avg_rel_loud (based on average window seconds selected)
  • avg_jitter, avg_shimmer, avg_HNR (based on average window seconds selected)
  • avg_CPP, avg_CPPS (based on average window seconds selected)
  • cumulative in-target time (pitch & loudness)
  • panel thresholds (thresholds used for each pitch or loudness panel added to the scene)
  • pitchTargetMin, pitchTargetMax
  • loudTargetMin, loudTargetMax
  • targetMaxPhonation, targetInTarget
  • averageWindowSeconds

1.6 Troubleshooting

Python Engine Not Launching

  • Wrong Python path provided to Unity for linkage
  • Missing dependencies
  • Virtual environment not activated before pip install

No Microphone Detected

  • OS permissions
  • No connection between Unity and Python (# Python Engine Not Launching)

Unity Errors

  • Scripting backend set incorrectly or its parameters are not correctly set (missing proper game object)
  • Missing packages or packages need to be updated (check Unity's package manager)

High Latency

  • CPU throttling
  • Too many background apps
  • Using a 48 kHz microphone (Unity resamples automatically but adds load)
← Back to IVoice