Biocomputing Research Lab (Singh Lab)

Lab banner

Technical Documentation

2.1 System Architecture Overview

IVoice follows a modular dual-process architecture:

Unity (Controller, UI, Feedback, Logging)
  |─ Captures live microphone audio
  |─ Streams PCM frames to Python
  |─ Receives real-time metrics
  |─ Computes deviations + panel stats
  |─ Runs alert policy
  |─ Renders adaptive feedback

Python (Calibration, Estimation)
  |─ Parses incoming audio frames
  |─ Performs pitch/loudness estimation
  |─ Performs VAD gating
  |─ Computes jitter/shimmer/HNR/CPP/CPPS
  |─ Emits one JSON result per frame

Communication is unidirectional:

  • Unity -> Python: Binary 30ms PCM frames
  • Python -> Unity: Per-frame JSON with metrics

All components in Unity are wired through well-defined interfaces inside Abstractions.cs, allowing developers to replace internal modules without touching the rest of the system.

2.2 Unity Side-Architecture

Unity implements real-time streaming, calibration, visualization, and alert policy. Key components live in:

Assets/Scripts/Unity/
        |─ Core/         # Voice capturing, streaming, policy determination, logging
        |─ Interfaces/   # Core abstractions
        |─ UI/           # Visualization

Core/

2.2.1.1 RealTimeVoiceAnalyzer.cs (Primary Controller)

Responsibilities:

  • Initializes IVoiceCapture
  • Forwards each VoiceFrame -> IVoiceAnalysisEngine
  • Receives AnalysisFrame events
  • Computes:
    • relative deviations
    • cumulative time-in-target
    • averaged metrics
    • panel-level Inside/Outside ratios
  • Invokes the current IAlertPolicy
  • Dispatches resulting alerts to IFeedbackPresenter
  • Emits logs to IRunLogger

This is the central orchestrator of the Unity pipeline.

2.2.1.2 VoiceCalibration.cs

Responsibilities:

  • Recording of three modes:
    • background noise
    • sustained vowel
    • sentence
  • Running Python calibration_analysis.py script
  • Producing:
    • WAV files (raw + clean)
    • calibration_data.json used by the engine

Calibration ensures that pitch/loudness deviations reflect user-specific norms.

2.2.1.3 VoiceSessionLogger.cs

Writes per-frame metrics

Interfaces/

2.2.2.1 Abstractions.cs

The file defines:

  • IVoiceCapture

    Abstraction of microphone input:

    • Emits VoiceFrame (float[] samples, ~30 ms)
    • Guarantees stready rate and mono 16 kHz output
    • RealTimeVoiceAnalyzer uses only this interface

    → Developers can plug in USB mic, WebRTC capture, prerecorded clips, sythetic sources, etc.

  • IVoiceAnalysisEngine

    Backend-agnostic estimator:

    • Accepts VoiceFrame
    • Returns AnalysisFrame
    • Default implementation uses real_time_stream.py

    → Developers can replace the Python engine with C++, TensorFlow, cloud APIs, or neural models.

  • ICalibrationProcessor

    Defines calibration steps:

    • Takes AudioClip
    • Produces CalibrationData (JSON + WAV)

    → Current implementation wraps calibration_analysis.py

  • IAlertPolicy

    Encapsulates decision logic:

    • Consumes panel-level statistics
    • Outputs: {None, TooHigh, TooLow, Praise}

    → Developers can insert RL-based policies, personalized adaptation, robot-friendly timing, etc.

  • IFeedbackPresenter

    Defines how alerts appear:

    • Shows high/low/praise states

    → Default is Unity UI; can be replaced with robot gestures.

  • IRunLogger

    For experiment data logging:

    • CSV rows for every AnalysisFrame

    → Supports reproducibility & ML training datasets.

2.3 Python-Side Architecture

Performs all the signal processing and heavy feature extraction.

Assets/Scripts/Python/

2.3.1 real_time_stream.py

Handles:

  • Binary packet parsing
  • Silero VAD gating
  • Low-latency SWIPE pitch detection
  • K-weighted loudness estimation
  • HNR/Jitter/Shimmer
  • CPP/CPPS via PowerCepstrogram
  • JSON output (one per frame)
  • Also uses metric throttling:
    • HNR/Jitter/Shimmer: every 150 ms
    • CPP/CPPS: every 450 ms

2.3.2 calibration_analysis.py

Provides:

  • background noise estimation
  • sustained vowel trimming
  • sentence-level VAD via Silero multi-pass
  • SWIPE + loudness over calibrated segments
  • Used to populate calibration_data.json
← Back to IVoice