Acoustic Analysis: ḍha

▼ Waveform (Amplitude vs Time)

Definition: Raw acoustic signal showing amplitude variation over time. Reveals consonant bursts (sharp peaks), voicing patterns (oscillations), and vowel steady-state regions (regular oscillations).

▼ Frequency Probability Distribution

Definition: Normalized frequency energy distribution showing the relative power at each frequency. Each curve represents one utterance. The area under the curve integrates to 1 (probability distribution).

Spectral Centroid (dotted line): Weighted mean frequency—indicates the "center of mass" of the spectrum. Higher centroid = brighter sound (more high-frequency energy).
Spectral Rolloff 95% (dashed line): Frequency containing 95% of the total energy. Captures the high-frequency cutoff of the sound.

▼ Pitch vs Time

Definition: Fundamental frequency (F0) in Hertz. Represents the perceived tone or "musical note" of the voice. Higher Hz = higher pitch.

▼ Power vs Time

Definition: Sound intensity (RMS amplitude) normalized to 0-1. Shows the loudness of the sound over time. Higher values indicate louder portions of the speech.

▼ Formants vs Time

Definition: Vocal tract resonances in Hertz. F1 (lowest) relates to tongue height, F2 to tongue position (front/back), F3 to lip rounding. Different vowels have characteristic F1/F2 patterns that distinguish them acoustically.

▼ Formant Space (F1 vs F2)

Definition: Traditional vowel space visualization. X-axis (F2): front vowels (high F2) ← → back vowels (low F2). Y-axis (F1): close vowels (low F1) ← → open vowels (high F1), inverted for traditional display. Points closer together indicate similar vowel quality.

▶ Interactive Formant Space (Click to Play Audio)

▼ Spectrograms

Definition: Time-frequency representation showing energy distribution across frequencies. The spectrogram displays which frequencies are present in the sound at each moment in time. Brighter colors indicate higher energy at that frequency.

▼ Findings & Statistics: ḍha

Acoustic Summary for ḍha (Overall)

Overall Statistics (n=4 files)

F1 (Formant 1)

Mean ± SD: 765 ± 431 Hz

F2 (Formant 2)

Mean ± SD: 1808 ± 589 Hz

F3 (Formant 3)

Mean ± SD: 3101 ± 398 Hz

Pitch (across all frames)

Range: 75 - 254 Hz

Mean ± SD: 164 ± 55 Hz

Duration

Range: 0.505 - 0.661 s

Mean ± SD: 0.582 ± 0.055 s

By Gender:

Female (n=2)

F1: 729 ± 137 Hz

F2: 1760 ± 24 Hz

F3: 3139 ± 40 Hz

Pitch: 214 ± 14 Hz

Male (n=2)

F1: 779 ± 18 Hz

F2: 1847 ± 45 Hz

F3: 3059 ± 60 Hz

Pitch: 118 ± 19 Hz

▼ Observations: ḍha

Automated Analysis for ḍha

Programmatically inferred patterns from the acoustic data

Consistency & Variance

Within-group homogeneity and overall variance patterns

• F1 shows substantial variance (CV=0.56), suggesting heterogeneous voice characteristics or synthesis inconsistencies.

• Female voices (n=2) show higher F1 variance (CV=0.19), possibly indicating quality issues or diverse voice models.

• Male voices (n=2) show very tight F1 consistency (CV=0.02), suggesting systematic pronunciation.

Gender Differences

Acoustic differences between male and female voices

• F1 gender difference is moderate (50 Hz / 6.6%, d=0.51).

• F2 gender difference is large (86 Hz / 4.8%, d=2.39).

Phonetic Analysis

Formant validation against expected vowel properties

• Formant analysis: F1=765 Hz (open), F2=1808 Hz (front-central).

Duration Patterns

Temporal characteristics and uniformity

• Duration shows low variance (CV=0.10, range 0.505-0.661s). TTS temporal control is tight, possibly unnaturally uniform compared to natural speech.

TTS Quality

Synthesis consistency and prosodic control

• ggl shows high pitch variance (σ=51 Hz), which may indicate natural-sounding variation or control instability.

Guide for Interpretation

Coefficient of Variation (CV): Ratio of standard deviation to mean. CV < 0.1 indicates tight clustering, CV > 0.2 indicates high variance.
Cohen's d: Standardized effect size (difference ÷ pooled SD). Measures how many standard deviations apart two groups are. d < 0.2 is negligible, 0.2-0.5 is small, 0.5-0.8 is medium, d > 0.8 is large.
Relative differences: Reported as "absolute Hz / percentage of mean" to show both raw and scaled differences (e.g., 48 Hz / 8.5% means 48 Hz is 8.5% of the average F1 value).
Formant validation: Expected ranges based on IPA vowel space: close vowels (F1 < 400 Hz), front vowels (F2 > 2000 Hz), back vowels (F2 < 1000 Hz).
Natural speech baselines: Gender differences typically show F1 ~100 Hz, F2 ~150 Hz; TTS may deviate from these.
TTS quality indicators: Pitch stability (σ < 25 Hz = excellent), formant consistency (lower variance = better control).