वाक् अन्वेषणम् • Acoustic Analysis: ḍha
Original: ढ | Equivalents: ढ, ಢ, ఢ | Files: 4

▼ Waveform (Amplitude vs Time)

Definition: Raw acoustic signal showing amplitude variation over time. Reveals consonant bursts (sharp peaks), voicing patterns (oscillations), and vowel steady-state regions (regular oscillations).
Waveform

▼ Frequency Probability Distribution

Definition: Normalized frequency energy distribution showing the relative power at each frequency. Each curve represents one utterance. The area under the curve integrates to 1 (probability distribution).
  • Spectral Centroid (dotted line): Weighted mean frequency—indicates the "center of mass" of the spectrum. Higher centroid = brighter sound (more high-frequency energy).
  • Spectral Rolloff 95% (dashed line): Frequency containing 95% of the total energy. Captures the high-frequency cutoff of the sound.
Frequency Probability Distribution

▼ Pitch vs Time

Definition: Fundamental frequency (F0) in Hertz. Represents the perceived tone or "musical note" of the voice. Higher Hz = higher pitch.
Pitch vs Time

▼ Power vs Time

Definition: Sound intensity (RMS amplitude) normalized to 0-1. Shows the loudness of the sound over time. Higher values indicate louder portions of the speech.
Power vs Time

▼ Formants vs Time

Definition: Vocal tract resonances in Hertz. F1 (lowest) relates to tongue height, F2 to tongue position (front/back), F3 to lip rounding. Different vowels have characteristic F1/F2 patterns that distinguish them acoustically.
Formants vs Time

▼ Formant Space (F1 vs F2)

Definition: Traditional vowel space visualization. X-axis (F2): front vowels (high F2) ← → back vowels (low F2). Y-axis (F1): close vowels (low F1) ← → open vowels (high F1), inverted for traditional display. Points closer together indicate similar vowel quality.
Formant Space
▶ Interactive Formant Space (Click to Play Audio)

▼ Spectrograms

Definition: Time-frequency representation showing energy distribution across frequencies. The spectrogram displays which frequencies are present in the sound at each moment in time. Brighter colors indicate higher energy at that frequency.
Spectrograms

▼ Findings & Statistics: ḍha

Acoustic Summary for ḍha (Overall)
Overall Statistics (n=4 files)

F1 (Formant 1)

Mean ± SD: 765 ± 431 Hz

F2 (Formant 2)

Mean ± SD: 1808 ± 589 Hz

F3 (Formant 3)

Mean ± SD: 3101 ± 398 Hz

Pitch (across all frames)

Range: 75 - 254 Hz

Mean ± SD: 164 ± 55 Hz

Duration

Range: 0.505 - 0.661 s

Mean ± SD: 0.582 ± 0.055 s

By Gender:

Female (n=2)

F1: 729 ± 137 Hz

F2: 1760 ± 24 Hz

F3: 3139 ± 40 Hz

Pitch: 214 ± 14 Hz

Male (n=2)

F1: 779 ± 18 Hz

F2: 1847 ± 45 Hz

F3: 3059 ± 60 Hz

Pitch: 118 ± 19 Hz

▼ Observations: ḍha

Automated Analysis for ḍha

Programmatically inferred patterns from the acoustic data

Consistency & Variance

Within-group homogeneity and overall variance patterns

• F1 shows substantial variance (CV=0.56), suggesting heterogeneous voice characteristics or synthesis inconsistencies.
• Female voices (n=2) show higher F1 variance (CV=0.19), possibly indicating quality issues or diverse voice models.
• Male voices (n=2) show very tight F1 consistency (CV=0.02), suggesting systematic pronunciation.

Gender Differences

Acoustic differences between male and female voices

• F1 gender difference is moderate (50 Hz / 6.6%, d=0.51).
• F2 gender difference is large (86 Hz / 4.8%, d=2.39).

Phonetic Analysis

Formant validation against expected vowel properties

• Formant analysis: F1=765 Hz (open), F2=1808 Hz (front-central).

Duration Patterns

Temporal characteristics and uniformity

• Duration shows low variance (CV=0.10, range 0.505-0.661s). TTS temporal control is tight, possibly unnaturally uniform compared to natural speech.

TTS Quality

Synthesis consistency and prosodic control

• ggl shows high pitch variance (σ=51 Hz), which may indicate natural-sounding variation or control instability.
Guide for Interpretation
  • Coefficient of Variation (CV): Ratio of standard deviation to mean. CV < 0.1 indicates tight clustering, CV > 0.2 indicates high variance.
  • Cohen's d: Standardized effect size (difference ÷ pooled SD). Measures how many standard deviations apart two groups are. d < 0.2 is negligible, 0.2-0.5 is small, 0.5-0.8 is medium, d > 0.8 is large.
  • Relative differences: Reported as "absolute Hz / percentage of mean" to show both raw and scaled differences (e.g., 48 Hz / 8.5% means 48 Hz is 8.5% of the average F1 value).
  • Formant validation: Expected ranges based on IPA vowel space: close vowels (F1 < 400 Hz), front vowels (F2 > 2000 Hz), back vowels (F2 < 1000 Hz).
  • Natural speech baselines: Gender differences typically show F1 ~100 Hz, F2 ~150 Hz; TTS may deviate from these.
  • TTS quality indicators: Pitch stability (σ < 25 Hz = excellent), formant consistency (lower variance = better control).