वाक् अन्वेषणम् • Acoustic Analysis: ti
Original: ति | Equivalents: ति, ತಿ, తి | Files: 9

▼ Waveform (Amplitude vs Time)

Definition: Raw acoustic signal showing amplitude variation over time. Reveals consonant bursts (sharp peaks), voicing patterns (oscillations), and vowel steady-state regions (regular oscillations).
Waveform

▼ Frequency Probability Distribution

Definition: Normalized frequency energy distribution showing the relative power at each frequency. Each curve represents one utterance. The area under the curve integrates to 1 (probability distribution).
  • Spectral Centroid (dotted line): Weighted mean frequency—indicates the "center of mass" of the spectrum. Higher centroid = brighter sound (more high-frequency energy).
  • Spectral Rolloff 95% (dashed line): Frequency containing 95% of the total energy. Captures the high-frequency cutoff of the sound.
Frequency Probability Distribution

▼ Pitch vs Time

Definition: Fundamental frequency (F0) in Hertz. Represents the perceived tone or "musical note" of the voice. Higher Hz = higher pitch.
Pitch vs Time

▼ Power vs Time

Definition: Sound intensity (RMS amplitude) normalized to 0-1. Shows the loudness of the sound over time. Higher values indicate louder portions of the speech.
Power vs Time

▼ Formants vs Time

Definition: Vocal tract resonances in Hertz. F1 (lowest) relates to tongue height, F2 to tongue position (front/back), F3 to lip rounding. Different vowels have characteristic F1/F2 patterns that distinguish them acoustically.
Formants vs Time

▼ Formant Space (F1 vs F2)

Definition: Traditional vowel space visualization. X-axis (F2): front vowels (high F2) ← → back vowels (low F2). Y-axis (F1): close vowels (low F1) ← → open vowels (high F1), inverted for traditional display. Points closer together indicate similar vowel quality.
Formant Space
▶ Interactive Formant Space (Click to Play Audio)

▼ Spectrograms

Definition: Time-frequency representation showing energy distribution across frequencies. The spectrogram displays which frequencies are present in the sound at each moment in time. Brighter colors indicate higher energy at that frequency.
Spectrograms

▼ Findings & Statistics: ti

Acoustic Summary for ti (Overall)
Overall Statistics (n=9 files)

F1 (Formant 1)

Mean ± SD: 693 ± 458 Hz

F2 (Formant 2)

Mean ± SD: 2383 ± 425 Hz

F3 (Formant 3)

Mean ± SD: 3205 ± 341 Hz

Pitch (across all frames)

Range: 75 - 332 Hz

Mean ± SD: 204 ± 69 Hz

Duration

Range: 0.451 - 0.720 s

Mean ± SD: 0.551 ± 0.084 s

By Gender:

Female (n=3)

F1: 643 ± 65 Hz

F2: 2460 ± 97 Hz

F3: 3256 ± 17 Hz

Pitch: 255 ± 22 Hz

Male (n=3)

F1: 601 ± 49 Hz

F2: 2295 ± 151 Hz

F3: 3068 ± 82 Hz

Pitch: 133 ± 23 Hz

Unknown (n=3)

F1: 795 ± 30 Hz

F2: 2385 ± 49 Hz

F3: 3262 ± 51 Hz

Pitch: 224 ± 17 Hz

▼ Observations: ti

Automated Analysis for ti

Programmatically inferred patterns from the acoustic data

Consistency & Variance

Within-group homogeneity and overall variance patterns

• F1 shows substantial variance (CV=0.66), suggesting heterogeneous voice characteristics or synthesis inconsistencies.
• Unknown voices (n=3) show very tight F1 consistency (CV=0.04), suggesting systematic pronunciation.

Gender Differences

Acoustic differences between male and female voices

• F1 gender difference is moderate (43 Hz / 6.9%, d=0.75).
• F2 gender difference is large (165 Hz / 6.9%, d=1.30).

Phonetic Analysis

Formant validation against expected vowel properties

• Formant analysis: F1=693 Hz (mid), F2=2383 Hz (front ✓).

Duration Patterns

Temporal characteristics and uniformity

• Duration variance is moderate (CV=0.15, range 0.451-0.720s).

TTS Quality

Synthesis consistency and prosodic control

• gtts shows lower F1 variance than ggl, suggesting more consistent synthesis quality.
• ggl shows high pitch variance (σ=65 Hz), which may indicate natural-sounding variation or control instability.
• gtts shows excellent pitch stability (σ=17 Hz), indicating superior prosodic control.

Cross-Script Comparison

Acoustic similarity across different scripts

• Kannada and Telugu pronunciations cluster together (distance=76 Hz in F1-F2 space), showing consistent articulation across scripts.
Guide for Interpretation
  • Coefficient of Variation (CV): Ratio of standard deviation to mean. CV < 0.1 indicates tight clustering, CV > 0.2 indicates high variance.
  • Cohen's d: Standardized effect size (difference ÷ pooled SD). Measures how many standard deviations apart two groups are. d < 0.2 is negligible, 0.2-0.5 is small, 0.5-0.8 is medium, d > 0.8 is large.
  • Relative differences: Reported as "absolute Hz / percentage of mean" to show both raw and scaled differences (e.g., 48 Hz / 8.5% means 48 Hz is 8.5% of the average F1 value).
  • Formant validation: Expected ranges based on IPA vowel space: close vowels (F1 < 400 Hz), front vowels (F2 > 2000 Hz), back vowels (F2 < 1000 Hz).
  • Natural speech baselines: Gender differences typically show F1 ~100 Hz, F2 ~150 Hz; TTS may deviate from these.
  • TTS quality indicators: Pitch stability (σ < 25 Hz = excellent), formant consistency (lower variance = better control).