वाक् अन्वेषणम् • Acoustic Analysis: ca
Original: च | Equivalents: च, ಚ, చ | Files: 4

▼ Waveform (Amplitude vs Time)

Definition: Raw acoustic signal showing amplitude variation over time. Reveals consonant bursts (sharp peaks), voicing patterns (oscillations), and vowel steady-state regions (regular oscillations).
Waveform

▼ Frequency Probability Distribution

Definition: Normalized frequency energy distribution showing the relative power at each frequency. Each curve represents one utterance. The area under the curve integrates to 1 (probability distribution).
  • Spectral Centroid (dotted line): Weighted mean frequency—indicates the "center of mass" of the spectrum. Higher centroid = brighter sound (more high-frequency energy).
  • Spectral Rolloff 95% (dashed line): Frequency containing 95% of the total energy. Captures the high-frequency cutoff of the sound.
Frequency Probability Distribution

▼ Pitch vs Time

Definition: Fundamental frequency (F0) in Hertz. Represents the perceived tone or "musical note" of the voice. Higher Hz = higher pitch.
Pitch vs Time

▼ Power vs Time

Definition: Sound intensity (RMS amplitude) normalized to 0-1. Shows the loudness of the sound over time. Higher values indicate louder portions of the speech.
Power vs Time

▼ Formants vs Time

Definition: Vocal tract resonances in Hertz. F1 (lowest) relates to tongue height, F2 to tongue position (front/back), F3 to lip rounding. Different vowels have characteristic F1/F2 patterns that distinguish them acoustically.
Formants vs Time

▼ Formant Space (F1 vs F2)

Definition: Traditional vowel space visualization. X-axis (F2): front vowels (high F2) ← → back vowels (low F2). Y-axis (F1): close vowels (low F1) ← → open vowels (high F1), inverted for traditional display. Points closer together indicate similar vowel quality.
Formant Space
▶ Interactive Formant Space (Click to Play Audio)

▼ Spectrograms

Definition: Time-frequency representation showing energy distribution across frequencies. The spectrogram displays which frequencies are present in the sound at each moment in time. Brighter colors indicate higher energy at that frequency.
Spectrograms

▼ Findings & Statistics: ca

Acoustic Summary for ca (Overall)
Overall Statistics (n=4 files)

F1 (Formant 1)

Mean ± SD: 929 ± 427 Hz

F2 (Formant 2)

Mean ± SD: 1968 ± 546 Hz

F3 (Formant 3)

Mean ± SD: 3214 ± 373 Hz

Pitch (across all frames)

Range: 75 - 287 Hz

Mean ± SD: 177 ± 64 Hz

Duration

Range: 0.476 - 0.581 s

Mean ± SD: 0.531 ± 0.037 s

By Gender:

Female (n=2)

F1: 962 ± 52 Hz

F2: 1983 ± 22 Hz

F3: 3257 ± 90 Hz

Pitch: 232 ± 23 Hz

Male (n=2)

F1: 901 ± 105 Hz

F2: 1957 ± 79 Hz

F3: 3171 ± 24 Hz

Pitch: 127 ± 21 Hz

▼ Observations: ca

Automated Analysis for ca

Programmatically inferred patterns from the acoustic data

Consistency & Variance

Within-group homogeneity and overall variance patterns

• F1 shows substantial variance (CV=0.46), suggesting heterogeneous voice characteristics or synthesis inconsistencies.
• Female voices (n=2) show very tight F1 consistency (CV=0.05), suggesting systematic pronunciation.

Gender Differences

Acoustic differences between male and female voices

• F1 gender difference is moderate (61 Hz / 6.6%, d=0.74).
• F2 gender difference is moderate (26 Hz / 1.3%, d=0.45).

Phonetic Analysis

Formant validation against expected vowel properties

• Formant analysis: F1=929 Hz (open), F2=1968 Hz (front-central).

Duration Patterns

Temporal characteristics and uniformity

• Duration shows low variance (CV=0.07, range 0.476-0.581s). TTS temporal control is tight, possibly unnaturally uniform compared to natural speech.

TTS Quality

Synthesis consistency and prosodic control

• ggl shows high pitch variance (σ=57 Hz), which may indicate natural-sounding variation or control instability.
Guide for Interpretation
  • Coefficient of Variation (CV): Ratio of standard deviation to mean. CV < 0.1 indicates tight clustering, CV > 0.2 indicates high variance.
  • Cohen's d: Standardized effect size (difference ÷ pooled SD). Measures how many standard deviations apart two groups are. d < 0.2 is negligible, 0.2-0.5 is small, 0.5-0.8 is medium, d > 0.8 is large.
  • Relative differences: Reported as "absolute Hz / percentage of mean" to show both raw and scaled differences (e.g., 48 Hz / 8.5% means 48 Hz is 8.5% of the average F1 value).
  • Formant validation: Expected ranges based on IPA vowel space: close vowels (F1 < 400 Hz), front vowels (F2 > 2000 Hz), back vowels (F2 < 1000 Hz).
  • Natural speech baselines: Gender differences typically show F1 ~100 Hz, F2 ~150 Hz; TTS may deviate from these.
  • TTS quality indicators: Pitch stability (σ < 25 Hz = excellent), formant consistency (lower variance = better control).