1. The Nature of Sound

Sound is a mechanical, longitudinal wave — a rhythmic disturbance that propagates through an elastic medium by the successive compression and rarefaction of its constituent particles.

Unlike electromagnetic waves (light, radio), sound requires a material medium to travel. In air at sea level and 20 °C, it travels at approximately 343 m/s. In water, which is far less compressible, it moves at roughly 1,480 m/s, and through steel at an impressive 5,100 m/s.

When a vibrating object — a drum skin, a vocal cord, a loudspeaker cone — moves outward, it pushes neighbouring air molecules together, creating a region of high pressure (compression). As it springs back, it leaves a region of low pressure (rarefaction). This alternating pattern of compressions and rarefactions radiates outward spherically from the source, carrying energy but no net transport of matter.

Compression Rarefaction λ (wavelength) Amplitude P t

Longitudinal vs Transverse

In a longitudinal wave, particle displacement is parallel to the direction of propagation. Sound in gases and liquids is always longitudinal. In solids, both longitudinal (P-waves) and transverse (S-waves) can exist — a distinction critical in seismology.

Speed of Sound

For an ideal gas: v = √(γRT/M), where γ is the adiabatic index, R is the gas constant, T is absolute temperature and M is molar mass. Temperature is the dominant factor in air: speed increases about 0.6 m/s per °C.

v = λ · f Fundamental wave equation — speed equals wavelength times frequency

2. Physical Properties of Sound Waves

Every sound can be described by a small set of physical parameters that together define what we hear.

2.1 Frequency & Pitch

Frequency (f) is the number of complete oscillation cycles per second, measured in Hertz (Hz). It is the primary physical correlate of the perceptual quality of pitch. A concert-A above middle C is defined at exactly 440 Hz. Humans can generally perceive frequencies between about 20 Hz and 20,000 Hz, though this range narrows significantly with age.

Sound Approx. Frequency Notes
Infrasound (earthquakes, elephants)< 20 HzInaudible to humans; felt as vibration
Bass (electric bass, bass drum)40 – 200 HzFelt and heard
Male voice fundamentals85 – 180 HzVaries with individual
Female voice fundamentals165 – 255 HzVaries with individual
Middle C (piano)261.6 HzC4 in scientific notation
Concert A440 HzISO standard tuning reference
Telephone / speech bandwidth300 – 3,400 HzSufficient for intelligibility
High-frequency hearing limit (young adult)~20,000 HzDeclines with age (presbycusis)
Ultrasound (diagnostic imaging)2 – 20 MHzFar above human range
Bat echolocation20 – 200 kHzAdaptive for insect detection

2.2 Amplitude, Intensity & the Decibel Scale

Amplitude is the maximum displacement of air molecules from their rest position. Intensity (I) is the power carried per unit area, measured in W/m². Because the human ear spans an enormous dynamic range — roughly 12 orders of magnitude from the threshold of hearing to the threshold of pain — acousticians use a logarithmic scale: the decibel (dB).

Lp = 20 · log10(p / p0) dB Sound Pressure Level — p₀ = 20 µPa (threshold of hearing)

Every increase of +3 dB doubles the acoustic power. Every increase of +10 dB is perceived as roughly twice as loud by the human ear (a psychoacoustic phenomenon quantified by the phon and sone scales).

SourceLevel (dB SPL)Effect
Threshold of hearing0 dBJust perceptible
Rustling leaves20 dBVery quiet
Quiet library30 – 40 dBComfortable silence
Normal conversation60 – 65 dBComfortable
Heavy traffic75 – 85 dBProlonged exposure: mild risk
Chainsaw / nightclub100 – 110 dBDamage after 15 min
Jet engine at 30 m140 dBThreshold of pain
Krakatoa eruption (1883)~172 dB (at 160 km)Heard 4,800 km away

2.3 Phase & Waveform

The phase of a wave describes where in its cycle it is at a given point in time. When two sound waves of identical frequency overlap, their relative phase determines whether they constructively interfere (adding together) or destructively interfere (cancelling). This principle underpins noise-cancelling headphones, standing waves in rooms, and many musical phenomena.

A pure tone is a single-frequency sinusoidal wave. Real-world sounds are almost always complex waveforms — superpositions of many sinusoids at different frequencies, amplitudes, and phases. The mathematical framework for decomposing these complex sounds is Fourier Analysis, explored in depth in Section 5.

3. Acoustic Behaviour

As sound waves travel through the world, they interact with boundaries and media in rich and often complex ways.

Reflection & Reverberation

When a sound wave strikes a surface, part of its energy is reflected. A single, discrete reflection heard distinctly from the original sound is an echo (requires the reflected path to be >17 m longer, i.e. >50 ms delay). In enclosed spaces, multiple reflections blend into reverberation — the persistence of sound after the source has stopped. Concert halls are designed to have reverb times (T60) of 1.5–2.5 s for orchestral music.

Refraction

Sound refracts — bends — when it passes between media of different acoustic speeds, governed by Snell’s Law: sin(θ1)/v1 = sin(θ2)/v2. Temperature gradients in the atmosphere cause dramatic refractive effects: sound can bend upward (away from Earth) on warm days (explaining why you can’t hear a distant thunderstorm), or downward at night, making sounds carry farther.

Diffraction

Sound bends around obstacles and spreads through openings — a phenomenon called diffraction. It is most pronounced when the wavelength is comparable to or larger than the obstacle. At 100 Hz (λ ≈ 3.4 m), sound readily diffracts around a wall; at 10,000 Hz (λ ≈ 3.4 cm), the same wall creates a strong acoustic shadow. This is why bass frequencies are “omnidirectional” and high frequencies are directional.

Absorption

All media convert some acoustic energy into heat. Absorption depends on frequency (higher frequencies are absorbed more quickly, hence thunder rumbles rather than cracks at a distance), humidity, temperature, and the material. Porous materials (foam, carpet, fabric) are highly absorptive; dense, hard surfaces (concrete, glass) are highly reflective. The absorption coefficient (α) ranges from 0 (perfect reflector) to 1 (perfect absorber).

The Doppler Effect

When a source and observer are in relative motion, the received frequency differs from the emitted frequency. As a source approaches, compressed wavefronts raise the perceived pitch; as it recedes, expanded wavefronts lower it. This is the familiar shift in pitch of a passing ambulance siren.

f′ = f · (v ± vo) / (v ∓ vs) Doppler Formula

Standing Waves & Resonance

When a wave reflects back on itself in a confined space, incident and reflected waves interfere to create a standing wave with fixed nodes (zero displacement) and antinodes (maximum displacement). Resonant modes occur at frequencies where the room or cavity dimensions are integer multiples of half-wavelengths. Room modes (eigenmodes) are a fundamental challenge in acoustic design.

3.1 The Inverse Square Law

In a free field (no reflections), sound radiates spherically from a point source. Because the surface area of a sphere grows as 4πr², the intensity falls as the square of distance:

I ∝ 1 / r²   →   ΔL = −20 · log10(r2/r1) dB Inverse Square Law — doubling distance reduces level by 6 dB

3.2 Acoustic Impedance

Acoustic impedance (Z = ρ · v) is the resistance a medium presents to the passage of a sound wave. Impedance mismatches at boundaries cause reflections. The fraction of power transmitted depends on how well impedances match — a concept critical in ultrasonic transducer design and in understanding why sound reflects so strongly at an air-water interface.

4. Human Audition

The human auditory system is a marvel of biological engineering, capable of detecting pressure fluctuations as small as 20 micropascals — a displacement of the eardrum smaller than the diameter of a hydrogen atom.

4.1 Anatomy of the Ear

The ear is classically divided into three regions, each performing a distinct signal-processing role:

The Outer Ear consists of the pinna (the visible cartilage structure) and the ear canal (external auditory meatus, ~2.5 cm long). The pinna’s folds and ridges create subtle frequency-dependent reflections that provide cues for vertical sound localisation (up/down). The ear canal acts as a quarter-wave resonator, boosting sensitivity around 2,000–4,000 Hz — exactly the frequency range most important for speech intelligibility.

The Middle Ear begins at the tympanic membrane (eardrum), which converts acoustic pressure fluctuations into mechanical vibrations. These are transmitted and amplified by three tiny ossicles — the malleus, incus, and stapes (hammer, anvil, stirrup). The ossicular chain achieves an impedance match between the low-impedance air of the outer ear and the high-impedance fluid of the inner ear, boosting transmission efficiency by roughly 25–30 dB.

The Inner Ear contains the cochlea, a fluid-filled spiral structure roughly 35 mm long when uncoiled. The stapes drives the oval window, setting up travelling waves on the basilar membrane. The cochlea performs a remarkable mechanical frequency analysis: high frequencies cause maximum displacement near the base; low frequencies near the apex. This spatial segregation of frequency — tonotopy — is preserved all the way to the auditory cortex and is, in essence, a biological implementation of Fourier decomposition.

Hair cells (3,500 inner and ~12,000 outer) sit on the basilar membrane. Their stereocilia deflect with membrane motion, opening ion channels and generating electrical signals — the conversion of mechanical vibration to neural impulse. Outer hair cells also act as mechanical amplifiers, achieving gains up to 40 dB through active electromotility (prestin-based).

4.2 The Auditory Range

Frequency Range Comparison Across Species

Human
20 Hz – 20 kHz
Dog
40 Hz – 65 kHz
Cat
45 Hz – 79 kHz
Bat
2 kHz – 200 kHz
Dolphin
1 kHz – 150 kHz
Elephant
14 Hz – 12 kHz

4.3 Psychoacoustics

Psychoacoustics studies the relationship between physical acoustic stimuli and subjective auditory perception. Key phenomena include:

Equal-Loudness Contours (Fletcher–Munson)

The ear’s sensitivity is highly frequency-dependent. We are most sensitive around 3–4 kHz and far less sensitive to very low or very high frequencies. The Fletcher–Munson curves (1933), refined as ISO 226 equal-loudness contours, map the SPL required at each frequency to produce the same perceived loudness. This informs the A-weighting filter used in sound level meters (dB(A)).

Masking

A loud sound can render softer nearby sounds inaudible — simultaneous masking. Critically, masking also occurs across time: forward masking (a loud sound masks quiet ones for up to 200 ms after) and backward masking. MP3 and AAC audio codecs exploit masking curves to discard psychoacoustically irrelevant information, achieving high compression ratios with minimal perceptual loss.

Binaural Hearing & Localisation

With two ears, the auditory system extracts spatial information using: Interaural Time Differences (ITD, ±0.7 ms, dominant below 1.5 kHz) and Interaural Level Differences (ILD, dominant above 1.5 kHz). The head-related transfer function (HRTF) models how the outer ear and head colour sound differently for each direction, enabling 3D audio rendering.

Pitch Perception

Pitch is not merely frequency. The missing fundamental phenomenon demonstrates that the brain can perceive a pitch even when the fundamental frequency is absent — reconstructed from the pattern of harmonics. This is why a small phone speaker, incapable of reproducing 100 Hz, can still convey a male voice with the correct perceived pitch.

5. Fourier Analysis, Harmonics & the FFT

The single most powerful mathematical tool in acoustics was developed by a French mathematician studying heat flow — and it fundamentally transformed our understanding of sound.

5.1 The Fourier Series

In 1807, Jean-Baptiste Joseph Fourier proposed a revolutionary idea: any periodic function, however complex, can be expressed as a sum of sinusoids (sines and cosines) at harmonically related frequencies. For a signal with period T (fundamental frequency f0 = 1/T), the Fourier Series is:

x(t) = a0/2 + Σn=1 [ an cos(2πnf0t) + bn sin(2πnf0t) ] Fourier Series — sum of harmonics with coefficients aₙ and bₙ

In acoustics, this means every periodic sound — a violin string, a vowel, a trumpet note — is built from a fundamental frequency plus harmonics (integer multiples of the fundamental). The relative amplitudes and phases of these harmonics determine the timbre (tonal colour) of the sound, explaining why a flute and an oboe playing the same note at the same loudness sound utterly different.

Harmonic Spectrum — Sawtooth Wave (f₀ = 110 Hz)
110 Hz
f₀
220
2f₀
330
3f₀
440
4f₀
550
5f₀
660
6f₀
770
7f₀
880
8f₀
990
9f₀
1100
10f₀
1210
11f₀
1320
12f₀

5.2 The Fourier Transform — Extending to Aperiodic Signals

Real sounds are rarely perfectly periodic. By taking the limit as the period T → ∞, the Fourier Series becomes the Fourier Transform, valid for any aperiodic signal:

X(f) = ∫−∞ x(t) e−j2πft dt   ↔   x(t) = ∫−∞ X(f) ej2πft df Continuous Fourier Transform pair — time domain ↔ frequency domain

X(f) is a complex-valued function whose magnitude gives the amplitude spectrum and whose argument gives the phase spectrum. Together, they contain all the information of the original signal — a complete, invertible representation.

5.3 The Discrete Fourier Transform & the FFT

In the digital age, signals are sampled at discrete time intervals. The Discrete Fourier Transform (DFT) computes the frequency spectrum of a sequence of N samples. However, computing all N output bins directly requires O(N²) operations — prohibitively slow for large N.

In 1965, James Cooley and John Tukey published their landmark algorithm: the Fast Fourier Transform (FFT). By exploiting the symmetry of complex exponentials and recursively splitting the DFT, the FFT reduces computation to O(N log N) — for N = 1,048,576, that is a speedup factor of over 50,000×.

X[k] = Σn=0N−1 x[n] · e−j2πkn/N    k = 0, 1, …, N−1 Discrete Fourier Transform — N input samples yield N frequency bins

The FFT is ubiquitous in audio processing. Every spectrum analyser, every audio codec (MP3, AAC, Opus), every digital audio workstation, and every voice assistant uses the FFT as a core operation. In real-time applications, a Short-Time Fourier Transform (STFT) applies overlapping FFT windows to a continuous signal, producing a spectrogram — a two-dimensional time-frequency representation that makes features like formants, vibrato, and transients visually apparent.

5.4 Windowing

Applying the FFT to a finite block of samples implicitly assumes the signal is periodic within that block — an assumption that introduces spectral leakage (smearing of energy across adjacent frequency bins) when the signal is not. Window functions — Hann, Hamming, Blackman, Kaiser — taper the signal to zero at the block boundaries, trading spectral resolution for reduced leakage. Choosing the right window for the task is a key skill in digital audio analysis.

6. The Laplace Transform in Acoustics

Where Fourier Analysis is the tool of steady-state frequency analysis, the Laplace Transform handles transients, initial conditions, and the stability of acoustic systems.

6.1 From Fourier to Laplace

The Laplace Transform, developed by Pierre-Simon Laplace in the late 18th century, generalises the Fourier Transform by introducing a complex frequency variable s = σ + jω, where σ is a real damping factor and ω = 2πf is angular frequency:

X(s) = ∫0 x(t) e−st dt     s = σ + jω ∈ ℂ Unilateral (one-sided) Laplace Transform

Setting σ = 0 (i.e., s = jω) recovers the Fourier Transform. The real part σ allows the transform to handle signals that grow or decay exponentially — making it the natural tool for analysing transient acoustic phenomena like the onset of a struck piano string or the decay of a resonant cavity.

6.2 Transfer Functions & Acoustic Filters

In linear acoustic systems (resonators, filters, rooms modelled as LTI systems), the ratio of output to input in the Laplace domain is the transfer function H(s) = Y(s)/X(s). Convolution in the time domain becomes multiplication in the s-domain — a dramatic simplification.

The poles of H(s) (values where the denominator is zero) determine the system’s resonant frequencies and decay rates. A pole at s = −σ0 ± jω0 represents a damped resonance at f0 = ω0/2π with exponential decay rate σ0.

Wave Equation in the s-Domain

The acoustic wave equation, a second-order PDE in time, transforms cleanly under Laplace. Initial conditions (pressure distribution and particle velocity at t = 0) appear naturally as algebraic terms, making the Laplace approach essential for solving room acoustics problems with defined initial states.

Digital Audio & the Z-Transform

The discrete-time analogue of the Laplace Transform is the Z-Transform, where z = esT and T is the sampling period. All digital audio filters (IIR lowpass, shelving EQ, reverberation algorithms) are designed and analysed using Z-domain techniques, mapped from their continuous-time Laplace prototypes via the bilinear transformation.

6.3 Poles, Zeros, and the Vocal Tract

The human vocal tract is remarkably well-modelled as a time-varying acoustic tube whose resonances — called formants (F1, F2, F3, ...) — shape vowel identity. In the Laplace framework, formants are poles of the vocal tract transfer function. Speech synthesis and analysis systems (LPC — Linear Predictive Coding) estimate these poles directly from the speech signal, encoding a second of speech in as few as 10–12 complex numbers. This is the foundation of telephony codecs and voice synthesis technology.

Summary of transforms in audio: The Fourier Transform reveals steady-state frequency content. The Laplace Transform handles system dynamics and transients. The FFT makes Fourier analysis computationally practical on digital hardware. Together, they form the mathematical backbone of all modern audio engineering.

7. Applications of Acoustics

The principles of sound waves and their mathematical treatment permeate technology, medicine, architecture, and art.

Architectural Acoustics

Designing the acoustic character of concert halls, opera houses, cathedrals, recording studios, and open-plan offices. Key metrics include reverberation time (T20, T30, T60), clarity (C80), definition (D50), and speech transmission index (STI). Simulation software uses the FFT and geometric (ray tracing) or wave-based (finite element) methods to predict room behaviour before construction.

Medical Ultrasonics

Diagnostic ultrasound (2–20 MHz) uses pulse-echo techniques to image soft tissue. The Doppler effect enables blood flow measurement. Therapeutic ultrasound delivers focused acoustic energy for physiotherapy, kidney stone disintegration (lithotripsy), and emerging cancer treatments. FFT-based signal processing is fundamental to reconstructing images from echo data.

Noise Control Engineering

Controlling industrial noise, environmental noise, and vehicle noise through absorption, insulation, and active noise control (ANC). ANC systems sample the noise with a microphone, compute an anti-phase signal in real time using digital signal processing, and emit it through a loudspeaker — exploiting destructive interference. Modern noise-cancelling headphones achieve 30+ dB of attenuation using these principles.

Sonar & Underwater Acoustics

Because sound travels so much better than electromagnetic waves underwater, sonar (Sound Navigation And Ranging) is the primary sensing modality in the ocean. Active sonar emits pulses and detects echoes; passive sonar listens for target signatures. The FFT is used in signal processing chains to detect weak signals in noise and to estimate bearing and range of underwater objects.

Speech & Music Technology

Automatic speech recognition (ASR), text-to-speech synthesis (TTS), audio compression (MP3, AAC, FLAC), music information retrieval (pitch detection, beat tracking, chord recognition), and hearing aids all rely on combinations of Fourier analysis, psychoacoustic models, and machine learning operating in the frequency domain. The STFT and Mel-frequency cepstral coefficients (MFCCs), derived via the FFT, are the standard feature representations for audio ML.

Seismology

Earthquakes generate both P-waves (longitudinal, >6 km/s in crust) and S-waves (transverse, slower, cannot travel through liquids). Analysing these waves’ arrival times at seismograph networks — using Fourier and Laplace methods — allows scientists to locate earthquake epicentres, determine magnitudes, and infer the structure of the Earth’s interior.


7.1 Digital Audio: The Sampling Theorem

All digital audio rests on a single profound theorem: the Nyquist–Shannon Sampling Theorem (1928/1949). It states that a bandlimited signal can be perfectly reconstructed from discrete samples if the sampling rate fs is greater than twice the highest frequency present:

fs > 2 · fmax   →   CD audio: fs = 44,100 Hz > 2 × 20,000 Hz = 40,000 Hz Nyquist criterion — the foundation of all digital audio

Frequencies above the Nyquist limit (fs/2) fold back into the audible band as aliasing — audible artefacts that are prevented by anti-aliasing filters applied before the analogue-to-digital converter. The Fourier Transform provides the mathematical proof that this sampling and perfect reconstruction is possible.