Sound Waves & Acoustics — A Complete Guide

1. The Nature of Sound

Sound is a mechanical, longitudinal wave — a rhythmic disturbance that propagates through an elastic medium by the successive compression and rarefaction of its constituent particles.

Unlike electromagnetic waves (light, radio), sound requires a material medium to travel. In air at sea level and 20 °C, it travels at approximately 343 m/s. In water, which is far less compressible, it moves at roughly 1,480 m/s, and through steel at an impressive 5,100 m/s.

When a vibrating object — a drum skin, a vocal cord, a loudspeaker cone — moves outward, it pushes neighbouring air molecules together, creating a region of high pressure (compression). As it springs back, it leaves a region of low pressure (rarefaction). This alternating pattern of compressions and rarefactions radiates outward spherically from the source, carrying energy but no net transport of matter.

Longitudinal vs Transverse

In a longitudinal wave, particle displacement is parallel to the direction of propagation. Sound in gases and liquids is always longitudinal. In solids, both longitudinal (P-waves) and transverse (S-waves) can exist — a distinction critical in seismology.

Speed of Sound

For an ideal gas: v = √(γRT/M), where γ is the adiabatic index, R is the gas constant, T is absolute temperature and M is molar mass. Temperature is the dominant factor in air: speed increases about 0.6 m/s per °C.

v = λ · f Fundamental wave equation — speed equals wavelength times frequency

2. Physical Properties of Sound Waves

Every sound can be described by a small set of physical parameters that together define what we hear.

2.1 Frequency & Pitch

Frequency (f) is the number of complete oscillation cycles per second, measured in Hertz (Hz). It is the primary physical correlate of the perceptual quality of pitch. A concert-A above middle C is defined at exactly 440 Hz. Humans can generally perceive frequencies between about 20 Hz and 20,000 Hz, though this range narrows significantly with age.

Sound	Approx. Frequency	Notes
Infrasound (earthquakes, elephants)	< 20 Hz	Inaudible to humans; felt as vibration
Bass (electric bass, bass drum)	40 – 200 Hz	Felt and heard
Male voice fundamentals	85 – 180 Hz	Varies with individual
Female voice fundamentals	165 – 255 Hz	Varies with individual
Middle C (piano)	261.6 Hz	C4 in scientific notation
Concert A	440 Hz	ISO standard tuning reference
Telephone / speech bandwidth	300 – 3,400 Hz	Sufficient for intelligibility
High-frequency hearing limit (young adult)	~20,000 Hz	Declines with age (presbycusis)
Ultrasound (diagnostic imaging)	2 – 20 MHz	Far above human range
Bat echolocation	20 – 200 kHz	Adaptive for insect detection

2.2 Amplitude, Intensity & the Decibel Scale

Amplitude is the maximum displacement of air molecules from their rest position. Intensity (I) is the power carried per unit area, measured in W/m². Because the human ear spans an enormous dynamic range — roughly 12 orders of magnitude from the threshold of hearing to the threshold of pain — acousticians use a logarithmic scale: the decibel (dB).

L_p = 20 · log₁₀(p / p₀) dB Sound Pressure Level — p₀ = 20 µPa (threshold of hearing)

Every increase of +3 dB doubles the acoustic power. Every increase of +10 dB is perceived as roughly twice as loud by the human ear (a psychoacoustic phenomenon quantified by the phon and sone scales).

Source	Level (dB SPL)	Effect
Threshold of hearing	0 dB	Just perceptible
Rustling leaves	20 dB	Very quiet
Quiet library	30 – 40 dB	Comfortable silence
Normal conversation	60 – 65 dB	Comfortable
Heavy traffic	75 – 85 dB	Prolonged exposure: mild risk
Chainsaw / nightclub	100 – 110 dB	Damage after 15 min
Jet engine at 30 m	140 dB	Threshold of pain
Krakatoa eruption (1883)	~172 dB (at 160 km)	Heard 4,800 km away

2.3 Phase & Waveform

The phase of a wave describes where in its cycle it is at a given point in time. When two sound waves of identical frequency overlap, their relative phase determines whether they constructively interfere (adding together) or destructively interfere (cancelling). This principle underpins noise-cancelling headphones, standing waves in rooms, and many musical phenomena.

A pure tone is a single-frequency sinusoidal wave. Real-world sounds are almost always complex waveforms — superpositions of many sinusoids at different frequencies, amplitudes, and phases. The mathematical framework for decomposing these complex sounds is Fourier Analysis, explored in depth in Section 5.

3. Acoustic Behaviour

As sound waves travel through the world, they interact with boundaries and media in rich and often complex ways.

Reflection & Reverberation

When a sound wave strikes a surface, part of its energy is reflected. A single, discrete reflection heard distinctly from the original sound is an echo (requires the reflected path to be >17 m longer, i.e. >50 ms delay). In enclosed spaces, multiple reflections blend into reverberation — the persistence of sound after the source has stopped. Concert halls are designed to have reverb times (T60) of 1.5–2.5 s for orchestral music.

Refraction

Sound refracts — bends — when it passes between media of different acoustic speeds, governed by Snell’s Law: sin(θ₁)/v₁ = sin(θ₂)/v₂. Temperature gradients in the atmosphere cause dramatic refractive effects: sound can bend upward (away from Earth) on warm days (explaining why you can’t hear a distant thunderstorm), or downward at night, making sounds carry farther.

Diffraction

Sound bends around obstacles and spreads through openings — a phenomenon called diffraction. It is most pronounced when the wavelength is comparable to or larger than the obstacle. At 100 Hz (λ ≈ 3.4 m), sound readily diffracts around a wall; at 10,000 Hz (λ ≈ 3.4 cm), the same wall creates a strong acoustic shadow. This is why bass frequencies are “omnidirectional” and high frequencies are directional.

Absorption

All media convert some acoustic energy into heat. Absorption depends on frequency (higher frequencies are absorbed more quickly, hence thunder rumbles rather than cracks at a distance), humidity, temperature, and the material. Porous materials (foam, carpet, fabric) are highly absorptive; dense, hard surfaces (concrete, glass) are highly reflective. The absorption coefficient (α) ranges from 0 (perfect reflector) to 1 (perfect absorber).

The Doppler Effect

When a source and observer are in relative motion, the received frequency differs from the emitted frequency. As a source approaches, compressed wavefronts raise the perceived pitch; as it recedes, expanded wavefronts lower it. This is the familiar shift in pitch of a passing ambulance siren.

f′ = f · (v ± v_o) / (v ∓ v_s) Doppler Formula

Standing Waves & Resonance

When a wave reflects back on itself in a confined space, incident and reflected waves interfere to create a standing wave with fixed nodes (zero displacement) and antinodes (maximum displacement). Resonant modes occur at frequencies where the room or cavity dimensions are integer multiples of half-wavelengths. Room modes (eigenmodes) are a fundamental challenge in acoustic design.

3.1 The Inverse Square Law

In a free field (no reflections), sound radiates spherically from a point source. Because the surface area of a sphere grows as 4πr², the intensity falls as the square of distance:

I ∝ 1 / r² → ΔL = −20 · log₁₀(r₂/r₁) dB Inverse Square Law — doubling distance reduces level by 6 dB

3.2 Acoustic Impedance

Acoustic impedance (Z = ρ · v) is the resistance a medium presents to the passage of a sound wave. Impedance mismatches at boundaries cause reflections. The fraction of power transmitted depends on how well impedances match — a concept critical in ultrasonic transducer design and in understanding why sound reflects so strongly at an air-water interface.

4. Human Audition

The human auditory system is a marvel of biological engineering, capable of detecting pressure fluctuations as small as 20 micropascals — a displacement of the eardrum smaller than the diameter of a hydrogen atom.

4.1 Anatomy of the Ear

The ear is classically divided into three regions, each performing a distinct signal-processing role:

The Outer Ear consists of the pinna (the visible cartilage structure) and the ear canal (external auditory meatus, ~2.5 cm long). The pinna’s folds and ridges create subtle frequency-dependent reflections that provide cues for vertical sound localisation (up/down). The ear canal acts as a quarter-wave resonator, boosting sensitivity around 2,000–4,000 Hz — exactly the frequency range most important for speech intelligibility.

The Middle Ear begins at the tympanic membrane (eardrum), which converts acoustic pressure fluctuations into mechanical vibrations. These are transmitted and amplified by three tiny ossicles — the malleus, incus, and stapes (hammer, anvil, stirrup). The ossicular chain achieves an impedance match between the low-impedance air of the outer ear and the high-impedance fluid of the inner ear, boosting transmission efficiency by roughly 25–30 dB.

The Inner Ear contains the cochlea, a fluid-filled spiral structure roughly 35 mm long when uncoiled. The stapes drives the oval window, setting up travelling waves on the basilar membrane. The cochlea performs a remarkable mechanical frequency analysis: high frequencies cause maximum displacement near the base; low frequencies near the apex. This spatial segregation of frequency — tonotopy — is preserved all the way to the auditory cortex and is, in essence, a biological implementation of Fourier decomposition.

Hair cells (3,500 inner and ~12,000 outer) sit on the basilar membrane. Their stereocilia deflect with membrane motion, opening ion channels and generating electrical signals — the conversion of mechanical vibration to neural impulse. Outer hair cells also act as mechanical amplifiers, achieving gains up to 40 dB through active electromotility (prestin-based).

4.2 The Auditory Range

Frequency Range Comparison Across Species

Human

20 Hz – 20 kHz

Dog

40 Hz – 65 kHz

Cat

45 Hz – 79 kHz

Bat

2 kHz – 200 kHz

Dolphin

1 kHz – 150 kHz

Elephant

14 Hz – 12 kHz

4.3 Psychoacoustics

Psychoacoustics studies the relationship between physical acoustic stimuli and subjective auditory perception. Key phenomena include:

Equal-Loudness Contours (Fletcher–Munson)

The ear’s sensitivity is highly frequency-dependent. We are most sensitive around 3–4 kHz and far less sensitive to very low or very high frequencies. The Fletcher–Munson curves (1933), refined as ISO 226 equal-loudness contours, map the SPL required at each frequency to produce the same perceived loudness. This informs the A-weighting filter used in sound level meters (dB(A)).

Masking

A loud sound can render softer nearby sounds inaudible — simultaneous masking. Critically, masking also occurs across time: forward masking (a loud sound masks quiet ones for up to 200 ms after) and backward masking. MP3 and AAC audio codecs exploit masking curves to discard psychoacoustically irrelevant information, achieving high compression ratios with minimal perceptual loss.

Binaural Hearing & Localisation

With two ears, the auditory system extracts spatial information using: Interaural Time Differences (ITD, ±0.7 ms, dominant below 1.5 kHz) and Interaural Level Differences (ILD, dominant above 1.5 kHz). The head-related transfer function (HRTF) models how the outer ear and head colour sound differently for each direction, enabling 3D audio rendering.

Pitch Perception

Pitch is not merely frequency. The missing fundamental phenomenon demonstrates that the brain can perceive a pitch even when the fundamental frequency is absent — reconstructed from the pattern of harmonics. This is why a small phone speaker, incapable of reproducing 100 Hz, can still convey a male voice with the correct perceived pitch.

5. Fourier Analysis, Harmonics & the FFT

The single most powerful mathematical tool in acoustics was developed by a French mathematician studying heat flow — and it fundamentally transformed our understanding of sound.

5.1 The Fourier Series

In 1807, Jean-Baptiste Joseph Fourier proposed a revolutionary idea: any periodic function, however complex, can be expressed as a sum of sinusoids (sines and cosines) at harmonically related frequencies. For a signal with period T (fundamental frequency f₀ = 1/T), the Fourier Series is:

x(t) = a₀/2 + Σ_n=1^∞ [ a_n cos(2πnf₀t) + b_n sin(2πnf₀t) ] Fourier Series — sum of harmonics with coefficients aₙ and bₙ

In acoustics, this means every periodic sound — a violin string, a vowel, a trumpet note — is built from a fundamental frequency plus harmonics (integer multiples of the fundamental). The relative amplitudes and phases of these harmonics determine the timbre (tonal colour) of the sound, explaining why a flute and an oboe playing the same note at the same loudness sound utterly different.

Harmonic Spectrum — Sawtooth Wave (f₀ = 110 Hz)

110 Hz
f₀

220
2f₀

330
3f₀

440
4f₀

550
5f₀

660
6f₀

770
7f₀

880
8f₀

990
9f₀

1100
10f₀

1210
11f₀

1320
12f₀

5.2 The Fourier Transform — Extending to Aperiodic Signals

Real sounds are rarely perfectly periodic. By taking the limit as the period T → ∞, the Fourier Series becomes the Fourier Transform, valid for any aperiodic signal:

X(f) = ∫_−∞^∞ x(t) e^−j2πft dt ↔ x(t) = ∫_−∞^∞ X(f) e^j2πft df Continuous Fourier Transform pair — time domain ↔ frequency domain

X(f) is a complex-valued function whose magnitude gives the amplitude spectrum and whose argument gives the phase spectrum. Together, they contain all the information of the original signal — a complete, invertible representation.

5.3 The Discrete Fourier Transform & the FFT

In the digital age, signals are sampled at discrete time intervals. The Discrete Fourier Transform (DFT) computes the frequency spectrum of a sequence of N samples. However, computing all N output bins directly requires O(N²) operations — prohibitively slow for large N.

In 1965, James Cooley and John Tukey published their landmark algorithm: the Fast Fourier Transform (FFT). By exploiting the symmetry of complex exponentials and recursively splitting the DFT, the FFT reduces computation to O(N log N) — for N = 1,048,576, that is a speedup factor of over 50,000×.

X[k] = Σ_n=0^N−1 x[n] · e^−j2πkn/N k = 0, 1, …, N−1 Discrete Fourier Transform — N input samples yield N frequency bins

The FFT is ubiquitous in audio processing. Every spectrum analyser, every audio codec (MP3, AAC, Opus), every digital audio workstation, and every voice assistant uses the FFT as a core operation. In real-time applications, a Short-Time Fourier Transform (STFT) applies overlapping FFT windows to a continuous signal, producing a spectrogram — a two-dimensional time-frequency representation that makes features like formants, vibrato, and transients visually apparent.

5.4 Windowing

Applying the FFT to a finite block of samples implicitly assumes the signal is periodic within that block — an assumption that introduces spectral leakage (smearing of energy across adjacent frequency bins) when the signal is not. Window functions — Hann, Hamming, Blackman, Kaiser — taper the signal to zero at the block boundaries, trading spectral resolution for reduced leakage. Choosing the right window for the task is a key skill in digital audio analysis.

6. The Laplace Transform in Acoustics

Where Fourier Analysis is the tool of steady-state frequency analysis, the Laplace Transform handles transients, initial conditions, and the stability of acoustic systems.

6.1 From Fourier to Laplace

The Laplace Transform, developed by Pierre-Simon Laplace in the late 18th century, generalises the Fourier Transform by introducing a complex frequency variable s = σ + jω, where σ is a real damping factor and ω = 2πf is angular frequency:

X(s) = ∫₀^∞ x(t) e^−st dt s = σ + jω ∈ ℂ Unilateral (one-sided) Laplace Transform

Setting σ = 0 (i.e., s = jω) recovers the Fourier Transform. The real part σ allows the transform to handle signals that grow or decay exponentially — making it the natural tool for analysing transient acoustic phenomena like the onset of a struck piano string or the decay of a resonant cavity.

6.2 Transfer Functions & Acoustic Filters

In linear acoustic systems (resonators, filters, rooms modelled as LTI systems), the ratio of output to input in the Laplace domain is the transfer function H(s) = Y(s)/X(s). Convolution in the time domain becomes multiplication in the s-domain — a dramatic simplification.

The poles of H(s) (values where the denominator is zero) determine the system’s resonant frequencies and decay rates. A pole at s = −σ₀ ± jω₀ represents a damped resonance at f₀ = ω₀/2π with exponential decay rate σ₀.

Wave Equation in the s-Domain

The acoustic wave equation, a second-order PDE in time, transforms cleanly under Laplace. Initial conditions (pressure distribution and particle velocity at t = 0) appear naturally as algebraic terms, making the Laplace approach essential for solving room acoustics problems with defined initial states.

Digital Audio & the Z-Transform

The discrete-time analogue of the Laplace Transform is the Z-Transform, where z = e^sT and T is the sampling period. All digital audio filters (IIR lowpass, shelving EQ, reverberation algorithms) are designed and analysed using Z-domain techniques, mapped from their continuous-time Laplace prototypes via the bilinear transformation.

6.3 Poles, Zeros, and the Vocal Tract

The human vocal tract is remarkably well-modelled as a time-varying acoustic tube whose resonances — called formants (F1, F2, F3, ...) — shape vowel identity. In the Laplace framework, formants are poles of the vocal tract transfer function. Speech synthesis and analysis systems (LPC — Linear Predictive Coding) estimate these poles directly from the speech signal, encoding a second of speech in as few as 10–12 complex numbers. This is the foundation of telephony codecs and voice synthesis technology.

Summary of transforms in audio: The Fourier Transform reveals steady-state frequency content. The Laplace Transform handles system dynamics and transients. The FFT makes Fourier analysis computationally practical on digital hardware. Together, they form the mathematical backbone of all modern audio engineering.

7. Applications of Acoustics

The principles of sound waves and their mathematical treatment permeate technology, medicine, architecture, and art.

Architectural Acoustics

Designing the acoustic character of concert halls, opera houses, cathedrals, recording studios, and open-plan offices. Key metrics include reverberation time (T20, T30, T60), clarity (C80), definition (D50), and speech transmission index (STI). Simulation software uses the FFT and geometric (ray tracing) or wave-based (finite element) methods to predict room behaviour before construction.

Medical Ultrasonics

Diagnostic ultrasound (2–20 MHz) uses pulse-echo techniques to image soft tissue. The Doppler effect enables blood flow measurement. Therapeutic ultrasound delivers focused acoustic energy for physiotherapy, kidney stone disintegration (lithotripsy), and emerging cancer treatments. FFT-based signal processing is fundamental to reconstructing images from echo data.

Noise Control Engineering

Controlling industrial noise, environmental noise, and vehicle noise through absorption, insulation, and active noise control (ANC). ANC systems sample the noise with a microphone, compute an anti-phase signal in real time using digital signal processing, and emit it through a loudspeaker — exploiting destructive interference. Modern noise-cancelling headphones achieve 30+ dB of attenuation using these principles.

Sonar & Underwater Acoustics

Because sound travels so much better than electromagnetic waves underwater, sonar (Sound Navigation And Ranging) is the primary sensing modality in the ocean. Active sonar emits pulses and detects echoes; passive sonar listens for target signatures. The FFT is used in signal processing chains to detect weak signals in noise and to estimate bearing and range of underwater objects.

Speech & Music Technology

Automatic speech recognition (ASR), text-to-speech synthesis (TTS), audio compression (MP3, AAC, FLAC), music information retrieval (pitch detection, beat tracking, chord recognition), and hearing aids all rely on combinations of Fourier analysis, psychoacoustic models, and machine learning operating in the frequency domain. The STFT and Mel-frequency cepstral coefficients (MFCCs), derived via the FFT, are the standard feature representations for audio ML.

Seismology

Earthquakes generate both P-waves (longitudinal, >6 km/s in crust) and S-waves (transverse, slower, cannot travel through liquids). Analysing these waves’ arrival times at seismograph networks — using Fourier and Laplace methods — allows scientists to locate earthquake epicentres, determine magnitudes, and infer the structure of the Earth’s interior.

7.1 Digital Audio: The Sampling Theorem

All digital audio rests on a single profound theorem: the Nyquist–Shannon Sampling Theorem (1928/1949). It states that a bandlimited signal can be perfectly reconstructed from discrete samples if the sampling rate f_s is greater than twice the highest frequency present:

f_s > 2 · f_max → CD audio: f_s = 44,100 Hz > 2 × 20,000 Hz = 40,000 Hz Nyquist criterion — the foundation of all digital audio

Frequencies above the Nyquist limit (f_s/2) fold back into the audible band as aliasing — audible artefacts that are prevented by anti-aliasing filters applied before the analogue-to-digital converter. The Fourier Transform provides the mathematical proof that this sampling and perfect reconstruction is possible.