Harmonic Analysis: Leverage Map

A. EXISTENCE JUSTIFICATION

Every signal is a superposition of pure tones.

The physical origins: Vibrating strings produce overtones. Heat diffuses through materials. Light waves interfere. Sound is pressure oscillation. In each case, understanding requires decomposing complex phenomena into simple periodic components.

Fourier’s audacious claim (1807): ANY function can be written as a sum of sines and cosines:

$$f(x) = \frac{a_0}{2} + \sum_{n=1}^{\infty} \left(a_n \cos(nx) + b_n \sin(nx)\right)$$

Mathematicians were skeptical. What does “any” mean? How does the sum converge? For discontinuous functions? These questions drove 19th-century analysis.

The vindication: Fourier was essentially right, but the precise statements required new mathematics:

  • Lebesgue integration (for L² convergence)
  • Distribution theory (for pointwise issues)
  • Functional analysis (for operator theory)

Why harmonic analysis is fundamental:

The Fourier transform is arguably the most important single operation in applied mathematics:

  • Converts differentiation to multiplication
  • Converts convolution to multiplication
  • Diagonalizes translation-invariant operators
  • Reveals frequency content
  • Connects local and global behavior

The modern view: Harmonic analysis is the study of symmetry through analysis. On ℝⁿ, translation symmetry gives Fourier analysis. On other groups, representation theory gives generalized harmonic analysis. The theme: exploit symmetry to decompose functions.


B. CORE OBJECTS & MORPHISMS

The Fourier transform (various conventions):

DomainTransformInverse
L¹(ℝ)f̂(ξ) = ∫ f(x)e^{-2πixξ} dxf(x) = ∫ f̂(ξ)e^{2πixξ} dξ
L²(ℝ)Same, extended by densityPlancherel: ‖f̂‖₂ = ‖f‖₂
𝕋 = ℝ/ℤ (periodic)f̂(n) = ∫₀¹ f(x)e^{-2πinx} dxf(x) = Σₙ f̂(n)e^{2πinx}
(sequences)f̂(θ) = Σₙ f(n)e^{-2πinθ}f(n) = ∫₀¹ f̂(θ)e^{2πinθ} dθ
Finite ℤ/Nℤf̂(k) = Σₙ f(n)e^{-2πink/N}f(n) = (1/N)Σₖ f̂(k)e^{2πink/N}

Core objects:

ObjectWhat it isNotation
Fourier transformDecomposition into frequenciesf̂, ℱf, F[f]
Fourier seriesPeriodic case, discrete frequenciesΣ cₙe^{inx}
Convolution(f * g)(x) = ∫ f(y)g(x-y) dyf * g
Singular integralPrincipal value integral with singular kernelHf, Tf
Maximal functionMf(x) = sup_{r>0} (1/B_r
MultiplierOperator defined by m(ξ)f̂(ξ)T_m
DistributionGeneralized function (continuous linear functional on test functions)u ∈ 𝒟'
Tempered distributionDistribution with polynomial growth controlu ∈ 𝒮'
WaveletLocalized oscillating functionψ
FrameOvercomplete “basis” with reconstruction{ψⱼ}

Function spaces:

SpaceDefinitionNorm
Lᵖ(ℝⁿ)∫|f|ᵖ < ∞‖f‖_p = (∫|f|ᵖ)^{1/p}
L^∞Essential supremum finite‖f‖_∞ = ess sup |f|
Schwartz 𝒮Rapidly decreasing smooth functionsSeminorms on derivatives
Hˢ (Sobolev)s derivatives in L²‖f‖_{Hˢ} = ‖(1+|ξ|²)^{s/2}f̂‖₂
BMOBounded mean oscillationsup_Q (1/|Q|)∫_Q |f - f_Q|
Hardy HᵖBoundary values of holomorphic functionsComplex analysis norms

Morphisms:

  • Fourier transform: L¹ → C₀, L² → L² (isometry)
  • Convolution operators: Lᵖ → Lᵖ
  • Singular integrals: Lᵖ → Lᵖ (for 1 < p < ∞)

C. CENTRAL INVARIANTS

The Fourier transform properties:

PropertyStatement
Linearity(αf + βg)^ = αf̂ + βĝ
Translation(τₐf)^ (ξ) = e^{-2πiaξ}f̂(ξ)
Modulation(e^{2πiax}f)^ (ξ) = f̂(ξ-a)
Scaling(f(·/λ))^ (ξ) = λf̂(λξ)
Differentiation(f’)^ (ξ) = 2πiξ f̂(ξ)
Convolution(f * g)^ = f̂ · ĝ
Multiplication(fg)^ = f̂ * ĝ
Parseval/Plancherel⟨f, g⟩ = ⟨f̂, ĝ⟩, ‖f‖₂ = ‖f̂‖₂

Uncertainty principle:

$$|xf|_2 \cdot |\xi\hat{f}|_2 \geq \frac{1}{4\pi}|f|_2^2$$

Equality iff f is a Gaussian. You cannot be localized in both position and frequency.

Decay and smoothness duality:

f has…f̂ has…
Compact supportReal analytic (entire)
Rapid decay (Schwartz)Rapid decay (Schwartz)
n derivatives in L¹(1+|ξ|)ⁿf̂ bounded
Jump discontinuitySlow decay (~1/ξ)

Key exponents:

BoundMeaning
‖f̂‖_∞ ≤ ‖f‖₁L¹ → L^∞
‖f̂‖₂ = ‖f‖₂L² → L² isometry
‖f̂‖_q ≤ C‖f‖_pHausdorff-Young: 1/p + 1/q = 1, 1 ≤ p ≤ 2

What counts as “the same”:

  • Functions equal a.e. (in Lᵖ)
  • Same Fourier transform (determines function in L¹ ∩ L²)
  • Distributions equal if they agree on all test functions

D. SIGNATURE THEOREMS

1. Fourier Inversion Theorem

For f ∈ L¹(ℝ) with f̂ ∈ L¹(ℝ): $$f(x) = \int_{-\infty}^{\infty} \hat{f}(\xi) e^{2\pi i x\xi} d\xi$$

Importance: The Fourier transform is invertible—you can recover f from f̂. Analysis (decomposition) and synthesis (reconstruction) are dual operations.

Caveats: Need f̂ ∈ L¹ for pointwise inversion. In L², inversion is via limit of truncated integrals.

2. Plancherel Theorem

The Fourier transform extends to a unitary operator on L²(ℝ): $$|\hat{f}|_2 = |f|_2$$

Importance: The Fourier transform preserves energy (L² norm). It’s an isometry, hence has an inverse. This is why L² is the natural home for Fourier analysis.

3. Riemann-Lebesgue Lemma

If f ∈ L¹(ℝ), then f̂ is continuous and f̂(ξ) → 0 as |ξ| → ∞.

Importance: High-frequency components of L¹ functions must decay. This constrains what can be a Fourier transform.

4. Carleson’s Theorem

For f ∈ L²(𝕋), the Fourier series converges to f(x) almost everywhere.

Importance: This resolved a century-old question: does the Fourier series of an L² function converge pointwise? Yes, a.e. (not everywhere—counterexamples exist). The proof (1966) was a tour de force.

Contrast: For L¹ functions, Fourier series can diverge on sets of positive measure (Kolmogorov).

5. Calderón-Zygmund Theory

Singular integral operators of the form $$Tf(x) = \text{p.v.} \int \frac{\Omega(x-y)}{|x-y|^n} f(y) dy$$ are bounded on Lᵖ for 1 < p < ∞ (under conditions on Ω).

Importance: This extends Fourier analysis to handle singular objects. The Hilbert transform, Riesz transforms, and many PDE operators fit this framework. The theory gives Lᵖ bounds without explicit computation.

6. Littlewood-Paley Theory

Decompose f into frequency bands: f = Σⱼ Δⱼf where Δⱼ picks out frequencies ~2ʲ. Then: $$|f|_p \sim \left|\left(\sum_j |\Delta_j f|^2\right)^{1/2}\right|_p$$

Importance: This “square function” characterization lets you analyze f frequency-by-frequency, then reassemble. It’s fundamental for:

  • Bounding multipliers
  • Paradifferential calculus
  • Wavelets

7. Paley-Wiener Theorem

f̂ has compact support iff f extends to an entire function of exponential type.

Importance: Characterizes bandlimited functions. A signal with no frequencies above B is determined by samples at rate 2B (Nyquist-Shannon sampling theorem follows).


E. BRIDGES TO OTHER DOMAINS

DomainConnection
PDEsFourier transform converts differential equations to algebraic. Heat kernel, wave propagation, Schrödinger. Pseudodifferential operators.
ProbabilityCharacteristic functions ARE Fourier transforms. Central limit theorem via Fourier. Lévy’s inversion.
Number TheoryPoisson summation, theta functions, modular forms. Circle method. Analytic number theory.
Signal ProcessingSampling, filtering, compression. FFT algorithm. Wavelets. Time-frequency analysis.
Representation TheoryFourier analysis on groups. Peter-Weyl theorem. Harmonic analysis on Lie groups.
Functional AnalysisSpectral theory of operators. Banach algebras. Operator algebras.
Quantum MechanicsPosition/momentum duality IS Fourier duality. Uncertainty principle. Wigner function.
Complex AnalysisHardy spaces, Paley-Wiener. Analytic signals. Hilbert transform.
Machine LearningFourier features, spectral methods. Convolutional networks and Fourier. Neural tangent kernel.
Image ProcessingFourier filtering, deconvolution. JPEG (DCT). Wavelets (JPEG 2000).

Pattern-linking gold:

Convolution is multiplication in frequency:

$$(f * g)^\wedge = \hat{f} \cdot \hat{g}$$

This single fact explains:

  • Why LTI systems have transfer functions
  • Why Gaussians are special (Gaussian × Gaussian = Gaussian)
  • Why convolution is associative (multiplication is)
  • How to solve constant-coefficient ODEs

The Poisson summation formula:

$$\sum_{n \in \mathbb{Z}} f(n) = \sum_{k \in \mathbb{Z}} \hat{f}(k)$$

Connects discrete sums to Fourier transforms. Applications:

  • Theta function transformation
  • Number theory (counting lattice points)
  • Sampling theory
  • Modular forms

Scale-frequency tradeoff:

Localizing in space spreads in frequency; localizing in frequency spreads in space. This is the uncertainty principle, manifested everywhere:

  • Heisenberg (quantum mechanics)
  • Gabor limit (time-frequency)
  • Wavelet scaling

F. COMMON MISCONCEPTIONS

  1. “Every function has a Fourier transform” — In classical sense, need integrability conditions. Distributions extend to all tempered functions, but with care.

  2. “Fourier series converge pointwise” — For L² yes (a.e., Carleson). For L¹, can diverge on positive measure sets. For continuous functions, can diverge at points (du Bois-Reymond).

  3. “The Fourier transform is THE decomposition” — It’s one of many: wavelets, Gabor, curvelets, shearlets. Each optimized for different structure (singularities, edges, anisotropy).

  4. “Wavelets replaced Fourier” — They complement, not replace. Fourier for stationary/periodic signals, wavelets for transient/multiscale. Both have their domains.

  5. “FFT is an approximation” — The Fast Fourier Transform computes the EXACT discrete Fourier transform in O(n log n) instead of O(n²). No approximation, just clever factorization.

  6. “High frequencies = noise” — High frequencies carry edges, fine detail. Removing them blurs. “Noise” is problem-dependent, not frequency-determined.

  7. “Uncertainty principle is about measurement” — The mathematical uncertainty principle is about the spread of f and f̂. The quantum version relates to non-commuting observables. Related but distinct.

  8. “Singular integrals are pathological” — They’re natural and ubiquitous: Hilbert transform, Riesz transforms, Cauchy integral. The theory (Calderón-Zygmund) makes them tractable.


G. NOTATION SURVIVAL KIT

SymbolMeaning
f̂ or ℱfFourier transform of f
f̌ or ℱ⁻¹fInverse Fourier transform
f * gConvolution
τₐfTranslation: (τₐf)(x) = f(x-a)
MfHardy-Littlewood maximal function
HfHilbert transform
ΔⱼLittlewood-Paley projection to frequency ~2ʲ
𝒮Schwartz space
𝒮'Tempered distributions
𝒟Test functions (compact support smooth)
𝒟'Distributions
LᵖLebesgue space
Sobolev space
BMOBounded mean oscillation
HᵖHardy space
ξ, ω, kFrequency variables
δDirac delta (distribution)
p.v.Principal value
suppSupport of function

Key transforms:

TransformDefinitionPair
Fourierf̂(ξ) = ∫ f(x)e^{-2πixξ} dx(f, f̂)
LaplaceF(s) = ∫₀^∞ f(t)e^{-st} dt(f, F)
Mellinφ(s) = ∫₀^∞ f(x)x^{s-1} dxMultiplicative Fourier
HilbertHf(x) = p.v. (1/π) ∫ f(t)/(x-t) dtPhase shift by π/2
WaveletWf(a,b) = ∫ f(t)ψ*((t-b)/a) dt/√a(position, scale)

H. ONE WORKED MICRO-EXAMPLE

Solving the heat equation via Fourier transform

Setup: Heat equation on ℝ:

$$\frac{\partial u}{\partial t} = \frac{\partial^2 u}{\partial x^2}, \quad u(x, 0) = f(x)$$

Take Fourier transform in x:

Let û(ξ, t) = ∫ u(x, t)e^{-2πixξ} dx.

The equation becomes:

$$\frac{\partial \hat{u}}{\partial t} = (2\pi i\xi)^2 \hat{u} = -4\pi^2\xi^2 \hat{u}$$

This is an ODE in t!

Solve the ODE:

$$\hat{u}(\xi, t) = \hat{u}(\xi, 0) e^{-4\pi^2\xi^2 t} = \hat{f}(\xi) e^{-4\pi^2\xi^2 t}$$

Inverse transform:

$$u(x, t) = \int \hat{f}(\xi) e^{-4\pi^2\xi^2 t} e^{2\pi ix\xi} d\xi$$

Recognize the convolution:

The function e^{-4π²ξ²t} is the Fourier transform of the Gaussian:

$$g_t(x) = \frac{1}{\sqrt{4\pi t}} e^{-x^2/(4t)}$$

(the “heat kernel”)

By convolution theorem:

$$u(x, t) = (f * g_t)(x) = \int f(y) \frac{1}{\sqrt{4\pi t}} e^{-(x-y)^2/(4t)} dy$$

Interpretation:

The solution at time t is the initial data f convolved with a Gaussian of width √t. Heat “spreads out” via Gaussian averaging. The Fourier transform diagonalized the Laplacian!


Micro-example 2: The Hilbert transform

Definition:

$$Hf(x) = \text{p.v.} \frac{1}{\pi} \int_{-\infty}^{\infty} \frac{f(t)}{x - t} dt$$

The “principal value” means: limit as ε → 0 of ∫_{|x-t| > ε}.

Fourier multiplier:

$$\widehat{Hf}(\xi) = -i \cdot \text{sgn}(\xi) \cdot \hat{f}(\xi)$$

So H multiplies positive frequencies by -i (phase shift -π/2) and negative frequencies by +i (phase shift +π/2).

Key properties:

  • H² = -I (applying twice gives negative identity)
  • H is bounded on Lᵖ for 1 < p < ∞
  • H is NOT bounded on L¹ or L^∞
  • For f real, f + iHf is the “analytic signal” (has only positive frequencies)

Connection to complex analysis:

If F = f + iHf (analytic signal), then F extends to a holomorphic function in the upper half-plane with boundary values F(x).

The Hilbert transform extracts the imaginary part from the boundary values of a holomorphic function with given real part.

Importance:

  • Signal processing: instantaneous phase and amplitude
  • PDEs: Cauchy-Riemann equations
  • Singular integral theory: prototype singular integral

Micro-example 3: Wavelets

The problem with Fourier:

Fourier analysis is global—each frequency component extends over all time. For signals with transient features (edges, bursts), Fourier doesn’t localize well.

The wavelet idea:

Use basis functions localized in BOTH time and frequency:

$$\psi_{a,b}(t) = \frac{1}{\sqrt{a}} \psi\left(\frac{t-b}{a}\right)$$

  • b = position (translation)
  • a = scale (dilation)—large a = low frequency, small a = high frequency

The wavelet transform:

$$Wf(a, b) = \int f(t) \overline{\psi_{a,b}(t)} dt$$

This gives a time-scale representation (vs. Fourier’s time-frequency with infinite time support).

Reconstruction:

Under admissibility condition (∫|ψ̂(ξ)|²/|ξ| dξ < ∞):

$$f(t) = C_\psi^{-1} \int_0^\infty \int_{-\infty}^{\infty} Wf(a,b) \psi_{a,b}(t) \frac{db , da}{a^2}$$

Discrete wavelets (orthonormal bases):

Daubechies constructed wavelets ψ such that {2^{j/2}ψ(2ʲt - k)}_{j,k ∈ ℤ} is an orthonormal basis for L²(ℝ).

Multiresolution analysis:

Build a ladder of approximation spaces V₀ ⊂ V₁ ⊂ V₂ ⊂ … where Vⱼ resolves features at scale 2⁻ʲ. The wavelets span the “details” Wⱼ = Vⱼ₊₁ ⊖ Vⱼ.

Why wavelets matter:

  • Compression (JPEG 2000): wavelets concentrate energy in few coefficients
  • Denoising: threshold small wavelet coefficients
  • Multiscale analysis: natural for fractals, turbulence
  • Fast algorithms: O(n) discrete wavelet transform

Harmonic Analysis on Groups

The general pattern:

Group“Frequencies”Transform
ℝ (continuous)Fourier transform
𝕋 = ℝ/ℤℤ (discrete)Fourier series
𝕋 (continuous)Discrete-time Fourier
ℤ/Nℤℤ/NℤDiscrete Fourier (DFT)
General LCA group GDual group ĜPontryagin duality
Compact group GIrreducible representationsPeter-Weyl theorem
Noncompact Lie groupUnitary dual (complicated!)Plancherel measure

Pontryagin duality:

For locally compact abelian (LCA) groups:

  • The dual Ĝ = continuous homomorphisms G → 𝕋
  • The dual of the dual is G: Ĝ̂ ≅ G
  • Fourier transform: L¹(G) → C₀(Ĝ)

Peter-Weyl theorem (compact groups):

Matrix coefficients of irreducible representations form an orthonormal basis of L²(G).

This IS Fourier analysis: for G = 𝕋, the irreps are χₙ(θ) = e^{inθ}, and matrix coefficients are the Fourier basis.

For G = SO(3): irreps are the Wigner D-matrices, basis functions are spherical harmonics.


Leverage for your work:

Fourier features in neural networks:

Random Fourier features: approximate kernel k(x,y) = k(x-y) via:

$$k(x-y) \approx \frac{1}{D} \sum_{j=1}^{D} \cos(\omega_j \cdot x + b_j) \cos(\omega_j \cdot y + b_j)$$

where ωⱼ are sampled from the Fourier transform of k. This makes kernel methods scalable.

The Neural Tangent Kernel is analyzed via Fourier:

The infinite-width limit of neural networks has spectral properties determined by the architecture. Fourier analysis of the NTK reveals which frequencies the network learns fast/slow.

Positional encodings:

Transformer positional encodings use sin/cos at different frequencies:

$$PE(pos, 2i) = \sin(pos / 10000^{2i/d})$$ $$PE(pos, 2i+1) = \cos(pos / 10000^{2i/d})$$

This IS a Fourier basis! The network can learn to attend to relative positions because sin(a+b) decomposes nicely.

Spectral graph theory:

Graph Laplacian L = D - A has eigenvalues and eigenvectors. The eigenvectors are “Fourier modes” on the graph. Graph convolution = multiplication in this spectral domain.

GNNs often work in this spectral basis. Understanding graph Fourier analysis is understanding GNNs.

Attention and convolution:

Attention is NOT convolution (not translation-equivariant). But analyzing attention in Fourier domain reveals its frequency preferences. Self-attention can implement convolution as a special case.

Scale-space and neural architectures:

Wavelets do multiscale analysis. CNNs with pooling do something similar—each layer operates at coarser scale. The connection is precise for certain architectures.

Uncertainty principle and representations:

You can’t have representations that are simultaneously localized in input space AND in “feature frequency” space. This constrains what neural representations can achieve.

Time-frequency analysis for sequences:

For temporal data (language, audio, video), Fourier and wavelet methods reveal structure. Transformers might be learning time-frequency representations implicitly.

The Fourier transform of the loss landscape:

Neural network loss landscapes have structure at multiple scales. Fourier analysis of the loss (as function of weights) might reveal optimization dynamics.


Now at 20 domains!

  1. Representation Theory
  2. Measure Theory
  3. Homotopy Type Theory
  4. Category Theory
  5. Spectral Theory
  6. Information Geometry
  7. Differential Geometry
  8. Algebraic Topology
  9. Lie Theory
  10. Dynamical Systems
  11. Algebraic Geometry
  12. Functional Analysis
  13. Complex Analysis
  14. Number Theory
  15. Mathematical Physics
  16. Information Theory
  17. Probability Theory
  18. Logic and Foundations
  19. Combinatorics
  20. Harmonic Analysis

Shall we continue with Optimization (convex analysis, duality, variational methods)? And then perhaps conclude with any remaining threads?