Harmonic Analysis: Leverage Map

A. EXISTENCE JUSTIFICATION

Every signal is a superposition of pure tones.

The physical origins: Vibrating strings produce overtones. Heat diffuses through materials. Light waves interfere. Sound is pressure oscillation. In each case, understanding requires decomposing complex phenomena into simple periodic components.

Fourier’s audacious claim (1807): ANY function can be written as a sum of sines and cosines:

$$f(x) = \frac{a_0}{2} + \sum_{n=1}^{\infty} \left(a_n \cos(nx) + b_n \sin(nx)\right)$$

Mathematicians were skeptical. What does “any” mean? How does the sum converge? For discontinuous functions? These questions drove 19th-century analysis.

The vindication: Fourier was essentially right, but the precise statements required new mathematics:

Lebesgue integration (for L² convergence)
Distribution theory (for pointwise issues)
Functional analysis (for operator theory)

Why harmonic analysis is fundamental:

The Fourier transform is arguably the most important single operation in applied mathematics:

Converts differentiation to multiplication
Converts convolution to multiplication
Diagonalizes translation-invariant operators
Reveals frequency content
Connects local and global behavior

The modern view: Harmonic analysis is the study of symmetry through analysis. On ℝⁿ, translation symmetry gives Fourier analysis. On other groups, representation theory gives generalized harmonic analysis. The theme: exploit symmetry to decompose functions.

B. CORE OBJECTS & MORPHISMS

The Fourier transform (various conventions):

Domain	Transform	Inverse
L¹(ℝ)	f̂(ξ) = ∫ f(x)e^{-2πixξ} dx	f(x) = ∫ f̂(ξ)e^{2πixξ} dξ
L²(ℝ)	Same, extended by density	Plancherel: ‖f̂‖₂ = ‖f‖₂
𝕋 = ℝ/ℤ (periodic)	f̂(n) = ∫₀¹ f(x)e^{-2πinx} dx	f(x) = Σₙ f̂(n)e^{2πinx}
ℤ (sequences)	f̂(θ) = Σₙ f(n)e^{-2πinθ}	f(n) = ∫₀¹ f̂(θ)e^{2πinθ} dθ
Finite ℤ/Nℤ	f̂(k) = Σₙ f(n)e^{-2πink/N}	f(n) = (1/N)Σₖ f̂(k)e^{2πink/N}

Core objects:

Object	What it is	Notation
Fourier transform	Decomposition into frequencies	f̂, ℱf, F[f]
Fourier series	Periodic case, discrete frequencies	Σ cₙe^{inx}
Convolution	(f * g)(x) = ∫ f(y)g(x-y) dy	f * g
Singular integral	Principal value integral with singular kernel	Hf, Tf
Maximal function	Mf(x) = sup_{r>0} (1/	B_r
Multiplier	Operator defined by m(ξ)f̂(ξ)	T_m
Distribution	Generalized function (continuous linear functional on test functions)	u ∈ 𝒟'
Tempered distribution	Distribution with polynomial growth control	u ∈ 𝒮'
Wavelet	Localized oscillating function	ψ
Frame	Overcomplete “basis” with reconstruction	{ψⱼ}

Function spaces:

Space	Definition	Norm
Lᵖ(ℝⁿ)	∫\|f\|ᵖ < ∞	‖f‖_p = (∫\|f\|ᵖ)^{1/p}
L^∞	Essential supremum finite	‖f‖_∞ = ess sup \|f\|
Schwartz 𝒮	Rapidly decreasing smooth functions	Seminorms on derivatives
Hˢ (Sobolev)	s derivatives in L²	‖f‖_{Hˢ} = ‖(1+\|ξ\|²)^{s/2}f̂‖₂
BMO	Bounded mean oscillation	sup_Q (1/\|Q\|)∫_Q \|f - f_Q\|
Hardy Hᵖ	Boundary values of holomorphic functions	Complex analysis norms

Morphisms:

Fourier transform: L¹ → C₀, L² → L² (isometry)
Convolution operators: Lᵖ → Lᵖ
Singular integrals: Lᵖ → Lᵖ (for 1 < p < ∞)

C. CENTRAL INVARIANTS

The Fourier transform properties:

Property	Statement
Linearity	(αf + βg)^ = αf̂ + βĝ
Translation	(τₐf)^ (ξ) = e^{-2πiaξ}f̂(ξ)
Modulation	(e^{2πiax}f)^ (ξ) = f̂(ξ-a)
Scaling	(f(·/λ))^ (ξ) = λf̂(λξ)
Differentiation	(f’)^ (ξ) = 2πiξ f̂(ξ)
Convolution	(f * g)^ = f̂ · ĝ
Multiplication	(fg)^ = f̂ * ĝ
Parseval/Plancherel	⟨f, g⟩ = ⟨f̂, ĝ⟩, ‖f‖₂ = ‖f̂‖₂

Uncertainty principle:

$$|xf|_2 \cdot |\xi\hat{f}|_2 \geq \frac{1}{4\pi}|f|_2^2$$

Equality iff f is a Gaussian. You cannot be localized in both position and frequency.

Decay and smoothness duality:

f has…	f̂ has…
Compact support	Real analytic (entire)
Rapid decay (Schwartz)	Rapid decay (Schwartz)
n derivatives in L¹	(1+\|ξ\|)ⁿf̂ bounded
Jump discontinuity	Slow decay (~1/ξ)

Key exponents:

Bound	Meaning
‖f̂‖_∞ ≤ ‖f‖₁	L¹ → L^∞
‖f̂‖₂ = ‖f‖₂	L² → L² isometry
‖f̂‖_q ≤ C‖f‖_p	Hausdorff-Young: 1/p + 1/q = 1, 1 ≤ p ≤ 2

What counts as “the same”:

Functions equal a.e. (in Lᵖ)
Same Fourier transform (determines function in L¹ ∩ L²)
Distributions equal if they agree on all test functions

D. SIGNATURE THEOREMS

1. Fourier Inversion Theorem

For f ∈ L¹(ℝ) with f̂ ∈ L¹(ℝ): $$f(x) = \int_{-\infty}^{\infty} \hat{f}(\xi) e^{2\pi i x\xi} d\xi$$

Importance: The Fourier transform is invertible—you can recover f from f̂. Analysis (decomposition) and synthesis (reconstruction) are dual operations.

Caveats: Need f̂ ∈ L¹ for pointwise inversion. In L², inversion is via limit of truncated integrals.

2. Plancherel Theorem

The Fourier transform extends to a unitary operator on L²(ℝ): $$|\hat{f}|_2 = |f|_2$$

Importance: The Fourier transform preserves energy (L² norm). It’s an isometry, hence has an inverse. This is why L² is the natural home for Fourier analysis.

3. Riemann-Lebesgue Lemma

If f ∈ L¹(ℝ), then f̂ is continuous and f̂(ξ) → 0 as |ξ| → ∞.

Importance: High-frequency components of L¹ functions must decay. This constrains what can be a Fourier transform.

4. Carleson’s Theorem

For f ∈ L²(𝕋), the Fourier series converges to f(x) almost everywhere.

Importance: This resolved a century-old question: does the Fourier series of an L² function converge pointwise? Yes, a.e. (not everywhere—counterexamples exist). The proof (1966) was a tour de force.

Contrast: For L¹ functions, Fourier series can diverge on sets of positive measure (Kolmogorov).

5. Calderón-Zygmund Theory

Singular integral operators of the form $$Tf(x) = \text{p.v.} \int \frac{\Omega(x-y)}{|x-y|^n} f(y) dy$$ are bounded on Lᵖ for 1 < p < ∞ (under conditions on Ω).

Importance: This extends Fourier analysis to handle singular objects. The Hilbert transform, Riesz transforms, and many PDE operators fit this framework. The theory gives Lᵖ bounds without explicit computation.

6. Littlewood-Paley Theory

Decompose f into frequency bands: f = Σⱼ Δⱼf where Δⱼ picks out frequencies ~2ʲ. Then: $$|f|_p \sim \left|\left(\sum_j |\Delta_j f|^2\right)^{1/2}\right|_p$$

Importance: This “square function” characterization lets you analyze f frequency-by-frequency, then reassemble. It’s fundamental for:

Bounding multipliers
Paradifferential calculus
Wavelets

7. Paley-Wiener Theorem

f̂ has compact support iff f extends to an entire function of exponential type.

Importance: Characterizes bandlimited functions. A signal with no frequencies above B is determined by samples at rate 2B (Nyquist-Shannon sampling theorem follows).

E. BRIDGES TO OTHER DOMAINS

Domain	Connection
PDEs	Fourier transform converts differential equations to algebraic. Heat kernel, wave propagation, Schrödinger. Pseudodifferential operators.
Probability	Characteristic functions ARE Fourier transforms. Central limit theorem via Fourier. Lévy’s inversion.
Number Theory	Poisson summation, theta functions, modular forms. Circle method. Analytic number theory.
Signal Processing	Sampling, filtering, compression. FFT algorithm. Wavelets. Time-frequency analysis.
Representation Theory	Fourier analysis on groups. Peter-Weyl theorem. Harmonic analysis on Lie groups.
Functional Analysis	Spectral theory of operators. Banach algebras. Operator algebras.
Quantum Mechanics	Position/momentum duality IS Fourier duality. Uncertainty principle. Wigner function.
Complex Analysis	Hardy spaces, Paley-Wiener. Analytic signals. Hilbert transform.
Machine Learning	Fourier features, spectral methods. Convolutional networks and Fourier. Neural tangent kernel.
Image Processing	Fourier filtering, deconvolution. JPEG (DCT). Wavelets (JPEG 2000).

Pattern-linking gold:

Convolution is multiplication in frequency:

$$(f * g)^\wedge = \hat{f} \cdot \hat{g}$$

This single fact explains:

Why LTI systems have transfer functions
Why Gaussians are special (Gaussian × Gaussian = Gaussian)
Why convolution is associative (multiplication is)
How to solve constant-coefficient ODEs

The Poisson summation formula:

$$\sum_{n \in \mathbb{Z}} f(n) = \sum_{k \in \mathbb{Z}} \hat{f}(k)$$

Connects discrete sums to Fourier transforms. Applications:

Theta function transformation
Number theory (counting lattice points)
Sampling theory
Modular forms

Scale-frequency tradeoff:

Localizing in space spreads in frequency; localizing in frequency spreads in space. This is the uncertainty principle, manifested everywhere:

Heisenberg (quantum mechanics)
Gabor limit (time-frequency)
Wavelet scaling

F. COMMON MISCONCEPTIONS

“Every function has a Fourier transform” — In classical sense, need integrability conditions. Distributions extend to all tempered functions, but with care.
“Fourier series converge pointwise” — For L² yes (a.e., Carleson). For L¹, can diverge on positive measure sets. For continuous functions, can diverge at points (du Bois-Reymond).
“The Fourier transform is THE decomposition” — It’s one of many: wavelets, Gabor, curvelets, shearlets. Each optimized for different structure (singularities, edges, anisotropy).
“Wavelets replaced Fourier” — They complement, not replace. Fourier for stationary/periodic signals, wavelets for transient/multiscale. Both have their domains.
“FFT is an approximation” — The Fast Fourier Transform computes the EXACT discrete Fourier transform in O(n log n) instead of O(n²). No approximation, just clever factorization.
“High frequencies = noise” — High frequencies carry edges, fine detail. Removing them blurs. “Noise” is problem-dependent, not frequency-determined.
“Uncertainty principle is about measurement” — The mathematical uncertainty principle is about the spread of f and f̂. The quantum version relates to non-commuting observables. Related but distinct.
“Singular integrals are pathological” — They’re natural and ubiquitous: Hilbert transform, Riesz transforms, Cauchy integral. The theory (Calderón-Zygmund) makes them tractable.

G. NOTATION SURVIVAL KIT

Symbol	Meaning
f̂ or ℱf	Fourier transform of f
f̌ or ℱ⁻¹f	Inverse Fourier transform
f * g	Convolution
τₐf	Translation: (τₐf)(x) = f(x-a)
Mf	Hardy-Littlewood maximal function
Hf	Hilbert transform
Δⱼ	Littlewood-Paley projection to frequency ~2ʲ
𝒮	Schwartz space
𝒮'	Tempered distributions
𝒟	Test functions (compact support smooth)
𝒟'	Distributions
Lᵖ	Lebesgue space
Hˢ	Sobolev space
BMO	Bounded mean oscillation
Hᵖ	Hardy space
ξ, ω, k	Frequency variables
δ	Dirac delta (distribution)
p.v.	Principal value
supp	Support of function

Key transforms:

Transform	Definition	Pair
Fourier	f̂(ξ) = ∫ f(x)e^{-2πixξ} dx	(f, f̂)
Laplace	F(s) = ∫₀^∞ f(t)e^{-st} dt	(f, F)
Mellin	φ(s) = ∫₀^∞ f(x)x^{s-1} dx	Multiplicative Fourier
Hilbert	Hf(x) = p.v. (1/π) ∫ f(t)/(x-t) dt	Phase shift by π/2
Wavelet	Wf(a,b) = ∫ f(t)ψ*((t-b)/a) dt/√a	(position, scale)

H. ONE WORKED MICRO-EXAMPLE

Solving the heat equation via Fourier transform

Setup: Heat equation on ℝ:

$$\frac{\partial u}{\partial t} = \frac{\partial^2 u}{\partial x^2}, \quad u(x, 0) = f(x)$$

Take Fourier transform in x:

Let û(ξ, t) = ∫ u(x, t)e^{-2πixξ} dx.

The equation becomes:

$$\frac{\partial \hat{u}}{\partial t} = (2\pi i\xi)^2 \hat{u} = -4\pi^2\xi^2 \hat{u}$$

This is an ODE in t!

Solve the ODE:

$$\hat{u}(\xi, t) = \hat{u}(\xi, 0) e^{-4\pi^2\xi^2 t} = \hat{f}(\xi) e^{-4\pi^2\xi^2 t}$$

Inverse transform:

$$u(x, t) = \int \hat{f}(\xi) e^{-4\pi^2\xi^2 t} e^{2\pi ix\xi} d\xi$$

Recognize the convolution:

The function e^{-4π²ξ²t} is the Fourier transform of the Gaussian:

$$g_t(x) = \frac{1}{\sqrt{4\pi t}} e^{-x^2/(4t)}$$

(the “heat kernel”)

By convolution theorem:

$$u(x, t) = (f * g_t)(x) = \int f(y) \frac{1}{\sqrt{4\pi t}} e^{-(x-y)^2/(4t)} dy$$

Interpretation:

The solution at time t is the initial data f convolved with a Gaussian of width √t. Heat “spreads out” via Gaussian averaging. The Fourier transform diagonalized the Laplacian!

Micro-example 2: The Hilbert transform

Definition:

$$Hf(x) = \text{p.v.} \frac{1}{\pi} \int_{-\infty}^{\infty} \frac{f(t)}{x - t} dt$$

The “principal value” means: limit as ε → 0 of ∫_{|x-t| > ε}.

Fourier multiplier:

$$\widehat{Hf}(\xi) = -i \cdot \text{sgn}(\xi) \cdot \hat{f}(\xi)$$

So H multiplies positive frequencies by -i (phase shift -π/2) and negative frequencies by +i (phase shift +π/2).

Key properties:

H² = -I (applying twice gives negative identity)
H is bounded on Lᵖ for 1 < p < ∞
H is NOT bounded on L¹ or L^∞
For f real, f + iHf is the “analytic signal” (has only positive frequencies)

Connection to complex analysis:

If F = f + iHf (analytic signal), then F extends to a holomorphic function in the upper half-plane with boundary values F(x).

The Hilbert transform extracts the imaginary part from the boundary values of a holomorphic function with given real part.

Importance:

Signal processing: instantaneous phase and amplitude
PDEs: Cauchy-Riemann equations
Singular integral theory: prototype singular integral

Micro-example 3: Wavelets

The problem with Fourier:

Fourier analysis is global—each frequency component extends over all time. For signals with transient features (edges, bursts), Fourier doesn’t localize well.

The wavelet idea:

Use basis functions localized in BOTH time and frequency:

$$\psi_{a,b}(t) = \frac{1}{\sqrt{a}} \psi\left(\frac{t-b}{a}\right)$$

b = position (translation)
a = scale (dilation)—large a = low frequency, small a = high frequency

The wavelet transform:

$$Wf(a, b) = \int f(t) \overline{\psi_{a,b}(t)} dt$$

This gives a time-scale representation (vs. Fourier’s time-frequency with infinite time support).

Reconstruction:

Under admissibility condition (∫|ψ̂(ξ)|²/|ξ| dξ < ∞):

$$f(t) = C_\psi^{-1} \int_0^\infty \int_{-\infty}^{\infty} Wf(a,b) \psi_{a,b}(t) \frac{db , da}{a^2}$$

Discrete wavelets (orthonormal bases):

Daubechies constructed wavelets ψ such that {2^{j/2}ψ(2ʲt - k)}_{j,k ∈ ℤ} is an orthonormal basis for L²(ℝ).

Multiresolution analysis:

Build a ladder of approximation spaces V₀ ⊂ V₁ ⊂ V₂ ⊂ … where Vⱼ resolves features at scale 2⁻ʲ. The wavelets span the “details” Wⱼ = Vⱼ₊₁ ⊖ Vⱼ.

Why wavelets matter:

Compression (JPEG 2000): wavelets concentrate energy in few coefficients
Denoising: threshold small wavelet coefficients
Multiscale analysis: natural for fractals, turbulence
Fast algorithms: O(n) discrete wavelet transform

Harmonic Analysis on Groups

The general pattern:

Group	“Frequencies”	Transform
ℝ	ℝ (continuous)	Fourier transform
𝕋 = ℝ/ℤ	ℤ (discrete)	Fourier series
ℤ	𝕋 (continuous)	Discrete-time Fourier
ℤ/Nℤ	ℤ/Nℤ	Discrete Fourier (DFT)
General LCA group G	Dual group Ĝ	Pontryagin duality
Compact group G	Irreducible representations	Peter-Weyl theorem
Noncompact Lie group	Unitary dual (complicated!)	Plancherel measure

Pontryagin duality:

For locally compact abelian (LCA) groups:

The dual Ĝ = continuous homomorphisms G → 𝕋
The dual of the dual is G: Ĝ̂ ≅ G
Fourier transform: L¹(G) → C₀(Ĝ)

Peter-Weyl theorem (compact groups):

Matrix coefficients of irreducible representations form an orthonormal basis of L²(G).

This IS Fourier analysis: for G = 𝕋, the irreps are χₙ(θ) = e^{inθ}, and matrix coefficients are the Fourier basis.

For G = SO(3): irreps are the Wigner D-matrices, basis functions are spherical harmonics.

Leverage for your work:

Fourier features in neural networks:

Random Fourier features: approximate kernel k(x,y) = k(x-y) via:

$$k(x-y) \approx \frac{1}{D} \sum_{j=1}^{D} \cos(\omega_j \cdot x + b_j) \cos(\omega_j \cdot y + b_j)$$

where ωⱼ are sampled from the Fourier transform of k. This makes kernel methods scalable.

The Neural Tangent Kernel is analyzed via Fourier:

The infinite-width limit of neural networks has spectral properties determined by the architecture. Fourier analysis of the NTK reveals which frequencies the network learns fast/slow.

Positional encodings:

Transformer positional encodings use sin/cos at different frequencies:

$$PE(pos, 2i) = \sin(pos / 10000^{2i/d})$$ $$PE(pos, 2i+1) = \cos(pos / 10000^{2i/d})$$

This IS a Fourier basis! The network can learn to attend to relative positions because sin(a+b) decomposes nicely.

Spectral graph theory:

Graph Laplacian L = D - A has eigenvalues and eigenvectors. The eigenvectors are “Fourier modes” on the graph. Graph convolution = multiplication in this spectral domain.

GNNs often work in this spectral basis. Understanding graph Fourier analysis is understanding GNNs.

Attention and convolution:

Attention is NOT convolution (not translation-equivariant). But analyzing attention in Fourier domain reveals its frequency preferences. Self-attention can implement convolution as a special case.

Scale-space and neural architectures:

Wavelets do multiscale analysis. CNNs with pooling do something similar—each layer operates at coarser scale. The connection is precise for certain architectures.

Uncertainty principle and representations:

You can’t have representations that are simultaneously localized in input space AND in “feature frequency” space. This constrains what neural representations can achieve.

Time-frequency analysis for sequences:

For temporal data (language, audio, video), Fourier and wavelet methods reveal structure. Transformers might be learning time-frequency representations implicitly.

The Fourier transform of the loss landscape:

Neural network loss landscapes have structure at multiple scales. Fourier analysis of the loss (as function of weights) might reveal optimization dynamics.

Now at 20 domains!

Representation Theory
Measure Theory
Homotopy Type Theory
Category Theory
Spectral Theory
Information Geometry
Differential Geometry
Algebraic Topology
Lie Theory
Dynamical Systems
Algebraic Geometry
Functional Analysis
Complex Analysis
Number Theory
Mathematical Physics
Information Theory
Probability Theory
Logic and Foundations
Combinatorics
Harmonic Analysis

Shall we continue with Optimization (convex analysis, duality, variational methods)? And then perhaps conclude with any remaining threads?