# Chapter 2: Noise and Smoothing > *"Differentiation amplifies noise. Smoothing reduces noise but introduces bias. There is no free lunch."* ## Connection to the Central Thesis Real measurements are never clean. Sensors have precision limits. Environments fluctuate. Sampling introduces quantization. The data we observe is: $$y_i = x(t_i) + \epsilon_i$$ where x(t) is the true state and ε is measurement noise. This creates a fundamental tension: **differentiation amplifies noise**, but **smoothing distorts the signal**. The approximation paradigm addresses this by treating numerical differentiation as a **regularized inverse problem**—we accept some bias to control variance. The smoothing parameter encodes our belief about the true system: "the underlying dynamics are probably smooth, so extreme oscillations in the derivative are likely noise, not signal." ## The Fundamental Tradeoff Every numerical differentiation method faces the same dilemma: - **No smoothing**: Derivatives are noisy (high variance) - **Heavy smoothing**: Derivatives are biased (systematic error) This is the **bias-variance tradeoff** applied to differentiation. ## Why Differentiation Amplifies Noise ### Frequency Domain Perspective Differentiation in the frequency domain is multiplication by iω: $$\mathcal{F}[f'(t)] = i\omega \cdot \mathcal{F}[f(t)]$$ High-frequency components get amplified. Noise is typically high-frequency. Therefore, differentiation amplifies noise. ### Quantitative Analysis If your signal is y(t) = f(t) + ε(t) where ε is white noise with power spectral density S_ε: - Signal derivative power: |iω|² S_f(ω) = ω² S_f(ω) - Noise derivative power: |iω|² S_ε = ω² S_ε The signal-to-noise ratio after differentiation: $$\text{SNR}_{\text{derivative}} = \frac{\omega^2 S_f(\omega)}{\omega^2 S_\epsilon} = \frac{S_f(\omega)}{S_\epsilon}$$ The SNR is unchanged only if signal and noise have the same spectrum. In practice, noise dominates at high frequencies, so differentiation destroys SNR. ## Smoothing as Regularization ### The Regularization Framework Instead of solving: $$\text{Find } f' \text{ such that } f \text{ matches data}$$ Solve: $$\text{Minimize } \|f - \text{data}\|^2 + \lambda \cdot \text{Roughness}(f)$$ where λ controls the bias-variance tradeoff: - λ = 0: Interpolate exactly (high variance) - λ → ∞: Maximally smooth (high bias) ### Common Roughness Penalties | Penalty | Formula | Effect | |---------|---------|--------| | First derivative | ∫(f')² dt | Penalizes slopes | | Second derivative | ∫(f'')² dt | Penalizes curvature | | Total variation | ∫|f'| dt | Preserves edges | The second derivative penalty is most common—it produces **cubic smoothing splines**. ## Smoothing Methods ### 1. Moving Average The simplest smoother: replace each point with the average of its neighbors. ```python import numpy as np def moving_average(y, window): """Simple moving average.""" kernel = np.ones(window) / window return np.convolve(y, kernel, mode='same') ``` **Properties**: - Reduces variance by factor of 1/window - Introduces phase lag - Blurs sharp features - Not optimal for any particular noise model ### 2. Savitzky-Golay Filter Fit a polynomial locally, use the polynomial's value (or derivative) at center. ```python from scipy.signal import savgol_filter # Smooth and differentiate in one step y_smooth = savgol_filter(y, window_length=11, polyorder=3) dy_smooth = savgol_filter(y, window_length=11, polyorder=3, deriv=1, delta=dt) ``` **Properties**: - Preserves moments up to polynomial order - Better edge preservation than moving average - Analytical derivatives from fitted polynomial - Window size and polynomial order are tuning parameters ### 3. Gaussian Smoothing Convolve with a Gaussian kernel: $$y_{\text{smooth}}(t) = \int y(\tau) \cdot \frac{1}{\sqrt{2\pi}\sigma} e^{-(t-\tau)^2/2\sigma^2} d\tau$$ ```python from scipy.ndimage import gaussian_filter1d y_smooth = gaussian_filter1d(y, sigma=2.0) ``` **Properties**: - Optimal for Gaussian noise (in MSE sense) - No ringing artifacts - σ controls smoothing strength - Derivative of Gaussian = smoothed derivative ### 4. Kernel Regression (Nadaraya-Watson) Weighted average with kernel weights: $$\hat{f}(x) = \frac{\sum_i K\left(\frac{x - x_i}{h}\right) y_i}{\sum_i K\left(\frac{x - x_i}{h}\right)}$$ where K is a kernel (Gaussian, Epanechnikov, etc.) and h is the bandwidth. ```python from sklearn.kernel_ridge import KernelRidge # Kernel regression kr = KernelRidge(kernel='rbf', gamma=1.0) kr.fit(t.reshape(-1, 1), y) y_smooth = kr.predict(t.reshape(-1, 1)) ``` ### 5. Spline Smoothing Minimize: $$\sum_i (y_i - f(t_i))^2 + \lambda \int (f''(t))^2 dt$$ The solution is a **natural cubic spline** with knots at data points. ```python from scipy.interpolate import UnivariateSpline # Smoothing spline (s controls smoothing) spline = UnivariateSpline(t, y, s=len(y)*noise_variance) y_smooth = spline(t) dy_smooth = spline.derivative()(t) ``` ## The Bias-Variance Decomposition For any estimator f̂ of true function f: $$\mathbb{E}[(f̂ - f)^2] = \text{Bias}^2 + \text{Variance}$$ where: - **Bias** = E[f̂] - f (systematic error from smoothing) - **Variance** = E[(f̂ - E[f̂])²] (random error from noise) ### Visualizing the Tradeoff ```python import numpy as np import matplotlib.pyplot as plt from scipy.interpolate import UnivariateSpline # True function t = np.linspace(0, 2*np.pi, 200) y_true = np.sin(t) dy_true = np.cos(t) # Noisy observations np.random.seed(42) noise = 0.1 y_noisy = y_true + noise * np.random.randn(len(t)) # Different smoothing levels smoothing_values = [0.01, 0.1, 1.0, 10.0] fig, axes = plt.subplots(2, 2, figsize=(12, 10)) axes = axes.flatten() for ax, s in zip(axes, smoothing_values): spline = UnivariateSpline(t, y_noisy, s=s*len(t)) dy_est = spline.derivative()(t) ax.plot(t, dy_true, 'k-', label='True derivative', linewidth=2) ax.plot(t, dy_est, 'r-', label=f'Estimated (s={s})', alpha=0.8) ax.set_title(f'Smoothing = {s}') ax.legend() ax.set_xlabel('t') ax.set_ylabel("f'(t)") plt.tight_layout() plt.show() ``` ## Choosing the Smoothing Parameter ### 1. Cross-Validation Leave out data points, predict them, minimize prediction error: $$\text{CV}(\lambda) = \frac{1}{n} \sum_{i=1}^n (y_i - \hat{f}_{-i}(x_i))^2$$ where f̂₋ᵢ is fitted without point i. ### 2. Generalized Cross-Validation (GCV) Efficient approximation to leave-one-out CV: $$\text{GCV}(\lambda) = \frac{\frac{1}{n}\sum_i (y_i - \hat{f}(x_i))^2}{(1 - \text{tr}(S)/n)^2}$$ where S is the smoothing matrix (ŷ = Sy). ### 3. AIC/BIC Information criteria that penalize complexity: $$\text{AIC} = n \log(\text{RSS}/n) + 2 \cdot \text{df}$$ $$\text{BIC} = n \log(\text{RSS}/n) + \log(n) \cdot \text{df}$$ where df is the effective degrees of freedom. ### 4. Known Noise Level If you know σ², set smoothing so residual variance ≈ σ²: ```python # For UnivariateSpline, s ≈ n * sigma^2 spline = UnivariateSpline(t, y, s=len(t) * sigma**2) ``` ## Differentiation-Specific Considerations ### Smoothing for Derivatives vs. Values The optimal smoothing for estimating f(x) differs from optimal smoothing for f'(x): - **For values**: Moderate smoothing - **For derivatives**: More smoothing needed (noise amplification) - **For second derivatives**: Even more smoothing Rule of thumb: Increase smoothing parameter by ~2-4x per derivative order. ### Boundary Effects Smoothing methods often perform poorly near boundaries: - Less data available for local averaging - Splines may oscillate - Derivatives especially unreliable **Solutions**: - Extend data with padding (reflection, extrapolation) - Use boundary-aware methods - Trim boundary regions from analysis ### Preserving Features Heavy smoothing can destroy important features: - Peaks get flattened - Edges get blurred - Oscillations get damped **Solutions**: - Adaptive smoothing (less smoothing where curvature is high) - Edge-preserving methods (total variation, bilateral filter) - Physics-informed constraints ## PyDelt's Smoothing Options ```python from pydelt.interpolation import SplineInterpolator, LowessInterpolator # Spline with explicit smoothing spline = SplineInterpolator(smoothing=0.1) spline.fit(t, y) dy = spline.differentiate(order=1)(t) # LOWESS (robust to outliers) lowess = LowessInterpolator(frac=0.1) lowess.fit(t, y) dy = lowess.differentiate(order=1)(t) ``` ## Practical Guidelines ### When to Use What | Situation | Recommended Method | |-----------|-------------------| | Gaussian noise, smooth function | Spline smoothing | | Outliers present | LOWESS/LOESS | | Sharp features to preserve | Savitzky-Golay or adaptive | | Very high noise | Heavy smoothing + accept bias | | Unknown noise level | Cross-validation | ### Diagnostic Checks 1. **Residual analysis**: Residuals should look like noise, not signal 2. **Sensitivity analysis**: How much do results change with smoothing parameter? 3. **Physical plausibility**: Do derivatives make sense (sign, magnitude)? ## Key Takeaways 1. **Differentiation amplifies noise** by factor ~1/h or ~ω in frequency domain 2. **Smoothing reduces variance** but introduces bias 3. **Optimal smoothing** depends on noise level and derivative order 4. **Cross-validation** provides data-driven smoothing selection 5. **Boundaries and features** require special attention ## Exercises 1. **Bias-variance visualization**: For sin(x) + noise, plot bias² and variance of derivative estimate vs. smoothing parameter. Find the optimal smoothing. 2. **Cross-validation**: Implement leave-one-out CV for spline smoothing. Compare to GCV. 3. **Derivative order**: For the same data, find optimal smoothing for f, f', and f''. How do they differ? 4. **Outlier robustness**: Add outliers to your data. Compare spline vs. LOWESS derivative estimates. --- *Previous: [← Numerical Differentiation](01_numerical_differentiation.md) | Next: [Interpolation Methods →](03_interpolation_methods.md)*