# Chapter 3: Interpolation Methods

> *"To differentiate discrete data, first make it continuous. The choice of interpolation method determines everything."*

## Connection to the Central Thesis

This chapter is the heart of the approximation paradigm. **Interpolation creates the differentiable surrogate** that makes everything else possible.

The true system evolves according to unknown dynamics. We observe discrete samples. By fitting an interpolant, we construct a continuous, differentiable function that approximates the true trajectory. This surrogate:

- **Exists everywhere** in the domain (not just at data points)
- **Has well-defined derivatives** (by construction)
- **Encodes smoothness assumptions** about the true system
- **Yields to standard calculus** tools

The choice of interpolation method is the most consequential decision in numerical differentiation. It determines what class of functions you're searching over, what smoothness you assume, and what errors you'll incur.

## The Interpolation-Differentiation Pipeline

The standard approach to numerical differentiation:

1. **Fit** a continuous function f̂(x) to discrete data (xᵢ, yᵢ)
2. **Differentiate** f̂(x) analytically or numerically
3. **Evaluate** f̂'(x) at desired points

The interpolation method determines:
- Smoothness of derivatives
- Bias-variance tradeoff
- Computational cost
- Behavior at boundaries

## Polynomial Interpolation

### Lagrange Interpolation

Given n+1 points, there's a unique polynomial of degree n passing through all of them:

$$p(x) = \sum_{i=0}^{n} y_i \prod_{j \neq i} \frac{x - x_j}{x_i - x_j}$$

**Problem**: High-degree polynomials oscillate wildly (Runge's phenomenon).

```python
import numpy as np
import matplotlib.pyplot as plt

# Runge's function
f = lambda x: 1 / (1 + 25*x**2)

# Interpolate with increasing degree
x = np.linspace(-1, 1, 1000)
for n in [5, 10, 15]:
    xi = np.linspace(-1, 1, n+1)
    yi = f(xi)
    
    # Lagrange interpolation
    from numpy.polynomial import polynomial as P
    coeffs = np.polyfit(xi, yi, n)
    p = np.polyval(coeffs, x)
    
    plt.plot(x, p, label=f'degree {n}')

plt.plot(x, f(x), 'k--', label='true', linewidth=2)
plt.ylim(-1, 2)
plt.legend()
plt.title("Runge's Phenomenon")
plt.show()
```

**Lesson**: Global polynomial interpolation is unstable. Use piecewise methods instead.

## Spline Interpolation

### What is a Spline?

A spline of degree k is a piecewise polynomial where:
- Each piece is degree k
- Function is Cᵏ⁻¹ continuous (k-1 continuous derivatives)
- Pieces join at **knots**

### Cubic Splines (k=3)

Most common choice. Between knots xᵢ and xᵢ₊₁:

$$S_i(x) = a_i + b_i(x-x_i) + c_i(x-x_i)^2 + d_i(x-x_i)^3$$

Constraints:
- Interpolation: S(xᵢ) = yᵢ
- C¹ continuity: S'ᵢ(xᵢ₊₁) = S'ᵢ₊₁(xᵢ₊₁)
- C² continuity: S''ᵢ(xᵢ₊₁) = S''ᵢ₊₁(xᵢ₊₁)
- Boundary conditions (natural, clamped, not-a-knot)

### Natural Cubic Splines

Boundary condition: S''(x₀) = S''(xₙ) = 0

This minimizes the "total curvature":

$$\int_{x_0}^{x_n} (S''(x))^2 dx$$

among all C² interpolants.

### Smoothing Splines

Don't interpolate exactly—balance fit and smoothness:

$$\min_f \sum_{i=1}^n (y_i - f(x_i))^2 + \lambda \int (f''(x))^2 dx$$

The solution is a natural cubic spline with knots at all data points.

```python
from scipy.interpolate import UnivariateSpline
import numpy as np

t = np.linspace(0, 2*np.pi, 100)
y = np.sin(t) + 0.1 * np.random.randn(100)

# Interpolating spline (s=0)
spline_interp = UnivariateSpline(t, y, s=0)

# Smoothing spline
spline_smooth = UnivariateSpline(t, y, s=1.0)

# Derivatives are analytical
dy_smooth = spline_smooth.derivative()(t)
d2y_smooth = spline_smooth.derivative(n=2)(t)
```

### B-Splines

Splines represented in the B-spline basis:

$$S(x) = \sum_{i} c_i B_{i,k}(x)$$

where Bᵢ,ₖ are B-spline basis functions of degree k.

**Advantages**:
- Numerically stable
- Local support (changing one coefficient affects limited region)
- Efficient algorithms (de Boor's algorithm)

```python
from scipy.interpolate import BSpline, make_interp_spline

# Create B-spline interpolant
spline = make_interp_spline(t, y, k=3)

# Evaluate and differentiate
y_interp = spline(t_new)
dy_interp = spline.derivative()(t_new)
```

## Local Polynomial Regression

### The Idea

Instead of fitting one global function, fit local polynomials around each query point.

At query point x₀:
1. Select nearby data points
2. Fit weighted polynomial (weights decrease with distance)
3. Use polynomial value/derivative at x₀

### LOWESS/LOESS

**LO**cally **W**eighted **S**catterplot **S**moothing:

$$\min_{\beta} \sum_{i=1}^n w_i(x_0) (y_i - \beta_0 - \beta_1(x_i - x_0) - ...)^2$$

where weights w_i(x₀) decrease with |xᵢ - x₀|.

Common weight function (tricube):

$$w(u) = \begin{cases} (1 - |u|^3)^3 & |u| < 1 \\ 0 & |u| \geq 1 \end{cases}$$

```python
from pydelt.interpolation import LowessInterpolator

# LOWESS with 10% of data in each local fit
lowess = LowessInterpolator(frac=0.1)
lowess.fit(t, y)

# Smooth values and derivatives
y_smooth = lowess(t)
dy_smooth = lowess.differentiate(order=1)(t)
```

### Local Linear Approximation (LLA)

PyDelt's LLA fits local polynomials with explicit derivative estimation:

```python
from pydelt.interpolation import LlaInterpolator

# Window-based local fitting
lla = LlaInterpolator(window_size=11, poly_order=2)
lla.fit(t, y)

# First and second derivatives
dy = lla.differentiate(order=1)(t)
d2y = lla.differentiate(order=2)(t)
```

### Comparison: Global vs. Local

| Aspect | Global (Splines) | Local (LOWESS/LLA) |
|--------|------------------|-------------------|
| Smoothness | C² everywhere | May have discontinuities |
| Adaptivity | Same smoothing everywhere | Can adapt to local features |
| Computation | O(n) after setup | O(n²) naive, O(n log n) with trees |
| Extrapolation | Dangerous | Very dangerous |
| Outlier robustness | Poor | Good (with robust weights) |

## Kernel Methods

### Kernel Smoothing

Generalization of local averaging:

$$\hat{f}(x) = \frac{\sum_i K_h(x - x_i) y_i}{\sum_i K_h(x - x_i)}$$

where Kₕ(u) = K(u/h)/h is a scaled kernel.

### Common Kernels

| Kernel | Formula | Properties |
|--------|---------|------------|
| Gaussian | exp(-u²/2) | Infinitely smooth |
| Epanechnikov | (3/4)(1-u²) for |u|<1 | Optimal MSE |
| Tricube | (1-|u|³)³ for |u|<1 | Used in LOESS |
| Uniform | 1 for |u|<1 | Moving average |

### Bandwidth Selection

The bandwidth h controls bias-variance:
- Small h: Low bias, high variance
- Large h: High bias, low variance

Optimal bandwidth (for MSE) scales as n^(-1/5) for kernel regression.

## Radial Basis Functions (RBF)

### The Approach

Represent the function as:

$$f(x) = \sum_{i=1}^n c_i \phi(\|x - x_i\|)$$

where φ is a radial basis function.

### Common RBFs

| RBF | Formula | Properties |
|-----|---------|------------|
| Gaussian | exp(-r²/ε²) | Infinitely smooth |
| Multiquadric | √(1 + (εr)²) | Smooth, good for interpolation |
| Thin plate spline | r² log(r) | Minimizes bending energy |
| Polyharmonic | rᵏ or rᵏ log(r) | Generalizes thin plate |

### Interpolation with RBFs

Solve the linear system:

$$\begin{pmatrix} \Phi & P \\ P^T & 0 \end{pmatrix} \begin{pmatrix} c \\ \lambda \end{pmatrix} = \begin{pmatrix} y \\ 0 \end{pmatrix}$$

where Φᵢⱼ = φ(‖xᵢ - xⱼ‖) and P contains polynomial terms.

```python
from scipy.interpolate import RBFInterpolator
import numpy as np

# 1D example
x = np.linspace(0, 2*np.pi, 20).reshape(-1, 1)
y = np.sin(x.flatten())

# RBF interpolation
rbf = RBFInterpolator(x, y, kernel='thin_plate_spline')

# Evaluate
x_new = np.linspace(0, 2*np.pi, 100).reshape(-1, 1)
y_new = rbf(x_new)
```

## Gaussian Process Regression

### Probabilistic Interpolation

Model the function as a Gaussian process:

$$f(x) \sim \mathcal{GP}(m(x), k(x, x'))$$

where m(x) is the mean function and k(x, x') is the covariance kernel.

### Posterior Mean and Variance

Given data (X, y), the posterior at new point x* is:

$$f(x^*) | X, y \sim \mathcal{N}(\mu^*, \sigma^{*2})$$

where:
$$\mu^* = k(x^*, X) [K + \sigma_n^2 I]^{-1} y$$
$$\sigma^{*2} = k(x^*, x^*) - k(x^*, X) [K + \sigma_n^2 I]^{-1} k(X, x^*)$$

### Derivatives of GPs

If f ~ GP, then f' ~ GP with:
- Mean: derivative of posterior mean
- Covariance: derivative of kernel

```python
from sklearn.gaussian_process import GaussianProcessRegressor
from sklearn.gaussian_process.kernels import RBF, WhiteKernel

# GP regression
kernel = RBF(length_scale=1.0) + WhiteKernel(noise_level=0.1)
gp = GaussianProcessRegressor(kernel=kernel)
gp.fit(t.reshape(-1, 1), y)

# Predictions with uncertainty
y_pred, y_std = gp.predict(t_new.reshape(-1, 1), return_std=True)
```

## PyDelt's Interpolation Classes

### Unified Interface

All interpolators share the same API:

```python
from pydelt.interpolation import (
    SplineInterpolator,
    LlaInterpolator,
    GllaInterpolator,
    LowessInterpolator,
    LoessInterpolator,
    FdaInterpolator,
    NeuralNetworkInterpolator
)

# Same pattern for all
interp = SplineInterpolator(smoothing=0.1)
interp.fit(t, y)

# Evaluate
y_smooth = interp(t_query)

# Differentiate
dy = interp.differentiate(order=1)(t_query)
d2y = interp.differentiate(order=2)(t_query)
```

### Method Comparison

| Method | Best For | Smoothing Control | Derivative Quality |
|--------|----------|-------------------|-------------------|
| Spline | Smooth functions | `smoothing` parameter | Excellent (analytical) |
| LLA | Local features | `window_size` | Good (polynomial) |
| GLLA | Noisy + local | `embedding`, `n` | Good |
| LOWESS | Outliers | `frac` | Moderate (numerical) |
| FDA | Functional data | Basis functions | Excellent |
| Neural | Complex patterns | Architecture | Exact (autodiff) |

## Choosing an Interpolation Method

### Decision Tree

1. **Is data very noisy?**
   - Yes → Use smoothing spline or LOWESS
   - No → Interpolating spline may work

2. **Are there outliers?**
   - Yes → LOWESS/LOESS (robust)
   - No → Splines are fine

3. **Do you need derivatives at non-data points?**
   - Yes → Splines or GP (continuous derivatives)
   - No → Local methods are fine

4. **Is the function highly nonlinear/complex?**
   - Yes → Neural network or GP
   - No → Splines or local polynomials

5. **Is computational speed critical?**
   - Yes → Splines (O(n) after setup)
   - No → Any method

## Key Takeaways

1. **Avoid high-degree global polynomials** (Runge's phenomenon)
2. **Splines** are the workhorse: piecewise polynomials with continuity
3. **Local methods** (LOWESS, LLA) adapt to local structure
4. **Smoothing parameter** controls bias-variance tradeoff
5. **PyDelt provides unified interface** across all methods

## Exercises

1. **Runge's phenomenon**: Interpolate 1/(1+25x²) with polynomials of degree 5, 10, 15, 20. Then use cubic splines. Compare.

2. **Spline vs. LOWESS**: Generate sin(x) + outliers. Compare derivative estimates from smoothing spline vs. LOWESS.

3. **Bandwidth selection**: Implement cross-validation for LOWESS bandwidth. Find optimal for sin(x) + noise.

4. **GP derivatives**: Use sklearn's GP to fit noisy data. Extract derivative by finite-differencing the posterior mean. Compare to analytical GP derivative.

---

*Previous: [← Noise and Smoothing](02_noise_and_smoothing.md) | Next: [Multivariate Derivatives →](04_multivariate_derivatives.md)*