# Chapter 1: Numerical Differentiation

> *"The derivative is a limit. In practice, we can't take limits—we can only approximate them."*

## Connection to the Central Thesis

Recall our core problem: we observe a system governed by unknown dynamics dx/dt = f(x,t), but we only have discrete, noisy samples. **We cannot compute the exact derivative**—we can only approximate it.

Finite differences are the simplest approximation. They work directly on the data without fitting a surrogate. This makes them fast and assumption-free, but also fragile: they amplify noise and require careful step size selection.

Understanding finite differences is essential because:
1. They reveal **why differentiation is hard** (ill-posed inverse problem)
2. They establish **error analysis** techniques used throughout
3. They motivate **why we need smoothing and interpolation**

## The Fundamental Problem

You have discrete samples (tᵢ, yᵢ) and want to estimate dy/dt. The textbook definition:

$$f'(x) = \lim_{h \to 0} \frac{f(x+h) - f(x)}{h}$$

is useless because:
1. You can't take h → 0 with finite precision arithmetic
2. Your data has gaps—you can't evaluate f at arbitrary points
3. Your measurements have noise

This chapter covers the classical approaches and their limitations.

## Finite Difference Formulas

### Forward, Backward, and Central Differences

Given evenly-spaced data with step h:

| Formula | Expression | Error Order |
|---------|------------|-------------|
| Forward | (f(x+h) - f(x)) / h | O(h) |
| Backward | (f(x) - f(x-h)) / h | O(h) |
| Central | (f(x+h) - f(x-h)) / 2h | O(h²) |

The central difference is more accurate because errors cancel symmetrically.

### Derivation via Taylor Series

Expand f(x+h) and f(x-h) around x:

$$f(x+h) = f(x) + hf'(x) + \frac{h^2}{2}f''(x) + \frac{h^3}{6}f'''(x) + O(h^4)$$

$$f(x-h) = f(x) - hf'(x) + \frac{h^2}{2}f''(x) - \frac{h^3}{6}f'''(x) + O(h^4)$$

Subtracting:

$$f(x+h) - f(x-h) = 2hf'(x) + \frac{h^3}{3}f'''(x) + O(h^5)$$

So:

$$\frac{f(x+h) - f(x-h)}{2h} = f'(x) + \frac{h^2}{6}f'''(x) + O(h^4)$$

The error is O(h²), not O(h).

### Higher-Order Formulas

More points = higher accuracy:

**Five-point stencil for f'(x):**
$$f'(x) \approx \frac{-f(x+2h) + 8f(x+h) - 8f(x-h) + f(x-2h)}{12h} + O(h^4)$$

**Second derivative (central):**
$$f''(x) \approx \frac{f(x+h) - 2f(x) + f(x-h)}{h^2} + O(h^2)$$

### Implementation

```python
import numpy as np

def finite_diff_derivative(y, h, order=1, accuracy=2):
    """
    Compute derivative using finite differences.
    
    Parameters:
        y: array of function values
        h: step size (assumed uniform)
        order: derivative order (1 or 2)
        accuracy: 2 or 4 (determines stencil width)
    """
    n = len(y)
    dy = np.zeros(n)
    
    if order == 1 and accuracy == 2:
        # Central difference O(h²)
        dy[1:-1] = (y[2:] - y[:-2]) / (2*h)
        dy[0] = (y[1] - y[0]) / h  # Forward at boundary
        dy[-1] = (y[-1] - y[-2]) / h  # Backward at boundary
        
    elif order == 1 and accuracy == 4:
        # Five-point stencil O(h⁴)
        dy[2:-2] = (-y[4:] + 8*y[3:-1] - 8*y[1:-3] + y[:-4]) / (12*h)
        # Fall back to lower order at boundaries
        dy[0] = (y[1] - y[0]) / h
        dy[1] = (y[2] - y[0]) / (2*h)
        dy[-2] = (y[-1] - y[-3]) / (2*h)
        dy[-1] = (y[-1] - y[-2]) / h
        
    elif order == 2 and accuracy == 2:
        # Second derivative O(h²)
        dy[1:-1] = (y[2:] - 2*y[1:-1] + y[:-2]) / h**2
        dy[0] = dy[1]  # Extrapolate at boundaries
        dy[-1] = dy[-2]
        
    return dy
```

## The Two Sources of Error

### Truncation Error

From dropping higher-order Taylor terms. For central difference:

$$\epsilon_{\text{trunc}} \approx \frac{h^2}{6} |f'''(x)|$$

**Decreases** as h → 0.

### Rounding Error

From finite precision arithmetic. When subtracting nearly-equal numbers:

$$\epsilon_{\text{round}} \approx \frac{\epsilon_{\text{machine}} |f(x)|}{h}$$

where ε_machine ≈ 10⁻¹⁶ for float64.

**Increases** as h → 0.

### The Optimal Step Size

Total error ≈ truncation + rounding:

$$\epsilon_{\text{total}} \approx \frac{h^2}{6}|f'''| + \frac{\epsilon_{\text{machine}}|f|}{h}$$

Minimize by taking derivative with respect to h and setting to zero:

$$h_{\text{opt}} \approx \left(\frac{3\epsilon_{\text{machine}}|f|}{|f'''|}\right)^{1/3}$$

For typical functions and float64: **h_opt ≈ 10⁻⁵ to 10⁻⁶**.

```python
import numpy as np
import matplotlib.pyplot as plt

def test_optimal_h():
    """Demonstrate optimal step size."""
    f = np.sin
    f_prime = np.cos
    x = 1.0
    
    h_values = np.logspace(-15, 0, 100)
    errors = []
    
    for h in h_values:
        approx = (f(x + h) - f(x - h)) / (2 * h)
        true = f_prime(x)
        errors.append(abs(approx - true))
    
    plt.loglog(h_values, errors)
    plt.xlabel('Step size h')
    plt.ylabel('Absolute error')
    plt.title('Finite Difference Error vs Step Size')
    plt.axvline(1e-5, color='r', linestyle='--', label='Optimal h ≈ 10⁻⁵')
    plt.legend()
    plt.show()
    
    # Find optimal
    opt_idx = np.argmin(errors)
    print(f"Optimal h: {h_values[opt_idx]:.2e}, Error: {errors[opt_idx]:.2e}")
```

## The Noise Amplification Problem

With noisy data y = f(x) + ε where ε ~ N(0, σ²):

$$\frac{y(x+h) - y(x-h)}{2h} = f'(x) + \frac{\epsilon(x+h) - \epsilon(x-h)}{2h}$$

The noise term has variance:

$$\text{Var}\left(\frac{\epsilon(x+h) - \epsilon(x-h)}{2h}\right) = \frac{2\sigma^2}{4h^2} = \frac{\sigma^2}{2h^2}$$

**Noise in derivative scales as σ/h**. Smaller h = more noise amplification.

### Demonstration

```python
import numpy as np
from pydelt.interpolation import SplineInterpolator

# True function and derivative
t = np.linspace(0, 2*np.pi, 200)
y_true = np.sin(t)
dy_true = np.cos(t)

# Add noise
noise_level = 0.02
y_noisy = y_true + noise_level * np.random.randn(len(t))

# Finite difference (naive)
h = t[1] - t[0]
dy_fd = np.gradient(y_noisy, h)

# Compare noise levels
print(f"Signal noise std: {np.std(y_noisy - y_true):.4f}")
print(f"Derivative noise std: {np.std(dy_fd - dy_true):.4f}")
print(f"Amplification factor: {np.std(dy_fd - dy_true) / np.std(y_noisy - y_true):.1f}x")
```

## Irregular Grids

Real data often has non-uniform spacing. The formulas change:

### Two-Point (Non-uniform)

$$f'(x_i) \approx \frac{f(x_{i+1}) - f(x_{i-1})}{x_{i+1} - x_{i-1}}$$

### Three-Point (Non-uniform)

For points x₀, x₁, x₂ with spacings h₁ = x₁ - x₀ and h₂ = x₂ - x₁:

$$f'(x_1) \approx \frac{h_1^2 f(x_2) - h_2^2 f(x_0) + (h_2^2 - h_1^2) f(x_1)}{h_1 h_2 (h_1 + h_2)}$$

### Fornberg's Algorithm

For arbitrary stencils, use Fornberg's algorithm to compute optimal weights:

```python
def fornberg_weights(z, x, m):
    """
    Compute finite difference weights for derivative of order m
    at point z using function values at points x.
    
    Based on Fornberg (1988).
    """
    n = len(x) - 1
    c = np.zeros((n+1, m+1))
    c[0, 0] = 1
    c1 = 1
    
    for i in range(1, n+1):
        mn = min(i, m)
        c2 = 1
        for j in range(i):
            c3 = x[i] - x[j]
            c2 *= c3
            if i <= m:
                c[i-1, i] = 0
            for k in range(mn, -1, -1):
                c[i, k] = c1 * (k * c[i-1, k-1] - (x[i-1] - z) * c[i-1, k]) / c2
                c[j, k] = ((x[i] - z) * c[j, k] - k * c[j, k-1]) / c3
        c1 = c2
    
    return c[:, m]
```

## PyDelt's Finite Difference Methods

PyDelt provides robust finite difference implementations:

```python
from pydelt import finite_difference

# Basic usage
t = np.linspace(0, 1, 100)
y = np.sin(2 * np.pi * t)

# First derivative
dy = finite_difference(t, y, order=1)

# Second derivative
d2y = finite_difference(t, y, order=2)

# With specified accuracy
dy_high = finite_difference(t, y, order=1, accuracy=4)
```

## When Finite Differences Fail

Finite differences are inadequate when:

1. **Noise level is significant** (σ/h becomes large)
2. **Data is sparse** (can't use higher-order stencils)
3. **Function has sharp features** (truncation error dominates)
4. **You need derivatives at non-sample points**

The solution: **smoothing** and **interpolation**, covered in the next chapters.

## Key Takeaways

1. **Central differences** are O(h²), better than forward/backward O(h)
2. **Optimal h** balances truncation and rounding error (~10⁻⁵ for float64)
3. **Noise amplifies** as 1/h—smaller steps = more noise
4. **Irregular grids** require modified formulas (Fornberg's algorithm)
5. **Finite differences alone** are insufficient for noisy data

## Exercises

1. **Optimal step size**: For f(x) = exp(x) at x = 0, find the step size that minimizes total error. Compare to the theoretical prediction.

2. **Noise amplification**: Generate sin(x) with Gaussian noise σ = 0.01. Plot derivative error vs. step size h. At what h does noise dominate?

3. **Higher-order stencils**: Implement the 5-point stencil for f'(x). Compare accuracy to 3-point for f(x) = x⁵ at x = 1.

4. **Non-uniform grid**: Generate data on a random grid. Implement the 3-point non-uniform formula and compare to linear interpolation + uniform finite difference.

---

*Previous: [← The Numerical Calculus Problem](00_why_calculus.md) | Next: [Noise and Smoothing →](02_noise_and_smoothing.md)*