Analogy

Think of it like this:

Kernel is like a shortcut formula:

  • The “long way”: Transform to high-dimensional space, then compute dot product → expensive
  • The “shortcut”: Use kernel function directly in original space → cheap
  • Result is the same!

Example:

Long way:
1. Transform x=[1,2] → φ(x)=[1, 4, √2·2, √2·1, √2·2, 1] (6D)
2. Transform y=[3,4] → φ(y)=[9, 16, √2·12, √2·3, √2·4, 1] (6D)
3. Compute φ(x)·φ(y) = 1×9 + 4×16 + ... = 121

Shortcut:
k(x,y) = (x·y)² = (11)² = 121 ✓ Same answer!

Why This Matters

The kernel trick allows us to:

  • Work with infinite-dimensional feature spaces without computing them explicitly
  • Make algorithms computationally tractable that would otherwise be impossible
  • Get the benefits of non-linear transformations at the cost of computing a simple function

Common Kernels

Polynomial Kernel: \(k(x,y) = (x \cdot y + c)^d\)

  • Computes dot product in polynomial feature space
  • Degree \(d\) controls complexity

RBF (Gaussian) Kernel: \(k(x,y) = \exp(-\gamma \|x-y\|^2)\)

  • Infinite-dimensional feature space
  • \(\gamma\) controls width of influence

Linear Kernel: \(k(x,y) = x \cdot y\)

  • Simplest case, no transformation
  • Equivalent to original feature space

Key Insight

You never need to compute \(\phi(x)\) explicitly. The kernel function \(k(x,y)\) directly gives you \(\langle\phi(x), \phi(y)\rangle\) without ever constructing the high-dimensional vectors.

This is why SVMs and other kernel methods can work with extremely high (or even infinite) dimensional spaces efficiently!