Numpyでカーネル回帰

f:id:yumaloop:20190426185334p:plain:w500

　カーネル法は，非線形データ解析に対する強力な武器です．ソフトマージンSVM・ガウス過程・クラスタリングなどのアルゴリズムの基本要素として頻出します．このポストでは，カーネル法を使って回帰問題を解く手続きを，Pythonで再現してみました．

　※ なお，カーネル法に関する知識は，以下の本を参考にしています．

カーネル多変量解析―非線形データ解析の新しい展開 (シリーズ確率と情報の科学)

作者: 赤穂昭太郎
出版社/メーカー: 岩波書店
発売日: 2008/11/27
メディア: 単行本
購入: 7人クリック: 180回
この商品を含むブログ (32件) を見る

カーネル法入門―正定値カーネルによるデータ解析 (シリーズ多変量データの統計科学)

作者: 福水健次
出版社/メーカー: 朝倉書店
発売日: 2010/11/01
メディア: 単行本
クリック: 19回
この商品を含むブログ (11件) を見る

カーネル回帰とは？
- カーネル関数とグラム行列
- 勾配降下法でパラメータ探索
カーネル回帰をPythonで実装 (numpy only)

カーネル回帰とは？

カーネル関数とグラム行列

$n$ 個のデータサンプル $\mathcal{D} = \left\{ x^{(i)}, y^{(i)} \right\} \, (i=1, \cdots n)$ から，関数 $y = f(x)$ を近似します．回帰モデルはカーネル関数 $k$ とパラメータ ${\alpha}_i \, (i=1 \cdots n)$ を用いて以下の式で表されます．

$$ \hat{y}^{(i)} = \sum_{j=1}^{n} \alpha_j k(x^{(i)}, x^{(j)}) \,\,\, i=1 \cdots n $$

推定値 $\hat{y}^{(i)}$ と真値 $y^{(i)}$ の間の距離を残差(residual)として計算します．データサンプルにモデルをfitさせて，残差の合計(あるいは平均)をモデルの損失関数として導出し，最適なパラメータ ${\alpha}_i \, (i=1 \cdots n)$ を求めます．

$$ \begin{align} residual(y^{(i)}, \hat{y}^{(i)}) &= y^{(i)} - \hat{y}^{(i)} \\ &= y^{(i)} - \sum_{j=1}^{n} \alpha_j k(x^{(i)}, x^{(j)}) \end{align} $$

$$ \begin{align} L(\alpha) &= \frac{1}{n} \sum_{i=1}^{n} {residual(y^{(i)}, \hat{y}^{(i)}) }^{2} \\ &= \frac{1}{n} \sum_{i=1}^{n} { \left( y^{(i)} - \sum_{j=1}^{n} \alpha_j k(x^{(i)}, x^{(j)}) \right) }^{2} \\ &= {({\bf y} - {\bf K} {\bf \alpha})}^{\mathrm{T}} ({\bf y} - {\bf K} {\bf \alpha}) \end{align} $$

$$ where, \,\,\, {\bf K} = \begin{pmatrix} k(x^{(1)}, x^{(1)}) & \cdots & k(x^{(1)}, x^{(n)}) \\ \vdots & \ddots & \vdots \\ k(x^{(n)}, x^{(1)}) & \cdots & k(x^{(n)}, x^{(n)}) \end{pmatrix} $$

${\bf K}$ はカーネル関数を与えるとデータサンプルから計算させる行列で，Gram行列と呼びます．

勾配降下法でパラメータ探索

パラメータ ${\alpha}_i \, (i=1 \cdots n)$ の探索には，ナイーブな勾配降下法を使います．係数 $\gamma$ は学習率です.

$$ \frac{\partial L(\alpha)}{\partial {\bf \alpha}} = -2 \cdot {\bf K}^{\mathrm{T}} ({\bf y} - {\bf K} {\bf \alpha}) \\ {\alpha}_{new} = {\alpha}_{old} + \gamma \cdot \frac{\partial L(\alpha)}{\partial {\bf \alpha}} \mid_{ \alpha = {\alpha}_{old} } $$

カーネル関数としては，RBF kernel(Gaussian kernel)を使う．

$k(x, x') := \exp \left( {\large - \frac{{|| x - x' ||}^2}{2 {\sigma}^{2}} } \right)$

カーネル回帰をPythonで実装 (numpy only)

コードは全て下記のリポジトリにあります． github.com

import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline

# Kernel Function
class RBFkernel():
    def __init__(self, sigma=0.5):
        self.sigma = sigma
        
    def __call__(self, x, y):
        numerator = -1 * np.sum((x - y)**2)
        denominator = 2 * (self.sigma**2)
        return np.exp(numerator / denominator)
    
    def get_params(self):
        return self.sigma
    
    def set_params(self, sigma):
        self.sigma = sigma

# Regression Model
class KernelRegression():
    def __init__(self, kernel):
        self.kernel = kernel
        
    def fit_kernel(self, X, y, lr=0.01, nb_epoch=1000, log_freq=50):
        self.X = X
        self.y = y
        self.n = X.shape[0] # sample size
        self.alpha = np.full(self.n, 1) # param alpha: initialize
        self.gram_matrix = np.zeros((self.n, self.n))
        
        # Gradient Descent Algorithm to optimize alpha
        for epoch in range(nb_epoch):
            
            # Gram Matrix
            for i in range(self.n):
                for j in range(self.n):
                    self.gram_matrix[i][j] = self.kernel(self.X[i], self.X[j])
                    self.loss, self.loss_grad = self.mse(self.X, self.y, self.alpha, self.gram_matrix)
                    self.alpha = self.alpha - lr * self.loss_grad
                    
            if epoch % log_freq == 0:
                print("epoch: {} \t MSE of sample data: {:.4f}".format(epoch, self.loss))
                        
                        
    def mse(self, X, y, alpha, gram_matrix):
        loss = np.dot((y - np.dot(gram_matrix, alpha)), (y - np.dot(gram_matrix, alpha)))
        loss_grad = -2 * np.dot(gram_matrix.T, (y - np.dot(gram_matrix, alpha)))
        return loss, loss_grad
    
    def predict(self, X_new):
        n_new = X_new.shape[0]
        y_new = np.zeros(n_new)
        for i in range(n_new):
            for j in range(self.n):
                y_new[i] += self.alpha[j] * self.kernel(X_new[i], self.X[j])
        return y_new

# Experiment

def actual_function(x):
    return 1.7 * np.sin(2 * x) + np.cos(1.5 * x) + 0.5 * np.cos(0.5 * x) + 0.5 * x  + 0.1

sample_size = 100 # the number of data sample
var_noise = 0.7 # variance of the gaussian noise for samples

# make data sample
x_sample = np.random.rand(sample_size) * 10 - 5
y_sample = actual_function(x_sample) + np.random.normal(0, var_noise, sample_size)

# variables for plot (actual function)
x_plot = np.linspace(-5, 5, 100)
y_plot = actual_function(x_plot)

# plot figure
plt.plot(x_plot, y_plot, color="blue", linestyle="dotted", label="actual function")
plt.scatter(x_sample, y_sample, alpha=0.4, color="blue", label="data sample")
plt.title("Actual function & Data sample (N={})".format(sample_size))
plt.legend(loc="upper left")
plt.xlabel("x")
plt.ylabel("y")
plt.show()

f:id:yumaloop:20190426184220p:plain:w500

# set kernel
sigma=0.2
kernel = RBFkernel(sigma=sigma)

# generate model
model = KernelRegression(kernel)

# fit data sample for the model
model.fit_kernel(x_sample, y_sample, lr=0.01, nb_epoch=1000, log_freq=100)
  epoch: 0      MSE of sample data: 35.2988
  epoch: 100    MSE of sample data: 30.3570
  epoch: 200    MSE of sample data: 29.4241
  epoch: 300    MSE of sample data: 28.8972
  epoch: 400    MSE of sample data: 28.5597
  epoch: 500    MSE of sample data: 28.3322
  epoch: 600    MSE of sample data: 28.1736
  epoch: 700    MSE of sample data: 28.0598
  epoch: 800    MSE of sample data: 27.9759
  epoch: 900    MSE of sample data: 27.9122

# check Gram Matrix of the model
print(model.gram_matrix)
  array([[1.00000000e+000, 3.24082380e-012, 1.86291046e-092, ...,
        3.45570112e-030, 7.50777061e-016, 6.18611768e-129],
       [3.24082380e-012, 1.00000000e+000, 5.11649994e-039, ...,
        1.78993799e-078, 1.05138357e-053, 1.15421467e-063],
       [1.86291046e-092, 5.11649994e-039, 1.00000000e+000, ...,
        6.88153992e-226, 4.47677881e-182, 8.98951607e-004],
       ...,
       [3.45570112e-030, 1.78993799e-078, 6.88153992e-226, ...,
        1.00000000e+000, 4.28581263e-003, 2.58161981e-281],
       [7.50777061e-016, 1.05138357e-053, 4.47677881e-182, ...,
        4.28581263e-003, 1.00000000e+000, 3.95135557e-232],
       [6.18611768e-129, 1.15421467e-063, 8.98951607e-004, ...,
        2.58161981e-281, 3.95135557e-232, 1.00000000e+000]])

# check loss
print(model.loss)
  12.333081499130776

# array for plot (predicted function)
x_new = np.linspace(-5, 5, 100)
y_new = model.predict(x_new)

 plot result
plt.plot(x_plot, y_plot, color="blue", linestyle="dotted", label="actual function")
plt.scatter(x_sample, y_sample, alpha=0.3, color="blue", label="data sample")
plt.plot(x_new, y_new, color="red", label="predicted function")
plt.title("Kernel Regression \n RBF kernel (sigma={}), sample size ={}".format(sigma, sample_size))
plt.legend(loc="upper left")
plt.xlabel("x")
plt.ylabel("y")
plt.show()