How to get the decision function after training a SVM model? #105

WuSiren · 2024-05-13T08:02:38Z

Generally, the decision function of a SVM model should be like $f(x) = sgn(\sum_i\alpha_iy_iK(x, x_i) + b)$. Then, how to get the $\alpha_i$ and $b$ in the function after the model is trained?

For example,

# Training data
X = [-2 -1 -1 1 1 2;
     -1 -1 -2 1 2 1]
y = [1, 1, 1, 2, 2, 2]

# Precomputed matrix for training (corresponds to linear kernel)
K = X' * X

model = svmtrain(K, y, kernel=Kernel.Precomputed)

Are they model.coefs and model.rho?

The text was updated successfully, but these errors were encountered:

barucden · 2024-05-13T13:18:36Z

I believe that model.rho is indeed the bias term, $b$, but model.coefs are the weights $\alpha_i$ multiplied by the labels $y_i$. The support vectors are stored in model.SVs.X and their labels in model.SVs.y. So to get the weights $\alpha_i$, you would have to divide model.coefs by model.SVs.y.

It would be great if you could verify this – and potentially correct me.

EDIT: if you are only interested in the decision values $f(x)$ and not in the coefficients and support vectors, then you can get those directly from svmpredict, which returns a tuple (predicted_labels, decision_values).

WuSiren · 2024-05-13T15:20:36Z

I think model.coefs are neither the weights $\alpha_i$ (generally, $\alpha_i$ are the Lagrangian multiplers and also the dual variables of the dual SVM problem) nor $\alpha_iy_i$. I also doubt that model.rho is the conventional $b$ in the SVM formulation.

As we know, in the dual problem of two-class SVM, the constraints on $\alpha_i$ are $\alpha_i \geqslant 0, \forall i$ and $\sum_i\alpha_iy_i=0$. However, please see the following example:

d = 3; N = 10;
X = rand(d, N)
y = rand([-1, 1], N)
K = X' * X
model = svmtrain(K, y, kernel=Kernel.Precomputed)
ind_SVs = model.SVs.indices
α = model.coefs ./ y[ind_SVs]
b = model.rho
α, b, α[:]>=zeros(N), sum(α)

It turned out that α[:]>=zeros(N) isn't always true and sum(α) is almost never 0.

As for one-class SVM, we have constraint $\sum_i\alpha_i=1$, but the following results do not match:

d = 3; N = 10;
X = rand(d, N)
K = X' * X
model = svmtrain(K, svmtype=OneClassSVM, kernel=Kernel.Precomputed)
ind_SVs = model.SVs.indices
α = model.coefs
b = model.rho
α, b,  sum(α)

The result of sum(α) is almost always 5.0.

WuSiren · 2024-05-13T15:39:24Z

EDIT: if you are only interested in the decision values $f(x)$ and not in the coefficients and support vectors, then you can get those directly from svmpredict, which returns a tuple (predicted_labels, decision_values).

Thanks, but I do need the exact value of the Lagrangian multiplers $\alpha_i$.

barucden · 2024-05-13T16:19:38Z

I think your code is computing $\sum_i \alpha_i$ instead of $\sum_i \alpha_i y_i$. Try

d = 3; N = 10;
X = rand(d, N)
y = rand([-1, 1], N)
model = svmtrain(X, y)
sv_y = model.SVs.y
α = model.coefs ./ sv_y
@show sum(sv_y .* α)

I am getting zero (or something close to zero).

Regarding the sign of the Lagrange multipliers, they should still be valid as long as all of them are of the same sign, right? The weight vector reads

$$\vec{w} = \sum_{i=1}^{L} \alpha_i y_i \vec{x}_i = \sum_{i=1}^{L} \mathrm{sign}( \alpha_i ) \lvert \alpha_i \rvert y_i \vec{x}_i,$$

and assuming $\mathrm{sign}( \alpha_1) = ... = \mathrm{sign}( \alpha_L)$,

$$\vec{w} = \mathrm{sign}( \alpha_1) \sum_{i=1}^{L} \lvert \alpha_i \rvert y_i \vec{x}_i$$

So if all the signs are the same, you get either $\vec{w}$ or $-\vec{w}$.

It all comes down to which label LIBSVM considers to be positive, and I think the answer is that the label of the first training example is chosen as "positive" (even if it happens to be $-1$ in your code).

So this should give you the expected result:

d = 3; N = 10;
X = rand(d, N)
y = rand([-1, 1], N)
model = svmtrain(X, y)
sv_y = model.SVs.y
α = first(y) * model.coefs ./ sv_y  # if y[1] == -1, flip the signs
@show sum(sv_y .* α) # should be zero
@show all(α .≥ 0) # should be true

WuSiren · 2024-05-13T17:01:06Z

Oh, I'm sorry for the wrong code in the first example. Thanks!

But how about the one-class SVM case? Why is the sum(α) always 5.0 since there's no interference from $y$?

The actual $\alpha$ = model.coefs / 5.0 and the actual $b$ (or $\rho$) = model.rho / 5.0, are they? And why is it 5.0?

barucden · 2024-05-13T18:32:50Z

For one-class SVM, LIBSVM imposes the constraint $\sum_{i} \alpha_i = \nu L$, where $\nu$ is a parameter and $L$ is the number of training examples (see https://www.csie.ntu.edu.tw/~cjlin/papers/libsvm.pdf).

WuSiren · 2024-05-14T01:48:29Z

Thank you very much! Everything becomes clear. 🤝

barucden · 2024-05-14T07:17:14Z

I kindly ask you to verify our conclusions and let us know here if they are correct. We can then close the issue.

WuSiren · 2024-05-14T15:43:33Z

Sure. So far, there is no evidence from limited instances that the conclusions are wrong. I will continue to gather more information to confirm this. To confirm the above conclusions theoretically, I think it's best if someone can analyze the code rigorously.

WuSiren · 2024-05-27T20:02:14Z

One more question: Does anyone know how to retrieve the optimal values of the slack variables ${\xi_i}$ from the trained model?

Thanks!

barucden · 2024-05-29T08:21:27Z

I don't know if that's possible. You might compute them yourself by obtaining the decision values $z_j$ for the training data:
$$\xi_i = 1 - y_j z_j$$

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to get the decision function after training a SVM model? #105

How to get the decision function after training a SVM model? #105

WuSiren commented May 13, 2024

barucden commented May 13, 2024 •

edited

Loading

WuSiren commented May 13, 2024 •

edited

Loading

WuSiren commented May 13, 2024

barucden commented May 13, 2024

WuSiren commented May 13, 2024 •

edited

Loading

barucden commented May 13, 2024 •

edited

Loading

WuSiren commented May 14, 2024

barucden commented May 14, 2024

WuSiren commented May 14, 2024

WuSiren commented May 27, 2024 •

edited

Loading

barucden commented May 29, 2024

How to get the decision function after training a SVM model? #105

How to get the decision function after training a SVM model? #105

Comments

WuSiren commented May 13, 2024

barucden commented May 13, 2024 • edited Loading

WuSiren commented May 13, 2024 • edited Loading

WuSiren commented May 13, 2024

barucden commented May 13, 2024

WuSiren commented May 13, 2024 • edited Loading

barucden commented May 13, 2024 • edited Loading

WuSiren commented May 14, 2024

barucden commented May 14, 2024

WuSiren commented May 14, 2024

WuSiren commented May 27, 2024 • edited Loading

barucden commented May 29, 2024

barucden commented May 13, 2024 •

edited

Loading

WuSiren commented May 13, 2024 •

edited

Loading

WuSiren commented May 13, 2024 •

edited

Loading

barucden commented May 13, 2024 •

edited

Loading

WuSiren commented May 27, 2024 •

edited

Loading