我需要编写一个函数以获取数据集的曲线拟合。下面的代码是我所拥有的。它尝试使用梯度下降来找到最适合数据的多项式系数。
//solves for y using the form y = a + bx + cx^2 ...
double calc_polynomial(int degree, double x, double* coeffs) {
double y = 0;
for (int i = 0; i <= degree; i++)
y += coeffs[i] * pow(x, i);
return y;
}
//find polynomial fit
//returns an array of coefficients degree + 1 long
double* poly_fit(double* x, double* y, int count, int degree, double learningRate, int iterations) {
double* coeffs = malloc(sizeof(double) * (degree + 1));
double* sums = malloc(sizeof(double) * (degree + 1));
for (int i = 0; i <= degree; i++)
coeffs[i] = 0;
for (int i = 0; i < iterations; i++) {
//reset sums each iteration
for (int j = 0; j <= degree; j++)
sums[j] = 0;
//update weights
for (int j = 0; j < count; j++) {
double error = calc_polynomial(degree, x[j], coeffs) - y[j];
//update sums
for (int k = 0; k <= degree; k++)
sums[k] += error * pow(x[j], k);
}
//subtract sums
for (int j = 0; j <= degree; j++)
coeffs[j] -= sums[j] * learningRate;
}
free(sums);
return coeffs;
}
而我的测试代码:
double x[] = { 0, 1, 2, 3, 4 };
double y[] = { 5, 3, 2, 3, 5 };
int size = sizeof(x) / sizeof(*x);
int degree = 1;
double* coeffs = poly_fit(x, y, size, degree, 0.01, 1000);
for (int i = 0; i <= degree; i++)
printf("%lf\n", coeffs[i]);
上面的代码在度= 1时起作用,但是更高的值会导致系数以nan的形式返回。
我也尝试过更换
coeffs[j] -= sums[j] * learningRate;
和
coeffs[j] -= (1/count) * sums[j] * learningRate;
但是我回到0而不是nan。
有人知道我在做什么错吗?
我尝试degree = 2, iteration = 10
了一下,但得到的结果不是nan
(值几千左右)以外的值,然后加1iteration
似乎使结果的大小增加了大约3倍。
从这个观察结果中,我猜想结果乘以count
。
在表达中
coeffs[j] -= (1/count) * sums[j] * learningRate;
两者的1
和count
是整数,所以整数除法在做1/count
,它会变成零,如果count
大于1。
取而代之的是,你可以将乘法结果除以count
。
coeffs[j] -= sums[j] * learningRate / count;
另一种方法是使用1.0
(double
value)代替1
。
coeffs[j] -= (1.0/count) * sums[j] * learningRate;