sparse.model.matrix creating inconsistent output

发布于 2020-04-10 16:06:39

I have an xgboost model on two different servers - a test server and a production server. Each server has exactly the same data and exactly the same code, but when I apply the same model to the same data in each environment I get a slightly different result. We need the results to be identical.

I've found that the sparse matrix object that the following line returns is different on each server:

mm <- sparse.model.matrix(y ~ ., data = df.new)[,-1]

The mm on the test server has @i and @x of length 182, whereas the mm on the production server has @i and @x of length 184. Again, I've compared the df.new from both servers and they are identical.

I've tried downgrading the Matrix package on the production server so that the versions match, but it's still producing different results. The only idea I have left is to match the versions of every package.

Does anyone have any suggestions for what might be happening? Unfortunately I can't share the data, but if it helps, it's 227 variables of mixed types (775 when converted to sparse model matrix). A lot of the variables are mostly 0.

I don't know if it makes a difference or not, but the test server is Windows and the production server is Linux.

Questioner

user123965

Viewed

Chinese

Original

dd <- data.frame(x=ordered(0:2)) > Matrix::sparse.model.matrix(~x,dd) 3 x 3 sparse Matrix of class "dgCMatrix" (Intercept) x.L x.Q 1 1 -7.071068e-01 0.4082483 2 1 -7.850462e-17 -0.8164966 3 1 7.071068e-01 0.4082483

sparse.model.matrix creating inconsistent output

Related issues