Question About Methods in Predictive Imputer #116

oattah1 · 2018-06-12T16:56:45Z

Predictive Imputer version:
Python version:
Operating System:

Description

Hi,
Thank you so much for implementing missForest in python; it has been really helpful. I was wondering why self, instead of new_imputed similar to the output of the pseudocode in the missForest paper(Algorithm 1; https://academic.oup.com/bioinformatics/article/28/1/112/219101)? Also in test_predictive_imputer.py, I see that in the test_predictive_imputer method after you called the predictive imputer class with imputer = predictive_imputer.PredictiveImputer() then you call imputer.fit(X).transform(X.copy())and assign that to be X_trans. I looked at the transform method and I noticed that the initial code to the transformed method is similar to the beginning of the fit method. I was wondering why is the transform method called after the model is fitted? Also could you explain more about the transform method?

What I Did

I copy the code in the Predictive Imputer class and edited the code in the fit method to print new_Imputed and and print X-trans = imputer.fit(X).transform(X.copy()) here is the result:
New Imputed From Fit Alone: [[-0.75511697 -0.56000831 -0.17324395 ... -0.28197847 -0.75023925
   1.57648847]
 [ 1.57128414  2.01646626 -0.4169594  ... -0.98846385  0.74134789
  -1.37396475]
 [-0.07996254  1.47008046 -0.70736089 ... -0.32923566  2.85404741
  -0.74426667]
 ...
 [ 1.31779515  1.58276785 -0.75654577 ... -0.72755881  1.08314364
  -0.58445367]
 [-0.13867505 -1.01695036 -0.97534895 ...  0.69337525  1.10940039
   0.32357511]
 [-1.29000647 -1.51285039 -0.37142956 ...  1.61670097 -1.3070204
   1.06002442]]
62.30102516297484
363.3446210619877
X-trans: [[-0.75511697 -0.56000831 -0.0921431  ... -0.28197847 -0.75023925
   1.61018726]
 [ 1.60153873  2.01646626 -0.44672337 ... -0.98846385  0.74134789
  -1.37396475]
 [-0.07996254  1.47008046 -0.70736089 ... -0.29503304  2.85404741
  -0.74426667]
 ...
 [ 0.00804308  1.40770558 -0.80077326 ... -0.48625397  1.08314364
  -0.58445367]
 [-0.13867505 -1.01695036 -0.80506119 ...  0.69337525  1.10940039
   0.32357511]
 [-1.19984082 -1.51285039 -0.37142956 ...  1.61670097 -1.3070204
   1.06002442]]
Some elements are similar but obviously some elements vary which was why I was wondering if you could explain the transform method

Thank you so much for doing this
Ochiba

The text was updated successfully, but these errors were encountered:

oattah1 · 2018-06-12T17:13:07Z

Also in the pseudocode of missForest paper, they created a vector of sorted indices of columns in X with respect to increasing amount of missing values; however your variable most_by_nan sort with respect to decreasing amount of missing values, could you explain why you did.
Thank you so much
Ochiba

oattah1 · 2018-06-12T19:49:55Z

Also could the n_estimators be changed to 100 instead of 50 like the default in missForest

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question About Methods in Predictive Imputer #116

Question About Methods in Predictive Imputer #116

oattah1 commented Jun 12, 2018

oattah1 commented Jun 12, 2018

oattah1 commented Jun 12, 2018

Question About Methods in Predictive Imputer #116

Question About Methods in Predictive Imputer #116

Comments

oattah1 commented Jun 12, 2018

Description

What I Did

oattah1 commented Jun 12, 2018

oattah1 commented Jun 12, 2018