Skip to content
This repository has been archived by the owner on Jun 22, 2022. It is now read-only.

Question About Methods in Predictive Imputer #116

Open
oattah1 opened this issue Jun 12, 2018 · 2 comments
Open

Question About Methods in Predictive Imputer #116

oattah1 opened this issue Jun 12, 2018 · 2 comments

Comments

@oattah1
Copy link

oattah1 commented Jun 12, 2018

  • Predictive Imputer version:
  • Python version:
  • Operating System:

Description

Hi,
Thank you so much for implementing missForest in python; it has been really helpful. I was wondering why self, instead of new_imputed similar to the output of the pseudocode in the missForest paper(Algorithm 1; https://academic.oup.com/bioinformatics/article/28/1/112/219101)? Also in test_predictive_imputer.py, I see that in the test_predictive_imputer method after you called the predictive imputer class with imputer = predictive_imputer.PredictiveImputer() then you call imputer.fit(X).transform(X.copy())and assign that to be X_trans. I looked at the transform method and I noticed that the initial code to the transformed method is similar to the beginning of the fit method. I was wondering why is the transform method called after the model is fitted? Also could you explain more about the transform method?

What I Did

I copy the code in the Predictive Imputer class and edited the code in the fit method to print new_Imputed and and print X-trans = imputer.fit(X).transform(X.copy()) here is the result:
New Imputed From Fit Alone: [[-0.75511697 -0.56000831 -0.17324395 ... -0.28197847 -0.75023925
   1.57648847]
 [ 1.57128414  2.01646626 -0.4169594  ... -0.98846385  0.74134789
  -1.37396475]
 [-0.07996254  1.47008046 -0.70736089 ... -0.32923566  2.85404741
  -0.74426667]
 ...
 [ 1.31779515  1.58276785 -0.75654577 ... -0.72755881  1.08314364
  -0.58445367]
 [-0.13867505 -1.01695036 -0.97534895 ...  0.69337525  1.10940039
   0.32357511]
 [-1.29000647 -1.51285039 -0.37142956 ...  1.61670097 -1.3070204
   1.06002442]]
62.30102516297484
363.3446210619877
X-trans: [[-0.75511697 -0.56000831 -0.0921431  ... -0.28197847 -0.75023925
   1.61018726]
 [ 1.60153873  2.01646626 -0.44672337 ... -0.98846385  0.74134789
  -1.37396475]
 [-0.07996254  1.47008046 -0.70736089 ... -0.29503304  2.85404741
  -0.74426667]
 ...
 [ 0.00804308  1.40770558 -0.80077326 ... -0.48625397  1.08314364
  -0.58445367]
 [-0.13867505 -1.01695036 -0.80506119 ...  0.69337525  1.10940039
   0.32357511]
 [-1.19984082 -1.51285039 -0.37142956 ...  1.61670097 -1.3070204
   1.06002442]]
Some elements are similar but obviously some elements vary which was why I was wondering if you could explain the transform method

Thank you so much for doing this
Ochiba

@oattah1
Copy link
Author

oattah1 commented Jun 12, 2018

Also in the pseudocode of missForest paper, they created a vector of sorted indices of columns in X with respect to increasing amount of missing values; however your variable most_by_nan sort with respect to decreasing amount of missing values, could you explain why you did.
Thank you so much
Ochiba

@oattah1
Copy link
Author

oattah1 commented Jun 12, 2018

Also could the n_estimators be changed to 100 instead of 50 like the default in missForest

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant