A minor error on page 258 and ch08.ipynb (Training a logistic regression model for document classification) when preparing train and test datasets #139

pavlo-yanchenko · 2023-08-14T13:30:02Z

When we prepare the train and test datasets, we slice the IMDB dataset dataframe with the .loc method (slicing using the index).

X_train = df.loc[:25000, 'review'].values
y_train = df.loc[:25000, 'sentiment'].values
X_test = df.loc[25000:, 'review'].values
y_test = df.loc[25000:, 'sentiment'].values

It's worth noting that contrary to usual Python slices, .loc includes both the start and the stop points in the result (when they are present in the index). So, it ends up with having the sample #25000 in both train and test datasets.

The text was updated successfully, but these errors were encountered:

rasbt · 2023-08-14T13:50:36Z

Great point. I think it's best to switch to .iloc here

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

A minor error on page 258 and ch08.ipynb (Training a logistic regression model for document classification) when preparing train and test datasets #139

A minor error on page 258 and ch08.ipynb (Training a logistic regression model for document classification) when preparing train and test datasets #139

pavlo-yanchenko commented Aug 14, 2023

rasbt commented Aug 14, 2023

A minor error on page 258 and ch08.ipynb (Training a logistic regression model for document classification) when preparing train and test datasets #139

A minor error on page 258 and ch08.ipynb (Training a logistic regression model for document classification) when preparing train and test datasets #139

Comments

pavlo-yanchenko commented Aug 14, 2023

rasbt commented Aug 14, 2023