Saptashwa Bhattacharyya
1 min readJul 10, 2019

--

Hello Tony, thanks for visiting! I was busy so couldn’t get back earlier.

Cross validation divides the data-set into training and test sets randomly. This is different from what you have as real test-set, that you obtained from train-test-split. So when you apply normalization (StandardScalar) without pipeline, the test fold within cross-validation contain info from the training set. Thus the best parameters obtained this way are biased.

Hopefully this helps!

--

--

Saptashwa Bhattacharyya
Saptashwa Bhattacharyya

Written by Saptashwa Bhattacharyya

PhD, Astrophysics. Using Deep Learning, Searching Dark Matter! https://www.linkedin.com/in/saptashwa

Responses (2)