How To Standardize Data With Sklearn's Cross_val_score()

September 08, 2024 Post a Comment

Let's say I want to use a LinearSVC to perform k-fold-cross-validation on a dataset. How would I perform standardization on the data? The best practice I have read is to build your

Solution 1:

You can use a Pipeline to combine both of the processes and then send it into the cross_val_score().

When the fit() is called on the pipeline, it will fit all the transforms one after the other and transform the data, then fit the transformed data using the final estimator. And during predict() (Only available if last object in pipeline is an estimator, otherwise transform()) it will apply transforms to the data, and predict with the final estimator.

Like this:

scalar = StandardScaler()
clf = svm.LinearSVC()

pipeline = Pipeline([('transformer', scalar), ('estimator', clf)])

cv = KFold(n_splits=4)
scores = cross_val_score(pipeline, X, y, cv = cv)

Check out various examples of pipeline to understand it better:

Baca Juga

http://scikit-learn.org/stable/modules/generated/sklearn.pipeline.Pipeline.html#examples-using-sklearn-pipeline-pipeline

Feel free to ask if any doubt.

theprettymind1987

How To Standardize Data With Sklearn's Cross_val_score()

Solution 1:

Post a Comment for "How To Standardize Data With Sklearn's Cross_val_score()"

Widget HTML #3