Skip to content

With multiple scorers, if a scorer fails, GridSearchCV will fail #830

Open
@bolliger32

Description

@bolliger32

What happened:
I was running GridSearchCV with multiple scoring metrics. One of them ("neg_mean_poisson_deviance") was undefined for some folds b/c it is undefined when y_hat is 0. This was handled during scoring but when create_cv_results was called, this raised a TypeError: 'float' object is not subscriptable. This is b/c score would normally return a dictionary when mutliple scorers are requested but in this case it returned the value I had passed as error_score to GridSearchCV, which in this case was np.nan. The issue is between L274 and L297 in methods.py.

What you expected to happen:
I expected that score to be np.nan for the folds in which the scorer failed, but not to raise an error

Minimal Complete Verifiable Example:

from sklearn.linear_model import LinearRegression
from dask_ml.model_selection import GridSearchCV
from sklearn.model_selection import LeaveOneOut
import numpy as np

X = np.array([[1, 2],
              [2, 1],
              [0, 0]])

y = 3 * X[:, 0] + 4 * X[:, 1]
cv = LeaveOneOut()

ols = LinearRegression(fit_intercept=False)
regr = GridSearchCV(
    ols,
    {"normalize": [False, True]},
    scoring=["neg_mean_squared_error", "neg_mean_poisson_deviance"],
    refit=False,
    cv=cv,
    error_score=np.nan,
    n_jobs=1
)
regr.fit(X,y)

This gives the TypeError I mentioned

Anything else we need to know?:
I think this should be a fairly quick fix so I'm going to give it a try

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions