-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
Support categorical variables with CSVs #10153
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I'm not opposed to this in principle, but I think the API will necessarily be clunky. Would we require (or allow) the user to specify all categories in the call to @esafak we do support categoricals in |
Can't we already declare the dtypes of selected columns? I thought the problem was limited to categoricals, but if not, please expand my request to all dtypes. |
You can specify the types. I was just thinking pd.read_csv('file.csv', dtypes={'A': np.int64, 'B': pd.CategoricalDtype(['cat1', 'cat2', 'cat3'])}) which means you'd need to know all the categories up front. Or we infer them and you'll need to check that they're aren't any surprising categories. |
Nice workaround, but I think it is still nice to support As a first step, how about converting the specified columns to |
It would be nice to be able to read CSVs with categorical variables using read_csv's dtype parameter instead of casting the columns after the fact.
The text was updated successfully, but these errors were encountered: