Skip to content

Handling null vs empty strings #114

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
andygrove opened this issue May 21, 2018 · 5 comments
Closed

Handling null vs empty strings #114

andygrove opened this issue May 21, 2018 · 5 comments

Comments

@andygrove
Copy link

Given this input file, I would like the c_string value in row 2 to be returned as None and the c_string value in row 3 to be returned as Some(""). This would be more consistent with how other types are handled.

Is this desirable behavior? If so, I'd like to try and add this feature.

c_int,c_float,c_string,c_bool
1,,"hello",true
2,4.4,,false
3,6.6,"",false
@BurntSushi
Copy link
Owner

I'm not sure this is a good idea. Semantically, an empty field and an empty quoted field are exactly the same in the underlying CSV data. For example, in order to add this feature, you'd need to thread changes all the way down into csv-core and probably even change its API.

It's not clear why you want this either. Can you provide real world examples?

@andygrove
Copy link
Author

In the example above I get a None for the empty c_float field in row 1. rather than a default value of 0.0 so it just seemed inconsistent for string fields to behave differently.

The use case is simply that I'm running SQL queries against CSV files using predicates like c_string IS NULL / c_string IS NOT NULL and they work for some types but not strings.

I admit I haven't read a CSV spec in decades ... so maybe this is just how CSV works?

@andygrove
Copy link
Author

I just read this blog post about how Spark handles this and it states that empty strings and missing strings are equivalent in CSV, so I guess we can close this issue.

I suppose I can add an option in my project to treat empty string as null or not.

Thanks for the help.

@BurntSushi
Copy link
Owner

What you can do is define a custom deserialize with function for serde that converts empty stringa to None values, so that you don't need to spread the case analysis out.

@andygrove
Copy link
Author

Nice, I'll try that. Thanks again.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants