Handling null vs empty strings #114

andygrove · 2018-05-21T13:30:58Z

Given this input file, I would like the c_string value in row 2 to be returned as None and the c_string value in row 3 to be returned as Some(""). This would be more consistent with how other types are handled.

Is this desirable behavior? If so, I'd like to try and add this feature.

c_int,c_float,c_string,c_bool
1,,"hello",true
2,4.4,,false
3,6.6,"",false

The text was updated successfully, but these errors were encountered:

BurntSushi · 2018-05-21T14:01:49Z

I'm not sure this is a good idea. Semantically, an empty field and an empty quoted field are exactly the same in the underlying CSV data. For example, in order to add this feature, you'd need to thread changes all the way down into csv-core and probably even change its API.

It's not clear why you want this either. Can you provide real world examples?

andygrove · 2018-05-21T14:07:59Z

In the example above I get a None for the empty c_float field in row 1. rather than a default value of 0.0 so it just seemed inconsistent for string fields to behave differently.

The use case is simply that I'm running SQL queries against CSV files using predicates like c_string IS NULL / c_string IS NOT NULL and they work for some types but not strings.

I admit I haven't read a CSV spec in decades ... so maybe this is just how CSV works?

andygrove · 2018-05-21T14:15:34Z

I just read this blog post about how Spark handles this and it states that empty strings and missing strings are equivalent in CSV, so I guess we can close this issue.

I suppose I can add an option in my project to treat empty string as null or not.

Thanks for the help.

BurntSushi · 2018-05-21T14:19:12Z

What you can do is define a custom deserialize with function for serde that converts empty stringa to None values, so that you don't need to spread the case analysis out.

andygrove · 2018-05-21T14:29:44Z

Nice, I'll try that. Thanks again.

andygrove closed this as completed May 21, 2018

BurntSushi mentioned this issue Sep 12, 2018

Empty strings inside a flattened Option<String> don't result in None #131

Closed

youngsofun mentioned this issue Aug 31, 2024

Feature: CSV options do not fulfil our needs databendlabs/databend#16356

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Handling null vs empty strings #114

Handling null vs empty strings #114

andygrove commented May 21, 2018

BurntSushi commented May 21, 2018

Uh oh!

andygrove commented May 21, 2018

Uh oh!

andygrove commented May 21, 2018

Uh oh!

BurntSushi commented May 21, 2018

Uh oh!

andygrove commented May 21, 2018

Uh oh!

Uh oh!

Handling null vs empty strings #114

Handling null vs empty strings #114

Comments

andygrove commented May 21, 2018

BurntSushi commented May 21, 2018

Uh oh!

andygrove commented May 21, 2018

Uh oh!

andygrove commented May 21, 2018

Uh oh!

BurntSushi commented May 21, 2018

Uh oh!

andygrove commented May 21, 2018

Uh oh!