You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I wish pandas merge function would prodive a way to verify that it does not merge on NA values, which, most of the times, are not what we want. Right now, I'm always doing a pre-check that my right dataframe does not have any NA values in keys, but when you are doing 10 merges one after the other, this is cumbersome and bug-prone.
Describe the solution you'd like
This would be an additional parameter (e.g. validate_no_na_merge) in pd.merge() function, defaulting to False, which, when True, would just assert that there are no NA values in the keys of the right dataframe. Thus, checking that there won't be any merge on NA keys.
Note this leaves the possibility that keys in the left dataframe contain NA values and will be kept as such in the final merged dataframe. Here we just wish to verify that the merge itself is not made on NA values.
API breaking implications
If the parameter defaults to False, I do not see any breaks.
Describe alternatives you've considered
Right now, I'm always doing a pre-check such as : assert not df_right.merge_key.isnull().any() df = pd.merge(df_left, df_right, on='merge_key', ...)
Additional context
None
The text was updated successfully, but these errors were encountered:
Thanks for your feedback,. Deprecating merge on NA seems to me maybe a bit too harsh, as you modify potentially the shape of your df, losing lines with potentially usefull information, which seems awkward I think..
What shall I do about this issue ? Close or keep it open ?
Uh oh!
There was an error while loading. Please reload this page.
Is your feature request related to a problem?
I wish pandas merge function would prodive a way to verify that it does not merge on NA values, which, most of the times, are not what we want. Right now, I'm always doing a pre-check that my right dataframe does not have any NA values in keys, but when you are doing 10 merges one after the other, this is cumbersome and bug-prone.
Describe the solution you'd like
This would be an additional parameter (e.g. validate_no_na_merge) in pd.merge() function, defaulting to False, which, when True, would just assert that there are no NA values in the keys of the right dataframe. Thus, checking that there won't be any merge on NA keys.
Note this leaves the possibility that keys in the left dataframe contain NA values and will be kept as such in the final merged dataframe. Here we just wish to verify that the merge itself is not made on NA values.
API breaking implications
If the parameter defaults to False, I do not see any breaks.
Describe alternatives you've considered
Right now, I'm always doing a pre-check such as :
assert not df_right.merge_key.isnull().any()
df = pd.merge(df_left, df_right, on='merge_key', ...)
Additional context
None
The text was updated successfully, but these errors were encountered: