Skip to content

ENH: Validate merge is not made on NA values in the merge keys #37518

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
ThomasBourgeois opened this issue Oct 30, 2020 · 3 comments
Closed

ENH: Validate merge is not made on NA values in the merge keys #37518

ThomasBourgeois opened this issue Oct 30, 2020 · 3 comments
Labels
Enhancement Needs Triage Issue that has not been reviewed by a pandas team member

Comments

@ThomasBourgeois
Copy link

ThomasBourgeois commented Oct 30, 2020

Is your feature request related to a problem?

I wish pandas merge function would prodive a way to verify that it does not merge on NA values, which, most of the times, are not what we want. Right now, I'm always doing a pre-check that my right dataframe does not have any NA values in keys, but when you are doing 10 merges one after the other, this is cumbersome and bug-prone.

Describe the solution you'd like

This would be an additional parameter (e.g. validate_no_na_merge) in pd.merge() function, defaulting to False, which, when True, would just assert that there are no NA values in the keys of the right dataframe. Thus, checking that there won't be any merge on NA keys.

Note this leaves the possibility that keys in the left dataframe contain NA values and will be kept as such in the final merged dataframe. Here we just wish to verify that the merge itself is not made on NA values.

API breaking implications

If the parameter defaults to False, I do not see any breaks.

Describe alternatives you've considered

Right now, I'm always doing a pre-check such as :
assert not df_right.merge_key.isnull().any()
df = pd.merge(df_left, df_right, on='merge_key', ...)

Additional context

None

@ThomasBourgeois ThomasBourgeois added Enhancement Needs Triage Issue that has not been reviewed by a pandas team member labels Oct 30, 2020
@phofl
Copy link
Member

phofl commented Oct 30, 2020

Hi, thanks for your report.

That was already discussed. For example in #32306. Idea is to deprecate merge on NaN and remove it in a future version

@ThomasBourgeois
Copy link
Author

Thanks for your feedback,. Deprecating merge on NA seems to me maybe a bit too harsh, as you modify potentially the shape of your df, losing lines with potentially usefull information, which seems awkward I think..
What shall I do about this issue ? Close or keep it open ?

@rhshadrach
Copy link
Member

Closing as a duplicate of #32306. @ThomasBourgeois - if you have thoughts on desired behavior please do express them there!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement Needs Triage Issue that has not been reviewed by a pandas team member
Projects
None yet
Development

No branches or pull requests

3 participants