-
-
Notifications
You must be signed in to change notification settings - Fork 18.6k
two dataframes outer join on null values #7473
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
see the linked question as well. |
Using
You can easily drop the missing values before merging.
This is the default, as normally this would be up to the user to drop before merging. could do this a bit better I suppose. |
Thanks. I think the outer join should follow the standard SQL otherwise it could be confusing.
In your example, you dropped the rows with null values before doing the outer join. But in the two examples above, the rows are not dropped. In the first example, student Alan has course NULL. After joining to the instructor table, the Alan row is not dropped. The final result just says Alan has course NULL and instructor NULL. In the second example, similarly we see that John with a Department ID NULL is not dropped in a full outer join. A solution was suggested in the SO example above for left join but for a full outer join we need to do something like
But I think is unnatural and inconsistent with the standard SQL. |
my example is very similar to what your soln shows. I think this is not well defined ATM in pandas. Welcome a pull-request to address this. (I don't know if changing this to SQL behavior will break anything; if it does then that would need to be address). |
@socheon Bit late after your comment but I notice this is still open and it helped me recently. I didn't quite get what I was looking for using your snippet but a bit of adjustment returned what I wanted and is more similar to the results expected doing a full outer join in SQL:
|
Closing as duplicate of #32306 with a more recent discussion on the future policy we want. |
In pandas, unlike SQL, the rows seemed to be joining on null values. Is this a bug?
related SO: http://stackoverflow.com/questions/23940181/pandas-merging-with-missing-values/23940686#23940686
Code snippet
Output
You can see row 0 in df1 unexpectedly joins to both rows in df2.
I would expect the correct answer to be
The text was updated successfully, but these errors were encountered: