ENH: Integration with DuckDB

#### Is your feature request related to a problem?
DuckDB supports the consumption and production of data frames. In this issue, I discuss how we could leverage these on two different ends.

1. Loading data frames as DuckDB Tables:
 The ```to_sql()``` function can be extended to leverage the consumption to efficiently load data frames as tables in DuckDB (See #45675).

2. Run SQL on data frames:

Another possibility that I would like to propose, is to use the DuckDB engine to directly run queries on a data frame. Since DuckDB is able to consume and produce those, a user could get a data frame from running SQL directly on that data frame.

e.g.,
```python
d = {'col1': [1, 2], 'col2': [3, 4]}
df_1 = pd.DataFrame(data=d)

d = {'col1': [2, 3]}
df_2 = pd.DataFrame(data=d)

df_result = pandas.sql("select df_1.col2 from df_1 inner join df_2 on (df_1.col1 = df_2.col2)")
```

This should be possible since it would run Duckdb's default connection under the hood. With the advantages of consuming/producing df, directly running SQL, and parallel query execution.

If this is something that the core-dev team/ pandas community would be interested in. I'm happy to give a go to a PR :-)

#### Additional context
DuckDB's blogpost of SQL on Pandas, including benchmarks. https://duckdb.org/2021/05/14/sql-on-pandas.html


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

ENH: Integration with DuckDB #45678

Is your feature request related to a problem?

Additional context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

ENH: Integration with DuckDB #45678

Description

Is your feature request related to a problem?

Additional context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions