Skip to content

Supplement .interactive with an Interactive Pipe class #673

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
MarcSkovMadsen opened this issue Nov 2, 2021 · 5 comments
Closed

Supplement .interactive with an Interactive Pipe class #673

MarcSkovMadsen opened this issue Nov 2, 2021 · 5 comments

Comments

@MarcSkovMadsen
Copy link
Collaborator

MarcSkovMadsen commented Nov 2, 2021

Background

Hvplot .interactive is truly powerful.

I believe it has the same potential as Streamlits api. It's simple, intuitive, beautiful and very powerful. And it comes without the downsides of

  • Slowness of "run script from top to bottom"
  • Spending time on remembering and optimizing caching.
  • Spaghetti code

My Pain

.interactive only supports pipelines starting from a DataFrame. I would like the same power for any pipeline.

As a minimum I would like all the read_XYZ like read_sql, read_csv, read_parquet to be interactive. Most times my pipeline/ data app also holds widgets for which data to extract as it would not be feasible to extract all data for example from a large sql table.

But sometimes my data would also be extracted from a larger, costly simulation based on specific arguments. And these arguments should also be presented as widgets.

If possible I would like an api that supports beautiful, readable method chaining and feels like an integrated part of .interactive.

Reference Example

For example I would like to be able to use the same, intuitive api with built in caching for this pipeline

import time

import pandas as pd
import panel as pn


def extract(country):
    # In my case this would often be an expense SQL query from a large table
    time.sleep(1)
    value = {"DK": 0, "DE": 3, "US": 6}[country]
    return pd.DataFrame({
        "col1": [value]*10,
        "col2": [value+1]*10,
        "col3": [value+2]*10,
    })

def transform(series: pd.DataFrame, col):
    time.sleep(1)
    return series[col].sum()

def pipeline(country, col):
    series = extract(country)
    return transform(series, col)

country_widget = pn.widgets.Select(value="DK", options=["DK", "DE", "US"])
col_widget = pn.widgets.Select(value="col1", options=["col1", "col2", "col3"])

Existing Apis

I can't see how any of the existing Param/ Panel apis support this. For example not the Pipeline.

This would be a solution though

import time

import pandas as pd
import panel as pn


def extract(country):
    # In my case this would often be an expense SQL query from a large table
    time.sleep(1)
    value = {"DK": 0, "DE": 3, "US": 6}[country]
    return pd.DataFrame(
        {
            "col1": [value] * 10,
            "col2": [value + 1] * 10,
            "col3": [value + 2] * 10,
        }
    )


def transform(series: pd.DataFrame, col):
    time.sleep(1)
    return series[col].sum()


def pipeline(country, col):
    series = extract(country)
    return transform(series, col)


country_widget = pn.widgets.Select(value="DK", options=["DK", "DE", "US"])
col_widget = pn.widgets.Select(value="col1", options=["col1", "col2", "col3"])

ipipeline = pn.bind(pipeline, country=country_widget, col=col_widget)

pn.Column(country_widget, col_widget, pn.panel(ipipeline, loading_indicator=True)).servable()
ipipeline.mp4

But it does not

  • allow method chaining
  • feel as simple and intuitive as .interactive
  • provide built in caching for each step.

It feels like another api than .interactive and thus does not keep things simple.

Proposed API

Something like

ipipeline = (
    Interactive(extract, country=country_widget)
    .pipe(transform, col=col_widget)
)

pn.pane(ipipeline).servable()

I would also like the Interactive class to recognize Dataframe and just make them .interactive so that this would also be possible

ipipeline = Interactive(extract, country=country_widget)[col_widget].sum()
    
pn.pane(ipipeline).servable()

Additional Context

  • As the built in caching should now support arbitrary objects they could either be memory cached, cached using diskcache or cached using some method which the user can provide.
  • Make an example of how to start an .interactive data extraction that would be enough to solve this problem pain?
  • Ahh. What about pipelines that start from multiple extractions of dataframes and assembles them. That would also be nice to be able to make them .interactive with caching.
@MarcSkovMadsen MarcSkovMadsen added TRIAGE Requires triage or initial assessment type: enhancement and removed TRIAGE Requires triage or initial assessment labels Nov 2, 2021
@MarcSkovMadsen MarcSkovMadsen added this to the Wishlist milestone Nov 2, 2021
@jbednar
Copy link
Member

jbednar commented Nov 2, 2021

Sounds great! I think this fits well with #533, i.e., expanding the power available from the hvPlot-centric interface.

@MarcSkovMadsen
Copy link
Collaborator Author

You can actually get very close with a hack like

ipipeline = (
    pd.DataFrame().interactive()
    .pipe(extract, country=country_widget)
    .pipe(transform, col=col_widget)
)
works_almost.mp4

But it is not totally satisfactory because

  • It feels like a hack
  • It is not simple and intuitive
  • The signature of extract needs to change to def extract(_, country):.
  • The extract function is not cached. When I change the col_widget value you can see it executes the extract function also.
import time

import pandas as pd
import panel as pn
import hvplot.pandas


def extract(_, country):
    # In my case this would often be an expense SQL query from a large table
    print(country)
    time.sleep(1)
    value = {"DK": 0, "DE": 3, "US": 6}[country]
    return pd.DataFrame(
        {
            "col1": [value] * 10,
            "col2": [value + 1] * 10,
            "col3": [value + 2] * 10,
        }
    )


def transform(series: pd.DataFrame, col):
    time.sleep(1)
    print(col)
    return series[col].sum()



country_widget = pn.widgets.Select(value="DK", options=["DK", "DE", "US"])
col_widget = pn.widgets.Select(value="col1", options=["col1", "col2", "col3"])

ipipeline = (
    pd.DataFrame().interactive()
    .pipe(extract, country=country_widget)
    .pipe(transform, col=col_widget)
)

pn.Column(
    ipipeline.widgets(),
    ipipeline.panel(loading_indicator=True),
).servable()

@MarcSkovMadsen
Copy link
Collaborator Author

MarcSkovMadsen commented Nov 4, 2021

So based on the previous example we can define

def interactive(func, *args, **kwargs):
    def wrapper(_, *args, **kwargs):
        return func(*args, **kwargs)
    return (
        pd.DataFrame().interactive()
        .pipe(wrapper, country=country_widget)
    )

and get an api like

ipipeline = (
    interactive(extract, country=country_widget)
    .pipe(transform, col=col_widget)
)

So the main problem to solve is the missing caching.

Full Example

import time

import pandas as pd
import panel as pn
import hvplot.pandas


def extract(country):
    # In my case this would often be an expense SQL query from a large table
    print(country)
    time.sleep(1)
    value = {"DK": 0, "DE": 3, "US": 6}[country]
    return pd.DataFrame(
        {
            "col1": [value] * 10,
            "col2": [value + 1] * 10,
            "col3": [value + 2] * 10,
        }
    )


def transform(series: pd.DataFrame, col):
    time.sleep(1)
    print(col)
    return series[col].sum()



country_widget = pn.widgets.Select(value="DK", options=["DK", "DE", "US"])
col_widget = pn.widgets.Select(value="col1", options=["col1", "col2", "col3"])

def interactive(func, *args, **kwargs):
    def wrapper(_, *args, **kwargs):
        return func(*args, **kwargs)
    return (
        pd.DataFrame().interactive()
        .pipe(wrapper, *args, **kwargs)
    )

ipipeline = (
    interactive(extract, country=country_widget)
    .pipe(transform, col=col_widget)
)

pn.Column(
    ipipeline.widgets(),
    ipipeline.panel(loading_indicator=True),
).servable()

@jbednar
Copy link
Member

jbednar commented Nov 9, 2021

That API does solve an important issue with .interactive and it looks pretty clean to me! Here interactive is called similarly to pn.bind, but resulting in a chainable object that mirrors the DataFrame API.

I'm not sure precisely how we'd define it generally, given that .interactive supports not just Pandas but also Xarray and potentially other data objects. I guess the default would be to define it in the importable module for each library, e.g. hvplot.pandas.interactive, hvplot.xarray.interactive, etc., each being an object that will mirror the respective API. One could imagine having a single hvplot.interactive function that would read a module variable set by the various importable backends to determine which type of object it's expecting, but that seems brittle and would make it less obvious how to combine multiple sources of data. If there's a way to define it that doesn't depend on the backend, great, but otherwise hvplot.pandas.interactive() isn't too bad.

@maximlt
Copy link
Member

maximlt commented Oct 28, 2022

Looks like this has been implemented in #720 :)

@maximlt maximlt closed this as completed Oct 28, 2022
@maximlt maximlt removed this from the Wishlist milestone Sep 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants