Supplement .interactive with an Interactive Pipe class #673

MarcSkovMadsen · 2021-11-02T06:11:47Z

Background

Hvplot .interactive is truly powerful.

I believe it has the same potential as Streamlits api. It's simple, intuitive, beautiful and very powerful. And it comes without the downsides of

Slowness of "run script from top to bottom"
Spending time on remembering and optimizing caching.
Spaghetti code

My Pain

.interactive only supports pipelines starting from a DataFrame. I would like the same power for any pipeline.

As a minimum I would like all the read_XYZ like read_sql, read_csv, read_parquet to be interactive. Most times my pipeline/ data app also holds widgets for which data to extract as it would not be feasible to extract all data for example from a large sql table.

But sometimes my data would also be extracted from a larger, costly simulation based on specific arguments. And these arguments should also be presented as widgets.

If possible I would like an api that supports beautiful, readable method chaining and feels like an integrated part of .interactive.

Reference Example

For example I would like to be able to use the same, intuitive api with built in caching for this pipeline

import time

import pandas as pd
import panel as pn


def extract(country):
    # In my case this would often be an expense SQL query from a large table
    time.sleep(1)
    value = {"DK": 0, "DE": 3, "US": 6}[country]
    return pd.DataFrame({
        "col1": [value]*10,
        "col2": [value+1]*10,
        "col3": [value+2]*10,
    })

def transform(series: pd.DataFrame, col):
    time.sleep(1)
    return series[col].sum()

def pipeline(country, col):
    series = extract(country)
    return transform(series, col)

country_widget = pn.widgets.Select(value="DK", options=["DK", "DE", "US"])
col_widget = pn.widgets.Select(value="col1", options=["col1", "col2", "col3"])

Existing Apis

I can't see how any of the existing Param/ Panel apis support this. For example not the Pipeline.

This would be a solution though

import time

import pandas as pd
import panel as pn


def extract(country):
    # In my case this would often be an expense SQL query from a large table
    time.sleep(1)
    value = {"DK": 0, "DE": 3, "US": 6}[country]
    return pd.DataFrame(
        {
            "col1": [value] * 10,
            "col2": [value + 1] * 10,
            "col3": [value + 2] * 10,
        }
    )


def transform(series: pd.DataFrame, col):
    time.sleep(1)
    return series[col].sum()


def pipeline(country, col):
    series = extract(country)
    return transform(series, col)


country_widget = pn.widgets.Select(value="DK", options=["DK", "DE", "US"])
col_widget = pn.widgets.Select(value="col1", options=["col1", "col2", "col3"])

ipipeline = pn.bind(pipeline, country=country_widget, col=col_widget)

pn.Column(country_widget, col_widget, pn.panel(ipipeline, loading_indicator=True)).servable()

ipipeline.mp4

But it does not

allow method chaining
feel as simple and intuitive as .interactive
provide built in caching for each step.

It feels like another api than .interactive and thus does not keep things simple.

Proposed API

Something like

ipipeline = (
    Interactive(extract, country=country_widget)
    .pipe(transform, col=col_widget)
)

pn.pane(ipipeline).servable()

I would also like the Interactive class to recognize Dataframe and just make them .interactive so that this would also be possible

ipipeline = Interactive(extract, country=country_widget)[col_widget].sum()
    
pn.pane(ipipeline).servable()

Additional Context

As the built in caching should now support arbitrary objects they could either be memory cached, cached using diskcache or cached using some method which the user can provide.
Make an example of how to start an .interactive data extraction that would be enough to solve this problem pain?
Ahh. What about pipelines that start from multiple extractions of dataframes and assembles them. That would also be nice to be able to make them .interactive with caching.

The text was updated successfully, but these errors were encountered:

jbednar · 2021-11-02T23:56:35Z

Sounds great! I think this fits well with #533, i.e., expanding the power available from the hvPlot-centric interface.

MarcSkovMadsen · 2021-11-04T20:47:51Z

You can actually get very close with a hack like

ipipeline = (
    pd.DataFrame().interactive()
    .pipe(extract, country=country_widget)
    .pipe(transform, col=col_widget)
)

works_almost.mp4

But it is not totally satisfactory because

It feels like a hack
It is not simple and intuitive
The signature of extract needs to change to def extract(_, country):.
The extract function is not cached. When I change the col_widget value you can see it executes the extract function also.

import time

import pandas as pd
import panel as pn
import hvplot.pandas


def extract(_, country):
    # In my case this would often be an expense SQL query from a large table
    print(country)
    time.sleep(1)
    value = {"DK": 0, "DE": 3, "US": 6}[country]
    return pd.DataFrame(
        {
            "col1": [value] * 10,
            "col2": [value + 1] * 10,
            "col3": [value + 2] * 10,
        }
    )


def transform(series: pd.DataFrame, col):
    time.sleep(1)
    print(col)
    return series[col].sum()



country_widget = pn.widgets.Select(value="DK", options=["DK", "DE", "US"])
col_widget = pn.widgets.Select(value="col1", options=["col1", "col2", "col3"])

ipipeline = (
    pd.DataFrame().interactive()
    .pipe(extract, country=country_widget)
    .pipe(transform, col=col_widget)
)

pn.Column(
    ipipeline.widgets(),
    ipipeline.panel(loading_indicator=True),
).servable()

MarcSkovMadsen · 2021-11-04T21:03:12Z

So based on the previous example we can define

def interactive(func, *args, **kwargs):
    def wrapper(_, *args, **kwargs):
        return func(*args, **kwargs)
    return (
        pd.DataFrame().interactive()
        .pipe(wrapper, country=country_widget)
    )

and get an api like

ipipeline = (
    interactive(extract, country=country_widget)
    .pipe(transform, col=col_widget)
)

So the main problem to solve is the missing caching.

Full Example

import time

import pandas as pd
import panel as pn
import hvplot.pandas


def extract(country):
    # In my case this would often be an expense SQL query from a large table
    print(country)
    time.sleep(1)
    value = {"DK": 0, "DE": 3, "US": 6}[country]
    return pd.DataFrame(
        {
            "col1": [value] * 10,
            "col2": [value + 1] * 10,
            "col3": [value + 2] * 10,
        }
    )


def transform(series: pd.DataFrame, col):
    time.sleep(1)
    print(col)
    return series[col].sum()



country_widget = pn.widgets.Select(value="DK", options=["DK", "DE", "US"])
col_widget = pn.widgets.Select(value="col1", options=["col1", "col2", "col3"])

def interactive(func, *args, **kwargs):
    def wrapper(_, *args, **kwargs):
        return func(*args, **kwargs)
    return (
        pd.DataFrame().interactive()
        .pipe(wrapper, *args, **kwargs)
    )

ipipeline = (
    interactive(extract, country=country_widget)
    .pipe(transform, col=col_widget)
)

pn.Column(
    ipipeline.widgets(),
    ipipeline.panel(loading_indicator=True),
).servable()

jbednar · 2021-11-09T01:51:37Z

That API does solve an important issue with .interactive and it looks pretty clean to me! Here interactive is called similarly to pn.bind, but resulting in a chainable object that mirrors the DataFrame API.

I'm not sure precisely how we'd define it generally, given that .interactive supports not just Pandas but also Xarray and potentially other data objects. I guess the default would be to define it in the importable module for each library, e.g. hvplot.pandas.interactive, hvplot.xarray.interactive, etc., each being an object that will mirror the respective API. One could imagine having a single hvplot.interactive function that would read a module variable set by the various importable backends to determine which type of object it's expecting, but that seems brittle and would make it less obvious how to combine multiple sources of data. If there's a way to define it that doesn't depend on the backend, great, but otherwise hvplot.pandas.interactive() isn't too bad.

maximlt · 2022-10-28T01:11:11Z

Looks like this has been implemented in #720 :)

MarcSkovMadsen added TRIAGE Requires triage or initial assessment type: enhancement and removed TRIAGE Requires triage or initial assessment labels Nov 2, 2021

MarcSkovMadsen added this to the Wishlist milestone Nov 2, 2021

philippjfr mentioned this issue Mar 23, 2022

Add ability to call interactive on bound functions #720

Merged

2 tasks

maximlt closed this as completed Oct 28, 2022

maximlt removed this from the Wishlist milestone Sep 13, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Supplement .interactive with an Interactive Pipe class #673

Supplement .interactive with an Interactive Pipe class #673

MarcSkovMadsen commented Nov 2, 2021 •

edited by jbednar

Loading

jbednar commented Nov 2, 2021

Uh oh!

MarcSkovMadsen commented Nov 4, 2021

Uh oh!

MarcSkovMadsen commented Nov 4, 2021 •

edited

Loading

Uh oh!

jbednar commented Nov 9, 2021

Uh oh!

maximlt commented Oct 28, 2022

Uh oh!

Uh oh!

Supplement .interactive with an Interactive Pipe class #673

Supplement .interactive with an Interactive Pipe class #673

Comments

MarcSkovMadsen commented Nov 2, 2021 • edited by jbednar Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Background

My Pain

Reference Example

Existing Apis

Proposed API

Additional Context

jbednar commented Nov 2, 2021

Uh oh!

MarcSkovMadsen commented Nov 4, 2021

Uh oh!

MarcSkovMadsen commented Nov 4, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Full Example

Uh oh!

jbednar commented Nov 9, 2021

Uh oh!

maximlt commented Oct 28, 2022

Uh oh!

MarcSkovMadsen commented Nov 2, 2021 •

edited by jbednar

Loading

MarcSkovMadsen commented Nov 4, 2021 •

edited

Loading