Skip to content

Plotting on map projection much slower on v0.6.1 than 0.6.0 #657

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
fmaussion opened this issue Nov 15, 2015 · 19 comments
Closed

Plotting on map projection much slower on v0.6.1 than 0.6.0 #657

fmaussion opened this issue Nov 15, 2015 · 19 comments

Comments

@fmaussion
Copy link
Member

The following code snippet produces an average of ERA-Interim temperature on a map:

import matplotlib.pyplot as plt 
import xray
import cartopy.crs as ccrs 
import time

netcdf = xray.open_dataset('ERA-Int-Monthly-2mTemp.nc')
t2_avg = netcdf.t2m.mean(dim='time')

start_time = time.time()
ax = plt.axes(projection=ccrs.Robinson())
if xray.__version__ == '0.6.0':
    t2_avg.plot(ax=ax, origin='upper', aspect='equal', transform=ccrs.PlateCarree()) 
else:
    t2_avg.plot(ax=ax, transform=ccrs.PlateCarree()) 
ax.coastlines()
plt.savefig('t_xray.png')
print("xray V{}: {:.2f} s".format(xray.__version__, time.time() - start_time))

t

I've been very careful to check that my environments are exact same (mpl 1.4.3, numpy 1.10.1, cartopy 0.13.0).

See the output for V0.6.0 and 0.6.1 (output from the latest master is similar to 0.6.1):

0.6.0:

python test_xray.py
/home/mowglie/.bin/conda/envs/climate/lib/python3.4/site-packages/matplotlib/collections.py:590: FutureWarning: elementwise comparison failed; returning scalar instead, but in the future will perform elementwise comparison
  if self._edgecolors == str('face'):
xray V0.6.0: 3.21 s

0.6.1:

 python test_xray.py 
/home/mowglie/.bin/conda/envs/upclim/lib/python3.4/site-packages/numpy/lib/shape_base.py:431: FutureWarning: in the future np.array_split will retain the shape of arrays with a zero size, instead of replacing them by `array([])`, which always has a shape of (0,).
  FutureWarning)
/home/mowglie/.bin/conda/envs/upclim/lib/python3.4/site-packages/matplotlib/collections.py:590: FutureWarning: elementwise comparison failed; returning scalar instead, but in the future will perform elementwise comparison
  if self._edgecolors == str('face'):
xray V0.6.1: 28.52 s

The first warning seems related to recent numpy. Note that a second warning appeared with xray V0.6.1.

It's interesting to mention that the bottleneck clearly is in the rendering (plt.savefig('t_xray.png')). Removing this line will make xray V0.6.1 faster than xray V0.6.0.

@shoyer
Copy link
Member

shoyer commented Nov 15, 2015

We changed the default plot type from "imshow" in xray v0.6 to "pcolormesh" in xray v0.6.1. Does setting that manually make a difference? e.g., t2_avg.plot.imshow?

@fmaussion
Copy link
Member Author

Thanks for the quick answer. changing to imshow indeed reduced the execution time back to normal. I wouldn't expect a factor 10 between imshow and pcolormesh, though...

@fmaussion
Copy link
Member Author

If I remove the projection stuff imshow and pcolormesh are equally fast. Both functions must handle the projection differently, somehow?

@shoyer
Copy link
Member

shoyer commented Nov 15, 2015

Yes, the difference in speed is definitely due to cartopy, which handles the projections. @pelson may be able to clarify whether imshow is always slower than pcolormesh when using cartopy, or if that is specific to particular projections.

We changed the default from imshow to pcolormesh in #608. I would not be opposed to changing the default back, given that we already product essentially equivalent plots with both methods. We did add a performance note to our (currently broken) docs: http://xray.readthedocs.org/en/latest/plotting.html#two-dimensions

@jhamman and/or @clarkfitzg may have some opinions here.

@fmaussion
Copy link
Member Author

I guess it's OK to leave it as is, as long as the performance loss is documented. I like the automatic plot() function because it hides the boring mpl internals to my students and (until now) always did what I expected it to do.

30 seconds to generate a plot in a Notebook is definitely a fun killer. Maybe I should still upgrade to 0.6.1 and explain to my students that they should use imshow() instead of plot().

@pelson
Copy link

pelson commented Nov 16, 2015

There is definitely scope for being smarter with cartopy's pcolormesh. There isn't an issue for it yet, but would be happy if you opened up a performance related issue in cartopy. pcolormesh will always be slower than imshow, but in most cases, not an order of magnitude slower!

@fmaussion
Copy link
Member Author

Hi again,

I've made a self-contained example below. I have no clue about how cartopy works (blind xray user, sorry :-() and was not able to remove the dependency to xray without getting different plots for imshow() and pcolormesh()... If one of the xray gurus could help me to remove the xray part of the code I could open an issue in cartopy. I wonder if the problem comes from the fact that ERA-Interim lons are spanning 0-360?

import matplotlib.pyplot as plt 
import xray
import numpy as np
import cartopy.crs as ccrs 
import time

nlats, nlons = (241, 480)
lats = np.linspace(90, -90, nlats)
lons = np.linspace(0, 360-0.75, nlons)
l1, l2 = np.meshgrid(lons, lats)
data = xray.DataArray(l1 + l2, [('latitude',  lats), ('longitude', lons)])

start_time = time.time()
fig = plt.figure()
ax = plt.axes(projection=ccrs.Robinson())
data.plot.imshow(ax=ax, transform=ccrs.PlateCarree()) 
ax.coastlines()
plt.savefig('imshow.png')
print("imshow: {:.2f} s".format(time.time() - start_time))

start_time = time.time()
fig = plt.figure()
ax = plt.axes(projection=ccrs.Robinson())
data.plot.pcolormesh(ax=ax, transform=ccrs.PlateCarree()) 
ax.coastlines()
plt.savefig('pcolormesh.png')
print("pcolormesh: {:.2f} s".format(time.time() - start_time))

imshow: 3.09 s
pcolormesh: 27.53 s

@pelson
Copy link

pelson commented Nov 18, 2015

Thanks @fmaussion - I've raised it in SciTools/cartopy#700. Thanks for putting together the self contained example - it's fine to have xray as a dependency on that. FWIW you would do something like plt.pcolormesh(lons, lats, l1 + l2, transform=ccrs.PlateCarree()) if you didn't have a DataArray.

@fmaussion
Copy link
Member Author

Thanks @pelson ! So now this is when it becomes funny: I've been able to make three similar plots using xray's imshow, pcolormesh and cartopy's pcolormesh. Xray's pcolormesh takes twice the time as cartopy's:

import matplotlib.pyplot as plt 
from matplotlib.colors import Normalize
import xray
import numpy as np
import cartopy.crs as ccrs 
import time

nlats, nlons = (241, 480)
lats = np.linspace(90, -90, nlats)
lons = np.linspace(0, 360-0.75, nlons)
l1, l2 = np.meshgrid(lons, lats)
data = xray.DataArray(l1 + l2, [('latitude',  lats), ('longitude', lons)])

cmap = plt.get_cmap('viridis')
norm = Normalize(vmin=0, vmax=data.max().values)

start_time = time.time()
fig = plt.figure()
ax = plt.axes(projection=ccrs.Robinson())
data.plot.imshow(ax=ax, transform=ccrs.PlateCarree(), add_colorbar=False, cmap=cmap, vmin=0) 
ax.coastlines()
plt.savefig('imshow_xray.png')
print("imshow xray: {:.2f} s".format(time.time() - start_time))

start_time = time.time()
fig = plt.figure()
ax = plt.axes(projection=ccrs.Robinson())
data.plot.pcolormesh(ax=ax, transform=ccrs.PlateCarree(), add_colorbar=False, cmap=cmap, vmin=0) 
ax.coastlines()
plt.savefig('pcolormesh_xray.png')
print("pcolormesh xray: {:.2f} s".format(time.time() - start_time))

start_time = time.time()
fig = plt.figure()
ax = plt.axes(projection=ccrs.Robinson())
ax.pcolormesh(lons, lats, l1 + l2, transform=ccrs.PlateCarree(), cmap=cmap, norm=norm)
ax.coastlines()
plt.savefig('pcolormesh_cartopy.png')
print("pcolormesh cartopy: {:.2f} s".format(time.time() - start_time))

imshow xray: 3.06 s
pcolormesh xray: 27.50 s
pcolormesh cartopy: 12.14 s

@clarkfitzg
Copy link
Member

This is surprising! But good to know.

@fmaussion
Copy link
Member Author

Was someone able to reproduce this or is it just me?

@shoyer
Copy link
Member

shoyer commented Nov 18, 2015

Yes, I'm seeing the same thing. Very weird -- I'll see if I can profile it.

On Wed, Nov 18, 2015 at 8:28 AM, Fabien Maussion [email protected]
wrote:

Was someone able to reproduce this or is it just me?

Reply to this email directly or view it on GitHub:
#657 (comment)

@jhamman
Copy link
Member

jhamman commented Nov 18, 2015

I can't verify right now but it may have something to do with using masked arrays under the hood. There are no nan's in your example but xray still is converting the array to a masked_array before plotting. I bet plotting with pcolormesh is slower with masked arrays than with numpy arrays.

@fmaussion
Copy link
Member Author

Changing the input to np.ma.asarray(l1 + l2) did not change anything as far as I can see...

@shoyer
Copy link
Member

shoyer commented Nov 18, 2015

Profiling for each version of pcolormesh: https://gist.github.com/shoyer/73e3841827fe1eb08d00

Switching a masked array doesn't seem to make the non-xray version any slower...

@fmaussion
Copy link
Member Author

See the number of calls:

faster.txt:
4099953 function calls (4093947 primitive calls) in 14.571 seconds

slower.txt:
10316809 function calls (10303964 primitive calls) in 34.330 seconds

@fmaussion
Copy link
Member Author

I had little time to spend on this lately but I'll try to get back to it in the next days. Any idea on where it could come from? I found it quite hard to debug because of the many decorators... From the profiling and the number of function calls above there seems to be something quite big happening in between xray and cartopy. Could it be something as trivial as a double function call or something?

@fmaussion
Copy link
Member Author

OK, on the current master and with a cartopy installed from conda-forge I am not able to reproduce the problem any more:

imshow xray: 3.30 s
pcolormesh xray: 13.13 s
pcolormesh cartopy: 12.85 s

I don't really understand what happened in between... The factor 4 between imshow and pcolormesh remains, but at least the show-killer 30sec plotting time I had before is gone. I'm closing this now, and this will remain a mystery.

(note: on my laptop I'm getting faster plotting results in a virtualenv configured with pip install than with pandas...)

@shoyer
Copy link
Member

shoyer commented Jun 15, 2016

Strange... thanks for checking again.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants