Skip to content Skip to sidebar Skip to footer

Xarray.open_mfdataset() Doesn't Work If Dask.distributed Client Has Been Created

I have a bit of a weird problem that I'd appreciate some input on. Basically, I'm running a notebook on the AWS Pangeo Cloud and am opening some GOES-16 satellite data on S3 (with

Solution 1:

Would the following achieve what you are after?

ds = xr.open_mfdataset(file_objs, combine='nested', concat_dim='t', data_vars='minimal', coords='minimal', compat='override')

Note that the non-dask version loads in about 35 seconds with these settings, while dask one seems to be on the scale of 90 seconds. I haven't worked with this data, so don't know if it's the case here, but it is possible that the scaling advantages will kick-in for a larger number of files (right now it's 24).

This is based on the guidance in the docs:

Commonly, a few of these variables need to be concatenated along a dimension (say "time"), while the rest are equal across the datasets (ignoring floating point differences).

This command concatenates variables along the "time" dimension, but only those that already contain the "time" dimension (data_vars='minimal', coords='minimal'). Variables that lack the "time" dimension are taken from the first dataset (compat='override').

Post a Comment for "Xarray.open_mfdataset() Doesn't Work If Dask.distributed Client Has Been Created"