Python/xarray for GEOS-Chem data analysis

Author: Jiawei Zhuang (jiaweizhuang@g.harvard.edu)

Last updated: 10/16/2017

Motivation

I have 3 years of experience with IDL and 7 years with MATLAB, but I now use Python for almost all research work because I feel it 10 times more efficient (in terms of human productivity, not just computational performance). Jupyter/IPython Notebook provides a great scientific research environment and avoids slow x11 forwarding when working with a remote server. Not to mention that Python is free and has become the NO.1 programming language in 2017!

This tutorial focuses on xarray, a popular, powerful and elegant Python package for analyzing earth science data. Compared to IDL or MATLAB, Python/xarray allows you to write much less boilerplate codes and focus on real research.

For example, say we want to read a variable O3 from a NetCDF file data.nc and compute its zonal average. The code using xarray would be just 2 lines:

ds = xr.open_dataset("data.nc") # all information is read into the object ds
ds['O3'].mean(dim='lon') # the code is self-descriptive

The same operation using IDL would be something like (based on IDL documentation)

;Ignore those codes if you have never used IDL before (you are lucky).

fid = NCDF_OPEN('data.nc') ; Open the NetCDF file:
var_id = NCDF_VARID(fid, 'O3') ; Get the variable ID
NCDF_VARGET, fid, var_id, data  ; Get the variable data
NCDF_CLOSE, fid ; close the NetCDF file

; then write some code to check data dimension, for example
size(data)

; OK, say we find that the data is a 4D array of shape [lon,lat,lev,time]
; the quickest way to average over the longitude is
mean(data, 1)
; The code would be much longer if you write "for" loops

; What's worse, IDL documentation on this mean() function is wrong!
; If you follow http://www.harrisgeospatial.com/docs/MEAN.html ,
; you would write
mean(data, DIMENSION=1)

; You want to take average over the first dimension "lon",
; But this DIMENSION keyword actually has no effect
; You will just get a global mean from the above expression.
; This bug is hard to find and could mess up your data analysis.

I hope this example has convinced you to switch from IDL to Python 😉

Preparation

Basic knowledge of Python is assumed. For Python beginners, I recommend Python Data Science Handbook by Jake VanderPlas, which is free available online. Skim through Chapter 1, 2 and 4, then you will be good! (Chapter 3 and 5 are important for general data science but not particularly necessary for working with gridded model data.)

It is crucial to pick up the correct tutorial when learning Python, because Python has a wide range of applications other than just science. For scientific research, you should only read data science tutorials. Other famous books such as Fluent Python and The Hitchhiker’s Guide to Python are great for general software development, but not quite useful for scientific research.

Once you manage to use Python in Jupyter Notebook, follow this GCPy page to set up Python environment for Geosciences. Now you should have everything necessary for working with GEOS-Chem data and most of earth science data in general.

Reading and exploring NetCDF file

Opening file

Here we use a GEOS-Chem restart file as an example. You can use your own file or download one at

ftp://ftp.as.harvard.edu/gcgrid/data/ExtData/NC_RESTARTS/initial_GEOSChem_rst.4x5_benchmark.nc
In [1]:
# those modules are almost always imported when working with model data
%matplotlib inline
import numpy as np
import xarray as xr
import matplotlib.pyplot as plt # general plotting
import cartopy.crs as ccrs # plot on maps, better than the Basemap module

xr.open_dataset() reads the entire NetCDF file into this ds object. You can view all variables, coordinates, dimensions and metadata by just printing this object.

In [2]:
ds = xr.open_dataset("./initial_GEOSChem_rst.4x5_benchmark.nc")
ds # same as print(ds) in IPython/Jupyter environment
Out[2]:
<xarray.Dataset>
Dimensions:      (lat: 46, lev: 72, lon: 72, time: 1)
Coordinates:
  * time         (time) datetime64[ns] 2013-07-01
  * lev          (lev) float32 0.9925 0.977456 0.96237 0.947285 0.9322 ...
  * lat          (lat) float32 -89.0 -86.0 -82.0 -78.0 -74.0 -70.0 -66.0 ...
  * lon          (lon) float32 -180.0 -175.0 -170.0 -165.0 -160.0 -155.0 ...
Data variables:
    TRC_NO       (time, lev, lat, lon) float64 2.561e-17 2.561e-17 2.561e-17 ...
    TRC_O3       (time, lev, lat, lon) float64 2.615e-08 2.615e-08 2.615e-08 ...
    TRC_PAN      (time, lev, lat, lon) float64 8.709e-12 8.709e-12 8.709e-12 ...
    TRC_CO       (time, lev, lat, lon) float64 6.75e-08 6.75e-08 6.75e-08 ...
    TRC_ALK4     (time, lev, lat, lon) float64 1.753e-10 1.753e-10 1.753e-10 ...
    TRC_ISOP     (time, lev, lat, lon) float64 5.959e-14 5.959e-14 5.959e-14 ...
    TRC_HNO3     (time, lev, lat, lon) float64 5.854e-15 5.854e-15 5.854e-15 ...
    TRC_H2O2     (time, lev, lat, lon) float64 4.73e-11 4.73e-11 4.73e-11 ...
    TRC_ACET     (time, lev, lat, lon) float64 8.546e-10 8.546e-10 8.546e-10 ...
    TRC_MEK      (time, lev, lat, lon) float64 4.22e-11 4.22e-11 4.22e-11 ...
    TRC_ALD2     (time, lev, lat, lon) float64 2.119e-11 2.119e-11 2.119e-11 ...
    TRC_RCHO     (time, lev, lat, lon) float64 1.152e-12 1.152e-12 1.152e-12 ...
    TRC_MVK      (time, lev, lat, lon) float64 3.882e-13 3.882e-13 3.882e-13 ...
    TRC_MACR     (time, lev, lat, lon) float64 1.161e-12 1.161e-12 1.161e-12 ...
    TRC_PMN      (time, lev, lat, lon) float64 1.593e-15 1.593e-15 1.593e-15 ...
    TRC_PPN      (time, lev, lat, lon) float64 6.739e-13 6.739e-13 6.739e-13 ...
    TRC_R4N2     (time, lev, lat, lon) float64 4.331e-12 4.331e-12 4.331e-12 ...
    TRC_PRPE     (time, lev, lat, lon) float64 2.102e-13 2.102e-13 2.102e-13 ...
    TRC_C3H8     (time, lev, lat, lon) float64 4.072e-11 4.072e-11 4.072e-11 ...
    TRC_CH2O     (time, lev, lat, lon) float64 4.576e-11 4.576e-11 4.576e-11 ...
    TRC_C2H6     (time, lev, lat, lon) float64 5.118e-10 5.118e-10 5.118e-10 ...
    TRC_N2O5     (time, lev, lat, lon) float64 1.116e-17 1.116e-17 1.116e-17 ...
    TRC_HNO4     (time, lev, lat, lon) float64 1.659e-15 1.659e-15 1.659e-15 ...
    TRC_MP       (time, lev, lat, lon) float64 2.642e-10 2.642e-10 2.642e-10 ...
    TRC_DMS      (time, lev, lat, lon) float64 3.006e-10 3.006e-10 3.006e-10 ...
    TRC_SO2      (time, lev, lat, lon) float64 5.637e-12 5.637e-12 5.637e-12 ...
    TRC_SO4      (time, lev, lat, lon) float64 7.844e-12 7.844e-12 7.844e-12 ...
    TRC_SO4s     (time, lev, lat, lon) float64 3.705e-16 3.705e-16 3.705e-16 ...
    TRC_MSA      (time, lev, lat, lon) float64 2.933e-12 2.933e-12 2.933e-12 ...
    TRC_NH3      (time, lev, lat, lon) float64 4.384e-13 4.384e-13 4.384e-13 ...
    TRC_NH4      (time, lev, lat, lon) float64 1.183e-11 1.183e-11 1.183e-11 ...
    TRC_NIT      (time, lev, lat, lon) float64 5.173e-12 5.173e-12 5.173e-12 ...
    TRC_NITs     (time, lev, lat, lon) float64 3.166e-17 3.166e-17 3.166e-17 ...
    TRC_BCPI     (time, lev, lat, lon) float64 9.654e-13 9.654e-13 9.654e-13 ...
    TRC_OCPI     (time, lev, lat, lon) float64 6.117e-12 6.117e-12 6.117e-12 ...
    TRC_BCPO     (time, lev, lat, lon) float64 4.49e-16 4.49e-16 4.49e-16 ...
    TRC_OCPO     (time, lev, lat, lon) float64 1.845e-16 1.845e-16 1.845e-16 ...
    TRC_DST1     (time, lev, lat, lon) float64 3.653e-13 3.653e-13 3.653e-13 ...
    TRC_DST2     (time, lev, lat, lon) float64 6.175e-13 6.175e-13 6.175e-13 ...
    TRC_DST3     (time, lev, lat, lon) float64 3.633e-13 3.633e-13 3.633e-13 ...
    TRC_DST4     (time, lev, lat, lon) float64 9.996e-31 9.996e-31 9.996e-31 ...
    TRC_SALA     (time, lev, lat, lon) float64 5.433e-11 5.433e-11 5.433e-11 ...
    TRC_SALC     (time, lev, lat, lon) float64 1.487e-10 1.487e-10 1.487e-10 ...
    TRC_Br2      (time, lev, lat, lon) float64 1.019e-12 1.019e-12 1.019e-12 ...
    TRC_Br       (time, lev, lat, lon) float64 7.141e-19 7.141e-19 7.141e-19 ...
    TRC_BrO      (time, lev, lat, lon) float64 3.617e-14 3.617e-14 3.617e-14 ...
    TRC_HOBr     (time, lev, lat, lon) float64 2.352e-12 2.352e-12 2.352e-12 ...
    TRC_HBr      (time, lev, lat, lon) float64 9.093e-18 9.093e-18 9.093e-18 ...
    TRC_BrNO2    (time, lev, lat, lon) float64 5.092e-16 5.092e-16 5.092e-16 ...
    TRC_BrNO3    (time, lev, lat, lon) float64 2.606e-16 2.606e-16 2.606e-16 ...
    TRC_CHBr3    (time, lev, lat, lon) float64 1.165e-12 1.165e-12 1.165e-12 ...
    TRC_CH2Br2   (time, lev, lat, lon) float64 8.938e-13 8.938e-13 8.938e-13 ...
    TRC_CH3Br    (time, lev, lat, lon) float64 6.523e-12 6.523e-12 6.523e-12 ...
    TRC_MPN      (time, lev, lat, lon) float64 1e-30 1e-30 1e-30 1e-30 1e-30 ...
    TRC_ISOPN    (time, lev, lat, lon) float64 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ...
    TRC_MOBA     (time, lev, lat, lon) float64 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ...
    TRC_PROPNN   (time, lev, lat, lon) float64 3.486e-14 3.486e-14 3.486e-14 ...
    TRC_HAC      (time, lev, lat, lon) float64 3.487e-13 3.487e-13 3.487e-13 ...
    TRC_GLYC     (time, lev, lat, lon) float64 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ...
    TRC_MMN      (time, lev, lat, lon) float64 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ...
    TRC_RIP      (time, lev, lat, lon) float64 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ...
    TRC_IEPOX    (time, lev, lat, lon) float64 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ...
    TRC_MAP      (time, lev, lat, lon) float64 2.064e-11 2.064e-11 2.064e-11 ...
    TRC_NO2      (time, lev, lat, lon) float64 2.662e-13 2.662e-13 2.662e-13 ...
    TRC_NO3      (time, lev, lat, lon) float64 3.155e-17 3.155e-17 3.155e-17 ...
    TRC_HNO2     (time, lev, lat, lon) float64 1.209e-14 1.209e-14 1.209e-14 ...
    TRC_N2O      (time, lev, lat, lon) float64 3.228e-07 3.228e-07 3.228e-07 ...
    TRC_OCS      (time, lev, lat, lon) float64 5e-10 5e-10 5e-10 5e-10 5e-10 ...
    TRC_CH4      (time, lev, lat, lon) float64 1.743e-06 1.743e-06 1.743e-06 ...
    TRC_BrCl     (time, lev, lat, lon) float64 7.006e-20 7.006e-20 7.006e-20 ...
    TRC_HCl      (time, lev, lat, lon) float64 9.395e-12 9.395e-12 9.395e-12 ...
    TRC_CCl4     (time, lev, lat, lon) float64 8.38e-11 8.38e-11 8.38e-11 ...
    TRC_CH3Cl    (time, lev, lat, lon) float64 5.506e-10 5.506e-10 5.506e-10 ...
    TRC_CH3CCl3  (time, lev, lat, lon) float64 1.68e-11 1.68e-11 1.68e-11 ...
    TRC_CFCX     (time, lev, lat, lon) float64 1.006e-10 1.006e-10 1.006e-10 ...
    TRC_HCFCX    (time, lev, lat, lon) float64 5.37e-11 5.37e-11 5.37e-11 ...
    TRC_CFC11    (time, lev, lat, lon) float64 2.406e-10 2.406e-10 2.406e-10 ...
    TRC_CFC12    (time, lev, lat, lon) float64 5.282e-10 5.282e-10 5.282e-10 ...
    TRC_HCFC22   (time, lev, lat, lon) float64 1.802e-10 1.802e-10 1.802e-10 ...
    TRC_H1211    (time, lev, lat, lon) float64 3.8e-12 3.8e-12 3.8e-12 ...
    TRC_H1301    (time, lev, lat, lon) float64 3.3e-12 3.3e-12 3.3e-12 ...
    TRC_H2402    (time, lev, lat, lon) float64 3.4e-13 3.4e-13 3.4e-13 ...
    TRC_Cl       (time, lev, lat, lon) float64 3.877e-18 3.877e-18 3.877e-18 ...
    TRC_ClO      (time, lev, lat, lon) float64 4.297e-15 4.297e-15 4.297e-15 ...
    TRC_HOCl     (time, lev, lat, lon) float64 1.078e-14 1.078e-14 1.078e-14 ...
    TRC_ClNO3    (time, lev, lat, lon) float64 1.742e-15 1.742e-15 1.742e-15 ...
    TRC_ClNO2    (time, lev, lat, lon) float64 1e-30 1e-30 1e-30 1e-30 1e-30 ...
    TRC_ClOO     (time, lev, lat, lon) float64 2.223e-19 2.223e-19 2.223e-19 ...
    TRC_OClO     (time, lev, lat, lon) float64 1.361e-19 1.361e-19 1.361e-19 ...
    TRC_Cl2      (time, lev, lat, lon) float64 1e-30 1e-30 1e-30 1e-30 1e-30 ...
    TRC_Cl2O2    (time, lev, lat, lon) float64 1e-30 1e-30 1e-30 1e-30 1e-30 ...
    TRC_H2O      (time, lev, lat, lon) float64 0.001177 0.001177 0.001177 ...
    TRC_MTPA     (time, lev, lat, lon) float64 1e-30 1e-30 1e-30 1e-30 1e-30 ...
    TRC_LIMO     (time, lev, lat, lon) float64 1e-30 1e-30 1e-30 1e-30 1e-30 ...
    TRC_MTPO     (time, lev, lat, lon) float64 1e-30 1e-30 1e-30 1e-30 1e-30 ...
    TRC_TSOG1    (time, lev, lat, lon) float64 5.603e-14 5.603e-14 5.603e-14 ...
    TRC_TSOG2    (time, lev, lat, lon) float64 9.888e-13 9.888e-13 9.888e-13 ...
    TRC_TSOG3    (time, lev, lat, lon) float64 2.44e-12 2.44e-12 2.44e-12 ...
    TRC_TSOG0    (time, lev, lat, lon) float64 1.264e-14 1.264e-14 1.264e-14 ...
    TRC_TSOA1    (time, lev, lat, lon) float64 9.28e-14 9.28e-14 9.28e-14 ...
    TRC_TSOA2    (time, lev, lat, lon) float64 2.652e-13 2.652e-13 2.652e-13 ...
    TRC_TSOA3    (time, lev, lat, lon) float64 7.673e-14 7.673e-14 7.673e-14 ...
    TRC_TSOA0    (time, lev, lat, lon) float64 1.336e-13 1.336e-13 1.336e-13 ...
    TRC_ISOG1    (time, lev, lat, lon) float64 2.549e-13 2.549e-13 2.549e-13 ...
    TRC_ISOG2    (time, lev, lat, lon) float64 1.821e-13 1.821e-13 1.821e-13 ...
    TRC_ISOG3    (time, lev, lat, lon) float64 2.367e-12 2.367e-12 2.367e-12 ...
    TRC_ISOA1    (time, lev, lat, lon) float64 4.221e-13 4.221e-13 4.221e-13 ...
    TRC_ISOA2    (time, lev, lat, lon) float64 4.881e-14 4.881e-14 4.881e-14 ...
    TRC_ISOA3    (time, lev, lat, lon) float64 7.445e-14 7.445e-14 7.445e-14 ...
    TRC_BENZ     (time, lev, lat, lon) float64 2.746e-11 2.746e-11 2.746e-11 ...
    TRC_TOLU     (time, lev, lat, lon) float64 8.516e-12 8.516e-12 8.516e-12 ...
    TRC_XYLE     (time, lev, lat, lon) float64 1.201e-12 1.201e-12 1.201e-12 ...
    TRC_ASOG1    (time, lev, lat, lon) float64 5.583e-15 5.583e-15 5.583e-15 ...
    TRC_ASOG2    (time, lev, lat, lon) float64 1.036e-14 1.036e-14 1.036e-14 ...
    TRC_ASOG3    (time, lev, lat, lon) float64 1.367e-13 1.367e-13 1.367e-13 ...
    TRC_ASOAN    (time, lev, lat, lon) float64 5.542e-14 5.542e-14 5.542e-14 ...
    TRC_ASOA1    (time, lev, lat, lon) float64 9.237e-15 9.237e-15 9.237e-15 ...
    TRC_ASOA2    (time, lev, lat, lon) float64 2.774e-15 2.774e-15 2.774e-15 ...
    TRC_ASOA3    (time, lev, lat, lon) float64 4.3e-15 4.3e-15 4.3e-15 ...
Attributes:
    Title:        COARDS/netCDF file created by BPCH2COARDS (GAMAP v2-17+)
    Conventions:  COARDS
    Format:       NetCDF-3
    Model:        GEOS5
    Delta_Lon:    5.0
    Delta_Lat:    4.0
    NLayers:      72
    Start_Date:   20130701
    Start_Time:   0
    End_Date:     20130701
    End_Time:     0
    Delta_Time:   0
    history:      Tue Feb 23 13:37:28 2016: ncatted -a units,TRC_ASOA3,o,c,mo...

ds is an xarray Dataset, which is like an in-memory representation of the entire NetCDF file. Just think this data type as a much more powerful version of MATLAB/IDL structure type, if you are unfamiliar with Python’s object-oriented programming.

In [3]:
type(ds)
Out[3]:
xarray.core.dataset.Dataset

Extracting variable

A Dataset typical contains many variables, just like a NetCDF file. To extract a single variable, simply use ds['varname']

In [4]:
dr = ds['TRC_O3']
dr
Out[4]:
<xarray.DataArray 'TRC_O3' (time: 1, lev: 72, lat: 46, lon: 72)>
[238464 values with dtype=float64]
Coordinates:
  * time     (time) datetime64[ns] 2013-07-01
  * lev      (lev) float32 0.9925 0.977456 0.96237 0.947285 0.9322 0.917116 ...
  * lat      (lat) float32 -89.0 -86.0 -82.0 -78.0 -74.0 -70.0 -66.0 -62.0 ...
  * lon      (lon) float32 -180.0 -175.0 -170.0 -165.0 -160.0 -155.0 -150.0 ...
Attributes:
    long_name:  O3 tracer
    units:      mol/mol

The returning dr is a DataArray, containing all information for a single variable, including the numerical data itself, the coodinate information and additional attributes like long_name and units here.

In [5]:
type(dr)
Out[5]:
xarray.core.dataarray.DataArray

DataArray (a single variable) and Dataset (containing multiple DataArray) are the only two data types you need to know in xarray.

Syntax explanation

Recall that you can select a variable just by its name (ds['varname']), not by its underlying ID. This syntax is nothing special – it just follows the notation of the dictionary type, a basic data type in Python for storing key-value pairs.

In [6]:
# make a simple dictionary
# 'var1' is the key, and 1 is the value
my_dict = dict(var1=1, var2=2)
my_dict
Out[6]:
{'var1': 1, 'var2': 2}
In [7]:
my_dict['var1'] # retrieve the value by the key
Out[7]:
1

Metadata

DataArray has some additional attributes to describe the variable. They can be accessed by .attrs.

In [8]:
dr.attrs
Out[8]:
OrderedDict([('long_name', 'O3 tracer'), ('units', 'mol/mol')])

It is again a dictionary type (but with fixed order). Again, retrieve the value by the key:

In [9]:
dr.attrs['units']
Out[9]:
'mol/mol'

Short cut:

In [10]:
dr.units
Out[10]:
'mol/mol'

DataSet contains global attributes describing that NetCDF file

In [11]:
ds.attrs['Title']
Out[11]:
'COARDS/netCDF file created by BPCH2COARDS (GAMAP v2-17+)'

All available keys are:

In [12]:
ds.attrs.keys()
Out[12]:
odict_keys(['Title', 'Conventions', 'Format', 'Model', 'Delta_Lon', 'Delta_Lat', 'NLayers', 'Start_Date', 'Start_Time', 'End_Date', 'End_Time', 'Delta_Time', 'history'])

Convert to numpy array

If you don’t need additional information like coordinates and units, you can always convert a DataArray to a pure numpy array by .values.

In [13]:
rawdata = dr.values # get pure numpy array

It is a 4D numpy array with the shape (time: 1, lev: 72, lat: 46, lon: 72)

In [14]:
type(rawdata), rawdata.shape
Out[14]:
(numpy.ndarray, (1, 72, 46, 72))

Modifying data

Most of the time you don’t need to convert DataArray to numpy array, because arithmetic operations and numpy functions can directly work on DataArray. Here we multiply the data by \(10^9\) and change the unit to ppbv.

In [15]:
dr *= 1e9 # v/v -> ppbv
dr.attrs['units'] = 'ppbv'
dr
Out[15]:
<xarray.DataArray 'TRC_O3' (time: 1, lev: 72, lat: 46, lon: 72)>
array([[[[ 26.152373, ...,  26.152373],
         ...,
         [  6.460082, ...,   6.460082]],

        ...,
        [[ 59.938827, ...,  59.938827],
         ...,
         [ 68.322656, ...,  68.322656]]]])
Coordinates:
  * time     (time) datetime64[ns] 2013-07-01
  * lev      (lev) float32 0.9925 0.977456 0.96237 0.947285 0.9322 0.917116 ...
  * lat      (lat) float32 -89.0 -86.0 -82.0 -78.0 -74.0 -70.0 -66.0 -62.0 ...
  * lon      (lon) float32 -180.0 -175.0 -170.0 -165.0 -160.0 -155.0 -150.0 ...
Attributes:
    long_name:  O3 tracer
    units:      ppbv

Now let’s look at 3 common use cases. They are meant to be read in order.

Case 1: surface field

Selecting data

Conventionally, you will select data by indexing the pure numerical array rawdata, like that:

In [16]:
data_surf = rawdata[0,0,:,:] # get 1st time slice and 1st level
data_surf.shape # only contain lat and lon dimensions
Out[16]:
(46, 72)

With the DataArray type, you can index the data by dimension names, without thinking about which dimension means time and which one means level.

In [17]:
dr_surf = dr.isel(time=0, lev=0)
dr_surf
Out[17]:
<xarray.DataArray 'TRC_O3' (lat: 46, lon: 72)>
array([[ 26.152373,  26.152373,  26.152373, ...,  26.152373,  26.152373,
         26.152373],
       [ 26.224114,  26.226319,  26.230103, ...,  26.205269,  26.221985,
         26.20175 ],
       [ 25.707591,  25.481674,  25.132504, ...,  25.927429,  25.662809,
         25.705889],
       ...,
       [  8.588493,   9.013182,   8.793466, ...,   8.650769,   8.041175,
          8.159272],
       [  6.469575,   6.469804,   6.465895, ...,   6.476197,   6.476856,
          6.473523],
       [  6.460082,   6.460082,   6.460082, ...,   6.460082,   6.460082,
          6.460082]])
Coordinates:
    time     datetime64[ns] 2013-07-01
    lev      float32 0.9925
  * lat      (lat) float32 -89.0 -86.0 -82.0 -78.0 -74.0 -70.0 -66.0 -62.0 ...
  * lon      (lon) float32 -180.0 -175.0 -170.0 -165.0 -160.0 -155.0 -150.0 ...
Attributes:
    long_name:  O3 tracer
    units:      ppbv

Verify that both methods give the same result:

In [18]:
np.allclose(data_surf, dr_surf.values)
Out[18]:
True

Convenience method for plotting

A 2D DataArray has a convenient plot() method.

In [19]:
dr_surf.plot()
Out[19]:
<matplotlib.collections.QuadMesh at 0x111eaa550>
_images/xarray_for_GC_55_1.png

You can tweak colormap and colorbar range.

In [20]:
dr_surf.plot(cmap='jet', vmin=0, vmax=100)
Out[20]:
<matplotlib.collections.QuadMesh at 0x112083b00>
_images/xarray_for_GC_57_1.png

See matplotlib colormap for all available color schemes.

Using gamap’s color scheme

We can also use the WhGrYlRd scheme from our good old friend gamap.

Download this WhGrYlRd.txt file and make a custom colormap (That’s a quick solution for now. We’ll put this into the GCPy package.)

In [21]:
# get gamap's WhGrYlRd color scheme from file
from matplotlib.colors import ListedColormap
WhGrYlRd_scheme = np.genfromtxt('./WhGrYlRd.txt', delimiter=' ')
WhGrYlRd = ListedColormap(WhGrYlRd_scheme/255.0)
In [22]:
dr_surf.plot(cmap=WhGrYlRd, vmin=0, vmax=100)
Out[22]:
<matplotlib.collections.QuadMesh at 0x1122156a0>
_images/xarray_for_GC_62_1.png

Extending case 1: visualization details

Note: Feel free to jump to Case 2 if you don’t care about detailed plotting for now. Here we demonstrate how to make publication-quality plots only using basic, low-level plotting functions. We’ll provide high-level wrappers in the GCPy package, but knowing how to write things from scratch is also very useful.

Basic heatmap

To tweak details, we can always fall back to the original matplotlib functions. The default plt.pcolormesh() plots 2D heat maps, but it has no idea about the spherical coordinate.

In [23]:
plt.pcolormesh(dr_surf, cmap=WhGrYlRd)
Out[23]:
<matplotlib.collections.QuadMesh at 0x1123a4780>
_images/xarray_for_GC_67_1.png

We can get the coordinate values from the DataArray, because the original NetCDF file contains the coordinate information.

In [24]:
lat = dr_surf['lat'].values
lon = dr_surf['lon'].values
print('lat:\n', lat)
print('lon:\n', lon)
lat:
 [-89. -86. -82. -78. -74. -70. -66. -62. -58. -54. -50. -46. -42. -38. -34.
 -30. -26. -22. -18. -14. -10.  -6.  -2.   2.   6.  10.  14.  18.  22.  26.
  30.  34.  38.  42.  46.  50.  54.  58.  62.  66.  70.  74.  78.  82.  86.
  89.]
lon:
 [-180. -175. -170. -165. -160. -155. -150. -145. -140. -135. -130. -125.
 -120. -115. -110. -105. -100.  -95.  -90.  -85.  -80.  -75.  -70.  -65.
  -60.  -55.  -50.  -45.  -40.  -35.  -30.  -25.  -20.  -15.  -10.   -5.
    0.    5.   10.   15.   20.   25.   30.   35.   40.   45.   50.   55.
   60.   65.   70.   75.   80.   85.   90.   95.  100.  105.  110.  115.
  120.  125.  130.  135.  140.  145.  150.  155.  160.  165.  170.  175.]

Calling plt.pcolormesh(X, Y, data) instead of plt.pcolormesh(data) will set correct coordinate values.

In [25]:
plt.pcolormesh(lon, lat, dr_surf, cmap=WhGrYlRd)
Out[25]:
<matplotlib.collections.QuadMesh at 0x112473cf8>
_images/xarray_for_GC_71_1.png

To gain full control on the figure, we can create an axis object and call the plotting method on it.

(This is not too useful here but is particularly useful for multi-panel plots, where each axis means one subplot so you can fine-tune each panel. See here for a subplot example.)

In [26]:
# has the same effect as the previous code cell
ax = plt.axes()
ax.pcolormesh(lon, lat, dr_surf, cmap=WhGrYlRd)
Out[26]:
<matplotlib.collections.QuadMesh at 0x1125730f0>
_images/xarray_for_GC_73_1.png

Geographic maps

To plot on geographic maps, the only change is adding the projection keyword to plt.axes()

(Recall that ccrs is from the cartopy the package, which is much easier to use than the old Basemap package).

In [27]:
ax = plt.axes(projection=ccrs.PlateCarree())
ax.coastlines()
ax.pcolormesh(lon, lat, dr_surf, cmap=WhGrYlRd)
Out[27]:
<matplotlib.collections.QuadMesh at 0x113dd19e8>
_images/xarray_for_GC_76_1.png

PlateCarree is the most commonly used projection but there are tons of projections available. See cartopy documentation for all options.

Let’s try the Robinson projection. Note that the transform keyword still takes PlateCarree, for some mathematical reasons.

In [28]:
ax = plt.axes(projection=ccrs.Robinson())
ax.coastlines()
ax.pcolormesh(lon, lat, dr_surf, cmap=WhGrYlRd, transform=ccrs.PlateCarree())
Out[28]:
<matplotlib.collections.QuadMesh at 0x113eeccc0>
_images/xarray_for_GC_78_1.png

Correcting map boundaries

You might notice a blank stripe on the right edge (\(180^\circ E/W\)). That’s because pcolormesh(X, Y, data) expects X and Y to be cell bounds, not cell centers. Fortunately, creating cell bound coordinates is trivial:

In [29]:
lon_b = np.linspace(-182.5, 177.5, 73)
print(lon_b)
[-182.5 -177.5 -172.5 -167.5 -162.5 -157.5 -152.5 -147.5 -142.5 -137.5
 -132.5 -127.5 -122.5 -117.5 -112.5 -107.5 -102.5  -97.5  -92.5  -87.5
  -82.5  -77.5  -72.5  -67.5  -62.5  -57.5  -52.5  -47.5  -42.5  -37.5
  -32.5  -27.5  -22.5  -17.5  -12.5   -7.5   -2.5    2.5    7.5   12.5
   17.5   22.5   27.5   32.5   37.5   42.5   47.5   52.5   57.5   62.5
   67.5   72.5   77.5   82.5   87.5   92.5   97.5  102.5  107.5  112.5
  117.5  122.5  127.5  132.5  137.5  142.5  147.5  152.5  157.5  162.5
  167.5  172.5  177.5]
In [30]:
lat_b = np.linspace(-92, 92, 47)
lat_b[0] = -90 # -92 => -90
lat_b[-1] = 90 #  92 =>  90
print(lat_b)
[-90. -88. -84. -80. -76. -72. -68. -64. -60. -56. -52. -48. -44. -40. -36.
 -32. -28. -24. -20. -16. -12.  -8.  -4.   0.   4.   8.  12.  16.  20.  24.
  28.  32.  36.  40.  44.  48.  52.  56.  60.  64.  68.  72.  76.  80.  84.
  88.  90.]

Now there’s no blank stripe in the figure.

In [31]:
ax = plt.axes(projection=ccrs.PlateCarree())
ax.coastlines()
plt.pcolormesh(lon_b, lat_b, dr_surf, cmap=WhGrYlRd)
Out[31]:
<matplotlib.collections.QuadMesh at 0x113fc8978>
_images/xarray_for_GC_84_1.png

Adding details

Add a few more codes to tweak figure size, title, and colorbar.

In [32]:
plt.figure(figsize=[10,6]) # make figure bigger

ax = plt.axes(projection=ccrs.PlateCarree())
ax.coastlines()
plt.pcolormesh(lon_b, lat_b, dr_surf, cmap=WhGrYlRd)

plt.title('Surface Ozone', fontsize=15)
cb = plt.colorbar(shrink=0.6) # use shrink to make colorbar smaller
cb.set_label("ppbv")
_images/xarray_for_GC_87_0.png

Gridlines and latlon ticks need more codes. But we can create a function and don’t have to repeat those those codes for every plot.

In [33]:
from cartopy.mpl.gridliner import LONGITUDE_FORMATTER, LATITUDE_FORMATTER
import matplotlib.ticker as mticker

def add_latlon_ticks(ax):
    '''Add latlon label ticks and gridlines to ax

    Adapted from
    http://scitools.org.uk/cartopy/docs/v0.13/matplotlib/gridliner.html
    '''
    gl = ax.gridlines(crs=ccrs.PlateCarree(), draw_labels=True,
                      linewidth=0.5, color='gray', linestyle='--')
    gl.xlabels_top = False
    gl.ylabels_right = False
    gl.xformatter = LONGITUDE_FORMATTER
    gl.yformatter = LATITUDE_FORMATTER
    gl.ylocator = mticker.FixedLocator(np.arange(-90,91,30))

Apply the above function to the figure, and finally save the figure to a file.

In [34]:
fig = plt.figure(figsize=[10,6])

ax = plt.axes(projection=ccrs.PlateCarree())
ax.coastlines()
plt.pcolormesh(lon_b, lat_b, dr_surf, cmap=WhGrYlRd)
plt.title('Surface Ozone', fontsize=15)

cb = plt.colorbar(shrink=0.6)
cb.set_label("ppbv")

# above codes are exactly the same as the previous example
# only the two lines below are new

add_latlon_ticks(ax) # add ticks and gridlines
fig.savefig('Surface_Ozone.png', dpi=200) # save figure to a file
_images/xarray_for_GC_91_0.png

dpi=200 leads to a pretty high-quality plot. You can view it here.

Case 2: zonal mean

Dropping unnecessary dimension

Recall that our DataArray has 4 dimensions, although the time dimension is redundant.

In [35]:
dr
Out[35]:
<xarray.DataArray 'TRC_O3' (time: 1, lev: 72, lat: 46, lon: 72)>
array([[[[ 26.152373, ...,  26.152373],
         ...,
         [  6.460082, ...,   6.460082]],

        ...,
        [[ 59.938827, ...,  59.938827],
         ...,
         [ 68.322656, ...,  68.322656]]]])
Coordinates:
  * time     (time) datetime64[ns] 2013-07-01
  * lev      (lev) float32 0.9925 0.977456 0.96237 0.947285 0.9322 0.917116 ...
  * lat      (lat) float32 -89.0 -86.0 -82.0 -78.0 -74.0 -70.0 -66.0 -62.0 ...
  * lon      (lon) float32 -180.0 -175.0 -170.0 -165.0 -160.0 -155.0 -150.0 ...
Attributes:
    long_name:  O3 tracer
    units:      ppbv

Get rid of this redundant dimension for convenience.

In [36]:
dr = dr.isel(time=0)  # equivalent to dr.squeeze() in this case
dr
Out[36]:
<xarray.DataArray 'TRC_O3' (lev: 72, lat: 46, lon: 72)>
array([[[ 26.152373,  26.152373, ...,  26.152373,  26.152373],
        [ 26.224114,  26.226319, ...,  26.221985,  26.20175 ],
        ...,
        [  6.469575,   6.469804, ...,   6.476856,   6.473523],
        [  6.460082,   6.460082, ...,   6.460082,   6.460082]],

       [[ 26.328511,  26.328511, ...,  26.328511,  26.328511],
        [ 26.338292,  26.334288, ...,  26.326118,  26.327644],
        ...,
        [ 10.846659,  10.846609, ...,  10.846533,  10.846559],
        [ 10.847146,  10.847146, ...,  10.847146,  10.847146]],

       ...,
       [[ 59.93882 ,  59.93882 , ...,  59.93882 ,  59.93882 ],
        [ 59.93882 ,  59.93882 , ...,  59.93882 ,  59.93882 ],
        ...,
        [ 68.32267 ,  68.32267 , ...,  68.32267 ,  68.32267 ],
        [ 68.322663,  68.322663, ...,  68.322663,  68.322663]],

       [[ 59.938827,  59.938827, ...,  59.938827,  59.938827],
        [ 59.938813,  59.938813, ...,  59.938813,  59.938813],
        ...,
        [ 68.322663,  68.322663, ...,  68.322663,  68.322663],
        [ 68.322656,  68.322656, ...,  68.322656,  68.322656]]])
Coordinates:
    time     datetime64[ns] 2013-07-01
  * lev      (lev) float32 0.9925 0.977456 0.96237 0.947285 0.9322 0.917116 ...
  * lat      (lat) float32 -89.0 -86.0 -82.0 -78.0 -74.0 -70.0 -66.0 -62.0 ...
  * lon      (lon) float32 -180.0 -175.0 -170.0 -165.0 -160.0 -155.0 -150.0 ...
Attributes:
    long_name:  O3 tracer
    units:      ppbv

Of course, you can do the same thing with pure numpy array. Recall that rawdata is 4D.

In [37]:
rawdata.shape
Out[37]:
(1, 72, 46, 72)
In [38]:
data_sqz = rawdata[0,...] # equivalent to rawdata.squeeze() in this case
data_sqz.shape
Out[38]:
(72, 46, 72)

Averaging

Here comes the example at the beginning! Take zonal average without thinking about dimension order.

In [39]:
dr_zmean = dr.mean(dim='lon')
dr_zmean
Out[39]:
<xarray.DataArray 'TRC_O3' (lev: 72, lat: 46)>
array([[ 26.152373,  26.194946,  25.725238, ...,  10.479322,   6.539299,
          6.460082],
       [ 26.328511,  26.341357,  26.298183, ...,  13.434381,  10.794238,
         10.847146],
       [ 26.498927,  26.484916,  26.580563, ...,  17.146443,  18.342079,
         18.30816 ],
       ...,
       [ 59.938841,  59.938838,  60.476615, ...,  68.69855 ,  68.322649,
         68.322649],
       [ 59.93882 ,  59.938823,  60.476616, ...,  68.698553,  68.32267 ,
         68.322663],
       [ 59.938827,  59.938816,  60.476615, ...,  68.698551,  68.322663,
         68.322656]])
Coordinates:
    time     datetime64[ns] 2013-07-01
  * lev      (lev) float32 0.9925 0.977456 0.96237 0.947285 0.9322 0.917116 ...
  * lat      (lat) float32 -89.0 -86.0 -82.0 -78.0 -74.0 -70.0 -66.0 -62.0 ...

You can do the same thing with pure numpy array, as long as you know the longitude axis is the last dimension.

In [40]:
data_zmean = data_sqz.mean(axis=2)
data_zmean.shape
Out[40]:
(72, 46)

But only having pure numerical data makes me feel unsafe. I often need to double check the 72 above means the number of vertical levels, not the size of longitude dimension.

Check if two approaches give the same result:

In [41]:
np.allclose(dr_zmean.values, data_zmean)
Out[41]:
True

Plotting

dr_zmean is a 2D DataArray, so it has a convenient plotting method.

In [42]:
dr_zmean.plot(cmap=WhGrYlRd)
Out[42]:
<matplotlib.collections.QuadMesh at 0x114e3d278>
_images/xarray_for_GC_112_1.png

It looks weird, because the lev coordinate is the sigma value, not pressure or height.

In [43]:
dr_zmean['lev']
Out[43]:
<xarray.DataArray 'lev' (lev: 72)>
array([  9.924996e-01,   9.774562e-01,   9.623704e-01,   9.472854e-01,
         9.322005e-01,   9.171155e-01,   9.020311e-01,   8.869475e-01,
         8.718643e-01,   8.567811e-01,   8.416982e-01,   8.266159e-01,
         8.090211e-01,   7.863999e-01,   7.612653e-01,   7.361340e-01,
         7.110056e-01,   6.858778e-01,   6.544709e-01,   6.167904e-01,
         5.791153e-01,   5.414491e-01,   5.037952e-01,   4.661533e-01,
         4.285283e-01,   3.909265e-01,   3.533493e-01,   3.098539e-01,
         2.635869e-01,   2.237725e-01,   1.900607e-01,   1.615131e-01,
         1.372873e-01,   1.166950e-01,   9.919107e-02,   8.431271e-02,
         7.159988e-02,   6.068223e-02,   5.132635e-02,   4.332603e-02,
         3.649946e-02,   3.067280e-02,   2.569886e-02,   2.146679e-02,
         1.787765e-02,   1.484372e-02,   1.228746e-02,   1.014066e-02,
         8.336023e-03,   6.818051e-03,   5.548341e-03,   4.492209e-03,
         3.618591e-03,   2.899926e-03,   2.311980e-03,   1.833603e-03,
         1.446498e-03,   1.134948e-03,   8.855623e-04,   6.868625e-04,
         5.292849e-04,   4.050606e-04,   3.077095e-04,   2.318676e-04,
         1.731298e-04,   1.279055e-04,   9.328887e-05,   6.678824e-05,
         4.618107e-05,   2.974863e-05,   1.613635e-05,   4.934665e-06], dtype=float32)
Coordinates:
    time     datetime64[ns] 2013-07-01
  * lev      (lev) float32 0.9925 0.977456 0.96237 0.947285 0.9322 0.917116 ...
Attributes:
    long_name:  Eta Centers
    units:      sigma_level
    positive:   up
    axis:       Z

We will provide functions to convert it to height or pressure, but for now let’s just use level index.

In [44]:
dr_zmean['lev'].values = np.arange(1,73)
dr_zmean['lev'].attrs['units'] = 'unitless'
dr_zmean['lev'].attrs['long_name'] = 'level index'
dr_zmean['lev']
Out[44]:
<xarray.DataArray 'lev' (lev: 72)>
array([ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16, 17, 18,
       19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36,
       37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54,
       55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72])
Coordinates:
    time     datetime64[ns] 2013-07-01
  * lev      (lev) int64 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 ...
Attributes:
    long_name:  level index
    units:      unitless
    positive:   up
    axis:       Z

Now the plot looks normal.

In [45]:
dr_zmean.plot(cmap=WhGrYlRd)
Out[45]:
<matplotlib.collections.QuadMesh at 0x114e9d4a8>
_images/xarray_for_GC_118_1.png

Slicing

Say we only want to plot the troposphere, we can use a slice to select level 1~30.

In [46]:
dr_zmean.isel(lev=slice(0,30))
Out[46]:
<xarray.DataArray 'TRC_O3' (lev: 30, lat: 46)>
array([[  26.152373,   26.194946,   25.725238, ...,   10.479322,    6.539299,
           6.460082],
       [  26.328511,   26.341357,   26.298183, ...,   13.434381,   10.794238,
          10.847146],
       [  26.498927,   26.484916,   26.580563, ...,   17.146443,   18.342079,
          18.30816 ],
       ...,
       [  53.418344,   53.418347,   45.959175, ...,   80.124412,   84.74668 ,
          84.746681],
       [  71.646951,   71.646957,   58.809819, ...,  102.935995,  110.957415,
         110.957423],
       [ 104.243057,  104.243062,   81.957629, ...,  176.496959,  179.441246,
         179.441244]])
Coordinates:
    time     datetime64[ns] 2013-07-01
  * lev      (lev) int64 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 ...
  * lat      (lat) float32 -89.0 -86.0 -82.0 -78.0 -74.0 -70.0 -66.0 -62.0 ...

The equivalent way in numpy would be:

In [47]:
data_zmean[0:30,:]
Out[47]:
array([[  26.15237271,   26.19494634,   25.72523762, ...,   10.47932228,
           6.53929865,    6.46008225],
       [  26.32851093,   26.34135744,   26.29818292, ...,   13.43438139,
          10.79423836,   10.84714629],
       [  26.4989275 ,   26.48491587,   26.58056279, ...,   17.14644329,
          18.34207919,   18.30816032],
       ...,
       [  53.41834353,   53.41834703,   45.95917533, ...,   80.12441219,
          84.74668009,   84.74668078],
       [  71.64695148,   71.64695711,   58.80981943, ...,  102.93599454,
         110.95741481,  110.95742281],
       [ 104.24305685,  104.24306179,   81.95762888, ...,  176.49695927,
         179.44124566,  179.44124409]])

Only plot the tropospheric region:

In [48]:
dr_zmean.isel(lev=slice(0,30)).plot(cmap=WhGrYlRd, vmax=80, vmin=0)
plt.title('tropospheric O3 (ppb)') # overwrite the default title
Out[48]:
<matplotlib.text.Text at 0x1150f8ac8>
_images/xarray_for_GC_125_1.png

Case 3: vertical profile

Selecting data

Say we want to plot the O3 profile at a specific location. Let’s see what locations are available.

In [49]:
print('lat:\n', dr['lat'].values)
print('lon:\n', dr['lon'].values)
lat:
 [-89. -86. -82. -78. -74. -70. -66. -62. -58. -54. -50. -46. -42. -38. -34.
 -30. -26. -22. -18. -14. -10.  -6.  -2.   2.   6.  10.  14.  18.  22.  26.
  30.  34.  38.  42.  46.  50.  54.  58.  62.  66.  70.  74.  78.  82.  86.
  89.]
lon:
 [-180. -175. -170. -165. -160. -155. -150. -145. -140. -135. -130. -125.
 -120. -115. -110. -105. -100.  -95.  -90.  -85.  -80.  -75.  -70.  -65.
  -60.  -55.  -50.  -45.  -40.  -35.  -30.  -25.  -20.  -15.  -10.   -5.
    0.    5.   10.   15.   20.   25.   30.   35.   40.   45.   50.   55.
   60.   65.   70.   75.   80.   85.   90.   95.  100.  105.  110.  115.
  120.  125.  130.  135.  140.  145.  150.  155.  160.  165.  170.  175.]

Say we want to select \((30^{\circ}N, 60^{\circ}E)\). Hey, don’t count which element in lat array is 30! sel (instead of isel) can select data by coordinate values, not by coordinate index. This feature allows you to use almost the same code for data at different resolutions.

In [50]:
profile = dr.sel(lat=30, lon=60)
profile
Out[50]:
<xarray.DataArray 'TRC_O3' (lev: 72)>
array([   40.683279,    50.502422,    58.76155 ,    59.553575,    59.688794,
          59.776063,    59.863531,    59.921533,    59.98514 ,    60.082513,
          60.335651,    61.095989,    62.673543,    65.194925,    67.979464,
          70.64056 ,    73.474794,    74.231899,    74.188087,    75.221195,
          77.541642,    80.683797,    83.345057,    82.974218,    80.626258,
          75.736345,    73.050543,    72.00984 ,    78.042632,    82.909999,
          86.870436,    91.910749,    97.782276,   102.39502 ,   129.772232,
         225.23146 ,   950.023207,   950.023036,  2735.300086,  2735.299631,
        4467.107829,  4467.108738,  6410.347396,  6410.347851,  8152.221199,
        8152.217561,  8152.21847 ,  8152.222108,  8664.22306 ,  8664.223969,
        8664.226698,  8664.225788,  6528.357062,  6528.355243,  6528.353879,
        6528.356153,  3128.377784,  3128.377784,  3128.377784,  3128.37733 ,
        1485.034545,  1485.03409 ,  1485.034431,  1485.034431,   313.26357 ,
         313.263598,   313.263513,   313.263541,    85.170825,    85.170811,
          85.170818,    85.170839])
Coordinates:
    time     datetime64[ns] 2013-07-01
  * lev      (lev) int64 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 ...
    lat      float32 30.0
    lon      float32 60.0
Attributes:
    long_name:  O3 tracer
    units:      ppbv

Plotting

profile is a 1D DataArray and it also has a convenience method for plotting.

In [51]:
profile.plot()
Out[51]:
[<matplotlib.lines.Line2D at 0x1152b4da0>]
_images/xarray_for_GC_134_1.png

This is particularly useful for time-series, but for profile we want lev to be the y-axis. We can always fall back to basic matplotlib functions.

In [52]:
plt.plot(profile, profile['lev'])
plt.ylabel('lev');plt.xlabel('ppbv')
plt.title('Ozone profile at $(30^{\circ}N, 60^{\circ}E)$')
Out[52]:
<matplotlib.text.Text at 0x115317e80>
_images/xarray_for_GC_136_1.png

Writing NetCDF file

During the previous 3 cases, we’ve made several changes to our DataArray object, including

  • scale its value by \(10^{9}\)
  • change its attribute ‘unit’ from ‘mol/mol’ to ‘ppbv’
  • drop the time dimension
  • change its vertical coordinate values to integers
  • change the vertical coordinate unit to ‘unitless’
In [53]:
dr # check its content
Out[53]:
<xarray.DataArray 'TRC_O3' (lev: 72, lat: 46, lon: 72)>
array([[[ 26.152373,  26.152373, ...,  26.152373,  26.152373],
        [ 26.224114,  26.226319, ...,  26.221985,  26.20175 ],
        ...,
        [  6.469575,   6.469804, ...,   6.476856,   6.473523],
        [  6.460082,   6.460082, ...,   6.460082,   6.460082]],

       [[ 26.328511,  26.328511, ...,  26.328511,  26.328511],
        [ 26.338292,  26.334288, ...,  26.326118,  26.327644],
        ...,
        [ 10.846659,  10.846609, ...,  10.846533,  10.846559],
        [ 10.847146,  10.847146, ...,  10.847146,  10.847146]],

       ...,
       [[ 59.93882 ,  59.93882 , ...,  59.93882 ,  59.93882 ],
        [ 59.93882 ,  59.93882 , ...,  59.93882 ,  59.93882 ],
        ...,
        [ 68.32267 ,  68.32267 , ...,  68.32267 ,  68.32267 ],
        [ 68.322663,  68.322663, ...,  68.322663,  68.322663]],

       [[ 59.938827,  59.938827, ...,  59.938827,  59.938827],
        [ 59.938813,  59.938813, ...,  59.938813,  59.938813],
        ...,
        [ 68.322663,  68.322663, ...,  68.322663,  68.322663],
        [ 68.322656,  68.322656, ...,  68.322656,  68.322656]]])
Coordinates:
    time     datetime64[ns] 2013-07-01
  * lev      (lev) int64 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 ...
  * lat      (lat) float32 -89.0 -86.0 -82.0 -78.0 -74.0 -70.0 -66.0 -62.0 ...
  * lon      (lon) float32 -180.0 -175.0 -170.0 -165.0 -160.0 -155.0 -150.0 ...
Attributes:
    long_name:  O3 tracer
    units:      ppbv

We can save this modified DataArray to a file by just one line of code. This simplicity is just amazing, compare to the terribly complicated procedure in IDL.

In [54]:
dr.to_netcdf('O3_restart.nc')

Use ncdump in the Linux shell to check its content:

$ncdump -h O3_restart.nc

netcdf O3_restart {
dimensions:
    lev = 72 ;
    lat = 46 ;
    lon = 72 ;
variables:
    float time ;
        time:_FillValue = NaNf ;
        time:long_name = "Time" ;
        time:axis = "T" ;
        time:delta_t = "0000-00-00 00:00:00" ;
        time:units = "hours since 1985-01-01" ;
        time:calendar = "gregorian" ;
    float lev(lev) ;
        lev:_FillValue = NaNf ;
        lev:long_name = "level index" ;
        lev:units = "unitless" ;
        lev:positive = "up" ;
        lev:axis = "Z" ;
    float lat(lat) ;
        lat:_FillValue = NaNf ;
        lat:long_name = "Latitude" ;
        lat:units = "degrees_north" ;
        lat:axis = "Y" ;
    float lon(lon) ;
        lon:_FillValue = NaNf ;
        lon:long_name = "Longitude" ;
        lon:units = "degrees_east" ;
        lon:axis = "X" ;
    float TRC_O3(lev, lat, lon) ;
        TRC_O3:_FillValue = 1.e+30f ;
        TRC_O3:long_name = "O3 tracer" ;
        TRC_O3:units = "ppbv" ;
        TRC_O3:add_offset = 0.f ;
        TRC_O3:scale_factor = 1.f ;

// global attributes:
        :coordinates = "time" ;
}

We can see that the file has pretty complete information – dimensions, coordinates, units, everything looks fine.

Finally, open this new file to check everything is correct:

In [55]:
xr.open_dataarray('O3_restart.nc')
Out[55]:
<xarray.DataArray 'TRC_O3' (lev: 72, lat: 46, lon: 72)>
[238464 values with dtype=float64]
Coordinates:
    time     datetime64[ns] 2013-07-01
  * lev      (lev) float32 1.0 2.0 3.0 4.0 5.0 6.0 7.0 8.0 9.0 10.0 11.0 ...
  * lat      (lat) float32 -89.0 -86.0 -82.0 -78.0 -74.0 -70.0 -66.0 -62.0 ...
  * lon      (lon) float32 -180.0 -175.0 -170.0 -165.0 -160.0 -155.0 -150.0 ...
Attributes:
    long_name:  O3 tracer
    units:      ppbv

Further reading

Check out xarray documentation, especially the examples!