Custom Kernels with apply_ufunc

Overview

In this tutorial, you learn:

  • What apply_ufunc is and its importance in the xarray Python library.

  • The basic usage of apply_ufunc to apply your function to a DataArray.

  • Applying custom kernels to DataArray with CuPy

Prerequisites

Concepts

Importance

Notes

Basics of Cupy

Necessary

Familiarity with Xarray

Necessary

  • Time to learn: 20 minutes

What is apply_ufunc?

apply_ufunc is a powerful function provided by the xarray library, which is commonly used for data manipulation in the Python programming language. This function allows users to apply universal functions (ufuncs) on xarray data structures, including DataArray, Dataset, or Variable objects. With apply_ufunc, users can apply arbitrary functions that are compatible with raw arrays (e.g. NumPy or CuPy), and the function will take care of aligning the input data, looping over dimensions, and maintaining metadata.

See also

See the Xarray tutorial material on apply_ufunc for more.

Simple functions that act independently on each value should work without any additional arguments.

Simple Example

In the example below, we calculate the saturation vapor pressure by using apply_ufunc().

But first, let’s create some sample data to work with.

We’ll use a 3-dimensional dataset (time, latitude, longitude) with random values:

import time

import cupy as cp
import cupy_xarray  # Adds .cupy to Xarray objects
import numpy as np
import pandas as pd
import xarray as xr
np.random.seed(0)

# Create the time range.
date = pd.date_range("2010-01-01", "2020-12-31", freq="M")

# Create the latitude range.
lat = np.arange(-90, 90, 1)

# Create the longitude range.
lon = np.arange(-180, 180, 1)

# Create random data
data_np = np.random.rand(len(date), len(lat), len(lon))
data_cp = cp.array(data_np)

# -- Create DataArray with Numpy data
data_xr_np = xr.DataArray(
    data_np,
    dims=["time", "lat", "lon"],
    coords=[date, lat, lon],
)

# -- Create DataArray with CuPy data
data_xr_cp = xr.DataArray(
    data_cp,
    dims=["time", "lat", "lon"],
    coords=[date, lat, lon],
)

Now, let’s define our function that calculate the saturation vapor pressure using Clausius-Clapeyron equation:

def sat_p(t):
    # return saturation vapor pressure
    # using Clausius-Clapeyron equation
    return 0.611 * np.exp(17.67 * (t - 273.15) * ((t - 29.65) ** (-1)))
start_time_np = time.time()

es_np = xr.apply_ufunc(sat_p, data_xr_np)

end_time_np = time.time()
time_np = end_time_np - start_time_np
type(es_np.data)
numpy.ndarray
start_time_cp = time.time()

es_cp = xr.apply_ufunc(sat_p, data_xr_cp)

end_time_cp = time.time()
time_cp = end_time_cp - start_time_cp

print(
    "apply_ufunc with Xarray DataArrays using CuPy provides a",
    round(time_np / time_cp, 2),
    "x speedup over NumPy.\n",
)
GroupBy with Xarray DataArrays using CuPy provides a 0.22 x speedup over NumPy.

Now, what is the output type? Does apply_ufunc preserve the underlying data type?

type(es_cp.data)
cupy.ndarray

Important

apply_ufunc preserves the underlying data type.

In the timing test, you might notice not much speed-up when using CuPy. But let’s run this cell another time:

start_time_cp = time.time()

es_cp = xr.apply_ufunc(sat_p, data_xr_cp)

end_time_cp = time.time()
time_cp = end_time_cp - start_time_cp

print(
    "apply_ufunc with Xarray DataArrays using CuPy provides a",
    round(time_np / time_cp, 2),
    "x speedup over NumPy.\n",
)
GroupBy with Xarray DataArrays using CuPy provides a 97.87 x speedup over NumPy.

Now, we can see much more speed-up using CuPy as explained in the first lesson:

When running these functions for the first time, you may experience a brief pause. This occurs as CuPy compiles the CUDA functions for the first time and cached them on disk for future use.

Elementwise Kernel with CuPy

Elementwise Kernels in CuPy allow for operations to be performed on an element-by-element basis on CuPy arrays.

To create an elementwise kernel in CuPy, you need to use cupy.ElementwiseKernel class. This class defines a CUDA kernel which can be invoked by the __call__ method of the instance.

This elementwise kernel takes three arguments:

  • a string defining the input type(s),

  • a string defining the output type(s),

  • and a string representing the operation to be performed, written in C syntax.

In this example, we want to calculate Relative Humidity using Revised Magnus coefficients by Alduchov and Eskridge.

Revised Magnus Coefficients by Alduchov and Eskridge: $$ RH = \left(\frac{{6.112 \cdot \exp\left(\frac{{17.67 \cdot (T_d - 273.15)}}{{T_d - 29.65}}\right)}}{{6.112 \cdot \exp\left(\frac{{17.67 \cdot (T - 273.15)}}{{T - 29.65}}\right)}}\right) \times 100 % $$

def calculate_relative_humidity(temp, dew_point):
    """
    Calculate Relative Humidity using Revised Magnus coefficients by Alduchov and Eskridge.

    Args:
        temp (float): Temperature in Celsius.
        dew_point (float): Dew Point Temperature in Celsius.

    Returns:
        float: Relative Humidity in percentage.
    """
    temp += 273.15  # Convert temperature to Kelvin
    dew_point += 273.15  # Convert dew point temperature to Kelvin

    es_temp = 6.112 * np.exp(
        (17.67 * (dew_point - 273.15)) / (dew_point - 29.65)
    )  # Saturation vapor pressure at dew point
    es_dew = 6.112 * np.exp(
        (17.67 * (temp - 273.15)) / (temp - 29.65)
    )  # Saturation vapor pressure at temperature

    relative_humidity = (es_dew / es_temp) * 100.0  # Calculate relative humidity in percentage
    return relative_humidity

But for Elementwise kernels we need to write it in C syntax:

  1. Set the list of input and output arguments and their data types:

    • input arguments : float32 temp, float32 d_temp

    • output arguments : float32 rh

  2. Write the code body:

    temp += 273.15;
    dew_point += 273.15;

    // Calculate saturation vapor pressure at dew point
    float es_temp = 6.112 * exp((17.67 * (dew_point - 273.15)) / (dew_point - 29.65));

    // Calculate saturation vapor pressure at temperature
    float es_dew = 6.112 * exp((17.67 * (temp - 273.15)) / (temp - 29.65));

    // Calculate relative humidity in percentage
    float relative_humidity = (es_dew / es_temp) * 100.0;
    
  1. Define the element-wise class and set the kernel name:

    compute_call = cp.ElementwiseKernel(input_list, output_list, code_body, 'RH')

Now let’s test to see how this works in a real example:

# Create random data
data_cp = 20 * (cp.random.rand(len(date), len(lat), len(lon)))


# -- Create Temp DataArray with CuPy data
temp = xr.DataArray(
    data_cp,
    dims=["time", "lat", "lon"],
    coords=[date, lat, lon],
)


offset = 20 * cp.random.rand(len(date), len(lat), len(lon))

# -- Create Wet Bulb Temp DataArray with CuPy data

temp_wet = xr.DataArray(
    data_cp - offset,
    dims=["time", "lat", "lon"],
    coords=[date, lat, lon],
)
input_list = "float64 temp, float64 dew_temp"
output_list = "float64 rh"

code_body = """

            // Calculate saturation vapor pressure at dew point
            float es_temp = 6.112 * exp((17.67 * (dew_temp)) / (dew_temp - 29.65));

            // Calculate saturation vapor pressure at temperature
            float es_dew = 6.112 * exp((17.67 * (temp)) / (temp - 29.65));

            // Calculate relative humidity in percentage
            rh = (es_dew / es_temp) * 100.0;
            """
## -- define the elementwise kernel:
compute_call = cp.ElementwiseKernel(input_list, output_list, code_body, "RH")

kernel = compute_call(data_cp, data_cp - offset)
%%time
result = xr.apply_ufunc(
    compute_call,
    temp,
    temp_wet,
)
##result
CPU times: user 1.24 ms, sys: 0 ns, total: 1.24 ms
Wall time: 1.24 ms

How much this computation took if we wanted to use pure Python?

%%time
relative_humidity = calculate_relative_humidity(temp, temp_wet)
CPU times: user 3.97 ms, sys: 0 ns, total: 3.97 ms
Wall time: 3.98 ms

We can see using the custom kernel method, we removed the pure Python overhead in between calculations, by creating a custom “elementwise” kernel that will run the entire computations on the GPU device.

Congratulations! You have now uncovered how to use apply_ufunc with custom CUDA kernels.

Summary

In this notebook, we have learned about:

  • What apply_ufunc is and its importance in the xarray Python library.

  • The basic usage of apply_ufunc to apply your function to a DataArray.

  • Applying custom kernels to DataArray with CuPy