Skip to main content

Interpolation

“I have experimental data for absorption (dependent variable) measured at various wavelengths (independent variable). Now I want estimates of the absorption at other values in the same range.”

This is an interpolation problem. We wish to obtain the value of a dependent value at point for which we did not sample the independent value. Interpolation is a mathematical procedure for filling in the gaps between available values. SciPy provides a module for interpolation based on the FITPACK library of FORTRAN functions. (Thus, it is fast and reliable.)

A Simple Example

To gain access to the interpolation functions, import the module:

import scipy.interpolate

Or, simply import the function we need for our one-dimensional problem:

from scipy.interpolate import interp1d 

Suppose that each column of an array expData contains a (wavelength, absorption) pair. We can use interp1d to create a function-like object for interpolation from this data set:

F = interp1d(expData[:,0], expData[:,1], kind='cubic')

Now, F(w) will return an interpolated estimate of the absorption at wavelength w every time it is called.[1] We can use the function to get an estimate for each wavelength in an array as well. Define the points at which you want the estimated values:

desiredRange = np.arange(430, 700, 100)

Then, evaluate the function at the desired points:

interpolatedData = F(desiredRange)

This is the basic procedure for interpolation. Now let’s take a look inside our black box!

More on interp1d(x,y)

interp1d requires two arguments — the x and y values that will be used for interpolation. In this example, we have provided an optional argument kind that specifies the type of interpolation procedure. Here, kind='cubic' instructs Python to use a third-order polynomial to interpolate between data points.

The other options are

  • linear: interpolate along a straight line between neighboring data points
  • nearest: project to the nearest data point
  • zero: project to the preceding data point
  • slinear: use a linear spline
  • quadratic: use a quadratic spline
  • cubic: use a cubic spline

The default of interp1d is a linear interpolation. You can also provide an integer number, in which case the function will use a polynomial of that order to interpolate between points. For example,

f = interp1d(x, y, kind=10)

will use a 10th order polynomial to interpolate between points.

The following script will demonstrate the available options and their output for a sample data set:

import numpy as np
import matplotlib.pyplot as plt
from scipy.interpolate import interp1d

# Original "data set" --- 21 random numbers between 0 and 1.
x0 = np.linspace(-1,1,21)
y0 = np.random.random(21)

plt.plot(x0, y0, 'o', label='Data')

# Array with points in between those of the data set for interpolation.
x = np.linspace(-1,1,101)

# Available options for interp1d
options = ('linear', 'nearest', 'zero', 'slinear', 'quadratic', 'cubic', 10)

for o in options:
    f = interp1d(x0, y0, kind=o)    # interpolation function
    plt.plot(x, f(x), label=o)      # plot of interpolated data

plt.legend()
plt.show()
Interpolated data from interp1d.
Interpolated data from interp1d.

More on Interpolation

You can see a few general principles of interpolation from the figure:

  • Interpolating functions are continuous.
  • Interpolating functions always pass through the data points.
  • A quadratic function can give a much worse fit than linear interpolation.
  • Increasing the order of the polynomial does not always lead to a better fit.
  • Interpolating functions can oscillate wildly between data points.
  • The fit gets worse toward the edges of the data set.

So what should you do when interpolating your own data?

The cubic spline is the workhorse of the industry. As you can see from the figure, it provides a smooth curve that appears to fit the data well. Smoothness extends beyond what you see in the figure: a cubic spline has continuous first and second derivatives. This is a useful property in physics, where first and second derivatives are quite common in theoretical analyses (Newton’s laws, Maxwell’s equations, the Schroedinger equation, etc.).

A cubic spline interpolation is a good choice in most cases.

A final word of caution: Interpolation and extrapolation are not the same. A good interpolating function can be a terrible approximation outside the set of data points used to create it. For this reason, the functions generated by interp1d(x,y) will not even return a number when you provide a value of the independent variable outside the range of the data set. They will raise a ValueError instead.


  1. F(w) will return an array, even if it is called with a single value. If you want a number instead of a 0-dimensional array that contains a single number, just call float(F(w)).  ↩

Comments

Unknown said…
Sir,Is there any way to extracts the data points through which the curve passes? i mean interpolated points
Jesse said…
We mention a few software packages for obtaining data from plots in Section 3.3.1 of our book ("Obtaining Data").

These particular points were produced by NumPy's random number generator. You should be able to produce a similar figure by running the script in this post. In that case, you can get the value of the data points from the arrays x0 and y0.
Unknown said…
how can we interpolate NAN values ?
Jesse said…
Good question! You should clean up your data set first. You do not want polynomials passing through infinity (inf) or “not a number” (nan) entries. You can use the logical test np.isfinite to exclude both of these.

If your original data are in two arrays x and y, then you can remove the NaN and inf entries using the following:

x_new = x[np.isfinite(y)]
y_new = y[np.isfinite(y)]

This will remove all the points for which y has entries nan and inf from both arrays. Now you are ready to interpolate!
Unknown said…
I wanted to have prediction function that can predict chances of failure based on some previous cases of failures(of a system). Can cubic splines serve my purpose. If so how do we do with python?.

Thanks in advance

Popular posts from this blog

Paths in Python

How do you get your Python interpreter to find modules that are not located in your current working directory? The answer is … you tell it where to look. When you type something like from my_module import my_function Python searches a collection of directories (i.e., folders) on your computer. If the directory containing <my_module.py> is not in this collection, you will receive an ImportError . This can be frustrating if you know the file exists, and even more so if you know where it exists. In this post, we will take a brief look at how to add paths to the collection of directories searched by Python. Paths A path is essentially a set of directions to a file: /Users/username/modules/my_module.py [Mac OS X, Linux] C:\modules\my_module.py [Windows] It tells your operating system how to navigate from a fixed starting point — the “root directory” / in Unix-based systems, or C:\ in Windows — through a collection of folders to the de

Raising a Figure Window to the Foreground

This post describes a utility function that will raise a plot window to the foreground of your screen. The function will only work with the Qt graphics backend, so I will start with a brief overview of graphics backends. If you just want to use the function as a black box, you can do the following: Set the Graphics Backend to “Qt” in the Spyder preferences menu. Copy this function into your working directory. Graphics Backends You may have have written a script to produce precisely the data you need, but a lot of processing is required to transform these numbers into a figure. You need to create a plot window, draw a figure inside of it, and manage all of the attributes that control a figure’s appearance: title, axis labels, line widths, colors, tick marks, etc. All of this happens in the background when you type a command like plt.plot(x,y) . A graphics backend is the software that Python uses to physically draw a figure on your computer screen. If you have already import

Illuminating Surface Plots

Matplotlib provides functions for visualizing three-dimensional data sets. One useful tool is a surface plot. A surface plot is a two-dimensional projection of a three-dimensional object. Much like a sketch artist, Python uses techniques like perspective and shading to give the illusion of a three-dimensional object in space. In this post, I describe how you can control the lighting of a surface plot. Surface Plots First, let’s look at some of the options available with the default three-dimensional plotting tools. This script will create a surface plot of a Bessel function. Its ripples will emphasize the effects of lighting later. import numpy as np import matplotlib.pyplot as plt from mpl_toolkits.mplot3d import Axes3D # Import 3D plotting tools. from scipy.special import jn # Import Bessel function. # Define grid of points. points = np.linspace(-10, 10, 51) X, Y = np.meshgrid(points, points) R = np.sqrt(X**2 + Y**2) Z = jn(0,R) # Create 3D surface plo