## Sunday, June 21, 2015

### Interpolation

“I have experimental data for absorption (dependent variable) measured at various wavelengths (independent variable). Now I want estimates of the absorption at other values in the same range.”

This is an interpolation problem. We wish to obtain the value of a dependent value at point for which we did not sample the independent value. Interpolation is a mathematical procedure for filling in the gaps between available values. SciPy provides a module for interpolation based on the FITPACK library of FORTRAN functions. (Thus, it is fast and reliable.)

## A Simple Example

import scipy.interpolate


Or, simply import the function we need for our one-dimensional problem:

from scipy.interpolate import interp1d


Suppose that each column of an array expData contains a (wavelength, absorption) pair. We can use interp1d to create a function-like object for interpolation from this data set:

F = interp1d(expData[:,0], expData[:,1], kind='cubic')


Now, F(w) will return an interpolated estimate of the absorption at wavelength w every time it is called.[1] We can use the function to get an estimate for each wavelength in an array as well. Define the points at which you want the estimated values:

desiredRange = np.arange(430, 700, 100)


Then, evaluate the function at the desired points:

interpolatedData = F(desiredRange)


This is the basic procedure for interpolation. Now let’s take a look inside our black box!

## More on interp1d(x,y)

interp1d requires two arguments — the x and y values that will be used for interpolation. In this example, we have provided an optional argument kind that specifies the type of interpolation procedure. Here, kind='cubic' instructs Python to use a third-order polynomial to interpolate between data points.

The other options are

• linear: interpolate along a straight line between neighboring data points
• nearest: project to the nearest data point
• zero: project to the preceding data point
• slinear: use a linear spline
• quadratic: use a quadratic spline
• cubic: use a cubic spline

The default of interp1d is a linear interpolation. You can also provide an integer number, in which case the function will use a polynomial of that order to interpolate between points. For example,

f = interp1d(x, y, kind=10)


will use a 10th order polynomial to interpolate between points.

The following script will demonstrate the available options and their output for a sample data set:

import numpy as np
import matplotlib.pyplot as plt
from scipy.interpolate import interp1d

# Original "data set" --- 21 random numbers between 0 and 1.
x0 = np.linspace(-1,1,21)
y0 = np.random.random(21)

plt.plot(x0, y0, 'o', label='Data')

# Array with points in between those of the data set for interpolation.
x = np.linspace(-1,1,101)

# Available options for interp1d
options = ('linear', 'nearest', 'zero', 'slinear', 'quadratic', 'cubic', 10)

for o in options:
f = interp1d(x0, y0, kind=o)    # interpolation function
plt.plot(x, f(x), label=o)      # plot of interpolated data

plt.legend()
plt.show()


## More on Interpolation

You can see a few general principles of interpolation from the figure:

• Interpolating functions are continuous.
• Interpolating functions always pass through the data points.
• A quadratic function can give a much worse fit than linear interpolation.
• Increasing the order of the polynomial does not always lead to a better fit.
• Interpolating functions can oscillate wildly between data points.
• The fit gets worse toward the edges of the data set.

So what should you do when interpolating your own data?

The cubic spline is the workhorse of the industry. As you can see from the figure, it provides a smooth curve that appears to fit the data well. Smoothness extends beyond what you see in the figure: a cubic spline has continuous first and second derivatives. This is a useful property in physics, where first and second derivatives are quite common in theoretical analyses (Newton’s laws, Maxwell’s equations, the Schroedinger equation, etc.).

A cubic spline interpolation is a good choice in most cases.

A final word of caution: Interpolation and extrapolation are not the same. A good interpolating function can be a terrible approximation outside the set of data points used to create it. For this reason, the functions generated by interp1d(x,y) will not even return a number when you provide a value of the independent variable outside the range of the data set. They will raise a ValueError instead.

1. F(w) will return an array, even if it is called with a single value. If you want a number instead of a 0-dimensional array that contains a single number, just call float(F(w)).  ↩

1. Sir,Is there any way to extracts the data points through which the curve passes? i mean interpolated points

2. We mention a few software packages for obtaining data from plots in Section 3.3.1 of our book ("Obtaining Data").

These particular points were produced by NumPy's random number generator. You should be able to produce a similar figure by running the script in this post. In that case, you can get the value of the data points from the arrays x0 and y0.

3. how can we interpolate NAN values ?

1. Good question! You should clean up your data set first. You do not want polynomials passing through infinity (inf) or “not a number” (nan) entries. You can use the logical test np.isfinite to exclude both of these.

If your original data are in two arrays x and y, then you can remove the NaN and inf entries using the following:

x_new = x[np.isfinite(y)]
y_new = y[np.isfinite(y)]

This will remove all the points for which y has entries nan and inf from both arrays. Now you are ready to interpolate!

4. I wanted to have prediction function that can predict chances of failure based on some previous cases of failures(of a system). Can cubic splines serve my purpose. If so how do we do with python?.