“I have experimental data for absorption (dependent variable) measured at various wavelengths (independent variable). Now I want estimates of the absorption at other values in the same range.”
This is an interpolation problem. We wish to obtain the value of a dependent value at point for which we did not sample the independent value. Interpolation is a mathematical procedure for filling in the gaps between available values. SciPy provides a module for interpolation based on the FITPACK library of FORTRAN functions. (Thus, it is fast and reliable.)
A Simple Example
To gain access to the interpolation functions, import the module:
import scipy.interpolate
Or, simply import the function we need for our one-dimensional problem:
from scipy.interpolate import interp1d
Suppose that each column of an array expData
contains a (wavelength,
absorption) pair. We can use interp1d
to create a function-like object for
interpolation from this data set:
F = interp1d(expData[:,0], expData[:,1], kind='cubic')
Now, F(w)
will return an interpolated estimate of the absorption at wavelength
w
every time it is called.[1] We can use the function to get an
estimate for each wavelength in an array as well. Define the points at which
you want the estimated values:
desiredRange = np.arange(430, 700, 100)
Then, evaluate the function at the desired points:
interpolatedData = F(desiredRange)
This is the basic procedure for interpolation. Now let’s take a look inside our black box!
More on interp1d(x,y)
interp1d
requires two arguments — the x and y values that will be used for
interpolation. In this example, we have provided an optional argument kind
that specifies the type of interpolation procedure. Here, kind='cubic'
instructs Python to use a third-order polynomial to interpolate between data
points.
The other options are
linear
: interpolate along a straight line between neighboring data pointsnearest
: project to the nearest data pointzero
: project to the preceding data pointslinear
: use a linear splinequadratic
: use a quadratic splinecubic
: use a cubic spline
The default of interp1d
is a linear interpolation. You can also provide an
integer number, in which case the function will use a polynomial of that order
to interpolate between points. For example,
f = interp1d(x, y, kind=10)
will use a 10th order polynomial to interpolate between points.
The following script will demonstrate the available options and their output for a sample data set:
import numpy as np
import matplotlib.pyplot as plt
from scipy.interpolate import interp1d
# Original "data set" --- 21 random numbers between 0 and 1.
x0 = np.linspace(-1,1,21)
y0 = np.random.random(21)
plt.plot(x0, y0, 'o', label='Data')
# Array with points in between those of the data set for interpolation.
x = np.linspace(-1,1,101)
# Available options for interp1d
options = ('linear', 'nearest', 'zero', 'slinear', 'quadratic', 'cubic', 10)
for o in options:
f = interp1d(x0, y0, kind=o) # interpolation function
plt.plot(x, f(x), label=o) # plot of interpolated data
plt.legend()
plt.show()
More on Interpolation
You can see a few general principles of interpolation from the figure:
- Interpolating functions are continuous.
- Interpolating functions always pass through the data points.
- A quadratic function can give a much worse fit than linear interpolation.
- Increasing the order of the polynomial does not always lead to a better fit.
- Interpolating functions can oscillate wildly between data points.
- The fit gets worse toward the edges of the data set.
So what should you do when interpolating your own data?
The cubic spline is the workhorse of the industry. As you can see from the figure, it provides a smooth curve that appears to fit the data well. Smoothness extends beyond what you see in the figure: a cubic spline has continuous first and second derivatives. This is a useful property in physics, where first and second derivatives are quite common in theoretical analyses (Newton’s laws, Maxwell’s equations, the Schroedinger equation, etc.).
A cubic spline interpolation is a good choice in most cases.
A final word of caution: Interpolation and extrapolation are not the same. A
good interpolating function can be a terrible approximation outside the set of
data points used to create it. For this reason, the functions generated by
interp1d(x,y)
will not even return a number when you provide a value of the
independent variable outside the range of the data set. They will raise a
ValueError
instead.
-
F(w)
will return an array, even if it is called with a single value. If you want a number instead of a 0-dimensional array that contains a single number, just callfloat(F(w))
. ↩
Comments
These particular points were produced by NumPy's random number generator. You should be able to produce a similar figure by running the script in this post. In that case, you can get the value of the data points from the arrays x0 and y0.
If your original data are in two arrays x and y, then you can remove the NaN and inf entries using the following:
x_new = x[np.isfinite(y)]
y_new = y[np.isfinite(y)]
This will remove all the points for which y has entries nan and inf from both arrays. Now you are ready to interpolate!
Thanks in advance