Sunday, June 21, 2015

Interpolation

“I have experimental data for absorption (dependent variable) measured at various wavelengths (independent variable). Now I want estimates of the absorption at other values in the same range.”

This is an interpolation problem. We wish to obtain the value of a dependent value at point for which we did not sample the independent value. Interpolation is a mathematical procedure for filling in the gaps between available values. SciPy provides a module for interpolation based on the FITPACK library of FORTRAN functions. (Thus, it is fast and reliable.)

A Simple Example

To gain access to the interpolation functions, import the module:

import scipy.interpolate

Or, simply import the function we need for our one-dimensional problem:

from scipy.interpolate import interp1d 

Suppose that each column of an array expData contains a (wavelength, absorption) pair. We can use interp1d to create a function-like object for interpolation from this data set:

F = interp1d(expData[:,0], expData[:,1], kind='cubic')

Now, F(w) will return an interpolated estimate of the absorption at wavelength w every time it is called.[1] We can use the function to get an estimate for each wavelength in an array as well. Define the points at which you want the estimated values:

desiredRange = np.arange(430, 700, 100)

Then, evaluate the function at the desired points:

interpolatedData = F(desiredRange)

This is the basic procedure for interpolation. Now let’s take a look inside our black box!

More on interp1d(x,y)

interp1d requires two arguments — the x and y values that will be used for interpolation. In this example, we have provided an optional argument kind that specifies the type of interpolation procedure. Here, kind='cubic' instructs Python to use a third-order polynomial to interpolate between data points.

The other options are

  • linear: interpolate along a straight line between neighboring data points
  • nearest: project to the nearest data point
  • zero: project to the preceding data point
  • slinear: use a linear spline
  • quadratic: use a quadratic spline
  • cubic: use a cubic spline

The default of interp1d is a linear interpolation. You can also provide an integer number, in which case the function will use a polynomial of that order to interpolate between points. For example,

f = interp1d(x, y, kind=10)

will use a 10th order polynomial to interpolate between points.

The following script will demonstrate the available options and their output for a sample data set:

import numpy as np
import matplotlib.pyplot as plt
from scipy.interpolate import interp1d

# Original "data set" --- 21 random numbers between 0 and 1.
x0 = np.linspace(-1,1,21)
y0 = np.random.random(21)

plt.plot(x0, y0, 'o', label='Data')

# Array with points in between those of the data set for interpolation.
x = np.linspace(-1,1,101)

# Available options for interp1d
options = ('linear', 'nearest', 'zero', 'slinear', 'quadratic', 'cubic', 10)

for o in options:
    f = interp1d(x0, y0, kind=o)    # interpolation function
    plt.plot(x, f(x), label=o)      # plot of interpolated data

plt.legend()
plt.show()
Interpolated data from interp1d.
Interpolated data from interp1d.

More on Interpolation

You can see a few general principles of interpolation from the figure:

  • Interpolating functions are continuous.
  • Interpolating functions always pass through the data points.
  • A quadratic function can give a much worse fit than linear interpolation.
  • Increasing the order of the polynomial does not always lead to a better fit.
  • Interpolating functions can oscillate wildly between data points.
  • The fit gets worse toward the edges of the data set.

So what should you do when interpolating your own data?

The cubic spline is the workhorse of the industry. As you can see from the figure, it provides a smooth curve that appears to fit the data well. Smoothness extends beyond what you see in the figure: a cubic spline has continuous first and second derivatives. This is a useful property in physics, where first and second derivatives are quite common in theoretical analyses (Newton’s laws, Maxwell’s equations, the Schroedinger equation, etc.).

A cubic spline interpolation is a good choice in most cases.

A final word of caution: Interpolation and extrapolation are not the same. A good interpolating function can be a terrible approximation outside the set of data points used to create it. For this reason, the functions generated by interp1d(x,y) will not even return a number when you provide a value of the independent variable outside the range of the data set. They will raise a ValueError instead.


  1. F(w) will return an array, even if it is called with a single value. If you want a number instead of a 0-dimensional array that contains a single number, just call float(F(w)).  ↩

Monday, June 8, 2015

Python Code in LaTeX

In A Student’s Guide to Python for Physical Modeling, we used the Listings package for LaTeX to format our code samples. In this post, we describe how we used the package and provide our style files for others to use.

Typesetting Computer Code

The listings package makes it easy to format code for typesetting in LaTeX. It allows the author to adjust virtually every aspect of how code is displayed. We used the package to develop our own syntax highlighting scheme.
  • All code is displayed in a monospace (typewriter) font.
  • Built-in functions and reserved words are bold and green.
  • Python exceptions are bold and red.
  • Modules and functions from NumPy and PyPlot are bold and black.
  • Modules and functions from other modules are bold and blue.
  • Strings are red.
  • Keyword arguments to functions are italic.
  • Comments are blue italic.
This scheme is similar to the one used in Spyder. It allows a reader to distinguish built-in functions and modules from variable names and user-defined functions at a glance.

How to Use the Style Files

The bulk of the work lies in setting up an environment to display code the way you want. We’ve already done that for you!
To start producing formatted code, just download one of the LaTeX files below. Put it somewhere LaTeX can find it, then add this line to the preamble of your document:
\include{listings_py34}
Somewhere after \begin{document} and before your first code sample, type
\pysetup
This will set all the style parameters and create the list of keywords to be displayed in different fonts and colors. Now you are ready to start adding code to your document.

Examples

The listings package provides three ways to add code to your document. In some ways, these environments are similar to LaTeX’s native verbatim environment: Every character you type will be printed, including spaces, tabs, and line breaks.

Inline Code

Very short fragments can be displayed inline. To do this, use the \lstinline command.
To plot the data, type \lstinline!plt.plot(x,y)! at the command line.
You do not have to use exclamation points to enclose the code fragment. You can use almost any set of special symbols that do not appear in the code fragment itself.
\lstinline!print(x)!,  \lstinline&print(x)&, and \lstinline$print(x)$ all
produce the same output.

Code Environment

For longer code samples, you can place the code inside of an lstlisting environment.
\begin{lstlisting}
def hello():
    print("Hello, world!")
    return 0
\end{lstlisting}
This will typeset a function definition.

Include Source Code

You can also directly import entire programs using the \lstinputlisting{} command. The program text will be formatted according to the style specifications and displayed exactly as typed.
\lstinputlisting{code/hello.py}
This will display the program hello.py, located in the code/ directory, in the typeset document.

Swapping Keywords Between Lists

Sometimes a word plays multiple roles in Python. For instance, dtype is an attribute of a NumPy array. It is also a keyword argument to the function `np.array’. To properly display the same word in different contexts using the listings package, it must be transferred between lists of keywords. We have provided a simple macro to do this.
\swapKeyword{keyword}{old_list}{new_list}
will delete keyword from old_list (if it is a member) and add it to new_list. For the dtype example, you might type something like
\swapKeyword{dtype}{3}{5}
To create an floating point array, type \lstinline!x = np.array([1,2,3], dtype='float')!.
\swapKeyword{dtype}{5}{3}
We can check the data type of the array: \lstinline!x.dtype!.
dtype is transferred from keyword list 3 (NumPy and PyPlot methods) to keyword list 5 (function keyword arguments) for the first inline code sample, then back its original list for the second. Thus, it appears in italics in the first instance and in bold in the second.

Download the Files

We have prepared two versions of the header file — one for Python 2.7 and one for Python 3.4. These differ only in a few reserved words and keywords that are unique to one version of Python or the other. For example, xrange and raw_input are built-in functions in Python 2.7, but they are not present in Python 3.4. Click on the appropriate link to download the version of the file you want. sample.tex is a short LaTeX document that illustrates the use of the listings package, and sample.pdf is the final output.
listings_py27.tex
listings_py34.tex
sample.tex
sample.pdf

More Information

The listings package maintained by Jobst Hoffman. You can download the User’s Guide for the listings package here.
We have only described a small fraction of what is possible with this package, but we hope that it is a useful fraction for those who wish to write about Python!

Sunday, June 7, 2015

Making Plots for Publication

Overview

This document extends the discussion of plots and graphs given in A Student’s Guide to Python for Physical Modeling. Here we give some recipes that work on our installation. If they don’t work on yours, please comment on that, especially if you find a workaround.

This post describes the following:

  • Setting figure size
  • Importing the SGPniceGraphics module
  • Functions and parameters defined in the module
  • Fixing crowded graphs
  • Saving graphs for modification in other art software
  • Typesetting labels with LaTeX
  • Other rc settings
  • CMYK color
  • Exporting raster images

Setting Figure Size

You can create a figure window of a specific size before plotting with the command

myfigsize = (6, 3)
plt.figure(frameon=False, figsize=myfigsize)

The first keyword argument suppresses the colored panel in the back of the figure, so you won’t need to delete it by hand in some drawing software. The second keyword argument specifies the desired width and height of the figure, in inches (!!). Give it a name (myfigsize) so that you can set it once and then use it throughout your code (“Define once, and reuse often.”)

Importing the SGPniceGraphics Module

The SGPniceGraphics module contains a few functions which are described below. Their purpose is to make plots look nicer. You can download the module here.

To use the module, you must first put it in a place where Python can find it. The easiest way is just to put it in your Global Working Directory, along with the script you are working on. Almost as easily, however, you can put it in some central location and ask Python to add that folder to your Path. From the python menu bar in Spyder, select PYTHONPATH manager. Click the “Add path” button and fill in the full path to the central location. (Mac users can obtain the path name as follows: Navigate to the desired folder; select it in Finder; use “Copy”; launch the Terminal; use “Paste.” The path name will appear at the command line; now copy again and paste it into the PYTHONPATH manager.)

You can now issue the command

from SGPniceGraphics import *

in your Python scripts.

Functions and Parameters Defined in the Module

Pfixplot

By default, Matplotlib’s graphs sit on top of a background panel, which is at best clutter if you are embedding a figure in some other document. By default, Matplotlib’s graphs are also surrounded by a box consisting of four “spines” and their tick marks. You may not want that style.

After you create a figure, use my_axis=plt.gca() to access its axis object. Then Pfixplot(my_axis) will make the plot background transparent. It also removes the top and right axes (“spines”).

Pfixlegend

If you create a legend, for example by

my_legend = my_axis.legend(mylabels)

then by default, the legend will also sit on top of a background panel (i.e., an opaque background), which may obscure something behind it and in any case is clutter.

After you create the legend, you can use Pfixlegend(my_legend) to make the legend background transparent.

Pcircsize

Matplotlib’s default choice for circle markers is bigger than that for asterisk markers, and in my opinion too big. When you create a graph plt.plot, using the marker option ‘o’, you can add the keyword argument ,markersize=Pcircsize to make the circles smaller.

Psavefig

The command

Psavefig('myfilename.pdf')

saves the current figure to a disk file. It just issues plt.savefig with some useful options. After saving, you can edit this vector-graphics file in your favorite art software, such as Inkscape (free) or Adobe Illustrator (not free). You may instead prefer to save in the .svg format, depending on what art software you use.

Fixing Crowded Graphs

If you try to make a graph that’s small enough for a journal article, particularly if it has multiple panels, Matplotlib may crowd it too much. For example, labels may collide with other things. One approach is to resize the graph by hand with the mouse before saving it. But maybe you already specified the size you want, and don’t want some other size! Before you start tweaking everything by hand, first try issuing the command

plt.tight_layout()

Often this persuades Matplotlib to do what you want.

Saving Graphs for Modification in Other Art Software

Graphs are “line art.” That is, they consist of geometrical objects like lines, circles, and text. For such graphics, you’ll get the best results if you save them in one of the “vector graphics” formats, because then they can be reproduced at any size without loss of quality. In contrast, a photograph is “bitmap art” — an array of individual pixels — and must be saved in one of the “raster image” formats. Converting from the pixellated image stored in such an image to the pixels actually used in the printer or screen, at the specified magnification, requires interpolation and can result in loss of quality. Line art can be converted to and saved in a raster format, but it’s a one-way process.

File formats like .tif and .jpg are strictly raster. File formats like .eps, .svg, and .pdf can accommodate either vector or raster objects, or both in the same file. If you save a graph from Matplotlib in .eps, .svg, or .pdf format, its vector character will be preserved and you can then edit the file in a vector-art program like Inkscape or Adobe Illustrator. That’s what we recommend.

Even if you save your graph as a .pdf or .svg file, however, text labels (title, axis labels, tick labels, etc.) will probably be rendered as “outlines.” These are genuine vector art, and will remain crisp upon reduction or enlargement. However, they are not “type,” that is, you can’t easily edit them in another software, change font, and so on. If you want that functionality, you’ll need to override a default parameter setting in Matplotlib. One way to do this, for .pdf files, is to include the line

matplotlib.rcParams['pdf.fonttype'] = 42

in your “rc” file (see below). For .svg files, the corresponding line is

matplotlib.rcParams['svg.fonttype'] = 'none'

Even with these lines, your art software may complain that it isn’t able to access (and hence to edit) some of the fonts used in the graphics file. The following step may be needed:

Duplicate all the folders in

~/anaconda/lib/python3.4/site-packages/matplotlib/mpl-data/fonts

and put the copies in a system font folder. In MacOSX, it’s

~/Library/Fonts

(Troubleshooting tips for Mac OS X can be found here.) In Windows, it’s

c:/windows/fonts 

For our installation, the folders in question were called afm/, pdfcorefonts/, and ttf/.

Finally, you can find other advice on fonts here and here. To implement their suggestions, use the following commands:

matplotlib.rcParams['ps.useafm'] = True
matplotlib.rcParams['pdf.use14corefonts'] = True
matplotlib.rcParams['text.usetex'] = True

These commands force the use of Type 1 fonts in saved figures.

Typesetting Labels with LaTeX

You can ask Matplotlib to render all your axis labels etc. using the LaTeX typesetting system. If you know that system, this option can be especially nice for graphics that you intend to embed in a bigger LaTeX document. To choose this option, use these two lines prior to creating any graph:

plt.rc('text', usetex=True)
plt.rc('font', family='serif')

Note that these lines may make your code run slowly. You may wish to leave them commented out until your graph nears its final form.

Even with the above lines, you may find that some labels come out in non-LaTeX fonts. You can try setting them in LaTeX math mode, like this:

ax.set_xlabel(r'$\mathrm{wavelength,\ \ [}\mathsf{nm}\mathrm{]}$')
ax.set_ylabel(r'$\mathrm{reflectance}$')

Notice the “raw string” syntax

r'stuff'

Placing the letter ‘r’ in front of a string means that we don’t need to escape the backslashes, which in this context will be sent to LaTeX for processing.

Other matplotlibrc Settings

A vast number of style settings have defaults that can be changed. For details, visit http://matplotlib.org/users/customizing.html.

Briefly, alternatives to the default settings are maintained in an initialization file called matplotlibrc. You can download a copy of this file here.

Alternatively, proceed as follows. At the IPython prompt, type

import matplotlib as mpl
mpl.get_configdir()

Python will give you a path to a folder containing your initialization file. If that folder’s name, or any component of its path, begins with a period, it may be “hidden” (invisible to your computer’s regular file finder). However, you can copy the path from the IPython console, open a terminal window, and change directory to it, for example

$ cd <<paste path to your folder here>>
$ cp matplotlibrc ~/
$ cp matplotlibrc backup_matplotlibrc

The first line above changes directory (cd) to the hidden folder. The second line duplicates (copies, “cp”) the initialization file and puts the new copy in your home directory (abbreviated “~/”), where you can see it. The third line makes a backup, just in case. (The dollar signs are command prompts; you don’t type them.)

Whichever of these methods you use to obtain matplotlibrc, it’s a plain-text file that you can examine in your text editor. It is well documented, but most of it is commented out. If you see a default setting that you’d like to change, you have two options:

  1. You can uncomment lines, change values to what you’d like, save the file, and copy it back to where Python expects to find it:

    $ cp ~/matplotlibrc <>

    You may want to use this option if you have a big project and want to enforce uniform graph style throughout it. Restart IPython after this step.

  2. You can instead leave the matplotlibrc file alone and issue override commands within your individual Python scripts. For example,

    import matplotlib as mpl
    mpl.rcParams['lines.linewidth'] = 2
    

Some other useful overrides were mentioned in “Typesetting labels with LaTeX” and “Saving graphs for modification in other art software” above. In addition, for publication you probably will want to override the default font size for labels, and so on.

CMYK Color

There are various ways to specify color. Some publishers demand that submitted color graphics be supplied in the “CMYK” color space, because it corresponds (a bit) more closely to printer inks than the “RGB” color space (which corresponds a bit more closely to the signals sent to a monitor). Matplotlib does not seem to support CMYK color directly. You must postprocess your graphics with some art program, for example Adobe Illustrator for vector graphics like those discussed in this post. Other postprocessing recipes can be found via a web search.

Exporting Raster Images

You may wish to visualize a matrix of numbers as a grayscale image. Chapter 7 of the book recommends using the function plt.imshow for this purpose. You may then wish to draw additional things on your image, for example by using plt.plot, and then save it to a pdf file with plt.savefig. A problem arises, however, if (as often happens with CCD camera data) your pixel grid is very coarse. The figure will look fine in the Python figure window, and it may even look fine when you save it to a file and open it in certain applications, such as Adobe Illustrator. Nevertheless, when embedded into a document, for example using LaTeX, it may look terrible: A very obvious new grid, finer than your pixels but much coarser than the standard 300 pixels per inch, has been imposed on the image.

Here is one method to address this problem. In Illustrator, create a new layer behind the main one. Delete the pixel part of your image from the main layer and paste it (<shift-cmd-V>) to the new layer. Lock the main layer. Now select the image and choose Edit > Rasterize. Choose an appropriate resolution (at least 300 ppi). Then save the image.