Washington University in St. Louis

The Bayesian Data-Analysis Software Package

The programs that run the various Bayesian analysis, the server software, were developed at Washington University by Dr. G. Larry Bretthorst and the Java language client interface was developed by Dr. Karen Marutyan. The combination of the server and client software is called the "Bayesian Data-Analysis Toolbox" software. However, this name is slightly misleading because this software can analyze data from many different sources, not just NMR data. Additionally, unlike the previous interface to this software, this new interface does not require the user to have access to any specialized NMR software, i.e., this interface is completely independent of Varian's VnmrJ, although the interface can load and process data from a Varian spectrometer.

The software contains a series of programs which we call packages and these packages implement various calculations using Bayesian probability theory. Most of these calculations are implemented using Markov chain Monte Carlo. All of the programs except Bayes Analyze, are capable of fully using multiple CPUs if you have them.

The various packages implemented by this software are describe here in the order they occur on the package menu in the interface. The hyperlinks contained in the following descriptions will download the the Chapter from the user manual being discussed. Here is the current list of packages:

  1. The Exponential package estimates the decay rate constants and amplitudes of signals known to be decaying exponentially. It does this when the number of the exponentials is known or unknown. In both cases the input to this package can come from ASCII files, from a peak pick or from Bayes Analyze files. In all cases one or more input data sets can be processed and the package looks for exponentially decaying signals that are common to the multiple data sets, but allowing each exponential to have differing initial conditions in each data set.

  2. The Inversion Recovery package is a special type of exponential analysis that is very common in NMR. In this problem the NMR signal starts at a negative value and decays to a positive value. The inversion recover model differs from an exponential plus a constant model only in that the model is typically formulated so that the two amplitudes represent the initial, time equal to zero, and equilibrium amplitude; thus the amplitudes are linear combinations of the amplitudes that would be estimated by an exponential plus a constant model. As a side note, this package is really a special case of the Enter Ascii package described below. We call these special cases preloaded enter Ascii models because the interface preloades the inversion recover model from the system model and thus simplifies what the user must do to run this inversion recovery model. This package can analyze multiple data jointly to look for a common diffusion parameters.

  3. the Diffusion Tensor package analyzes NMR diffusion measurements using one, two or three diffusion tensor models with or without a constant. These tensor can use either "b" values or "g" (gradient) values for the abscissa and the "b" values can be either 3D vectors or "b" matrices. Thus this package process 18 different diffusion tensor models. Because McMC packages compute the probability for the model using thermodynamic integration, this package has the ability to do some simple model selection. As with most packages multiple ASCII data sets can be analyzed jointly to look for common diffusion tensor parameters.

  4. The Enter Ascii Model package package allows the user to define a model and then use Bayesian Probability theory to analyze that model. To create a simple model, the user must copy the example model and then write the the Fortran or C code necessary to evaluate the model. In addition, the users must create a file that describes the model parameters and the prior probabilities for those models. The user has the option to load a user generated model or a system model, a model written by us. When the package is run, the data along with this model are sent to the server. The server compiles this model on the fly and creates a dynamic load library. Because of this, to use this package, the server must have either Fortran or C installed on the it. The Enter Ascii program is then run and the model dynamically loaded and used to analyze the data. As with most packages that require ASCII data, this package can analyze multiple data jointly to look for common parameters. Finally, the models used in Enter Ascii package are the same models used to analyze images. So one can use the Enter Ascii package to analyze a few pixels from an image and then proceed to analyze an entire image, see Analyze Image Pixels for more on this.

  5. The Enter Ascii Model Selection package utilizes the models generated for Enter Ascii to do model selection. After setting up a number of rival models using Enter Ascii, one can then proceed to this package. Here one can load up to 10 different models and then use this package to compute the posterior probability for the models. Because this is a new package, the manual pages are not yet available for this package.

  6. The Test Ascii Model model package supports the Ascii Model packages, by giving you a facility for testing models to enshure they are doing their calculations correctly. This package allows you to load a model and then it will throughly test the model by evaluating the model 10,000 times using parameter sampled from the priors. In the process of evaluating the model, the package will catch any arithmetic errors that occurr and it will show you where the invalid arithmetic occurred. The outputs form the model include a peak posterior probability estimate of the model and plots of the model signal as a function of the parameter samples.

  7. The Magnetization Transfer (two sites) package solves the Block-McConnell equations to obtain the exchange rate constants for two site magnetization exchange. Input to this package is usually the peak amplitudes or intensities from two inversion recovery time coarses where the exchanging peaks in are selectively inverted.

  8. The Magnetization Transfer Kinetics package is a magnetization transfer package that solves the Block-McConnell equations at multiple temperatures and concentrations to derive the entropy and enthalpies of the the exchange process. Input to this package is the same as for the two site magnetization transfer package with multiple temperature and concentration measurements.

  9. The Big Magnetization Transfer package solves the magnetization transfer problem when one of the sites can be considered infinite compared to the other.

  10. The Bayes Analyze package is a time domain frequency estimation package that is fully capable of determining the number of resonances in an FID and estimating the resonance parameters. This package can analyze single FIDS, or it can run multiple FIDs and look for frequencies common to these FIDs. Input to this package can come from different sources and appropriate data conversions are carried out when the data are loaded.

  11. The Big Peak/Little Peak package analyzes time domain FID data in which there is a single big peak that may be many orders of magnitude larger in intensity (the big peak) than the metabolic peaks (the little peaks) of interest. The Big Peak/Little Peak package solves this problem by treating the big peak as a nuisance and then uses Bayesian probability theory to account for the big peak while simultaneously estimating the frequencies, decay rate constants and amplitudes of the resonances of interest.

  12. The Find Resonances package analyzes NMR FID data looking for resonances. The program is a model selection program that is attempting to determine the number of resonances in the data and estimate the parameters associated with those resonances. This package uses Markov chain Monte Carlo simulations to determine the posterior probability for the number of resonances in the data. This package essentially solves the same problem as the Bayes Analyze package described below. However, because it uses McMC the calculations are much slower than those in Bayes Analyze, but they are much more through; often having much better resolution than Bayes Analyze. Because this is a new package, the manual pages are not yet available.

  13. The Metabolite package analyzes data from a given NMR sample, for example a C13 FID of Glutamate. The intensity of the Glutamate resonances are related to each other through a metabolic model. This model can be very simple or very complex and with help from us they can be user defined. The metabolic model relates the intensity of the resonances in the model to a series of metabolic parameters, typically fractional rates that relates how much of a compound went through a certain chemical reaction. The resonances in a metabolic models are described in a metabolite file and the metabolic model itself is encoded in a FORTRAN or C routine. The metabolic package reads the resonance and the metabolic models and then uses Bayesian probability theory to estimate the metabolic parameters as well as the parameters associated with the resonances, i.e., the frequencies and decay rate constants.

  14. The Behrens-Fisher package solves the classical medical testing problem: given two experiments that consist of repeated measurements of the same quantity where in the second measurement one has change some experiential parameter determine if the experiments are the same or if they differ. For more information on this calculation see On the Difference in Means.

  15. The Errors in Variables package solves the errors in variables problem. In this problem one has a data set that has uncertainty in both the X and Y variables. These errors may be know or unknown, so this package solves four different errors in variables problems. In the name the "given" refers to the fact that the program solves this problem given the order of the polynomial to fit. The input data are described in the manual.

  16. The Polynomial Models package fits polynomials of either a given given or an unknown order to the input data. When the order is specified then a polynomial of that order is analyzed using Bayesian probability theory to determine the appropriate coefficients. When the order is specified as unknown, the Bayesian probability theory is used to compute the posterior probability for the order of the polynomials. The input data is two column ASCII and this package do not process multiple data sets.

  17. The MaxEnt Histograms, density estimation package, is a ASCII package that takes as its input a two column ASCII file. Column one is just a data point number and column two is a sample from the unknown density function. The program models the density function as a Maximum Entropy moment distribution having an unknown number of Lagrange multipliers. So the parameters are Lagrange multipliers and the unknown number of them. The program does a Markov chain Monte Carlo simulation with simulated annealing where the number of multipliers is one more parameter in the simulation. Outputs include the posterior probability for the number of multipliers, the posterior probabilities for the multipliers, scatter plots and the polynomials used in the calculations.

  18. The Binned Histogram package is a new histogramming package. In the previous release of the software, there was a MaxEnt histogramming package that infers histograms that are functionally Maximum Entropy moment distributions. As such the program is inferring the moments and the number of moments needed to represent the input samples from unknown density. This procedure works well for compact distribution, but fails badly when the distribution of samples is multimodal. In order to estimate density functions when the samples are multimodal we added a histogramming package that infers what can only be called binned histograms. These histograms can represent any distribution, they have error bars on the number of counts in the bins, and the user can indicate if the histograms are to be smoothed or not.

  19. A Kernel density estimation package has been added to the list of packages. This is a true density estimation package that attempts to estimate a density function by expanding it on a set of kernels. There are nine different kernel types and the package attempts to determine what superposition of kernels best describes the denstiy function. Because this is a Bayesian estimation of the density function, the estimated density function comes with uncertainity estimates. Note that I have not yet written the manual pages for this package.

  20. The Linear Phasing package produces linearly phased images. In spin echo MRI most images can be phased (absorption mode images) by calculating two first order phases and one zero order phase. Bayes Phase computes these phases and then applies them to the images. The resulting images are then available for further processing by the Analyze Image Pixels package. For more on this calculation see Automatic phasing of MR images. Part I: Linearly varying phase.

  21. The Nonlinear phasing package phases images that are varying in a nonlinear fashion. This package can be used to produce absorption mode images for gradient echo MR images or any other image in which the phase is varying in an unpredictable fashion. For more information on this calculation see Automatic phasing of MR images. Part II: Voxel-wise phase estimation.

  22. The Image Pixels package loads a predefined model and then uses that model to analyze images on a pixel by pixel basis. Model can be loaded from the system directory and these predefined models perform a number of common calculations in MRI such as exponential analysis with one or more exponentials with or without a constant, diffusion tensor, Additionally, the users can copy and the edit an example model to create models of his own. These models can be loaded from the users home directory and then used to analyze the image.

  23. The Image Pixels package includes an option for finding the peak of the posterior probability. When this option is selected, a different program is actually run by the package. This program is a searching algorithm that looks for the peak in the posterior probability for the parameters in the model. The Chapter on the Bayes Analyze package has an extensive discussion of Levenberg-Marquardt and Newton-Raphson. These peak parameter estimates are then used to generate maps of the various parameters appearing in the model. Because this program is a searching routine rather than an MCMC routine, it is very fast and can give you good results using any ASCII model in a small fraction of the time needed to run the Markov chain Monte Carlo simulations.

  24. The Image Pixels Model Selection package extends the concepts in Analyze Image Pixels to model selection. In this package one can load a number of different models and then use Bayesian probability theory to determine which model best accounts for the data. The models in use here are the same models mentioned in both Analyze Image Pixels and the Enter Ascii packages. However, here because the models can have different parameterizations, the output images are constructed from the derived parameters. For more on this package and how to use it see the user manual.

In addition to the discussions of the various packages, the user manual also contains discussions of the various file formats used by the Bayes Analysis software. These include a discussion of the four dimensional floating point format, sometimes called 4dfp, a discussion of the Ascii File formats used. The directory orginization, how to install the software, a description of the interface, a tutorial on Markov chain Monte Carlo, with thermodynamic integration, outlier detection, and, finally, a detailed description on how to write and build your own models.

This site is being maintained by:
Larry Bretthorst
Dept. Of Chemistry and Radiology
Washington University
St. Louis MO 63130

Phone: 314 362-9994