Seminar 4

In this seminar I motivate monte carlo simulations and give a very elementary introduction to the underlying technique, together with some basic worked examples that should enable you to get started using numerical recipes algorithms, ROOT, or clhep.

Some useful sources on random number generation.

Numerical Recipes (in C C++ FORTRAN PASCAL ..whatever). Press, Teuolsky, Vetterling

and Flannery. Chapter 7 on Random Numbers. Copy in HEP computer room.

ROOT: http://root.cern.ch particularly the class TRandom1

CLHEP: http://proj-clhep.web.cern.ch/proj-clhep/ and see example below.

F. James, Comp. Phys. Comm 60, 329-344 (1990)

Motivation

We are frequently faced with the problem that our experiments are so complicated and convoluted that it is not at all transparent theoretically what their result should be. In fact, the vast majority of experiments in particle and nuclear physics are like this. Here are a couple of examples:

EXAMPLE 1:

Two protons collide at high energy, and the experiment is repeated many times. Not surprisingly, the zoo of particles and hadron jets resulting from each interaction is different from all the others. Even if the physics at the interaction point is governed by the standard model, the complexity of the evolution to the final state is enormous. How do you get from the pretty theory of the standard model to the probability distribution for the final state particles and, even worse, the probability distribution for the quantities actually measured by the detector, like energy deposited in some calorimeter pixel of finite acceptance and energy dependent efficiency, for example? It’s not going to be easy. And it’s absolutely critical you know the answer precisely, because departures from the standard model (discovery of the Higgs, for example), will manifest themselves as deviations of the measured output data from what is predicted in this ‘theory’. If you don’t have a credible theoretical model, you can’t do any physics with your detector.

EXAMPLE 2:

A radioactive source, say cobalt 60, is positioned directly on top of a gamma ray spectrometer consisting of a cooled germanium crystal. What is the probability distribution for the energy deposited in the crystal in the ensuing radioactive decays? In this case, although the underlying theory is well understood, the geometry makes things hard. In what percentage of gamma ray interactions in the crystal will all the energy of the gamma ray be absorbed, for example? The answer is going to depend on many things, including the geometry of the detector (smaller detectors will typically fully absorb a smaller fraction of the gammas), but also what it is surrounded by (gammas may scatter out of the crystal and then back in off shielding). And for some sources, the energy spectrum of the gamma rays from the source may be complicated too.

How Monte Carlo works.

In principle, there are analytic solutions to these types of problems. You start with a precisely known initial state. You then calculate a probability distribution for what might happen next, which includes all possible physics. You then integrate over this probability distribution to get a probability distribution for the state of the system after this initial ‘step’. For each of the allowed outputs, you calculate a probability distribution for what might happen in ‘step 2’, integrate over these probability distributions to calculate this output. And you keep going. At the end of the day, for each possible allowed outcome you sum the wavefunctions, or probabilities in the case of systems governed by classical statistics, yielding the given outcome considered, to obtain the probability or wavefunction for that outcome. In the case of a wave function, you then have to take the modulus squared. Of course, there is potentially a very large infinity of intermediate states that give rise to a given final state, and each intermediate state has a probability computed by calculating a multiple integral, so these analytic problems are almost always intractable.

The Monte Carlo approach works this way instead. Take a known initial state. Calculate the probability distribution for what might happen next. Rather than integrating over the possibilities, use a random number generator to pick one outcome, using a probability distribution that reflects knowledge of the local physics. For example, if the initial event is the emission of a gamma ray of a known energy from an isotropic source, you might use random number generators to pick two angles to specify its direction with respect to the detector. Having made this initial choice, you propagate this state a short time (or other independent variable) forward, then you again write down the probability distribution for all allowed physics. Again, you use a random number generator to select from the set of possibilities. And so on, until some definable end point is reached, like the particle entering a sensitive detector. When you repeat this process a very large number of times, the distribution of final states of particles reflects the physics used to calculate the probability distribution at each step. If this physics is a good model of the physics of the actual experiment, the distribution of simulated output should match the data set from the experiment. If there is a mismatch, as long as you didn’t make a mistake, it is a sign that the physics used to step through the monte carlo is not a good match to reality. This is the business end of discoveries in particle physics. Monte Carlo simulations are the tool for comparing the actual data with the expectations of theories. If the Higgs is discovered in ATLAS, I am sure that the first evidence for it will be from a disagreement between somebody’s monte carlo simulation and some portion of the data.

Random Number Generation

A key component of monte carlo simulations is the generation of random numbers. I refer the interested reader to, for example, Chapter 7 of Numerical Recipes for a thorough beginners introduction to methods for generating pseudorandom numbers in computers. Pseudorandom number generators (for now on abbreviated to random number generators) use iteration algorithms to produce a series of outputs that appear random, although of course then cannot be truly random unless you have a quantum computer. There are many tests of the randomness of a generator, one being the period of repetition. The more samples you have to take from the generator before the sequence repeats itself, the better. However, this test is not the only one. A generator with a long period may still only ever produce even numbers, etc. The advice here is:

(a) Pick a sophisticated generator - preferably one with a refereed publication to back it up. Those used by the ROOT TRandom1 class and by clhep HepJamesRandom are based on algorithms written up in the paper given above.

(b) Avoid random number generators that come as standard with the compiler. For example, in C there is an ANSI mandated rand() function that you should never use.

Random Number Seeds

Because the generators use iteration, the initial value of the input determines exactly the evolution of the sequence. Therefore you can if you wish set this seed. If you do not, it is frequently set to a default and hence your random number generator will always have the same output. You may not want this. Or, if you are building and testing a Monte Carlo code, you may wish your code to produce the same pseudo-random output twice for the same input parameters, so you can test modifications to the code. Either way, you may well want to set the seed. Typically the seed is a non-negative four byte integer, but this is not always the case. The method of setting the seed is generator dependent.

What is produced.

The base unit of output of random number generators is uniform deviates. This is a fancy term for numbers between zero and 1 with a probability density function of 1 in this interval. Random number generators

Random number generators I have known

- Algorithms from numerical recipes

The first generator I used in anger was ran1.c , a code from numerical recipes in C, chapter 7, which produces uniform deviates. It’s cousin gasdev.c produces gaussian distributed random numbers from a distribution having mean zero and standard deviation 1, and uses ran1.c plus some tricks. I will not discuss these generators today. They do have the advantage of being entirely stand-alone code so you do not have to install a large and complex set of libraries like ROOT or clhep to use them.

- ROOT and the TRandom1 class

The first generator I discuss in detail is the TRandom1 class that comes as part of ROOT. The code I used in class to demonstrate this is included as part of the tarball attached at the top of this web page. There are five functions. Examine the source for each one and figure out how it works.

int styles() set root to make plots on a white background and remove shadows.

int uniformDeviates(UInt_t seed) print 10 uniform deviates to the terminal using the prescribed random number seed.

int gaussRanom(UInt_t seed) generate 100,000 random numbers from a deviation having mean zero and standard deviation 1. Use these to fill a histogram having 100 bins between -3 and 3, and draw this histogram.

int distFromTheory(UInt_t seed, TH1F** pph) generate 100,000 random numbers from a deviation having a probability density function input by the user. In this example the function is

Plot the random numbers on a histogram. Also, write the address of the histogram, allocated using new, to the address pointed to by the second argument of the function.

To call this function you need to declare a pointer to TH1F:

TH1F* myhistpointer;

Then invoke the function with

distFromTheiry(234,&myhistpointer)

This way the address of myhistpointer is passed to the compiler, and to this address the function writes the address of the histogram. The use of pointers to pointers is necessary so that the function can somehow ‘return’ a pointer to a histogram. An alternative is to call the function with an argument of type pointer to void, for example, but this technique fell out of style when people started to turn away from C.

int distFromData(UInt_t seed, TH1F* myhistpointer)

This function generates data having a probability density function given by that of the histogram you pass it. In particular, the histogram pointer returned by the previous function may be used as an input. The result is more histograms that resemble the output of distFromTheory, but actually only having the same probability density function.

CLHEP library random number generators.

The functionality available in root is powerful, but not available to you when using special purpose industry standard Monte Carlo tools such as geant4. In the case of geant4, which many of you will use, you will need some knowledge of random number generation using the clhep library classes such as HepJamesEngine. I have included in the above tarball a simple example that can be compiled and linked with the CLHEP library. This is installed on the hep cluster You will need to modify the variable CLHEPHOME in the makefile of the same directory to point to the directory containing lib/libCLHEP.a and include/CLHEP/Random/Randomize.h. Make should produce something like:

g++ -c -I/Users/edaw/cern/clhep/include clheprandom.cc

g++ -o spit clheprandom.o -L/Users/edaw/cern/clhep/lib -lstdc++ -lCLHEP

which should generate no errors. To run it, type ./spit and you should see ten random numbers having uniform probability density functions between -3 and +8 produced. The code is very simple and heavily commented. Hopefully the comments will suffice to explain how the routine works. The capabilities of the CLHEP random number generators are far more extensive than this, of course, but at least now you should be able to get random number generation to work in geant4, and this should help in getting to know this important package.

Tarball of source code for this tutorial here