In this section we're going to talk about how we go about finding what we call the signal.  Now this so-called signal could correspond to the production of a Higgs boson, or maybe of a pair of W bosons, or maybe it just corresponds to events with two high-energy jets in them (initiated by gluons or heavy quarks). 

In short:

The signal corresponds to whatever particular process it is you’re interested in studying.

A background process, in contrast, is one which might look a little bit or (if you're unlucky) a lot like the signal process you care about, but isn't.  You'd like to cut away as much of the background as possible but not, if you can help it, by inadvertently removing signal events in the process.

What's the big deal?  "Signal and background, fine, I get it", you could say.

Well here's the big deal: as you probably might have guessed, it boils down to one of these scenarios:

Actually (as usual) that analogy isn't the best.  Usually you're looking to identify actual signal events and separate them from background events, so at the end of the day you might have a number of signal events (sometimes it's tiny, sometimes not), and you might have a much larger number of background events (again, sometimes, and sometimes not – it depends on the process) which, furthermore, could be decomposed into various types of background events. 

All of that might sound vaguely familiar when you think of it in the context of those stacked histograms in the last section.

At any rate, it might make more sense to draw it like this:

If we take the example of searching for Higgs bosons in our vast set of proton-proton collision data, we might come up with a set of event selection cuts – to be described in a second – which we apply and which help to identify the signal events we care about.  At the end of the day be left with only a handful of true signal events.  Unfortunately the backgrounds are large and even though we've tried to target them, we're still left with a healthy number of them.  Of course all we see are Higgs-like candidate events, so it's all blended together.  How much is signal and how much is background we can only really know from our simulations, but these give us predictions which we can test against the data.

So that's it: needles and haystacks.  Keep that in mind as we move forward.


Performing an Event Selection Cut

In Section 8 we introduced histograms which are often used by physicists and scientists in general to present their data (real or simulated) in a visually appealing way.  Crucially a histogram (i.e. a binning of the data) is often needed because one never has the luxury of infinite statistics.

Let's focus on our simulated data for now.  This means we know a priori which events are signal and which are background.  We can even give them colour: signal in red, and backgrounds in blue (we'll assume there's just one type of background, or lump them all together).

If we run them through the machinery of our analysis framework, we might get to a certain junction point where we'd like to make a cut on a given quantity.  Let's assume here that the quantity is the mass of some particle we've reconstructed.  Before deciding on the cut value, we fill a histogram to help us see what things look like.

You might get something like this:

Great.  Let's (quite obviously) keep the events above the cut value line and reject or cut the events below the line.  Easy enough, right?

Alright.  What if the situation is a bit less obvious:


So still maybe somewhat obvious, you might think.  After all, things look pretty symmetric if you still cut down the middle, so maybe that's still best, right?

At least one thing is certain: if you do make a cut right at the middle you will remove more background events compared with signal events, so that's great!  But can you afford to do that?  How precious are the signal events you're looking for?  Maybe you should make the cut a little bit more to the left to keep as much signal as you can.  But shoot, that lets in a whole bunch of the blue background which we don't want.

Finding the optimal cut value is not always as obvious as it seems and actually it depends on what you’re trying to measure!  If you only have a handful maybe you can't afford to throw out even one or two!  If, in contrast, you've swimming in data then you can try to keep the purity as high as possible even if it means losing some signal.  Here purity you can think of as the fraction of events you've labelled as signal which really are signal.

Luckily there’s a lot of statistical theory that can help make the choice so you don't have to 'eyeball it' so to speak.  It involves something called optimization.  Optimization boils down to trying to maximize or minimize some number – not the number of signal or background events separately, but some simple quantity involving both.  Here for our purposes let's not worry about what that number is, but just know that it can be done and then you can use that to defend your choice.  "Because the statisticians said so" works (and you can check their math if you're curious).

I want to highlight that I've drawn these two distributions in two different colours, but again, don't be fooled: when you make a real measurement, all you get are numbers, so no one is going to split things into two groups or paint some of them one colour and some of them another colour for you.  You have to rely (at least partly) on simulations to motivate your cuts.  And with simulations you're allowed to use various colours or split up the histograms into two as you please since this can help you motivate your choices.

Here's another even more complicated scenario:

Now it's really getting complicated, isn't it?

We're still ok though: optimization comes to the rescue again to tell us the best place to put our cut line.  We can make a calculation and the output will tell us at what value of the mass to make the cut.

I want to emphasize again that doing this sort of thing 'by eye' can be a tricky business. 

Here's what I mean.

Say the shapes we drew were just that: shapes.  We scaled them up or down such that the area underneath them was equal – makes it easier to put them on the same plot, right?  But what if in actual fact the relative amount of signal is tiny?

Kind of like this:

Signal and background distributions shown separately with their respective normalisations.

In other words, there’s roughly twenty times as much background as signal (the numbers I made up of course just as an illustrative example).  So now can draw those distributions again, but this time with the right scale.

Signal and background distributions shown overlaid and with their true normalisation scales.

What happened to our picture?

Now it’s harder to see the signal, but you can see that it is of course still there (and, crucially, it has the same shape as it had before, though it’s hard to tell when it’s so squished, right?).

In this case, while it might have seemed like a good idea to make the cut where we did, it maybe didn’t really help us too much overall.  Our signal is still definitely buried under a huge background!

Ok, new strategy.  Another thing we could do is look at the same events in a different way.  By this I mean we could see how the same events are distributed for some other variable.  Again, same events, but we use them to fill a histogram of a different quantity – maybe that's a more discriminating variable if we're lucky.  In other words, maybe one that will make it easier to discriminate between the signal and background.  We'll give this new variable the boring name "alternate variable" just to keep it generic. 

And let's then pretend when the dust settles it comes out like this:

Signal and background distributions overlaid and scaled to their correct normalisations but for an alternate variable which can be seen to be more discriminating.

The areas underneath the curves are identical to the distributions above, but look how much more of the signal is in the 'shallow' part of the background.  Actually, it's almost even protruding from beneath the background.  That should definitely be easier for us to see! 

Remember: we're going to stack those histograms on top of one another when we compare it to the real data, right?  So that means we just need the signal to make a strong enough contribution to push the whole thing up – make a bump, so to speak – such that we can make a claim that the signal contribution has to be there.  Otherwise if our background is large enough, we wouldn't know one way or another – no one would miss the signal if it weren't there.

If we find the perfect variable (perfect here meaning the most discriminating), we can stop there.  And if we have enough statistics to play with we can even toss out a ton of signal events to get the purity we want.  If not, we might have to be even more clever.


Exploiting Correlations

One can go even further and build some new variables out of several others (maybe even thousands) and then hope that this new composite variable gives an even better separation!  This is often trickier than it seems, but it's done all the time.  It ties into the concept of big data which is a bit of a buzzword these days.  The more variables you have, the more information you can squeeze out to tell you the full story.  Sort of.

Why did I say that it's so tricky?

Let's say we have two separate distributions (filled with the exact same events, but for two different histograms with variables X and Y, kind of like the ones above for the mass).  Cutting on X might only help us a bit, and same deal for Y.

So the question is: can we use some fancy mix of X & Y together?  Just some new variable Z where Z = X + Y?  Or something else?

The simple answer: sure. 

But ok, a follow-up question: crucially, is it useful to do so?  Are we really gaining much by doing it?

The answer to that one: it really depends on how much they’re correlated!

In other words, the same events that have a high value of X may also have a high value of Y for the most part. 

When X is high, Y is also high (on average). 

And when X is low, Y is low (again, on average).

This illustrates the idea of correlation

Correlation is a beautiful thing.  In physics it can either help us or hinder us. 

It can potentially allow us to make a measurement of one quantity we wouldn't normally have direct access to.  But it can also make us do a lot of work and gain effectively nothing (less than nothing since we've wasted a lot of time!).

For better or worse, in nature things are correlated to some degree all the time.  Several aspects of the stock market are also correlated (to the annoyance of investors). Sometimes they’re strongly correlated; sometimes they're weakly or negligibly correlated.  Sometimes they're anti-correlated (when one goes up, the other goes down and vice versa). 

The variables we want to use in doing some sort of analysis don't strictly have to be correlated at all of course, but the greater the degree to which they are correlated (or anti-correlated), the less it's useful to combine them.  Why?  Because for two variables that are highly correlated (or anti-correlated), the more likely it will be that they're just telling you the same information.

Said another way: using some fancy mix of X and Y might help you, but it might not.  It could be just as good using only either one or the other.  Using some combination of two (or more!) variables that are individually highly discriminating and not strongly correlated with each other (we can also say uncorrelated) is in general a good idea –the fact that they're not strongly correlated means they're each giving you different types of useful information.  A clever combination of the two will then allow you to do a better job in your separating power!

Have I confused you?

Here are some classic pictures illustrating the degree of correlation between two numbers for non graphophobes.  I made that word up and spell-check tells me it doesn't exist.  The point: we're drawing two-dimensional (or 2D) histograms here.  The basic concept of a 2D histogram is no different that of the 1D histograms we introduced before.  But with 2D histograms you need to specify both the x and y position of the bin you want to fill.

Examples of two-dimensional distributions exhibiting varying levels of correlation.

So each of the two-dimensional histograms you see above was filled using 5000 randomly drawn values of some number based on a Gaussian or normal probability distribution – a 'bell curve'.  That first random number you draw tells you the position on the x-axis (horizontal axis) for the bin you're going to fill, but you then also have to know which exact bin you'll fill, so you need the value on the y-axis too.  For that you draw another number associated to the first one. The level to which the pairs of numbers for the x-axis and y-axis are correlated is incorporated directly into the code that produces the plots.

It's the patterns we want to focus on.

In the left-most plot the two numbers are not correlated.  In fact they are independent (a stronger word).  As you move from left to right, the amount of correlation increases, until in the rightmost plot the two numbers essentially represent the same quantity!  If we wanted to draw the anti-correlated equivalents they'd just be flipped, so to speak.

Note: the patterns don't have to look like the ones above – those are just the classic examples!  In general though, look for the 'one goes up, the other goes up' type of pattern.  The more you see that, the greater the chance you've got a lot of correlation!

Often physicists (or those working in finance, statistics, or biology for example) use something called a multivariate techniques to separate tiny signals from otherwise overwhelmingly large backgrounds – neural networks or boosted decision trees are a few examples of such multivariate techniques.  Crudely speaking this boils down to throwing in distributions for tens, or even hundreds or thousands of different variables all with varying levels of correlation.  You use these to 'train' the program which scrutinizes the different inputs and correlations, and ultimately builds some single, super-powerful variable that's better than any individual variable you could put together yourself, and then you make a cut on that one.


The 'Non-Background' Background

There's another less obvious type of background we've not specifically mentioned so far, and that's the idea of combinatorics in true signal events.  So if they're truly signal events, why am I calling them background?

Here's the gist of it:

Consider the decay of two Z bosons (we don't care about where they came from for now).  Let's say each of them decays to a muon - anti-muon pair.  Now assume you've identified some event with four muon objects as one of your candidate signal events ('candidate' since you're not entirely sure, but it sure looks right).  For certain candidate events, it might be pretty straightforward to guess which muons should go together, so you might for example have something like this:

Two Z boson candidates each reconstructed from a muon-antimuon pair.


Pretty easy, right?  But ok, the fact that the corresponding muons' (or anti-muons') projected trajectories were close to one another in space made it easier to fit the pieces together by eye.  Notice also that the particles have different charge, so when exposed to strong magnetic fields (like the ones we employ in ATLAS) muons will bend one way and anti-muons will bend the other way.  That will just help confirm our suspicions even more.

So great, we can pat ourselves on the back with that one.

But what if, say, each of the Z bosons were to decay to a quark-antiquark pair (e.g. an up and an anti-up quark just for the sake of argument, but it doesn't really matter).  Remember that we don't reconstruct those quarks directly; we reconstruct what are called hadronic jets and those we use to probe the original quarks of interest.  But the charge information mostly doesn't help us anymore.  And if we're unlucky the four jets in the final state will be distributed such that it's not at all clear who belongs with whom.

Here's what I mean:

Two Z bosons each decaying to a quark-antiquark final state, which leads to four hadronic jets.  It's not immediately clear which ones belong together.

Tricky, right?

Ok, if we guess, then we'll probably get it right sometimes, right?  But sometimes we'll put the wrong pieces together.  You can think of the jets like actual puzzle pieces – sometimes they're meant to go together, and sometimes they just don't fit!

Correctly and incorrectly reconstructed Z boson candidates, demonstrating the concept of a combinatorial background.

Now if we do that many times over and over, and split up the cases where we 'get it right' and where we don't, the resulting histograms take on some interesting shapes!

Combinatorial and correctly reconstructed signal distributions for Z boson decays.

The name of the game here is of course to come up with some sort of algorithm that decides how to put the pieces together such that that algorithm has a very high probability of a correct match.  A certain fraction of the time (hopefully a large fraction!) that algorithm will get it right, but inevitably in the remaining cases it will get things wrong and you'll get garbage – pairs of puzzle pieces that just weren't meant to fit together.  That garbage though (when binned in a histogram) will take on a certain shape.

And when it comes to real backgrounds, say where there were no Z bosons in the first place from our example, then you're certain to get garbage anyway (there simply is no correct way to put the pieces together!).  And that garbage can, and likely will, have an entirely different shape.  What a mess!

On top of the issue of combinatorics, the signal itself becomes distorted by the inherent resolution of our detector or measuring device.

Combinatorics and Resolution in Top Quark Reconstruction

Let's look at one more example to sum it all up, and specifically, let's take the case of a top quark decaying in your detector into various pieces.  Here's an idea of what you can expect in terms of reconstructing those top quarks and looking for a mass 'peak' (which you can expect to get if you do things right).  Here we consider only signal, but recognizing that part of that signal will be the combinatorial background we just talked about.

From left-to-right it goes from (overly) idealized to what you might actually expect to see from a real detector:

Different effects contributing to the resolution of an invariant mass peak.

Notice that in the above case we don't split up the 'correct' signal from the 'combinatorial' signal cases.  In real life we can't do that anyway.  We can cheat to see how well we're really doing by looking at simulations, but the true signal that's in our data (even in the case that we have no background) will be a messy blend of correctly reconstructed top quarks and some combinatorial garbage. 

That's part of life.

Higgs Hunting!

For the final part of this section, let's talk about another signal you're probably more familiar with from having heard it in the news, namely the production of a Standard Model Higgs Boson.  Now, this is a pretty rare type of event – we knew this based on Standard Model predictions, even though we didn't know what the mass of the Higgs boson was.  We also initially didn't even know if the Higgs Boson existed – now we do! – so we could have been looking for needles in a haystack where there were simply no needles to be found.

At any rate, the LHC runs with the predictability of a clock and produces tons of uninteresting events, peppered with the odd one which is really interesting from time to time.  Every once in a while when we're lucky a Higgs boson will be produced.  

Actually, you might wonder, how often is that?  

Consider the entire 2012 data-taking period during which roughly ~ (a one with 15 zeroes!) potentially 'interesting' proton-proton collisions took place.  Of these, we can expect that in roughly 400.000 of those a Higgs boson was produced...  If you were to just pick an event truly at random, you'd be 100 times less likely to pick a Higgs event than you would be to pick the winning lottery numbers in Lotto 6/49, where your chance of winning is roughly one in thirteen million.

So the answer: it's rare.

You can think of the ATLAS data stream as spewing out events on a conveyor belt, kind of like this:

A cartoon illustrating the probabilistic nature of rare particle production at ATLAS.

The above picture is meant to illustrate the lottery-type nature of …nature.  It’s meant to depict, for a given process (here Higgs Boson production), that on average you have to wait a certain amount of time before you get what you want (somehow macheteing or weeding through the other stuff you don’t care about as best you can). 

However, I want to highlight neither in the case of the Higgs boson event itself, nor any events in between, did the final-state particles originate from a particular diagram, even if I drew them that way above.  It’s fundamentally a quantum mechanical process.  All you can know is average production frequencies, expected waiting times, and so on.

In other words, it's not really meaningful to ask the question: how many seconds will pass between this particular Higgs being produced and the next one?

But you can ask (and get a legitimate answer to) the following questions:

(i) What is the probability that the LHC will produce (and we will reconstruct and measure) 50 Higgs bosons in ATLAS in X amount of time?

(ii) How many Higgs bosons, according to theory, do we expect to be produced, reconstructed and identified over the full 2015 data-taking period of ATLAS?

 [Both answers will of course come with a certain uncertainty!]

What we’re glossing over is the fact that in the overwhelming majority of events nothing particularly interesting is happening.  We do our best not to ‘waste our film’, so to speak, on such events – using that expression at the risk of losing a fraction of the audience.

And I should emphasize that we never really know for sure, of course, if a particular event truly corresponds to the production of a Higgs boson!  All we can do is say that a given event certainly has all the telltale signatures that a Higgs boson was produced.  We call it a candidate signal event.  And if we have enough candidate signal events compared to the number we would expect if the Higgs didn’t exist, we can even claim a discovery (as we did in 2012)!  It's more statistically involved than that of course, but that's the essence of it.

Tying all of this back to the signal versus background stuff we introduced earlier on in this section, we have to come up with some clever event selection cuts to identify the events of interest.  Here we split up into different teams, since the Higgs can decay in several ways (as predicted by theory and now verified by experiment).  Each of those so-called decay channels has different telltale signatures, and features its own set of backgrounds.  There are pros and cons for each.  So each analysis team pics a decay channel and they get to work analyzing the data.

 A few of the more common Standard Model Higgs Boson decay channels together with some of their pros and cons.

If we focus on the two-photon decay channel, the background is huge, meaning there are many other processes – not involving Higgs bosons – which look almost indistinguishable.  If we have enough statistics though, and if we try to pick really discriminating variables to squeeze out our signal, we might just be able to see an excess over the background.  That's exactly what happened in 2012.  By combining the results of three main channels, we were able to show conclusively that a Higgs boson-like particle existed with a mass roughly 125 times the mass of an individual proton.

A new particle!

Searching for Higgs Boson decays in di-photon events.

If the resolution of our detector had been much worse, we simply wouldn't have seen anything – the bump would have been washed away and spread over an overwhelmingly large background.

In the next and final section, we'll look at some other types of things we might hope to see in the future.