Informatics Musings: March 2013

Tuesday, 26 March 2013

Exponential Distributions, Part 4 - Manage to the Mean, Not the Mode

In Part 3 we looked at the forgetfulness of exponential queues, namely the memoryless property. In this last article, we look at how to manage queues and systems that behave according to an exponential distribution.

Exponential Attribute #4 - The Mean is More Important Than the Mode

Or, Your Gut Feeling Will Fail You

In a normal distribution (bell curve), the peak value coincides with the mean value. The most common occurrence (the mode) also happens to be the average.

Not so with the exponential distribution. The asymmetrical tail skews the average away from the peak. The mean will always be larger than the mode.

So what? It turns out this subtle mathematical difference can play tricks on your mind if you're not careful. If you're a manager, that translates into consistently being over-budget and consistently failing to meet your operational performance targets.

Managing to the Mode or to the Mean?

As a manager, you gradually get a feel for how your department operates. You have a sense of the typical system behaviour, whether it's how long a call on your help desk lasts or how much it will cost to reprogram 100 lines of code. There is randomness involved, but your experience tells you the randomness averages out over time. If your system behaves according to a normal distribution, then it's hard to know if you're managing to the mode or to the mean. In fact, it doesn't matter because the two values are the same.

However, if your system behaves according to an exponential distribution, it suddenly matters which value you are managing to. When you want to manage to achieve the budget targets, you must use the mean. The mode is irrelevant.

I have found that people tend to manage to the most common events. It is easy to get a "gut feel" for the things that happen the most frequently, but it is not as easy to get a "gut feel" for the average. That means your "gut" manages to the mode.

Therefore if you manage according to your gut, an exponential system will cause you to fail every time. If you budget to the mode, you will always end the year over budget. If your target time for callers kept on hold is set by the mode, you will never achieve your target. You must budget and manage to the mean.

Beware of Management CFIT

In aviation, investigators created a term called CFIT, which means Controlled Flight Into Terrain. It can be the cause of a crash when the aircraft is working fine, including the instruments, but the pilot ignores the signs and flies by "gut feel" instead. When visibility is poor, that means the pilot can fly the plane directly into the ground, sea, or mountain without realizing it.

When you are managing a system that behaves in an exponential way, you must fly by instruments and not by your gut. Your instruments tell the nurse manager that the average stay of her inpatients is 5 days, but her gut feel is that 3 days is more common. Her gut is right (i.e. the mode length of stay is 3 days) but she must manage to the mean of 5 days to stay on budget.

That is why reliable and frequent reporting is key to managing exponential systems, because without reliable instruments you cannot pilot the system properly.

Manage to the Mean AND Build Flexibility

Managing to the mean is still not enough to be a successful manager however. As discussed in Part 2, with exponential distributions the tail wags the dog. You must be prepared for the large but rare outlier events that will have a huge impact on your system, such as the spinal injury patient whose length of stay is greater than 1 year. Over the long term the average value will become correct, but during a single budget year there may not be enough time to absorb the effect of such a large-valued event. To handle those events, you need to build in some flexibility into the system.

Flexibility can take different forms. It can involve cross-training of staff to permit additional capacity when a long-duration event occurs. For instance, asking a mortgage specialist to cover as a bank teller when one customer has an hour's worth of coins to deposit or having an extra doctor on call when the Emergency waiting time exceeds a few hours. It can also take the form of budget reserves. For instance, the cost of that rare spinal injury patient cannot be predicted by simply budgeting to that nursing unit's averages, but across a hospital with a few dozen nursing units, those rare outlier events may happen with much more predictability. (This is the Central Limit Theorem in action.) A once-in-a-half-century event for a nursing unit may become a bi-annual event across the entire hospital, which becomes much easier to budget for. When an unusual event occurs, transfer that reserve to the affected nursing unit and the departmental budgets stay on track. You have managed to predict an unpredictable event and reduced it's impact on your system.

Summary

What I hope to have shown in these few articles is that exponential distributions are real, they have some counter-intuitive effects, but they are actually quite predictable and manageable if you understand their behaviour.

Monday, 18 March 2013

Exponential Distributions, Part 3 - Queues Just Don't Remember

In Part 1 and Part 2, we have looked at some characteristics of exponential distributions, particularly the Erlang distributions. In this article we tackle another attribute of an exponential queue.

Exponential Attribute #3 - The Memoryless Property

Or, Always Press the Reset Button When You Arrive

Queues are just line-ups, so how can a queue possess a memory?
As it turns out, some queues and systems do have a memory, but not all. It is actually quite intuitive that all queues should have a memory. We expect that what happened before you got in line should affect what happens after you arrive. However, queues that are memoryless are not affected by what happened before you arrive. That behaviour is counter-intuitive, at least at first glance.

What is a "Memoryless" Queue?

A memoryless queue refers to the property that what happened before you arrived in the queue will have no affect on the timing of the next event after you arrive. Why does this matter?

Let's create a fictional example of waiting for a taxi cab to pass by on a street corner to take you to your next meeting. You arrive at a random time and taxis are passing that corner at random intervals. How long will you expect to wait for the next taxi on average?

If some data is collected at that corner over a period of days, let's assume the data shows the taxi intervals follow a normal distribution (bell curve) with an average interval of 5 minutes. When you arrive on the corner, sometimes you will have just missed a taxi and would expect to wait a full 5 minutes for the next one. At other times, you might arrive 5 minutes after the last taxi and would expect to wait a very short time for the next one. On average therefore, you would expect to wait 1/2 of the average interval time, or 2.5 minutes. Further, if there was a hotdog vendor on that corner who paid more attention to taxis than to his customers, you could ask him how long ago the last taxi passed. If it was 5 minutes ago, then you would expect to wait only a short time, whereas if the last taxi passed seconds ago, you would expect to wait about 5 minutes. In other words, the memory of the previous event in the system affects your expected wait time. That is intuitive.

However, now assume the data instead showed that taxi intervals on that corner follow an Erlang distribution with an average interval of 5 minutes. How long would you expect to wait for a taxi now? It turns out the correct answer is 5 minutes -- not half the average interval, but the average interval itself. Further, it doesn't matter what the hotdog vendor tells you about the last taxi to pass by. Whether it was 5 seconds ago or 10 minutes ago, the expected average wait time for you will be 5 minutes from when you arrive. It is like the queue has no memory -- it doesn't matter what happened before you arrived, the expected wait time clock resets itself. How could this be? Surely it cannot be true. It appears counter-intuitive.

The easiest way to demonstrate this is with a non-queue example. Imagine a factory that manufactures pipe. One machine extrudes pipe continuously and an automated saw immediately cuts the pipe into either 1 meter or 9 meter lengths, depending on the next customer order in the computer system. Effectively the order of pipe lengths is random, but on average the machine produces 50% short pieces and 50% long pieces each shift. The pipe continues moving along the assembly line at a constant speed where a robotic arm randomly selects pieces of pipe for quality control testing. An arm shoots out and knocks the pipe that happens to be passing by at that moment into a bin. The arm is programmed to randomly activate a certain number of times per shift.

At the end of each shift, what proportion of long and short pieces of pipe would you expect to find in the bin? Even though that machine produces half short and half long pipes, that will not be the composition in the bin. Because the long pieces of pipe take 9 times longer to pass the robotic arm, the odds are 9 times greater that a long piece will be selected. Therefore you would expect 90% of the pipes in the bin to be long ones.

It is the same principle with memoryless queues. Even though a long interval between taxi cabs is rare, the odds are greater that you will happen to arrive while one of those long intervals is occurring. That increases the expected wait time, and it turns out to increase it exactly to the mean value. Even if that last taxi came 8 minutes ago, you can still expect to wait 5 more minutes. You could just be in one of those rare but long duration intervals.

And that is what it means for a queue to be memoryless. Whatever happened before you arrived is irrelevant -- the clock resets when you get there.

Friday, 8 March 2013

Exponential Distributions, Part 2 - The Tail Wags the Dog

In Part 1, we looked at the asymmetry of exponential distributions, particularly Erlang Distributions. We concluded that in terms of randomness, bad luck will come more frequently than good luck. Now in Part 2, we look at the magnitude of that bad luck. It's not just about the number of bad luck events, but how large those individual events can be and how much effect they can have on the behaviour of a system.

Exponential Attribute #2: Outliers Cannot Be Ignored
Or, The Tail Wags the Dog

What is an outlier?

An outlier is a statistical term for an observation that is markedly different from the other observations in the dataset. It is ultimately a subjective definition even though they are often determined using statistical formulas. Common practice uses +/- 3 standard deviations from the mean as the boundary for outliers, which is reasonable for normal distributions. However, there is no mathematical rule that says this boundary definition is better than any other.

So what is the purpose of identifying outliers?

The primary purpose is to exclude measurement errors or other unusual occurrences that would bias the dataset and lead one to make a wrong conclusion. That's a worthwhile goal. However, the opposite danger is to exclude valid measurements as outliers and that too can lead one to make a wrong conclusion. Excluding outliers is a double-edged sword.

The chart on the right is a box plot of the Michelson-Morley Experiment results from 1887 where the speed of light was measured as the earth moved through the supposed "aether wind" of space. There were 5 experiments with 20 observations each. The top line of each box shows the 75th percentile value, the bottom line is the 25th percentile, the middle bold line is the median, and the T's are the maximum and minimum. The small circles represent outliers as per the boundary definition above. For experiment #3, four of the observations are deemed to be outliers -- two large-value outliers and two small-value outliers. As you can see, while statistically they may be considered distant from the rest of the values in their group, three of them are within the normal "inlier" ranges of the other four experiments.

This is a good example of the difficulty and arbitrary nature of defining an outlier. When you realize that ALL of the variation in this experiment is solely due to measurement error -- the speed of light is not changing -- then it raises the question as to why some measurement errors would be accepted as inliers and other errors would be deemed outliers. How does one know if the two low-value outliers in experiment #3 are closer to the true value than the other 18 higher values? As it turns out, the true speed of light is at value 792 in this chart (792 + 299,000 km/s). That means one of the low-value outliers in Experiment #3 is just as close to being correct as the 25th percentile value. It was a valid measurement and should not be classified as an outlier.

So even though this dataset was roughly normally distributed, it was still difficult to find the true outliers. With exponential distributions, it gets even more difficult because of the long tail. Simply going out 3 standard deviations from the mean does not give you a reasonable boundary for identifying outliers.

For example, using an outlier boundary of +/- 3 standard deviations on a normal distribution would define 0.27% of the events to be outliers, or 1 out of every 370 events. That means if you were measuring the outdoor temperature once per day, you would expect to see either a high or low outlier about once per year. If however the temperature happens to follow an Erlang distribution (with k=3, lambda=6), you would define 1.2% of the events to be outliers using the 3 standard deviation rule. That means you would expect to see an outlier temperature once per quarter. That's not particularly infrequent. (Yes I know temperature doesn't follow an Erlang distribution, but humour me for a minute!) Erlang distributions do not have the heaviest tails either -- other distributions such as the Log-normal or Weibull can have much heavier tails and therefore much greater proportions of their events beyond the 3-sigma boundary.

It's not just the number of outliers that's important to understand, but the size of each outlier as well. The heavier the tail, the higher the outlier value can be. In a normal distribution, the rare outlier events will still have values fairly close to the outlier boundary. In fact, an outlier greater than +/- 6 standard deviations from the mean essentially never occurs in a normal distribution. Not so with exponential functions. The tails go on and on, and some outliers can have absolutely huge values. Using our Erlang function parameters above, we would expect 0.016% of outliers to fall beyond 6 standard deviations above the mean, or 16 of every 100,000 events. That's not frequent, but it's a far cry from never! To use our weather analogy one last time (I promise!), that's an incredibly extreme temperature once every 17 years. Or it's like a "storm of the century" that happens 5 or 6 times a century.

What's important to understand is that these very large and rare events are not outliers. They are valid, real events that are an inherent characteristic of these exponential distributions. They are not measurement errors or aberrations from what is expected. They should not be dismissed but rather expected and planned for.

An Economist Discovers the Exponential World of Healthcare

I saw this play out in stark reality on a project a number of years ago. I was part of a team doing data projections showing how the retiring baby boomers would affect the provincial healthcare system over 25 years. One analysis involved predicting the demand for inpatient services. A semi-retired and respected economics statistician on our team built a spreadsheet model for inpatient demand, crunched the numbers, and declared that inpatient days would go down over the coming decades as the baby boomers retired, and that the total inpatient costs would also drop. The team leader was thrilled with the good news, but I was skeptical. I had read a couple studies in peer reviewed journals that made the exact opposite conclusion, namely inpatient days would rise, average length of stay would rise, and costs would increase materially.

I asked the statistician for his model and data to review his analysis. The data was fine and his model was sound, except for one minor step -- he ignored all of the data points above 3 standard deviations from the mean. I asked him his reason for this omission of a significant portion of the dataset and his reply was, "Well, that's what I always do." Obviously his entire career was spent using macro-economic data that was normally distributed, and he got into the habit of eliminating the outliers for every analysis without even thinking about it. The problem was that one cannot do that with inpatient length-of-stay data, which follows an Erlang distribution. This economist had used his "outlier magic wand" and made the sickest patients in the province instantly disappear! And what do you know? Hospital costs go down when you make the sickest patients disappear! Unfortunately doctors and hospital administrators don't have that magic wand in their pockets and they know the sickest patients have to be treated, often at significant cost to the system. Eliminating them from the projection made no sense.

I redid the analysis using all of the inpatient data and it agreed with the published studies, namely that inpatient days, average length-of-stay, and costs would all rise as baby boomers retired. It took a lot of convincing of the team leader that my analysis was identical to the economist's analysis, except that I included the sick people!

When dealing with exponential distributions in the real world, remember that the tail end of the distribution drives the behaviour of the system. Do not ignore the outliers. They are likely real events, they will happen more frequently than you would like, they will be larger than you like, and if you ignore them you will get the wrong answer.

With exponential distributions, the tail wags the dog.

Postscript

After publishing my article this morning, I found an article by Carl Richards at Motley Fool who argues that outliers in normal distributions shouldn't be ignored either. Seems to be the theme for the week!

Thursday, 7 March 2013

Exponential Distributions, Part 1 - Expect More Bad Luck Than Good

In a recent blog post, I discussed how JPMorgan got burned by assuming derivatives behaved according to a normal distribution when in fact they followed an exponential distribution. I want to delve a bit further into some of the counter-intuitive aspects of exponential distributions because I have found most people are not familiar with them.

Randomness

First it is important to understand that all of these distributions I am referring to deal with randomness. When the next event cannot be predicted precisely, such as tomorrow's stock market close or the roll of the dice, it is an example of randomness. However, if a population of related random events is studied as a group, the randomness will take on a particular form. The histogram for rolling a single die will be flat, with each value from 1 through 6 having an equal 1/6th chance of occurring. Sometimes those histograms take the shape of a bell curve (normal distribution) or an exponential distribution or something completely different. It just happens that normal and exponential curves appear a lot in the real world.

Erlang Distributions, shown to the right, are just one family out of many exponential curves. The mathematics of Erlang functions can be viewed here, but it's not necessary to know the math in order to understand their behaviour. Most queues in the real world follow an Erlang curve (think the line-up at the bank or how long you will be kept on hold by your company's help desk). Exponential distributions come in many shapes and sizes, but they share some common attributes that differ significantly from the Normal Distribution. These mathematical differences create real-world consequences that must be understood in order to manage them appropriately.

Exponential Attribute #1: Assemetry
Or Expect More Bad Luck Than Good

Normal distributions are symmetrical about their mean, which implies random variation will occur equally above and below the average value. If you equate randomness with luck, and one side of the mean as good and the other as bad, then you will get approximately equal amounts of good luck (above the mean) and bad luck (below the mean).

Exponential distributions however are lop-sided. This is because there is a minimum value but no practical maximum, and because the mean is relatively close to the minimum. For example, inpatient length of stay in a hospital cannot be less than 1 day but it can extend out to hundreds of days on rare occasions. The duration of a telephone call cannot be less than zero seconds but it can extend into hours or even days if it is a computer modem that's making the call.

How is this important?

It means unusual random events will almost ALWAYS be on the long side of the distribution. You will not get enough randomly small events to balance out the randomly large events as you would expect in a normal distribution. For instance, if shoe size in a population happens to follow a normal distribution, we would expect to find an unusually large set of feet (i.e. more than 3 sigma above the mean) on a rare occasion (about 1 out of every 370 people). Similarly we would also expect to find an unusually small set of feet (i.e. less than 3 sigma below the mean) on an equally rare occasion. If our sample size of the population is sufficient, those large and small outliers would balance each other out and our mean value would be unaffected. Not so with exponential outliers. When you get an inpatient who stays in a hospital ICU bed for 2 years (such as a spinal injury from a diving accident) you will never get an inpatient who stays -2 years to balance things out. That one outlier event will significantly impact the average length of stay for that ICU and probably for the entire hospital.

Therefore, exponential distributions bring unbalanced randomness. Warren Buffet found this to be true of the insurance industry, as he states in his 2005 Letter to Shareholders:
"One thing, though, we have learned – the hard way – after many years in the business: Surprises in insurance are far from symmetrical. You are lucky if you get one that is pleasant for every ten that go the other way."

JP Morgan learned this as well. Days where derivative contracts lost big money were not balanced out by days where those same contracts earned big money. The random behaviour of that system was severely unbalanced. The bad luck exceeded the good luck.

Except it wasn't luck. It was predictable randomness.

If your system's randomness follows an exponential curve, then you must plan for more bad luck than good.

In Part 2 we will look at another implication of this asymmetry, namely the large tail of an exponential distribution.

Friday, 1 March 2013

Warren Buffet's Investing Wisdom, Part 2

Today Warren Buffet issued his latest annual report and letter to shareholders. Continuing from Part 1, here are some more of my favourite nuggets of Warren Buffet's wisdom from his previous annual letters to shareholders.

John Stumpf, CEO of Wells Fargo, aptly dissected the recent behavior of many lenders: “It is interesting that the industry has invented new ways to lose money when the old ways seemed to work just fine.” (2007)
I’ve reluctantly discarded the notion of my continuing to manage the portfolio after my death – abandoning my hope to give new meaning to the term “thinking outside the box.” (2007)
For me, Ronald Reagan had it right: “It’s probably true that hard work never killed anyone – but why take the chance?” (2006)
Warning: It’s time to eat your broccoli – I am now going to talk about accounting matters. I owe this to those Berkshire shareholders who love reading about debits and credits. I hope both of you find this discussion helpful. All others can skip this section; there will be no quiz. (2006)
As a wise friend told me long ago, “If you want to get a reputation as a good businessman, be sure to get into a good business.” (2006)
Long ago, Mark Twain said: “A man who tries to carry a cat home by its tail will learn a lesson that can be learned in no other way.” If Twain were around now, he might try winding up a derivatives business. After a few days, he would opt for cats. (2005)
When we finally wind up Gen Re Securities, my feelings about its departure will be akin to those expressed in a country song, “My wife ran away with my best friend, and I sure miss him a lot.” (2005)
Comp committees should adopt the attitude of Hank Greenberg, the Detroit slugger and a boyhood hero of mine. Hank’s son, Steve, at one time was a player’s agent. Representing an outfielder in negotiations with a major league club, Steve sounded out his dad about the size of the signing bonus he should ask for. Hank, a true pay-for-performance guy, got straight to the point, “What did he hit last year?”
When Steve answered “.246,” Hank’s comeback was immediate: “Ask for a uniform.” (2005)
Long ago, Sir Isaac Newton gave us three laws of motion, which were the work of genius. But Sir Isaac’s talents didn’t extend to investing: He lost a bundle in the South Sea Bubble, explaining later, “I can calculate the movement of the stars, but not the madness of men.” If he had not been traumatized by this loss, Sir Isaac might well have gone on to discover the Fourth Law of Motion: For investors as a whole, returns decrease as motion increases. (2005)
R. C. Willey will soon open in Reno. Before making this commitment, Bill and Scott again asked for my advice. Initially, I was pretty puffed up about the fact that they were consulting me. But then it dawned on me that the opinion of someone who is always wrong has its own special utility to decision-makers. (2004)
John Maynard Keynes said in his masterful The General Theory: “Worldly wisdom teaches that it is better for reputation to fail conventionally than to succeed unconventionally.” (Or, to put it in less elegant terms, lemmings as a class may be derided but never does an individual lemming get criticized.) (2004)
Charlie and I detest taking even small risks unless we feel we are being adequately compensated for doing so. About as far as we will go down that path is to occasionally eat cottage cheese a day after the expiration date on the carton. (2003)
[Regarding the annual meeting:] Charlie and I will answer questions until 3:30. We will tell you everything we know . . . and, at least in my case, more. (2003)
Borsheim’s [Jewellery Store] operates on a gross margin that is fully twenty percentage points below that of its major rivals, so the more you buy, the more you save – at least that’s what my wife and daughter tell me. (Both were impressed early in life by the story of the boy who, after missing a street car, walked home and proudly announced that he had saved 5¢ by doing so. His father was irate: “Why didn’t you miss a cab and save 85¢?”) (2003)
When I review the reserving errors that have been uncovered at General Re, a line from a country song seems apt: “I wish I didn’t know now what I didn’t know then.” (2002)
We cherish cost-consciousness at Berkshire. Our model is the widow who went to the local newspaper to place an obituary notice. Told there was a 25-cents-a-word charge, she requested “Fred Brown died.” She was then informed there was a seven-word minimum. “Okay” the bereaved woman replied, “make it ‘Fred Brown died, golf clubs for sale’.” (2002)
Bad terminology is the enemy of good thinking. When companies or investment professionals use terms such as "EBITDA" and "pro forma," they want you to unthinkingly accept concepts that are dangerously flawed. (In golf, my score is frequently below par on a pro forma basis: I have firm plans to "restructure" my putting stroke and therefore only count the swings I take before reaching the green.) (2001)