You Know, Most People Just Listen to Music...

Bus schedules in Los Angeles usually say things like "Between the hours of 9AM and 11AM, a bus departs every 15 minutes from the ____ terminal."  Of course, if you've every waited for the bus anywhere other than that one stop, you are familiar with the phenomenon that rarely, if ever, does a bus come every fifteen minutes on the dot.  

Now we can't blame the bus drivers or the bus company, because there's absolutely no way they could predict traffic/pedestrians/light changes/etc. so precisely that they could actually give a truthful estimate of how often buses arrive halfway in between their time checks.  For those who aren’t familiar, time checks are points where the bus company guarantees a bus will leave at a particular time.  The bus will wait at that point just to make sure that the buses are spaced as correctly as possible.  These usually occur at the beginning and end of a particular line before the bus turns around to run the line again.  

Since I’ve been taking the bus this week, I’ve found myself wondering “how long should I expect to wait if I just walk out at random times?” (I.e. don’t try to catch any one particular bus or pay any attention at all to a bus schedule).  I take the Santa Monica bus 12, which supposedly departs the green line station every 15 minutes headed northbound to UCLA on weekday mornings.  So why do I find myself waiting three minutes one day, and then twenty minutes the next?  

Well fortunately we can use a little bit of probability theory to answer this question, and then we’ll verify it with a little bit of MATLAB experimentation.  

First off, we have to figure out what question we’re really asking.  I think we’d all love to know how long we’ll have to wait for the bus on the one particular day that we’re running late and really can’t spare a single second, but alas, such questions about random variables cannot be answered.  The question we’re going to answer is how long on average will I have to wait if I go out and wait for the bus?  For those who aren’t math people, when we say things like that what we’re saying is that we’re going to go out and wait for the bus not once, not twice, not ten times, but many many many times until we have statistical power behind our test.  We’re looking for over-arching behavior in our situations, not point by point variations.

Ok, so we have our question: “If I walk out randomly to the bus stop, how long on average will I have to wait for the next bus to arrive?”

Now, to answer this, we need a little bit more information: how often do buses pass by my stop?  This is perhaps the critical part of our problem.  This is the “random variable” part and where I’ve been beating my head against the wall for the last few days.  Let’s step aside and look at a non-random version of our problem.

Assume that buses pass by my stop precisely every 15 minutes.  Well there’s nothing random about that.  If I arrive randomly at the stop without any knowledge of when the previous bus passed or when they should pass (it’s my first day taking the bus), I can expect to wait, on average, 7.5 minutes.  I’m not going to get too bogged down in why this is because this is already getting long-winded, but think of it like this: since I arrived randomly, it’s just as likely that I’ll have to wait 15 minutes as it is that I have to wait zero minutes.  The average of those two is 7.5.  Similarly, it’s just as likely that I’ll have to wait 14.95 minutes as it is that I’ll have to wait 0.05 minutes; the average of those two is 7.5.  And so on and so forth for all of the possible points in that interval.  If you’re still confused, I’ll try and find a more thorough description online and add a link to it at the end of this post.  This is an important concept though as we’re going to use it in our argument.  The take away message here is:

If I arrive randomly in a fixed interval of length x, I can expect to wait x/2 on average for the bus to come.

Ok, so now that we have that established (hopefully), let’s make it real: buses never arrive in fixed intervals.  So that interval length that we specified, X, is actually a random variable.  It’s random because it’ll change based on that motorcycle that pulled out in front of the bus, how many people need to get on and off at each stop, whether or not a bus gets through a light, etc. Even if we had some way to parse all that information, humans are known to behave in “noisy,” i.e. random, ways on repetitive tasks.

Now there are different ways that random variables can be distributed.  Uniformly is one.  This is the case of rolling a die: every outcome is equally likely.  In the case of our bus, that’s like saying it’s equally likely that the time between buses is one minute as it is that it’s twenty minutes, or forty.  

Given that buses leave every fifteen minutes and traffic, as bad as it can be, doesn’t usually slow our bus down forty-minutes-badly, the uniform distribution doesn’t really fit our buses here.  In fact, because our buses depart every fifteen minutes, it’s not crazy to think that on average, there will be about fifteen minutes in between buses at the bus stop.  We’re going to do just that.

The type of distribution that we’re looking for is something that has this kind of flavor: the most likely possibility is that the bus-interval is about fifteen minutes.  The interval can change, but the further away we get from fifteen minutes, the less likely it is that the buses will be separated thusly.

For instance: Most buses will be about 15 minutes apart.  It’s totally possible that the bus could take 10 minutes or 20 minutes to get to the stop, but a little less likely than just the standard 15 minutes. Even more-so, it’s also possible that the bus-interval could be 3 minutes or 27 minutes but that’s less likely than the 10 or 20 minute case and certainly less likely than our standard 15 minute case.  

Such a distribution exists and it’s called the Poisson distribution.  Such a distribution is built to describe the number of event that are likely to occur in a given interval of time.  For instance, bus passings in an hour.  Perfect for what we want.  A Poisson distributed random variable has some interesting properties too such as:


which says “the variance of random variable X is equal to the average of the random variable X.”  We’ll use this later.  Don’t fret too much if you don’t know what variance is.  Loosely speaking, it describes how likely you are to get a number that’s different from the mean, and how far away it can be.

Ok, let’s move on from all this talk of distributions and get to the meat of the problem.  We want to know the average wait time for our bus stop by which buses pass randomly with an average time of 15 minutes and are Poisson distributed.

It’s important here to distinguish between the bus-interval (or the spacing between buses) and the wait-time.  The buses will pass by the stop, whether we’re waiting or not, and the interval is the time between them.  The wait time, is how long we have to wait for the bus if we get out there in any of those intervals.  From here forward we’re going to refer to the bus intervals as \(X\) (or subscripted variety such as \(X_{i}\)) and the wait times as \(W\) (similarly with the subscripts).

We start by setting up our expectation value (a.k.a. the average or mean) of our wait time:

\(\mathbb{E}[W]=\sum\nolimits_{i} \overline{w_{i}}\,p\left(w_{i}\right)\) 

\(w_{i}\) is the wait time if we arrive in an interval of size \(x_{i}\).  Since our arrival time is uniformly distributed, and we’re assuming we do this a boatload of times, we care about the average (as specified in the equation, \(\overline{w_{i}}\) which as we previously discussed, is


We also need to figure our what \(p_{i}\left(w_{i}\right)\) is.  We know that the probability of our wait time is proportional to the size of the interval (i.e. we are more likely to arrive in a long stretch without a bus) and we also know that it’s proportional to how common the interval is to occur (i.e. we are very likely to arrive in common length intervals such as 15 minute ones, and a little less likely to arrive in very short or very long intervals due to the fact they occur less often; recall the Poisson distribution). So,

\(p\left(w_{i}\right)\propto x_{i}\,p\left(x_{i}\right)\)

Since it’s a probability distribution though we’ve gotta normalize, and as a result get:

\(p\left(w_{i}\right) =\frac{ x_{i}\,p\left(x_{i}\right)}{\sum\nolimits_{j} x_{j}\,p\left(x_{j}\right)}\)

Now we put it all together to get:

\(\mathbb{E}[W]=\sum\nolimits_{i} \overline{w_{i}}\,p\left(w_{i}\right) = \frac{\sum\nolimits_{i} \frac{x_{i}}{2}\,x_{i}\,p\left(x_{i}\right)}{\sum\nolimits_{j} x_{j}\,p\left(x_{j}\right)}\)


So \(\mathbb{E}[]\) refers to the expectation value of something, and the expectation value is the same as the average of something.  \(\mathbb{E}[X]\) is the average of X.  \(\mathbb{E}[X^{2}]\) is called the second moment and theres a fun relation that goes something like,


From which we can rearrange to get,


Putting this back into our equation for our expected wait time we get,


But we know that for a Poisson distributed random variable, which X is, we have that 

\( Var(X)=\mathbb{E}[X]=\lambda \), 

and thus we work out that

\( \mathbb{E}[W]=\frac{1}{2}\,\frac{\lambda+\lambda^{2}}{\lambda}=\frac{1}{2}(1+\lambda)\).

Thus for our case where our average bus interval is 15 minutes, we should expect to wait, on average, EIGHT minutes for the bus instead of seven and half like we might expect.

Now, making experimentally verifiable claims are the soul of science, but I’m sure as hell not going to sit outside for a hundred million billion hours counting buses to get good statistics, so I decided to model this is MATLAB with a little piece of code.  

% Bus_arrivals.m %
% John M Hoffman 2014 %

while (count<100000)

if ped_arrival<bus_time
disp('The average wait time is:')


What it basically does is grab a random number for the time between buses and then a random number for what point I arrived at the bus stop.  If my number is less than the bus’s number, then great, I’ll be there for the next bus and we record the difference as the amount of time that I waited for the bus to arrive.  My number is bigger than the bus’s then I missed that bus and we start over without recording anything.  We do this until I’ve “caught” the bus a hundred thousand times and then look at the average of all of those wait times.

Running my program a handful of times for 15 minute average bus intervals I get the results:

The average wait time is:

The average wait time is:

The average wait time is:

The average wait time is:

So it’s looking pretty good.  An astute reader will point out the pretty consistent little bit of overshoot we have there.  I’m not sure exactly what this is from, but if I had to guess it’s not some crazy underlying phenomenon but rather has something to do with how MATLAB handles drawing random numbers and my lack of care in accounting for all possible contingencies when programming.  Namely, I’m not sure that the code allows for a zero-wait time case, or at least enough of them as it should, which would give a slight overestimate to our figure.  Any thoughts would be great and much appreciated!

That being said, none of this helps answer the real question of avoiding the crazy guy whispering in your ear or that teenager with their music up so loud everyone on the bus can hear it.  Math has a solution to that one too:

Count the buses, don’t ride them.