The German Tank Problem
There is a classic practical joke which sometimes makes the rounds on the internet where you take three sheep and paint the numbers 1, 3 and 4 onto them, then release them into a busy public place. The idea being that after rounding up the initial three, people will spend time trying to find sheep number 2 which they will presume is still at large. Barring (Baaing…) any initial doubts I have to how realistic it is that someone would have 3 sheep at their disposal, this suggests an interesting mathematical problem. Given that you can see numbers 1, 3 and 4, and we assume that they have been labelled from 1 to n, what is the most likely number of sheep?
It turns out this is exactly the same problem that the Allies came across in the war. While they were confident that they could destroy the inferior Panzer III and Panzer IV German tanks, their Panzer V was a big improvement. However, the Allies weren't sure how many were in operation; were there just a couple that were being shown off at multiple battlefields, or were there hundreds? Occasionally the Allies managed to capture one and inside they found a model number, which gave them a list of numbers similar to the sheep problem above. From an incomplete list of numbers how could they estimate the total number?
It turns out that there is quite a nice formula for estimating this. If the total number of tanks captured is k and the highest model number observed is m, then the expected number of tanks = m + m/k - 1. So if you have seen tanks numbered 23, 47, 49, 51 and 66, then k = 5 and m = 66. So our best guess for the total number of tanks is 66 + 66/5 - 1 = 78.2, which looks reasonable.
How about our sheep problem? Well k = 3 and m = 4, so expected flock size = 4 + 4/3 - 1 = 4.3333…. This isn't much bigger than our biggest observed sheep because all of the data was quite concentrated.
There are more sophisticated models available which give better estimations. In the war the Allies found that each tank had several numbers they could make use of. By collecting the production numbers on the gearboxes, engines and chassis separately (and later in the war, from each wheel) the Allies put together all the data to make very accurate estimations from a small number of tanks. These estimates were usually far better than those made by the secret services. To wrap up here’s a graph of actual data from the war which shows just how accurate the method became:
Number of German Panzer V Tanks
Our model (red) was very close to what the German records (blue) indicate the actual numbers were. While the secret service's estimates were way off.