TY - JOUR
T1 - The Classical Occupancy Distribution
T2 - Computation and Approximation
AU - O’Neill, Ben
N1 - Publisher Copyright:
© 2020 American Statistical Association.
PY - 2021
Y1 - 2021
N2 - We examine the discrete distributional form that arises from the “classical occupancy problem,” which looks at the behavior of the number of occupied bins when we allocate a given number of balls uniformly at random to a given number of bins. We review the mass function and moments of the classical occupancy distribution and derive exact and asymptotic results for the mean, variance, skewness and kurtosis. We develop an algorithm to compute a cubic array of log-probabilities from the classical occupancy distribution. This algorithm allows the computation of large blocks of values while avoiding underflow problems in computation. Using this algorithm, we compute the classical occupancy distribution for a large block of values of balls and bins, and we measure the accuracy of its asymptotic approximation using the normal distribution. We analyze the accuracy of the normal approximation with respect to the variance, skewness and kurtosis of the distribution. Based on this analysis, we give some practical guidance on the feasibility of computing large blocks of values from the occupancy distribution, and when approximation is required.
AB - We examine the discrete distributional form that arises from the “classical occupancy problem,” which looks at the behavior of the number of occupied bins when we allocate a given number of balls uniformly at random to a given number of bins. We review the mass function and moments of the classical occupancy distribution and derive exact and asymptotic results for the mean, variance, skewness and kurtosis. We develop an algorithm to compute a cubic array of log-probabilities from the classical occupancy distribution. This algorithm allows the computation of large blocks of values while avoiding underflow problems in computation. Using this algorithm, we compute the classical occupancy distribution for a large block of values of balls and bins, and we measure the accuracy of its asymptotic approximation using the normal distribution. We analyze the accuracy of the normal approximation with respect to the variance, skewness and kurtosis of the distribution. Based on this analysis, we give some practical guidance on the feasibility of computing large blocks of values from the occupancy distribution, and when approximation is required.
KW - Approximation
KW - Birthday problem
KW - Classical occupancy distribution
KW - Computation
KW - Coupon collector problem
KW - Distribution and moments
UR - http://www.scopus.com/inward/record.url?scp=85077905574&partnerID=8YFLogxK
U2 - 10.1080/00031305.2019.1699445
DO - 10.1080/00031305.2019.1699445
M3 - Article
SN - 0003-1305
VL - 75
SP - 364
EP - 375
JO - American Statistician
JF - American Statistician
IS - 4
ER -