TY - JOUR
T1 - stratifyR
T2 - An R Package for optimal stratification and sample allocation for univariate populations
AU - Reddy, K. G.
AU - Khan, M. G.M.
N1 - Publisher Copyright:
© 2020 John Wiley & Sons Australia, Ltd
PY - 2020/9/1
Y1 - 2020/9/1
N2 - This R package determines optimal stratification of univariate populations under stratified sampling designs using a parametric-based method. It determines the optimum strata boundaries (OSB), optimum sample sizes (OSS) and multiple other quantities for the study variable, y, using the best-fit probability density function of a study variable available from survey data. The method requires the parameters and other characteristics of the distribution of the study variable to be known, either from available data or from a hypothetical distribution if the data are not available. In the implementation, the problem of determining the OSB is formulated as a mathematical programming problem and solved by using a dynamic programming technique. If the data of the population (i.e. the study variable) are available to the surveyor, the method estimates its best-fit distribution and determines the OSB and OSS under Neyman allocation, directly. When the dataset is not available, stratification is made based on the assumption that the values of the study variable, y, are available as hypothetical realisations of proxy values of y from past/recent surveys. Thus, it requires certain distributional assumptions about the study variable. At present, the package handles stratification for the populations where the study variable follows a continuous distribution: namely, Pareto, Triangular, Right-triangular, Weibull, Gamma, Exponential, Uniform, Normal, Lognormal and Cauchy distributions. In this paper, applications of major functionalities in the package are illustrated with a number of real/simulated as well as some hypothetical populations.
AB - This R package determines optimal stratification of univariate populations under stratified sampling designs using a parametric-based method. It determines the optimum strata boundaries (OSB), optimum sample sizes (OSS) and multiple other quantities for the study variable, y, using the best-fit probability density function of a study variable available from survey data. The method requires the parameters and other characteristics of the distribution of the study variable to be known, either from available data or from a hypothetical distribution if the data are not available. In the implementation, the problem of determining the OSB is formulated as a mathematical programming problem and solved by using a dynamic programming technique. If the data of the population (i.e. the study variable) are available to the surveyor, the method estimates its best-fit distribution and determines the OSB and OSS under Neyman allocation, directly. When the dataset is not available, stratification is made based on the assumption that the values of the study variable, y, are available as hypothetical realisations of proxy values of y from past/recent surveys. Thus, it requires certain distributional assumptions about the study variable. At present, the package handles stratification for the populations where the study variable follows a continuous distribution: namely, Pareto, Triangular, Right-triangular, Weibull, Gamma, Exponential, Uniform, Normal, Lognormal and Cauchy distributions. In this paper, applications of major functionalities in the package are illustrated with a number of real/simulated as well as some hypothetical populations.
KW - R project for statistical computing
KW - dynamic programming
KW - mathematical programming problem
KW - optimum sample sizes
KW - optimum strata boundaries
UR - http://www.scopus.com/inward/record.url?scp=85092788696&partnerID=8YFLogxK
U2 - 10.1111/anzs.12301
DO - 10.1111/anzs.12301
M3 - Article
SN - 1369-1473
VL - 62
SP - 383
EP - 405
JO - Australian and New Zealand Journal of Statistics
JF - Australian and New Zealand Journal of Statistics
IS - 3
ER -