Nonparametric Segmentation Methods: Applications of Unsupervised Machine Learning and Revealed Preference
Many recent efforts by econometricians have focused on supervised machine learning techniques to aid in empirical studies using experimental data. By contrast, this article explores the merits of unsupervised machine learning algorithms for informing ex ante policy design using observational data. We examine the extent to which groups of consumers with differing responses to economic incentives can be identified in a context of fruit and vegetable demand. Two classes of nonparametric algorithms—revealed preference and unsupervised machine learning—are compared for segmenting households in the National Consumer Panel. Nonlinear almost-ideal demand models are estimated for all segments to determine which methods group households into segments with different expenditure and price elasticities. In-sample comparisons and out-of-sample prediction results indicate methods using price-quantity data alone—without demographic, geographic or other variables—perform better at segmenting households into groups with sizeable differences in price and expenditure responsiveness. These segmentation results suggest considerable heterogeneity in household purchasing behavior of fruits and vegetables.