Abstract
Many problems in marketing and economics require firms to make targeted consumer-specific decisions, but current estimation methods are not designed to scale to the size of modern data sets. In this article, the authors propose a new algorithm to close that gap. They develop a distributed Markov chain Monte Carlo (MCMC) algorithm for estimating Bayesian hierarchical models when the number of consumers is very large and the objects of interest are the consumer-level parameters. The two-stage and embarrassingly parallel algorithm is asymptotically unbiased in the number of consumers, retains the flexibility of a standard MCMC algorithm, and is easy to implement. The authors show that the distributed MCMC algorithm is faster and more efficient than a single-machine algorithm by at least an order of magnitude. They illustrate the approach with simulations with up to 100 million consumers, and with data on 1,088,310 donors to a charitable organization. The algorithm enables an increase of between $1.6 million and $4.6 million in additional donations when applied to a large modern-size data set compared with a typical-size data set.
Get full access to this article
View all access options for this article.
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
