What if a data analyst is trying to predict the number of customers that go to a store? Or how much water a new city will use in the next fiscal year? There are so many ways in which the daily number of consumers/customers change based on several variables, it is hard to get an accurate number. What most analysts do is something called a Monte Carlo simulation, or averaging thousands of iterations worth of variables. I will show you how I used a Monte Carlo simulation with predicting the number of Buckethead fans around the world!
To start, I needed some information on how many fans are going to each of Buckethead's shows during his 2019 tour. Below is a screenshot of what you might find on Buckethead's "Tour" page, with a few extra columns.
The capacity variable is the venue capacity for the respective concert location, and the random distribution variable is a random number generator (between 0.85 and 1) that represents that venue's potential capacity during one of Buckethead's shows. We cannot assume that there are enough Buckethead fans to max out every venue's capacity, and some people go to multiple shows, so no venue will have a perfect 100% attendance of fans.
The colors are based on what the US Census Bureau designates each United States region: Midwest, South, and Eastern United States. Using the sumproduct feature in Excel, I can calculate the total number of concert attendees multiplied by the distribution variable for each region of the US (with the exception of the West, given that he hasn't toured there yet :( oh well.) The results are in the colored cells in the image below.
The Buckethead Disciple managed to get survey responses from over 800 volunteers that revealed where they live in the world. Based on the percentage of survey respondents, he could determine the rate of participation based on region. If the values in the colored cells are multiplied by those rates, then it is possible to find out the number of fans in each region (with the exception of Africa, because only one African survey respondent participated in the survey.) Here is where Monte Carlo comes in:
Remember that random distribution variable? The numbers in those cells change each time a key is pressed in Microsoft Excel. So the number randomly fluctuates from 0.85 to 1. So in some iterations, Austin, Texas will have full capacity, and other iterations it will have only 95%, or 92%, or 85%. If we can record, say, 10,000 iterations-worth of distributions, then averaging them will be a good estimate of what the actual number will be.
Using a data table operation, I replicated 10,000 iterations of random fluctuations in the distribution variable. This is what it looked like below.
A graph that charts the "Total" column is below.
Notice the fluctuations with an overlay of the "average" dashed, black line. Averaging each of these values gives us the final estimated value: 359,254 worldwide.
I put the population of each of these regions just to put in perspective how many Buckethead fans and Non-Buckethead fans there are. That is how I got the pie chart for the video:
So that's Monte Carlo simulations, everyone!
Comments