top of page
Search

Data Mining and Buckethead: Don't Judge a Pike by its Cover

Updated: Apr 8, 2019

Have you ever been curious to know what kinds of music the Buckethead community likes most? Have you ever wondered if people are ever thrown off by Buckethead's skronky or experimental music? Is there a way to predict how much people will like an album based off of its characteristics (and how many sales Big B will make from that album)? There is a way to find out all the answers to these questions using something called data mining!


Data mining is a fancy way to discover relationships with huge data sets by breaking them down into a few variables. Some friends of the Buckethead Disciple helped me run some machine learning algorithms on some data from Bucketheadpikes.com.


Here is what the finished data looked like:

'Sales' was extracted using a VBA function in Excel that downloaded total number of supporters from each individual pike page. Song rating was determined by the number of fan views on YouTube for each preview song (rated 1 for very few views to 4 for the most); i.e. if the pike's preview song had more views on YouTube, then it would receive a 4.

Every pike up to 275 ("Dreamthread") and its characteristics were included in this data set. Genre was determined by the overall pike genre, and season was extracted from the Buckethead discography Wikipedia website.


The concert column was whether or not Buckethead had played a song from that pike live in concert (1 for yes, 0 for no).


So what is this "preview song" thing? The preview song is always the first song in a pike that is available for a listen when you open up the pike's page on bucketheadpikes.com. Here is an example:

This is what you see if you click on the Forneau Cosmique pike. The preview song is the album namesake, "Forneau Cosmique." If you haven't purchased the pike yet, then you get to hear only the first song of the pike, i.e. the preview song. If the preview song on each corresponding pike is a favorite in the Buckethead community, then we assigned it the value of "1." What's the difference between a 1 and a 0 then? Have you ever heard Buckethead's song "Ricochet?" Do you know what pike it is from? I didn't think so. That's why it's a 0. But you probably HAVE heard of "Lebrontron," and you probably know that it comes from the pike It's Alive. Therefore, we coded "Lebrontron" as a 1.


Anyhoo... let's get down to some data mining! Here are the models we ran.

Description: What kind of model we ran. RMSE: Root Mean Square Error, the lower the number, then the more reliable the model. R2: R Square, the number that is closer to 1 means the associated model has more predictive power. The model that gives us the highest R2 will tell us the most juicy goss.

The ANN (Artificial Neural Network) Model gave us our best results. Cool beans! Uhhh... what does that mean? I'll show you!

Artificial Neural Network model with 3 singular TanH nodes, 1 singular linear node, 1 secondary TanH node, and 1 secondary linear node.

ANN models are cool because the variables associated with each pike are passed through a variety of bias nodes that introduce a small amount of bias to help make the predicted pike sale a little more realistic. For example, say a pike was released in 2011, was pike #1, hard rock genre, has a song Buckethead has played live, was released in spring, and has a good preview song whose song rating is a 4/4. ANN will pass those characteristics through its nodes and think, "Oh, this album probably sold really well, my best guess is that it should have sold about 840 copies."Turns out, that pike, which is actually pike 1 (It's Alive) sold 849 digital copies, which is really close to what ANN predicted! Our model was off by only 9 sales! If we pass every pike's characteristics through this model, we can teach ANN how to predict all album sales.


Here is a chart that shows what ANN predicted vs what the actual pike sales were. Remember, an R2 that's close to 1 means our model is very reliable.

An R2 of .755 is exceptional. ANN was able to explain quite a bit of the variation in Y (or how and why pike sales differ).

With this chart, ANN can also tell us something super interesting. Turns out that it doesn't matter what season a pike was released in, they sell the same no matter what. It also doesn't matter if Buckethead plays a song in concert, the associated pike doesn't sell any better. However, ANN does tell us some important information about genre and preview songs. It is all summed up in this table:


If you are familiar with multiple linear regression and variable coefficients, these values are basically the Y intercept with 5 coefficients.

ANN was able to tell us that the average Buckethead pike will sell between 575 and 630 digital copies. However, if that pike is an Easy Rock genre, then Buckethead will get between 55 and 108 extra sales! Unfortunately, if that pike is an Experimental Genre (like the Halloween Pikes), then they well sell between 50 and 80 fewer digital copies!


Notice how having a fan favorite preview song increased pike sales by between 67 and 80 copies, and that for every extra quartile increase (think of it as another star in a 4 star rating), Big B will sell between 22 and 36 extra pikes! That's 144 extra sales for just one 4 star song!


If Buckethead wanted to max out his pike sales, then he might consider a Hard Rock or Easy Rock genre pike with a solid preview song and to avoid the experimentation. Nevertheless, I think Buckethead does what he wants and feels, so my final conclusion is the following:


While Buckethead fans have a variety of preferences, we can conclude from this data that the majority prefer Easy or Hard Rock songs and pikes. Fans are also easily swayed by the first song they hear on an album and might be weary about listening to the rest of an album whose first song isn't their favorite.


This is just a theory that is turning into a hypothesis. Further testing is needed, which I totally intend to expound upon in the future. ;)




242 views
bottom of page