Wednesday, 30 December 2015

Amazon Top 100 (2013 and 2014) Retention Analysis


As regular readers of this blog know, I have kept data on the Amazon Top 100 list of books, for the years 2013 and 2014 and have written a number of blogs in which I analyzed that data. It will soon be time to update that database with the most popular new books of 2015. But before doing that, I thought it would be interesting to see just how the Top 100 of 2013 and 2014 did during the year 2015. How did their rankings change? How did their review numbers change? Which books held their rankings the best, by such important factors as genre, early ranking, writer sex, writer age, price and so on?
There are several ways to look at such data. The first and most obvious is via descriptive statistical analysis – i.e. just looking at measures such as average rank by category. Beyond that, more advanced techniques, such as logistic regression, can be used to determine the independent effect of each of these categories. That can help to predict the types of books that best hold their rankings and reviews (and therefore, sales) over time.


This blog will focus on the descriptive statistics, by looking at the average rank of these books, during each month in 2015. The graph above is an example. For the 2013 and 2014 Top 100 lists, the average rank by month in 2015 is given by the height of the bars. Higher average ranks are, of course, not desirable. In these graphs, lower numbers are better, just like in golf.
The books in the combined 2013 and 2014 Amazon Top 100 lists fell from an average rank of about 4000 in early 2015 to about 9000 by mid-December. One could fit a functional form to the data, but eye it appears to be quasi-linear, perhaps a gently sloped power law, over the 12 month period.
It is hard to be sure what that represents in terms of reduced sales or income, however, as the relationship between rank and sales is not linear. Furthermore, due to differences in pricing, the relationship between units sold and money earned is not straightforward either.
We can try to estimate the drop in sales via reviews. It turns out that this set of books “earned” about 3.3 reviews per day per book in the early part of 2015 versus about 1.2 per day in the last period. If we assume that a relatively constant percentage of books are reviewed by purchasers, that would indicate that sales of these books declined to about a bit over one-third of their early 2015 total, by the end of 2015. Note that in their “Top 100” year, these books averaged about 9.4 reviews per book per day. So, by the end of 2015, they were probably selling about one-eighth as many books as they were during their initial publishing year. This is shown in the graph below, along with data for each of the two Top 100 years, and best fit exponential decay curves.

At any rate, this particular blog is more interested in comparing how well different categories of books did over time, rather than estimating sales figures (though I might try that in a later blog).


Overall Ranks in 2015, by “Top 100” Year

As you can see, the more recent books from 2014 held their ranks during 2015 better than the books from the 2013 Top 100 list. In both cases, though, the average rank of the books drifted upwards, throughout the year. The 2013 books started 2015 with an average rank of about 6000 at the end of January, and finished at about 12000 in mid-December. The 2014 books started the year at about rank 2000 on average, and ended at about 6000. Remember that these books were in the Top 100 lists in their respective years. However, the Top 100 lists were constructed relative to books published that year, and the ranks in 2015 were against all years, so the declines look worse than they really were.


Ranks in 2015, by Rank Quartile in Top 100 Year

The second set of graphs shows how book ranks changed in 2015, based on the books initial ranking in the Top 100 year. The first category, labelled 1, represents books that were in the first quartile of ranks in their initial year (i.e. the top 25%). The group of books labelled 2 was in the second quartile, and so on.
It is clear that books that were in the top quartile in their publishing year managed to hold their rank the best, and books in the bottom quartile did the worst, in that regard. That's not too surprising. There wasn't much difference in the two middle quartiles, though. So, it appears that the readers don't distinguish between books in the middle ranks all that much.
 
Looking at the data by year, the pattern repeats itself, or at least approximately so. Books that were in Quartile 1 during their publication year held their ranks the best, while those in Quartile 4 did the worst. Quartiles 2 and 3 reversed between the 2013 and 2014 list, however.

Ranks in 2015, by Sex of Writer

The third group of graphs gives book ranks during 2015, by the gender of the writer. It appears that gender didn't make much difference at the start of the year – both female and male writers were averaging about rank 4000. But as the year went on, books by females lost ground in the rankings more quickly than books by males, so that there was a substantial difference by year end.

As we will see later on, much of that is probably a reflection of the genre that the sexes tend to write in. Romances lost their rankings more quickly than other genres, and since women tend to write in the romance genre, their rankings suffered accordingly as 2015 progressed.
In this case, breaking out the data by year did reveal some differences. In the 2013 Top 100 books, there was little difference between males and females, in the ranks by 2015. However, the 2014 Top 100 books indicated an advantage for male writers. With this amount of data, we can't tell whether the male-female difference is real, but short lived, or whether it is a quirk of the datasets.

Ranks in 2015, by Educational Status of Writer

The graph of Rank by Writer Education is a bit counter-intuitive. At the start of the year, books by writers with graduate degrees held their rankings the best, followed by those with some university, then high school, then Bachelor's degree and Unknown. By the end of the year, it was writers with “some university” who held their rankings the best, though.
 

If we collapse these categories into “No degree or unknown status” versus “Has a degree”, things change somewhat. I collapsed those categories in that fashion, on the assumption that writers who weren't keen on disclosing their educational status, probably didn't have university degrees. But that could be wrong.
Using this re-categorization, the degree holders did somewhat better than the non-degree holders, though the difference was not all that great. Basically, they did better in the middle months of the year, but about the same at the beginning and end of the year. It seems fair to say that there is no clear trend evident. Breaking out the data by year (not shown) also shows no clear trend – in 2013 non-degreed writers seemed to do slightly better, while in 2014 the reverse was true.

Looking at the subject that the writer studied and/or worked in (besides writing), we see that the traditional subjects of English/History/Journalism and Law were most successful at holding their ranks through 2015.

Ranks in 2015, by Age Range of Writer

In this case, a clear trend was evident, in favour of older, more established writers. Generally speaking, as the writer was older, the books held their rankings better. This was probably a reflection of the older writers' longer tenure, and thus more established reputation with readers.

The exception was the first age group, which did somewhat better than the second. I should note that the difficulty that writers in the 35-44 age group had in holding their rank was probably related to genre – this tends to be the age group that writes a lot of Romance books, which don't hold their rank as well as other genres.
Looking at the data by year (not shown here) revealed a similar trend in both years, whereby older writers held their ranks better than younger writers.

Ranks in 2015, by Publisher Type

This graph also shows a very clear trend. Books published by Indie writer/publishers started off 2015 with much higher ranks, and lost ground from that point. Books published by the Big 5 publishers (BPH on the graph) did better, though not great. It was books that were published by the smaller traditional publishers that performed best, in terms of holding their ranks and starting off 2015 at a fairly desirable rank.
 
This again was at least partially a reflection of genre, since Indies are largely found in the Romance genre. However, the extra marketing push of traditional publishing might also be playing a role.
Looking at the data by Top 100 year shows that this effect was very similar for both sets of books. For the Indie books in the 2013 dataset, though, we see that the 2015 average ranks have not seen a clear trend during the year – they have more or less stabilized in the 10,000 to 15,000 range, though with a fair bit of variance.

Ranks in 2015, by Publishing Month

This graph was very interesting, though probably no surprise to anyone with experience in the traditional publishing industry. Clearly, you want to be published in the 11th month, November. Those books started off with very good ranks and held their position. Books published in September also did fairly well. But books published in October were clobbered. That appears to be the no-mans-land of publishing, at least in this dataset.
I imagine that the November effect is related to the most popular and established writers being published in that month, timed carefully to benefit from Christmas gift book buying. It would appear that October books are too far from Christmas to hit that sweet spot. As for September, it seems likely that is a “return-to-school” effect. March also seemed to be a good month, perhaps a “nearing-end-of-term” effect.
This result held true for both the 2013 and 2014 Top 100 lists (graph not shown).
 

Ranks in 2015, by Original Price Range

This graph shows how well books held their rank in 2015, by the price range that they were originally published at. Those ranges were Low = under $4, Moderate $4 to $7.99, high $8 and up.
As you can see, the high priced books started 2015 at a lower rank, on average, and held the lower rank better than the other groups. The moderately priced books were next, though by the end of the year there was little difference between them and the high priced books. Low priced books entered the year with the least desirable rankings, and got worse from there.
This effect was also similar for the two years.


Ranks in 2015, by Ebook vs Pbook Price in 2015

During 2015, traditional publishers began increasing book prices, and notably often priced ebooks higher than pbooks (print books). This is an effort to maintain the print book market, and the print book stores that sell those books. Traditional publishers can thereby use their advantage in getting into the big print book stores as a selling point to both readers and writers.
The graph below shows how that worked out, in terms of holding rankings during 2015. One can see that books where the ebook was priced higher than the pbook lost ground in the latter part of 2015, about when that pricing strategy took hold. So, it definitely hurt those books. The other books, where the ebooks were priced lower than the pbooks held their rankings better. The “NA” books are those that were only available as ebooks.

 
The graph with data split out by Top 100 year shows how this effect was more pronounced in the 2014 set of books, so recency seemed to play a role in this. Note the big jump in ranks for the 2014 books whose ebook was priced higher than the pbook, beginning in Sept 2015.


 

Ranks in 2015, by Genre

This graph was also very interesting. It is clear that Romance books had the shortest shelf life. They tended to start 2015 at the highest ranks and lost rank from there. Next were the “Other” books, a mix of hard to categorize fiction and non-fiction. Thriller/Suspense/Crime started off at about the same point at Literary Fiction, but lost more ground as the year progressed. Interestingly, it was Science Fiction and Fantasy that started the year with the lowest ranks and lost the least ground as the year progressed.
The graph with 2013 and 2014 books broken out separately shows that this trend was very similar across both years, though in the 2014 set, Science Fiction held its rank better than Literary Fiction.


To summarize the ability of books to hold their sales rank over time:
  • Newer books did better (2014 publishing vs 2013).
  • Books that were originally better ranked did better.
  • Books by male writers did better, but the effect was fairly small.
  • Books by writers with university degrees did better, but the effect was small.
  • Books by older writers did better.
  • Books by traditional publishers did better.
  • Books published in November did much better, books published in October did much worse.
  • Books that were originally high priced did better.
  • Pricing a book's ebook version higher than its pbook version seemed to hurt its ranking.
  • Romances had the shortest shelf life, literary fiction and Science Fiction the longest.
In later blogs, I intend to look at how reviews held up, and also do some multivariate analysis, to see what the most important predictors of a long shelf life were.

=========================================================
After all these stats, you might want to read some less quantitative. So, try a road trip through North America in an 18 wheeler, with “On the Road with Bronco Billy”:




 
Or even better, try a spaceship and planet-side road trip (escaping from slavers), with our gal Kati of Terra:











No comments:

Post a Comment