Wednesday, 30 December 2015

Amazon Top 100 (2013 and 2014) Retention Analysis

As regular readers of this blog know, I have kept data on the Amazon Top 100 list of books, for the years 2013 and 2014 and have written a number of blogs in which I analyzed that data. It will soon be time to update that database with the most popular new books of 2015. But before doing that, I thought it would be interesting to see just how the Top 100 of 2013 and 2014 did during the year 2015. How did their rankings change? How did their review numbers change? Which books held their rankings the best, by such important factors as genre, early ranking, writer sex, writer age, price and so on?
There are several ways to look at such data. The first and most obvious is via descriptive statistical analysis – i.e. just looking at measures such as average rank by category. Beyond that, more advanced techniques, such as logistic regression, can be used to determine the independent effect of each of these categories. That can help to predict the types of books that best hold their rankings and reviews (and therefore, sales) over time.

This blog will focus on the descriptive statistics, by looking at the average rank of these books, during each month in 2015. The graph above is an example. For the 2013 and 2014 Top 100 lists, the average rank by month in 2015 is given by the height of the bars. Higher average ranks are, of course, not desirable. In these graphs, lower numbers are better, just like in golf.
The books in the combined 2013 and 2014 Amazon Top 100 lists fell from an average rank of about 4000 in early 2015 to about 9000 by mid-December. One could fit a functional form to the data, but eye it appears to be quasi-linear, perhaps a gently sloped power law, over the 12 month period.
It is hard to be sure what that represents in terms of reduced sales or income, however, as the relationship between rank and sales is not linear. Furthermore, due to differences in pricing, the relationship between units sold and money earned is not straightforward either.
We can try to estimate the drop in sales via reviews. It turns out that this set of books “earned” about 3.3 reviews per day per book in the early part of 2015 versus about 1.2 per day in the last period. If we assume that a relatively constant percentage of books are reviewed by purchasers, that would indicate that sales of these books declined to about a bit over one-third of their early 2015 total, by the end of 2015. Note that in their “Top 100” year, these books averaged about 9.4 reviews per book per day. So, by the end of 2015, they were probably selling about one-eighth as many books as they were during their initial publishing year. This is shown in the graph below, along with data for each of the two Top 100 years, and best fit exponential decay curves.

At any rate, this particular blog is more interested in comparing how well different categories of books did over time, rather than estimating sales figures (though I might try that in a later blog).

Overall Ranks in 2015, by “Top 100” Year

As you can see, the more recent books from 2014 held their ranks during 2015 better than the books from the 2013 Top 100 list. In both cases, though, the average rank of the books drifted upwards, throughout the year. The 2013 books started 2015 with an average rank of about 6000 at the end of January, and finished at about 12000 in mid-December. The 2014 books started the year at about rank 2000 on average, and ended at about 6000. Remember that these books were in the Top 100 lists in their respective years. However, the Top 100 lists were constructed relative to books published that year, and the ranks in 2015 were against all years, so the declines look worse than they really were.

Ranks in 2015, by Rank Quartile in Top 100 Year

The second set of graphs shows how book ranks changed in 2015, based on the books initial ranking in the Top 100 year. The first category, labelled 1, represents books that were in the first quartile of ranks in their initial year (i.e. the top 25%). The group of books labelled 2 was in the second quartile, and so on.
It is clear that books that were in the top quartile in their publishing year managed to hold their rank the best, and books in the bottom quartile did the worst, in that regard. That's not too surprising. There wasn't much difference in the two middle quartiles, though. So, it appears that the readers don't distinguish between books in the middle ranks all that much.
Looking at the data by year, the pattern repeats itself, or at least approximately so. Books that were in Quartile 1 during their publication year held their ranks the best, while those in Quartile 4 did the worst. Quartiles 2 and 3 reversed between the 2013 and 2014 list, however.

Ranks in 2015, by Sex of Writer

The third group of graphs gives book ranks during 2015, by the gender of the writer. It appears that gender didn't make much difference at the start of the year – both female and male writers were averaging about rank 4000. But as the year went on, books by females lost ground in the rankings more quickly than books by males, so that there was a substantial difference by year end.

As we will see later on, much of that is probably a reflection of the genre that the sexes tend to write in. Romances lost their rankings more quickly than other genres, and since women tend to write in the romance genre, their rankings suffered accordingly as 2015 progressed.
In this case, breaking out the data by year did reveal some differences. In the 2013 Top 100 books, there was little difference between males and females, in the ranks by 2015. However, the 2014 Top 100 books indicated an advantage for male writers. With this amount of data, we can't tell whether the male-female difference is real, but short lived, or whether it is a quirk of the datasets.

Ranks in 2015, by Educational Status of Writer

The graph of Rank by Writer Education is a bit counter-intuitive. At the start of the year, books by writers with graduate degrees held their rankings the best, followed by those with some university, then high school, then Bachelor's degree and Unknown. By the end of the year, it was writers with “some university” who held their rankings the best, though.

If we collapse these categories into “No degree or unknown status” versus “Has a degree”, things change somewhat. I collapsed those categories in that fashion, on the assumption that writers who weren't keen on disclosing their educational status, probably didn't have university degrees. But that could be wrong.
Using this re-categorization, the degree holders did somewhat better than the non-degree holders, though the difference was not all that great. Basically, they did better in the middle months of the year, but about the same at the beginning and end of the year. It seems fair to say that there is no clear trend evident. Breaking out the data by year (not shown) also shows no clear trend – in 2013 non-degreed writers seemed to do slightly better, while in 2014 the reverse was true.

Looking at the subject that the writer studied and/or worked in (besides writing), we see that the traditional subjects of English/History/Journalism and Law were most successful at holding their ranks through 2015.

Ranks in 2015, by Age Range of Writer

In this case, a clear trend was evident, in favour of older, more established writers. Generally speaking, as the writer was older, the books held their rankings better. This was probably a reflection of the older writers' longer tenure, and thus more established reputation with readers.

The exception was the first age group, which did somewhat better than the second. I should note that the difficulty that writers in the 35-44 age group had in holding their rank was probably related to genre – this tends to be the age group that writes a lot of Romance books, which don't hold their rank as well as other genres.
Looking at the data by year (not shown here) revealed a similar trend in both years, whereby older writers held their ranks better than younger writers.

Ranks in 2015, by Publisher Type

This graph also shows a very clear trend. Books published by Indie writer/publishers started off 2015 with much higher ranks, and lost ground from that point. Books published by the Big 5 publishers (BPH on the graph) did better, though not great. It was books that were published by the smaller traditional publishers that performed best, in terms of holding their ranks and starting off 2015 at a fairly desirable rank.
This again was at least partially a reflection of genre, since Indies are largely found in the Romance genre. However, the extra marketing push of traditional publishing might also be playing a role.
Looking at the data by Top 100 year shows that this effect was very similar for both sets of books. For the Indie books in the 2013 dataset, though, we see that the 2015 average ranks have not seen a clear trend during the year – they have more or less stabilized in the 10,000 to 15,000 range, though with a fair bit of variance.

Ranks in 2015, by Publishing Month

This graph was very interesting, though probably no surprise to anyone with experience in the traditional publishing industry. Clearly, you want to be published in the 11th month, November. Those books started off with very good ranks and held their position. Books published in September also did fairly well. But books published in October were clobbered. That appears to be the no-mans-land of publishing, at least in this dataset.
I imagine that the November effect is related to the most popular and established writers being published in that month, timed carefully to benefit from Christmas gift book buying. It would appear that October books are too far from Christmas to hit that sweet spot. As for September, it seems likely that is a “return-to-school” effect. March also seemed to be a good month, perhaps a “nearing-end-of-term” effect.
This result held true for both the 2013 and 2014 Top 100 lists (graph not shown).

Ranks in 2015, by Original Price Range

This graph shows how well books held their rank in 2015, by the price range that they were originally published at. Those ranges were Low = under $4, Moderate $4 to $7.99, high $8 and up.
As you can see, the high priced books started 2015 at a lower rank, on average, and held the lower rank better than the other groups. The moderately priced books were next, though by the end of the year there was little difference between them and the high priced books. Low priced books entered the year with the least desirable rankings, and got worse from there.
This effect was also similar for the two years.

Ranks in 2015, by Ebook vs Pbook Price in 2015

During 2015, traditional publishers began increasing book prices, and notably often priced ebooks higher than pbooks (print books). This is an effort to maintain the print book market, and the print book stores that sell those books. Traditional publishers can thereby use their advantage in getting into the big print book stores as a selling point to both readers and writers.
The graph below shows how that worked out, in terms of holding rankings during 2015. One can see that books where the ebook was priced higher than the pbook lost ground in the latter part of 2015, about when that pricing strategy took hold. So, it definitely hurt those books. The other books, where the ebooks were priced lower than the pbooks held their rankings better. The “NA” books are those that were only available as ebooks.

The graph with data split out by Top 100 year shows how this effect was more pronounced in the 2014 set of books, so recency seemed to play a role in this. Note the big jump in ranks for the 2014 books whose ebook was priced higher than the pbook, beginning in Sept 2015.


Ranks in 2015, by Genre

This graph was also very interesting. It is clear that Romance books had the shortest shelf life. They tended to start 2015 at the highest ranks and lost rank from there. Next were the “Other” books, a mix of hard to categorize fiction and non-fiction. Thriller/Suspense/Crime started off at about the same point at Literary Fiction, but lost more ground as the year progressed. Interestingly, it was Science Fiction and Fantasy that started the year with the lowest ranks and lost the least ground as the year progressed.
The graph with 2013 and 2014 books broken out separately shows that this trend was very similar across both years, though in the 2014 set, Science Fiction held its rank better than Literary Fiction.

To summarize the ability of books to hold their sales rank over time:
  • Newer books did better (2014 publishing vs 2013).
  • Books that were originally better ranked did better.
  • Books by male writers did better, but the effect was fairly small.
  • Books by writers with university degrees did better, but the effect was small.
  • Books by older writers did better.
  • Books by traditional publishers did better.
  • Books published in November did much better, books published in October did much worse.
  • Books that were originally high priced did better.
  • Pricing a book's ebook version higher than its pbook version seemed to hurt its ranking.
  • Romances had the shortest shelf life, literary fiction and Science Fiction the longest.
In later blogs, I intend to look at how reviews held up, and also do some multivariate analysis, to see what the most important predictors of a long shelf life were.

After all these stats, you might want to read some less quantitative. So, try a road trip through North America in an 18 wheeler, with “On the Road with Bronco Billy”:

Or even better, try a spaceship and planet-side road trip (escaping from slavers), with our gal Kati of Terra:

Sunday, 27 December 2015

Girl on the Train - The Sales Bump from Goodreads #1 Readers’ Choice Award

Girl on the Train - The Sales Bump from Goodreads #1 Readers’ Choice Award

Earlier this month (Dec 1, 2015), Goodreads (the social media site for book discussions, reviews, etc) announced the 2015 Goodreads Choice Awards. A couple of books that were chosen by Goodreads members as #1 in their categories happened to be books that I have been following closely, for much of this year. Those are Harper Lee’s “Go Set a Watchman” and Paula Hawkins’ “Girl on the Train”.
In a couple of earlier blogs, I looked at how this affected Harper Lee’s recently discovered (and published) novel “Go Set a Watchman”. In this blog, I will look at what has happened to the other book in question, The Girl on the Train. I have been following the ups and downs of this book on Amazon for most of the year. It’s been interesting to watch. At times it has started to fall off of its high perch, only to get new momentum and move back up the charts. It looks like the Goodreads award, and the approach of Christmas sales has helped it out, once again.

One additional feature of this analysis is the sub-analysis of sales and reviews in the various national markets in the Amazon English speaking world - the U.S., the U.K., Australia and Canada.
As you can see in the graph, there appears to be a fairly decent rankings improvement in U.S. sales, at about the same time as the Goodreads award (Dec 1), which is indicated by the different coloured point. The book was losing rank, dropping below 50th spot, until about that point, then it picked up again, to the 20 to 30 rank.

A similar effect seems to take place in the U.K. sales, though perhaps not so pronounced. However, the book had never fallen much past the mid-20’s there, so there was less room for improvement.

Sales in Australia were down, by mid-November, then picked up after that, by a considerable amount. It is hard to pin that on the Goodreads award, as the improvement began before December 1.

Like the Australian market, the Canadian market underwent some gyrations well before the Goodreads Award in early December. One could argue that there was some pickup after the award, but the case is not exactly cut and dried.

The evidence from reviews is marginal, at best. In the overall graph, we can see a substantial bump in the number of reviews early in December. However, as the graph shows, these spikes seem to be a regular feature of the Amazon review system, so it is debatable whether we can attribute that spike in reviews to the Goodreads award.

Looking at the U.S. and U.K. results separately, we see much the same pattern. There is a jump in reviews at the beginning of December, but these jumps are happen a lot over the course of the year, so one can’t make too much of this particular increase.

When it comes to the two smaller markets, the situation is even more problematic. There simply aren’t enough reviews to even notice an increase from the Goodreads award. 

The presence of data for all four markets does allow us to make a few other observations of general interest. One relates to the size of the markets. In terms of reviews, the figures as of Dec 22, 2015 for Girl on the Train were:
  • U.S. = 28,204 reviews. The U.S. population is about 322.5 million.
  • U.K. = 8,950 reviews. The U.K. population is about 64.8 million.
  • Australia = 652 reviews. Australia’s population is about 24 million.
  • Canada = 418 reviews. Canada’s population is about 36 million.
The U.K. had 32% as many reviews as the U.S., though the population is only 20% of the U.S.. The fact that the author is British and the book is set in London probably explains this over-representation of reviews (and sales, given that sales should correlate with reviews). Canada had about 2% as many reviews as the U.S., though it’s population is about 11% the size. This may be due to the fairly significant proportion of Canadian readers who are with Kobo or buy through the U.S. Amazon store. Australia had about 2% as many reviews as the U.S., though it’s population is about 7% the size. This may reflect a slower adoption of e-reading in Australia, but that’s just a guess.

I also had a look at the correlation coefficients in rank, between the countries. That’s just a way of seeing how much two variables are related – a value of 1 implies a perfect positive correlation, 0 is no correlation, and -1 is perfect negative correlation. So, a high positive number means that when the book’s rank went up in one country, it also went up in the other country. The results were:
  • U.S. and U.K. correlation was about 0.83.
  • U.S. and Canadian correlation was about 0.82.
  • U.S. and Australian correlation was about 0.83.
So, as the book’s rank changed in the U.S. store, it tended to change at a similar rate and in a similar direction in the other stores. In other words, “Girl on a Train” was popular throughout the English speaking world, and its ebbs and flows in popularity were similar throughout that language world.
The other correlations (e.g. between Canada and the U.K.) tended to be in the .60 to .65 range. A path analysis type solution might suggest that these correlations were mediated by the U.S. (CanU.S. correlation=.8 times U.S.  U.K. correlation=.8 resulting in CanU.K. correlation=.64), but that’s just a conjecture.

After all these stats, you might want to read something less quantitative, but still featuring interesting non-fiction. So, try a road trip through North America in an 18 wheeler, with “On the Road with Bronco Billy”.  You will ave a chance to pick up some trucking jargon, some trucking workplace culture, some geography, and a lot of penetrating  sociological observations :) :

Tuesday, 22 December 2015

Get in the Christmas Spirit with a Christmas-themed Short Story – Miranda and the Not-Chrismas Elf

“A Christmas Carol” and “The Gift of the Magi” are great, but sometimes the Christmas spirit needs to be refreshed by something more up-to-date.  Christmas short stories by Helena Puumala are just the ticket to recapture Christmas magic, for children and adults.

“Miranda and the Not-Christmas Elf” will be free on Amazon from Dec 24 to Dec 28 (otherwise, it is 99 cents).  It’s a short story about a little girl, who is troubled by bullies and family problems between her parents, who needs a little Christmas magic to help her overcome her fears.

Product Description

The little pre-school girl, Miranda, is feeling unsafe because of bullies in the neighbourhood and family troubles between her mother and father. Can her friend, young elementary grade age Nathan, use his special powers to call on the North Pole for some Christmas Eve magic, to help her out?

The story is a heartwarming Christmas tale, suitable for children and adults, which will bring a little Christmas magic to us all. It is about 9000 words, or around an hour or so, at typical reading speeds.