Friday, 10 April 2015

How Long is the Time Lag Between Buying a Book and Reviewing it?



It is natural for a writer or publisher (Indy or Trad) to wonder about the lag time between buying books,  reading books, and then reviewing those books.  I don’t mean official book reviews, as in the big newspapers, but rather the reviews that readers leave on Amazon and other sites that allow reviewing.  Having some awareness of this could help in such matters as scheduling the release of a book series – too short a gap and readers might ignore the new release (“I haven’t even read book 1 yet, no sense thinking about book 2”), too long a gap and readers might forget about it (“that series is familiar, but I can’t even remember the plot of book 1”).

So, what is the time lag between buying and reading/reviewing?  Well, obviously only Amazon’s data scientists know for sure, but here are a few small particles of evidence, based on following some books on Amazon over a period of time.

As we know, Amazon shows a book’s sales rank and its number of reviews on their website (among many other things).  Out of curiosity, I followed a couple of books over the last few months, one of which was the classic “To Kill a Mockingbird” by Harper Lee.  The reason that I followed this book was the recent news that Harper Lee is to release a new novel in the summer of 2015 (entitled “Go Set a Watchman”), so I was curious to see how that news affected sales of “To Kill a Mockingbird”.  I followed both the ranking and number of reviews for “Mockingbird” since early February, giving me a useful time series. 

I also followed a recent number one best seller, “The Girl on the Train”.  At this point it is still too early to make much out of that data – the book just hasn’t fallen very far in the rankings, so one can’t really explore a rankings-reviews interaction yet.  Mockingbird, on the other hand, has shown a wide range of sales ranks during that period, so it is feasible to look for a ranking-reviews effect, and therefore estimate the time between purchase and reviewing.

Naturally I acknowledge that following a few books can’t shed light on the full range of books out there.  But, examining “ideal-type cases” can sometimes yield some useful results, especially if one holds the results lightly and provisionally.



1 – Ranks by Date

The first graph shows the sales rank of Mockingbird, from Feb 5, 2015 to April 9, 2015.  Unfortunately, I missed the first few days of its big run, so the sales rank starts at #4.  Google Trends shows a big spike for “Go Set a Watchman” on Feb 3, so I should have caught most of the run.  As you can see, the rank of the book went from #4 to #168 over approximately 2 months, meaning it dropped about 20 ranks per week over that time.  So, fresh publicity for a classic can drive it up to the top of the rankings, but it won’t stay there for very long.

The other noteworthy feature is the tendency for the data to get noisier as time progresses.  At first, the best fit line and the data are very close, but they begin to diverge at about the half way point. This reflects the fact that ranking becomes more sensitive to change in sales as the book becomes less popular.  For books in the top 50, a small day to day change in sales won’t affect the rank very much, but higher up the curve, that same change in sales will have a much more noticeable effect.

 

2 – Reviews by Date

The next graphs show the total number of reviews by date, as well as the number of reviews on each date.






As you can see, the total reviews climbed steadily, in nearly a straight line.  However, a closer examination reveals a bit jaggedness to the line, with some jumps followed by smooth growth.  The day to day reviews graphs shows this more clearly, with certain days where the number of reviews is substantially higher than the overall trend, which is usually about 10 to 20 reviews per day.  One should bear in mind that these might be “catch-up days”, where Amazon releases a tranche of reviews all at the same time.



3 – Relationship between Ranks and Reviews

The purpose of this part of the analysis is to see whether the number of reviews corresponds to sales rank.  People have to buy books before they can review them, so one would expect that the number of reviews will track the number of sales, assuming that a fairly constant fraction of purchasers are inclined to make the effort to write a review.  So, for example, if one person in one hundred reviews a book, then sales should be equal to reviews multiplied by 100.  Naturally, this won’t be a mathematical law, but a statistical one – we expect it to work over a reasonable time span, not every day.

Furthermore, it takes time to read a book after purchasing, and then it takes more time to post a review.  This is the time lag between purchasing and reviewing.  To explore that time lag, I calculated the correlation between sales rank and the number of reviews.  That number shows how much one number is related to another – e.g. when sales rank goes down, we would expect reviews to go up (since there are more readers out there to become reviewers).  I calculated this correlation coefficient between sales rank and reviews at a number of lags, to try to see where the correlation becomes highest.  This would give an indication of the time lag between purchasing and reviewing, at least the lag that is most common for readers.


 
As you can see, the graph reaches a low point at about 3 to 4 weeks.  Since our correlation coefficient is negative (when sales rank goes down (closer to 1), reviews go up), the best correlation is when the graph is at its lowest point.

Now, as we know, the relationship between sales rank and sales is a power function.   It’s not linear – number 1 books sell a lot more than number 2 books and so forth.  To account for this, we do a mathematical transformation, via logarithms.  This has the effect of turning a power law (which usually looks something like a hyperbola) into a straight line.  We then plot the correlation coefficient of the transformed data, which is shown below.  This has the effect of making the function more symmetric and moving the minimum point a bit to the right, now at about 30 days or a month.




I cherry picked the 31 day lag, which is the lowest point on the graph, to produce a plot of rank versus 31 day lag.  That is shown below, as is the log transform graph of the same data.  As you can see (hopefully J), there is a relationship, though a fairly weak one.  So, if you squint at the data long enough, you can be persuaded that the lag between purchasing and reviewing is about 30 days, at least for this classic work of literature.
 



For my final piece of evidence, I have calculated something called the coefficient of determination, which is just the square of the correlation coefficient.  It shows the percentage of the change in one variable that can be accounted for by the change in the other variable.  It also has the nice feature of always being positive.



In this graph, you see a maximum at about 3 weeks, another at about 4-5 weeks and possibly a third at about 7-8 weeks.  It’s hard to say how seriously to take that fine structure, but the one-month effect seems fairly convincing to me.

All that being said, the relationship between purchasing and reviewing is obviously complex.  You need to purchasers to create reviewers, so the causality has to work in that direction, at least partially.  But, reviews also stimulate purchases (they are “social proof”), so the relationship feeds back In the opposite direction.  Then, to confuse matters more, there is a lot of auto-correlation in the data  - that just means if Day 10 has high sales, Day 11 will probably have high sales as well.  Mathematically, that throws some more terms into a complete theoretical description of the relationship.  No doubt, the fully worked out relationship would be a very complicated beast.

It should also be noted that the lag time between purchasing and reviewing is bound to differ for different types of books.  A popular thriller best-seller will probably have a much shorter lag time than a ponderous work of literary fiction (it took me ten years to get around to reading my copy of James Joyce’s Ulysses, but only a few days for the latest Grisham legal thriller).  At any rate, those are matters for another day, and another blog.
 

No comments:

Post a Comment