It is natural for a writer or publisher (Indy or Trad) to
wonder about the lag time between buying books, reading books, and then reviewing those
books. I don’t mean official book
reviews, as in the big newspapers, but rather the reviews that readers leave on
Amazon and other sites that allow reviewing.
Having some awareness of this could help in such matters as scheduling
the release of a book series – too short a gap and readers might ignore the new
release (“I haven’t even read book 1 yet, no sense thinking about book 2”), too
long a gap and readers might forget about it (“that series is familiar, but I
can’t even remember the plot of book 1”).
So, what is the time lag between buying and reading/reviewing? Well, obviously only Amazon’s data scientists
know for sure, but here are a few small particles of evidence, based on
following some books on Amazon over a period of time.
As we know, Amazon shows a book’s sales rank and its number
of reviews on their website (among many other things). Out of curiosity, I followed a couple of
books over the last few months, one of which was the classic “To Kill a
Mockingbird” by Harper Lee. The reason
that I followed this book was the recent news that Harper Lee is to release a
new novel in the summer of 2015 (entitled “Go Set a Watchman”), so I was
curious to see how that news affected sales of “To Kill a Mockingbird”. I followed both the ranking and number of
reviews for “Mockingbird” since early February, giving me a useful time series.
I also followed a recent number one best seller, “The Girl
on the Train”. At this point it is still
too early to make much out of that data – the book just hasn’t fallen very far
in the rankings, so one can’t really explore a rankings-reviews interaction
yet. Mockingbird, on the other hand, has
shown a wide range of sales ranks during that period, so it is feasible to look
for a ranking-reviews effect, and therefore estimate the time between purchase
and reviewing.
Naturally I acknowledge that following a few books can’t
shed light on the full range of books out there. But, examining “ideal-type cases” can
sometimes yield some useful results, especially if one holds the results
lightly and provisionally.
1 – Ranks by Date
The first graph shows the sales rank of Mockingbird, from
Feb 5, 2015 to April 9, 2015.
Unfortunately, I missed the first few days of its big run, so the sales
rank starts at #4. Google Trends shows a
big spike for “Go Set a Watchman” on Feb 3, so I should have caught most of the
run. As you can see, the rank of the
book went from #4 to #168 over approximately 2 months, meaning it dropped about
20 ranks per week over that time. So,
fresh publicity for a classic can drive it up to the top of the rankings, but
it won’t stay there for very long.
The other noteworthy feature is the tendency for the data to
get noisier as time progresses. At
first, the best fit line and the data are very close, but they begin to diverge
at about the half way point. This reflects the fact that ranking becomes more
sensitive to change in sales as the book becomes less popular. For books in the top 50, a small day to day
change in sales won’t affect the rank very much, but higher up the curve, that
same change in sales will have a much more noticeable effect.
2 – Reviews by
Date
The next graphs show the total number of reviews by date, as
well as the number of reviews on each date.
As you can see, the total reviews climbed steadily, in
nearly a straight line. However, a
closer examination reveals a bit jaggedness to the line, with some jumps
followed by smooth growth. The day to
day reviews graphs shows this more clearly, with certain days where the number
of reviews is substantially higher than the overall trend, which is usually
about 10 to 20 reviews per day. One
should bear in mind that these might be “catch-up days”, where Amazon releases
a tranche of reviews all at the same time.
3 – Relationship
between Ranks and Reviews
The purpose of this part of the analysis is to see whether
the number of reviews corresponds to sales rank. People have to buy books before they can
review them, so one would expect that the number of reviews will track the
number of sales, assuming that a fairly constant fraction of purchasers are
inclined to make the effort to write a review.
So, for example, if one person in one hundred reviews a book, then sales
should be equal to reviews multiplied by 100.
Naturally, this won’t be a mathematical law, but a statistical one – we
expect it to work over a reasonable time span, not every day.
Furthermore, it takes time to read a book after purchasing,
and then it takes more time to post a review.
This is the time lag between purchasing and reviewing. To explore that time lag, I calculated the
correlation between sales rank and the number of reviews. That number shows how much one number is
related to another – e.g. when sales rank goes down, we would expect reviews to
go up (since there are more readers out there to become reviewers). I calculated this correlation coefficient
between sales rank and reviews at a number of lags, to try to see where the
correlation becomes highest. This would
give an indication of the time lag between purchasing and reviewing, at least
the lag that is most common for readers.
As you can see, the graph reaches a low point at about 3 to
4 weeks. Since our correlation
coefficient is negative (when sales rank goes down (closer to 1), reviews go
up), the best correlation is when the graph is at its lowest point.
Now, as we know, the relationship between sales rank and
sales is a power function. It’s not
linear – number 1 books sell a lot more than number 2 books and so forth. To account for this, we do a mathematical
transformation, via logarithms. This has
the effect of turning a power law (which usually looks something like a
hyperbola) into a straight line. We then
plot the correlation coefficient of the transformed data, which is shown
below. This has the effect of making the
function more symmetric and moving the minimum point a bit to the right, now at
about 30 days or a month.
I cherry picked the 31 day lag, which is the lowest point on
the graph, to produce a plot of rank versus 31 day lag. That is shown below, as is the log transform
graph of the same data. As you can see
(hopefully J),
there is a relationship, though a fairly weak one. So, if you squint at the data long enough,
you can be persuaded that the lag between purchasing and reviewing is about 30
days, at least for this classic work of literature.
For my final piece of evidence, I have calculated something
called the coefficient of determination, which is just the square of the
correlation coefficient. It shows the
percentage of the change in one variable that can be accounted for by the
change in the other variable. It also
has the nice feature of always being positive.
In this graph, you see a maximum at about 3 weeks, another
at about 4-5 weeks and possibly a third at about 7-8 weeks. It’s hard to say how seriously to take that
fine structure, but the one-month effect seems fairly convincing to me.
All that being said, the relationship between purchasing and
reviewing is obviously complex. You need
to purchasers to create reviewers, so the causality has to work in that direction,
at least partially. But, reviews also
stimulate purchases (they are “social proof”), so the relationship feeds back
In the opposite direction. Then, to
confuse matters more, there is a lot of auto-correlation in the data - that just means if Day 10 has high sales,
Day 11 will probably have high sales as well.
Mathematically, that throws some more terms into a complete theoretical
description of the relationship. No
doubt, the fully worked out relationship would be a very complicated beast.
It should also be noted that the lag time between purchasing
and reviewing is bound to differ for different types of books. A popular thriller best-seller will probably
have a much shorter lag time than a ponderous work of literary fiction (it took
me ten years to get around to reading my copy of James Joyce’s Ulysses, but
only a few days for the latest Grisham legal thriller). At any rate, those are matters for another
day, and another blog.
No comments:
Post a Comment