In a couple of previous blogs, we looked at some statistics on sales
for popular book series of the recent past.
As a memory refresher, those book
series are repeated below:
Author and Series

Total

J.K. Rowling  Harry Potter

447,000,000

Dan Brown  Robert Langdon

200,000,000

Stephanie Myers – Twilight

120,000,000

Suzanne Collins  Hunger Games

50,000,000

Robert Jordan  Wheel of Time

44,000,000

Stephen King  The Dark Tower

30,000,000

G.R.R. Martin  Game of Thrones

24,000,000

Veronica Roth – Divergent

20,000,000

Douglas Adams  Hitchhikers Guide

16,000,000

Patrick O'Brian  Aubrey/Maturin

4,000,000

As noted previously, these 10 series represent nearly 1
billion copies sold.
In the last blog, we focused primarily on how the number of
Goodreads raters varied by the position of book within the series. In other words, we tracked the number of
people who rated book 1, book 2, book 3, etc. (normalized to book 1 = 100) to
see if there was a pattern to the data.
And in fact we did discover a very prominent trend, which was true for
most of the series that we looked at; for the most part, the number of books rated by the Goodreads
community declined from book to book, with the dropoff best modelled as our
old friend, the power law.
Naturally we didn’t do this merely to analyse the behaviour
of Goodreads raters, as interesting and worthy as that exercise might be. We were, in fact, assuming that the number of
Goodreads raters were a reasonably constant fraction of the number of people
who had actually purchased and read the books in question. So, we assumed that the pattern of Goodreads
raters was likely to be quite similar to the pattern of purchasers, a sort of
attenuated mirror image. We examined the
Harry Potter series in detail (a series for which reasonably accurate book by
book sales are known) to verify that the number of Goodreads raters does, in
fact, reflect the sales of the individual books.
The people on Goodreads are an interesting sample of avid
readers, but their willingness to rate a book is a reflection of the popularity
of a book over a long period. You don’t
have to have bought a book very recently to be able to add it to your Goodreads
“have read” list and to rate that book.
That’s interesting data, but what about the more recent developments in
the book trade? A lot has happened in
the past 5 to 10 years, particularly the rise online book sales, both physical
books and ebooks. This includes the vast
new supply of books that have been added to the world’s “book population” by
small publishers and selfpublishers (generally speaking we can use the term
Indie for these) as well as the
increased production of the big publishers, frontlist and backlist.
In this blog we will look at Amazon data, to try to get a
handle on the newer publishing world.
For the most part, Amazon reviewers have purchased their books fairly
recently, and usually from Amazon. As
time goes on, those books are being purchased more and more in the kindle/ebook
format, which is a very different experience from buying physical books. Ebooks are instantaneous, always available,
and relatively cheap (see Dodecahedron blog “Imagine that you had a magic wineglass”
for some further exploration of those ideas).
So, how have these new facts changed the pattern of book buying within
these popular series listed earlier in the blog?
Let’s look at the book series in detail, focusing on the
number of Amazon raters vs the position of the book within the series, and
compare that with our previous results using Goodreads raters. Again, we will go by series book sales, from largest
to smallest. In the graphs that follow,
Amazon data will be in blue (lines and diamond markers), while Goodreads data
will be in red (lines and square markers).
The best fit equations of these graphs are also shown, highlighted in
the appropriate colors. Also included is
the Rsquare, which is a way of measuring how well the data actually fits the
equation. An Rsquare near 1.00 implies
an excellent fit, while an Rsquare near 0 implies a very poor fit. Scores in between those extremes are less
clearcut.
In the graphs, the Amazon data are modelled by quadratic
functions. Though they are very
imperfect fits, the quadratic model seemed to capture one very important
feature of the Amazon rater data; in many cases, the earlier and later books in
the series got the most reviews, while the midpoint books were less likely to
be reviewed. A quadratic function
incorporates that well, since the nature of a quadratic is to have one
inflection point(a maximum or minimum).
The powerlaw and straight line function fit Rsquares are also shown,
to help indicate which functional form best fits the observed data.
As before, the Goodreads data are modelled by power law
functions. You can refer to the earlier
blogs on power functions to refresh your memory on those. The main feature of a power law that is
important here, however, is that the series decays, with each book being a (more
or less) constant fraction of the one before it.
Note that in both
cases, these are standard Excel options for modelling data.
1 – Harry Potter
(J.K. Rowling)
Testing the Amazon data for three different functional forms
(power law, quadratic and straight line), it turns out that the Rsquare is
marginally better for the quadratic than the others.
Power law Rsquare = 
0.060

Quadratic Rsquare =

0.130

Straight line Rsquare =

0.003

2 – Robert Langdon
(Dan Brown)
In this case, we see that the quadratic function fit the
Amazon data quite well, though that was probably mainly due to the influence of
the last data point, which refers to the most recent book of the series. Evidently that book was much more “popular”
on Amazon than on Goodreads, at least in as much as people were inclined to do
reviews.
Again, when testing the three functional forms for the
Amazon data, we find the quadratic has the best fit, somewhat better than a
straightline fit (though that one wasn’t bad, either).
Power law Rsquare =

0.490

Quadratic Rsquare =

0.872

Straight line Rsquare =

0.671

3 – Twilight
(Stephanie Myers)
For the Amazon data, the quadratic form is far superior to
the others:
Power law Rsquare = 
0.190

Quadratic Rsquare = 
0.997

Straight line Rsquare = 
0.000

4 –Hunger Games
(Suzanne Collins)
As with the Twilight series, the Hunger Games series demonstrates the Amazon versus Goodreads responses very well. However, with only three data points we have to be careful not to overinterpret our results. It is trivially true that 3 points can be made to fit a quadratic perfectly (as long as they aren’t on a straight line), much as a 2 points can be made to fit a straight line perfectly. Nonetheless, it is notable that the Amazon data fits the general picture that we have seen in the other cases, with the first and last books drawing more interest that the second.
For the Amazon data, the quadratic form is far superior to the others, though the “perfect fit” to three points is no surprise, as noted above:
As with the Twilight series, the Hunger Games series demonstrates the Amazon versus Goodreads responses very well. However, with only three data points we have to be careful not to overinterpret our results. It is trivially true that 3 points can be made to fit a quadratic perfectly (as long as they aren’t on a straight line), much as a 2 points can be made to fit a straight line perfectly. Nonetheless, it is notable that the Amazon data fits the general picture that we have seen in the other cases, with the first and last books drawing more interest that the second.
For the Amazon data, the quadratic form is far superior to the others, though the “perfect fit” to three points is no surprise, as noted above:
Power law Rsquare =

0.850

Quadratic Rsquare =

1.000

Straight line Rsquare =

0.535

5 –Wheel of Time
(Robert Jordan)
As with the Harry Potter series, the Amazon data for this long series was not particularly well modelled by the quadratic function. The first and last books of the series were high points in terms of reviews, but some of the middle books also did very well in that regard. Curiously, those were not books that were notable in the Goodreads data, which was modelled by a power series fairly well.
Nonetheless, for the Amazon data, the quadratic form is superior to the others. Basically, though, this series was not well represented by any simple functional form.
As with the Harry Potter series, the Amazon data for this long series was not particularly well modelled by the quadratic function. The first and last books of the series were high points in terms of reviews, but some of the middle books also did very well in that regard. Curiously, those were not books that were notable in the Goodreads data, which was modelled by a power series fairly well.
Nonetheless, for the Amazon data, the quadratic form is superior to the others. Basically, though, this series was not well represented by any simple functional form.
Power law Rsquare =

0.000

Quadratic Rsquare =

0.240

Straight line Rsquare =

0.099

6 –The Dark Tower
(Stephen King)
As with Harry Potter and Wheel of Time, the Amazon data for
this series is not particularly well modelled by a quadratic, but it does
follow the general trend of the first and last books being reviewed more often
than the middle books. Again, however,
one of the middle books was an “outlier”. On the other hand, the Goodreads data
was an excellent fit to a power law.
Once more, though, for the Amazon data, the quadratic form
is superior to the others.
Power law Rsquare =

0.010

Quadratic Rsquare =

0.348

Straight line Rsquare =

0.074

7 –Game of Thrones
(G.R.R. Martin)
This series seems to follow the same general trend as the Twilight and Hunger Games series, which is to say that the first and last books had much higher numbers of reviews, relative to the middle books. So, the fit to the quadratic form is very high (though there are only 5 points). As for the Goodreads data, it is very well fit by the power law.
Once more, for the Amazon data, the quadratic form is far
superior to the others. This series seems to follow the same general trend as the Twilight and Hunger Games series, which is to say that the first and last books had much higher numbers of reviews, relative to the middle books. So, the fit to the quadratic form is very high (though there are only 5 points). As for the Goodreads data, it is very well fit by the power law.
Power law Rsquare =

0.090

Quadratic Rsquare =

0.941

Straight line Rsquare =

0.040

8 –Divergent
(Veronica Roth)
This series follows a similar pattern to Twilight, Hunger
Games and Game of Thrones. In all of
those cases, the first and last books of the series drew more Amazon interest
than the middle book(s). However, as
with Hunger Games, we must note that there were only three books in the series,
so a quadratic will naturally be a perfect fit.
As for the Goodreads data, as noted earlier, it had a very good fit to a
power law.
As noted below, for the Amazon data, the quadratic fit is superior (trivially so, with
three data points).
Power law Rsquare =

0.650

Quadratic Rsquare =

1.000

Straight line Rsquare =

0.335

9 –Hitchhikers’
Guide (Douglas Adams)
For the Amazon data, the Hitchhikers series is a good fit to
a quadratic form. However, that’s mainly
due to the influence of the first point.
It actually appears to conform nearly as closely to a power law fit as
the Goodreads data did.
The fits of the various functional forms to the Amazon data
make that explicit, below.
Power law Rsquare =

0.899

Quadratic Rsquare =

0.808

Straight line Rsquare =

0.502

10 –
Aubrey/Maturin (Patrick O’Brian)
The Aubrey/Maturin series conforms somewhat to the quadratic
form in the Amazon data – the first and last books drew the greatest amount of
interest. But, as with Hitchhikers, the
Amazon data actually conformed very well to a power law, as did the Goodreads
data.
Once more, comparing the Rsquares of the various functional
fits brings that out, as shown below.
Power law Rsquare =

0.795

Quadratic Rsquare =

0.572

Straight line Rsquare =

0.259

Some Conclusions
·
It appears that the pattern in the number of
Amazon reviewers per book is quite different from the trend in the number of
Goodreads raters per book.
·
Amazon reviewers seem to be inclined to review
the first and last books of a series more than the middle books, resulting in a
quadratic fit to the data. As noted
earlier, Goodreads raters tend to drop off continuously as the series proceeds,
resulting in a power law.
·
The Amazon quadratic function phenomenon is much
more evident in more recent book series, namely:
o
Robert Langdon (Dan Brown)
o
Twilight (Stephanie Myer)
o
Hunger Games (Suzanne Collins)
o
Game of Thrones (G.R.R. Martin)
o
Divergent (Veronica Roth)
·
In some of the older series, the Amazon and
Goodreads trends in reviews/raters were quite similar (best modelled by a power
law), namely:
o
Hitchhikers Guide (Douglas Adams)
o
Aubrey/Maturin (Patrick O’Brian)
·
The other three series were less clearcut, but the
Amazon data still tended to be modelled somewhat better by the quadratic:
o
Harry Potter (J.K. Rowling
o
Wheel of Time (Robert Jordan)
o
Dark Tower (Stephen King)
·
We can’t be sure whether the tendency for the
Amazon reviewers to be more focussed on “first and last” is a reflection of
underlying purchasing numbers or a reviewing preference, though it’s probably a
bit of both.
·
Some people may be willing to skip some of the
middle books in a series. They may get
hooked on the first book, not have time to read some of the middle books (and
thus skip them), but want to find out how the story arc went by purchasing and
reading the final book.
·
On the other hand, people are more likely to
want to weigh in with their opinions at the outset of a series or at the
conclusion of the series than they are in the middle of the series. There is a common human reaction to want to
jump on the bandwagon at the start and let the world know about. People also want to make their “summing up”
judgements known. So this could account
for the prominence of first and last book reviews predominating, even if the
middle books were purchased and read.
The one thing that does seem pretty clear is that the Amazon ebook world
has produced quite a different reviewing (and presumably purchasing) pattern
than the old world of physical books and bookstores. In the old world, scarcity was the rule  if
you didn’t jump into a series at the start, you might never find the early
books of the series (short of haunting used bookstores). Now, if a series interests you, you can jump
in at any time and read the whole series.
In
our own small way, we have seen this at Dodecahedron Books in the buying
patterns for Kati of Terra series. When
Kati 2 came out, it sparked as many sales of Kati 1 over the following year as
it did of Kati 2. Kati 3 seems to be
having a somewhat similar effect. In
this case, at least, it seems that people were seeing Kati 2 and saying “that
looks interesting, but I might as well start with the first book of the
series”. Since ebooks are always
available (no windowing as with physical bookstores) this is a perfectly
logical response. It will be
interesting to see how these patterns evolve over time.
No comments:
Post a Comment