Friday, 27 June 2014

Book Statistics Corner, Part 4 – Sales Trends of Ten of the Most Popular Book Series


In a previous blog, we looked at some statistics on sales for one particular popular book series, Patrick O’Brian’s Aubrey/Maturin series, a historical fiction series about the Royal Navy during the era of the Napoleonic Wars.  Now, we will extend this analysis, adding nine more of the most popular series in recent history.  The particular book series were selected from a wiki article, “List of best-selling books”. 
The author, series title and total sales (copies) are shown below:

Author and Series
Total
J.K. Rowling - Harry Potter
447,000,000
Dan Brown - Robert Langdon
200,000,000
Stephanie Myers – Twilight
120,000,000
Suzanne Collins - Hunger Games
50,000,000
Robert Jordan - Wheel of Time
44,000,000
Stephen King - The Dark Tower
30,000,000
G.R.R. Martin - Game of Thrones
24,000,000
Veronica Roth – Divergent
20,000,000
Douglas Adams - Hitchhikers Guide
16,000,000
Patrick O'Brian - Aubrey/Maturin
4,000,000

Though this only constitutes 10 series, it represents nearly 1 billion copies sold and multiple billions of dollars in profit.  No wonder publishers love a breakout book series.  They are pure gold.
Since actual book sales are notoriously difficult to come across, we will again use proxy statistics from the Goodreads website to get a feel for how well these series did over time, and especially for how well sales held up from Book 1 to the final book of the respective series.  But first, we will examine the Harry Potter series a little more closely, since the wiki page had estimates of copies sold for each book in the series.  We can then see how well that compares to the Goodreads statistics.


Book Num  GR Reviews GR Ratings Sales (Wiki) First Pub GR Rating
1 40,793 2,580,696 107,000,000 1997 4.38
2 16,682 1,177,363 60,000,000 1998 4.28
3 18,824 1,219,695 55,000,000 1999 4.46
4 16,610 1,182,736 55,000,000 1999 4.46
5 15,789 1,136,636 55,000,000 2003 4.40
6 15,599 1,136,725 65,000,000 2005 4.48
7 37,191 1,175,133 50,000,000 2006 4.57
Total 161,488 9,608,984 447,000,000   4.43

As the table and graph indicate, the numbers of copies sold correlates pretty closely to the number of people who rated the books on Goodreads, once we have normalized the data.  We do that by defining the value of the statistic as 100 for Book 1, then comparing the following volumes to that index.   For example, Volume 1 (Philosopher’s Stone) sold 107 million copies, while volume 2 (Chamber of Secrets) sold 60 million copies.  Then 60/107 = .56, so Volume 2 is given the value 56, compared to Volume 1, which is given the value 100.  Similarly for the other books and measures.  The correlation coefficient between copies sold and number of ratings is .964, which is high.  Note that a value of 1.00 would indicate a perfect correlation between two variables.

 
Another way to see this is to divide the number Goodreads ratings into the number of books sold for each volume.  As you can see, the number is consistently close to 2 percent.  That also shows that Goodreads has a pretty wide reach among readers, at least as far as the Harry Potter series is concerned:

Book Num
Title (Harry Potter and the…)
Ratings pct of Sales
1
Philosopher's Stone
2.4%
2
Chamber of Secrets
2.0%
3
Prisoner of Azkaban
2.2%
4
Goblet of Fire
2.2%
5
Order of the Phoenix
2.1%
6
Half-Blood Prince
1.7%
7
Deathly Hallows
2.4%

  As for the number of Goodreads  members who left reviews of the Harry Potter series, that also correlates nicely with the number of copies sold, up until the final volume (the correlation constant between copies sold and number of Goodreads reviews was .963 for the first six volumes) .  However, we can see on the graph that the number of reviews shoots way up for “Deathly Hallows”.  In this data at least, there appears to be a greater willingness to leave a review for the final book of the series.  I suppose a proportionately higher percentage of Goodreads raters want to do a “summing up” review, as well as leave a rating.  Intuitively, that makes sense.  We also noted a similar phenomenon in the Patrick O’Brian series in the earlier blog, though this time the effect is more pronounced.
We might also note that reviews became more positive as the series went on, from a 4.38 and 4.28 for volumes 1 and 2 to a high of 4.48 and 4.57 for volumes 6 and 7.   We can infer from this that, not surprisingly, the readers who continue on with a series tend to be keener on the books than those who don’t.  Again, we also noticed this tendency in the Patrick O’Brian series. 
Now let’s look at the book series in detail, focusing on the number of Goodreaders rankers vs the position of the books within the series.  We will go by series book sales, largest to smallest.
1 – Harry Potter (J.K. Rowling)
We see that the series followed a power law quite closely, with the second and the last books departing somewhat from the best fit curve.  The median book had about half the raters that the first book had.  As noted above, if we divide Goodreads raters into copies sold, we come up with a figure of 2.1%.  This relatively low figure may be a reflection of the fact that a substantial part of the audience did not participate on Goodreads, perhaps because they were too young.


2 – Robert Langdon (Dan Brown)
In this case, we see that the function departs from the power law form by quite a bit.  That’s mostly because the second book of the series, The Da Vince Code was really the big breakout success.  In fact, most people think it is actually the first book in the series, which was actually Angels and Demons.  But the second book caught the public’s fancy more, probably because of the implications for the church. Note that the last two books seem to have lagged the first two quite badly, relative to the first two, at least in this data.  But it is hard to repeat that level of success.    If we divide the number of Goodreads raters into the number of copies sold, we come up with a ratio of 1.4%.  This probably reflects an older, less social-media driven audience for this type of a series (and perhaps a less enthusiastic one). 
 
3 – Twilight (Stephanie Myers)
This four book series followed a relatively flat power law very closely.  After the initial drop-off from Book 1, she seems to have held on to about 40% of the initial book raters, very consistently.   If we divide the number of Goodreads raters into the number of copies sold, we come up with a ratio of 4.2%, a middling-high figure. 

4 –Hunger Games (Suzanne Collins)
This series also conformed closely to the power law, though naturally that’s easier to do with only three data points to fit.  She also did a very good job of holding onto about half of the raters through the final two books of the trilogy.  If we divide the number of Goodreads raters into the number of copies sold, we come up with a ratio of 10.4%.  This would seem to indicate that readers of this series were very enthusiastic about sharing their ratings of the book and were very social media aware.

5 –Wheel of Time (Robert Jordan)
This series conformed fairly well to the power series, but with some bumps along the way.  From reading reviews, it seems that the series lagged somewhat in the latter middle part, then picked up again towards the end.  Nonetheless, it did an excellent job of holding onto raters as the series progressed, given its length.  Nearly half were still engaged for most of the latter half of a long series.   If we divide the number of Goodreads raters into the number of copies sold, we come up with a ratio of 2.2%.   Due to the length of the series, this might also reflect an older, less social-media driven audience.
6 –The Dark Tower (Stephen King)
This one is almost a textbook perfect example of a nice power law.  King did pretty well to hold on to a lot of raters over a long series as well.  If we divide the number of Goodreads raters into the number of copies sold, we come up with a ratio of 2.1%.  Again, due to the length of the series, this might also reflect an older, less social-media driven audience.


7 –Game of Thrones (G.R.R. Martin)
Yes, I know that’s not the real name, but that’s the name of the TV show, so I figure that’s how most people think of it.  Again, it is almost a picture-perfect example of a power law.  The last book has lagged a bit, but he still has two more books to go.  Again, he has done a good job of holding on to nearly half of his original audience, as inferred from Goodreads raters.  If we divide the number of Goodreads raters into the number of copies sold, we come up with a ratio of 7.6%.  Perhaps this is at least partially due to the series having a concurrent TV spinoff, with the consequent buzz and cross promotion.
 

 
8 –Divergent (Veronica Roth)
This is a pretty decent fit, but there are only three points to fit, so that has to be borne in mind.   If we divide the number of Goodreads raters into the number of copies sold, we come up with a ratio of 8.2%.  As with the Twilight series, this would seem to indicate an audience that is very enthusiastic about the books and keen to share their feelings on social media.
 
9 –HitchhikersGuide (Douglas Adams)
Again, this is a nearly perfect fit to a power law.  However, it has quite a steep drop-off, with the first book in the series getting far more ratings than the earlier books.  This seems to be a feature of older books and how they interact with Goodreads.  It may be that it is more a reflection of people’s recall of an older series, rather than being related to underlying sales.  However, if we divide the number of Goodreads raters into the number of copies sold, we come up with a ratio of 6.1%, which is quite high for a series whose author died quite a while back and whose audience probably skews older. 
10 – Aubrey/Maturin (Patrick O’Brian)
Again, this is a very good fit to a power law, especially given the length of the series.  We see a bump at Book 10 (that was the book that shared the title with the movie “The Far Side of the World”).  Book 2 is also a bit low.  If we divide the number of Goodreads raters into the number of copies sold, we come up with a ratio of 2.9%, which is what we might expect for a series whose author died quite a while back and whose audience probably skews older. 
 

Some Conclusions

·         It does appear that the number of Goodreads raters reflects the number of copies sold fairly accurately within a series (i.e. there is a good correlation).   At any rate, that appears to be the case for the Harry Potter series.  For that series, the number of Goodreads raters was about 2% of the copies that were sold.

·         However, a similar calculation went from a high of 10% for the Hunger Games series to a low of 1.4% for the Dan Brown series.  So, there is some considerable variability in different audiences to make their way to Goodreads and to share their opinions, via rating the books.

·         Nearly all of these very popular series fit a power law very well.  The main exception was the Dan Brown series, in which The Da Vinci Code was the exception to the rule.  But that book truly was exceptional, on a lot of grounds.

·         Most of the newer series managed to have 40% to 50% of the Goodreads raters involved by the midpoint of the series, relative to the first book.  The older series had a much greater rate of drop-off, though this may also be related to the fit between the audience of the series and the members of Goodreads.

Next time we will look at whether these findings hold for Amazon reviews, or whether things are different in the Kindle world.
Note that I will put up the raw data that these graphs are based on a little later in the week.
 



No comments:

Post a Comment