In a recent blog, I did some comparisons of my analysis of
Amazon’s Top 100 Kindle eBooks of 2013, versus the data recently released by noted
SF writer Hugh Howie and his (currently unknown) data guru. They analysed a number of snapshot datasets, collected from Amazon’s website via a web
“spider”, which can data mine publicly available internet data extremely
quickly and efficiently. They have now
released datasets of increasing size (the latest included 50,000 books) and
have delved into books outside the genre categories. Those blogs of mine can be found under the
general titles “Amazon Top 100 Kindle Books” in the Dodecahedron Books blog
site. Hugh Howie’s can be found in the
website “Author Earnings”.
One key difference between my analysis of the Amazon Top 100
eBooks of 2013 and the Howie/DataGuru analysis concerned the proportions of traditionally
published books versus Indie books that were in the top 100.
Though my original analysis was surprising enough in its estimate of the
penetration of Indies in the Amazon best-sellers, the Howie/DataGuru data was
even more favourable to Indies. The
tables below recap those results, updating them with Howie/DataGuru’s most
recent findings. Here’s my result for percentage of Indie vs Trad books in the Top 100, along with the new results reported by Hugh Howie (next table).
Amazon Top 100, 2013
|
Total
|
Traditional
|
76%
|
Indie
|
24%
|
Grand Total
|
100%
|
These are Hugh/DataGuru’s numbers from the 50,000 book
sample. I have added his “From Small or
Medium Publisher”, “Big Five Published” and “Amazon Published” together, to be
equivalent to my “Traditional” category.
Similarly, I have added his “Indie Published” with “From Uncategorized
Single-Author Publisher” together to be equivalent to my “Indie” category.
Hugh Howie’s Amazon snapshot, early
2014
|
Total
|
Traditional
|
64%
|
Indie
|
36%
|
Grand Total
|
100%
|
Why are the results different? Why do Indies account for 36% of Hugh Howie’s
Feb 7, 2014 snapshot, but only 24% of the 2013 Amazon Top 100, by my count?
As I mentioned in an earlier blog, one possibility is simply
that a lot changed between the times that the two samples represent. To recap that blog:
“ My Amazon Top 100 analysis was
based on Amazon’s list of their top 100 books of 2013. In a sense then, it could be thought of as
representing the mid-point of the 2013 data, since it is an accumulation of
data collected throughout the year.
Hugh’s analysis was from a snapshot in February 2014…about 8 months
passed between the mid-point of one sample and the time of the second. In the current publishing world, a lot can
change in 8 months, as we know.”
I also noted a second possibility, which I will explore
below. To recap that blog:
“The second possibility is that
the traditionally published books in the top 100 were more consistently present
in that list over a longer time period, whereas any particular Indie book spends
less time in the top 100, to be replaced by a new Indie book… there is more
“churn” in the Indie books than the Trads….because the Trad authors have had
longer careers and therefore have a ready-made fan base that allows [any
particular trad title] to stick on the top of the list for a longer time. Indies
have a more experimental audience, so any particular book doesn’t stay at the
top as long, though as a group they are very successful .”
To explore this possibility, I constructed a model set of
200 books in Excel, which could be split into two groups:
·
“Non-Stickers”, who sold between a lower and upper
limit of copies of books each time period (a randomly generated number, between
10 and 1000 per month).
·
“Stickers”, who sold between a lower and upper
limit of copies of book each time period, but had a slightly higher number for
the lower limit, which could be varied (a randomly generated numbers between a
variable lower limit and 1000 copies per month).
I then generated twelve months of artificial data, showing
the percentage of books that were “Non-Stickers” each month versus the
percentage that were “Stickers”. Note
that the “Stickers” have a slight edge in book sales in the non-control scenarios,
but only a slight edge. There were ten
trials performed under each set of assumptions, to ensure that the random
number generator resulted in a good
representation of the underlying statistical assumptions (i.e. utilizing the
Central Limit Theorem aka the Law of Large Numbers, which simply means that as
you do more trials your results will become closer and closer to the
theoretical assumptions in your model).
The first two graphs show the results of having a dataset
of 64% “Stickers”/36% “Non-Stickers”, with each group randomly selling
somewhere between 10 and 1000 books per month.
I chose the 64/36 ratio, because that is the proportions of Trads to
Indies in Hugh Howie’s dataset of 50,000 Amazon books. This is the control scenario, where Stickers
and non-Stickers sell the same number of books per month, on average. That would be 505 books each, the result of a
uniform random number generator, that picked a number between 10 and 1000 each
time, with each number having the same probability of being chosen.
I then varied the lower limit of books sold for the “Stickers”,
raising it slightly with each model run, while keeping it the same for the
“Non-Stickers”. The results of half a
dozen model runs are shown below, varying the lower limit each time. As you can see, the Howie/DataGuru results
are reproduced when the “Stickers” have a lower bound of about 60 sales per
month. That would imply an average of
about 530 books per month, to the Indies average of 505 books per month. It corresponds to a difference that hardly
shows up in the monthly data, but is very noticeable in the annual data.
The exact numbers for the six model runs are shown below, along with a
graph of the results.
Lower Bound
|
Upper
Bound
|
Annual, Top Percentile
|
Monthly, Top Percentile
|
10
|
1000
|
64%
|
64%
|
50
|
1000
|
68%
|
65%
|
62
|
1000
|
76%
|
67%
|
75
|
1000
|
79%
|
67%
|
100
|
1000
|
81%
|
67%
|
125
|
1000
|
90%
|
67%
|
So, projecting these results into the Trad/Indie results,
it is clear that if the Trad published books
tended to be only a little more consistent in their monthly sales
results, they could quite easily have about 76% of the books in the Amazon Top
100 for the Year 2013, but only about 64% in a daily snapshot early in February
2014.
Obviously, this exercise doesn’t prove that this is what
happened, but it does show that it is quite plausible. Furthermore, if the “stickiness factor” isn’t
related to publisher category, but rather to length of time that a writer has
been in the public eye, then this Trad/Indie difference will wither away, as
Indies have more time to establish themselves in the marketplace.
No comments:
Post a Comment