Tuesday, 1 March 2016

One Year with Harper Lee’s To Kill a Mockingbird and Go Set a Watchman, Part 2



One Year with Harper Lee’s To Kill a Mockingbird and Go Set a Watchman, Part 2

As most people must have heard by now, Harper Lee died a little while ago (Feb 19, 2016).  Earlier, I published a blog following her Amazon sales rank and imputed sales over the past year (Feb 2015 to Feb 2016), noting how sales corresponded to some key events over the year.  This companion blog looks at how the number of Amazon reviews corresponded to those key events.  It also performs some analysis on the relationship between the Sales Rank and the number of reviews, for these two books, To Kill a Mockingbird and Go Set a Watchman.


As a reminder of that blog, and for context, the graph below shows how the sales rank of To Kill a Mockingbird (TKAM) and Go Set a Watchman (GSAW) varied over the time span from early February 2015 to late February 2016, a period of a little over a year.





1 – Number of Amazon Reviews and Key Events over the Year

The graph below shows the total number of reviews recorded on the Amazon site, by date for the period from early Feb 2015 to early Feb 2016.  The same key events are outlined, as was done for the sales rank graph.





The first key event was the announcement that a new book by Harper Lee was in the works, in early February 2015.  As you can see, the slope of the curve for TKAM reviews increased (the blue line gets steeper), when that announcement was made, indicating that interest was piqued, as reflected by people’s propensity to leave a review.


The next key event was the pre-release of GSAW in late May 2015, followed by publication in early July 2015.  The pre-release of GSAW didn’t do much, if anything, for the review numbers of TKAM.  However, they did pick up with the release of the new book (again, the slope of the blue line increases).  Naturally, once GSAW was released, the number of reviews shot up very quickly, along with sales, of course.  The rapid increase in reviews would seem to indicate that there was a lot of latent interest in the new book.


Reviews for GSAW began trailing off at about the beginning of October, as indicated by the diminishing slope of the red line.  An inflection point happened sometime in October, with the line bending back down.  The slope of the blue TKAM line also diminished about this time, though the effect is rather slight.


The next major event happened in December, when GSAW won its category in the Goodreads Book of the Year (2015) rankings.  That, and Christmas, seems to have turned the line back upwards, with an inflection point some time in January.  The pace of reviews for TKAM didn’t appear to change much, if at all. 


Then, of course, we come to Ms. Lee’s death. A funny thing happens almost immediately - Amazon takes away about 1700 reviews, overnight, on Feb 21, 2016.  That’s why the blue line takes a sudden plunge, a discontinuity.  


One wonders just what happened here.  Many Amazon authors have had the experience of having reviews taken away by Amazon, especially we Indies with modest sales.  The explanation for this is generally that the reviewer had some kind of family or commercial relationship with the writer or the publisher.  Presumably the same thing must be at work here.  Since Harper Lee probably didn’t have 1700 “bogus” reviews from her family and friends, it is natural to assume that this must relate to the publisher.  Had the publisher salted in all these reviews?  Or is some other explanation at work.  I suppose that we will never know.


Anyway, after that the TKAM line resumes, and the rate of reviews seems to increase modestly.  GSAW, on the other hand, doesn’t seem to be much affected by the writer’s death.


In the last blog, I noted that death did seem to be a good career move, in terms of sales.  But the effect was not long lasting.  Both books are now in the 300-400 rank range.  It probably wont’ be long before they reach their baseline level, somewhere in the 800 to 1000 rank range.


The graph below gives a day by day count of the number of reviews for each book, rather than a running total, along with the key events during the year.  It can also be correlated with the comments in the text above.  This format makes some things clearer, but others more obscure (hidden by the day to day noise of the time series).  By the way, I cut off the data before the big TKAM recalculation of reviews, as it distracted from the other aspects of the graph, given the scale of that one day change.






2 – Sales Rank versus Number of Reviews

As a data analyst, I am always interested in exploring relationships among variables.  In this case, I will look at just how sales rank and number of reviews were related, for these two books during the time period in question.


The first graph shows the average sales rank during a given ten day period for TKAM, versus the number of reviews that the book received during that same ten day period.  As you can see, there does seem to be a definite relationship - a lower sales rank (more sales) corresponds to a higher review rate (more reviews).  This is as one would expect.  You need sales to get reviews, but reviews can also trigger sales, due to the “social proof” that people tend to assume from the mere presence of reviews.




I used Excel’s trend-line option to test a few different functional forms, to the relationship.  The best fit was given by an exponential function.  Basically, that implies that the slope of the relationship is highest when the sales rank is lower, and weakens with increasing rank.

I should note that removing the outlier at approximately x=100, y=45 only improves the model R-square a bit, increasing it from 0.742 to 0.767.  An R-square of 1.00 implies a perfect positive fit, while an R-square of 0.00 implies no relationship, and an R-square of -1.00 implies a perfect negative fit.  So, this is a pretty decent fit.




We can now go on to look at whether the fit gets better or worse, if we compare sales rank at period T with sales rank at period T+1 (using ten day period averages).  In other words, we are testing how strongly sales predict later reviews.  When we do that we see that the fit gets worse, with the R-square dropping from .742 to .511, for the exponential functional form.







I then tried the other alternative - testing sales rank at period T against number of reviews in period T-1.  In other words, that tests how strongly reviews predict sales. In this case, the R-square was .567, which is greater than the previous case, but less than the case where sales rank and number of reviews are drawn from the same time period.


So, it would seem that the relationship between sales rank and reviews is:

·         strongest when the two are close together in time,

·         next strongest when reviews lead sales rank

·         then weakest when sales rank lead reviews.


Naturally it would be best to do a multiple regression to pin this down further, but as a first level qualitative result it is still useful.


The results were substantially similar, when looking at sales rank and reviews for GSAW, though a logarithmic function proved to have the best fit:

·         strongest when the two are close together in time (R-square=.731),

·         next strongest when reviews lead sales rank (R-square=.650)

·         then weakest when sales rank lead reviews (R-square=.686).











So, to sum up:


·          The key events in the year (announcement of new book, publishing of new book, award to new book, author’s death) tended to correspond in increases in sales and reviews for both books, “To Kill a Mockingbird” and “Go Set a Watchman”.



·         There were some unusual re-jiggings of reviews by Amazon, especially for “To Kill a Mockingbird”, where about 1700 reviews were pulled, shortly after Harper Lee’s death.



·         Sales rank and reviews were related, in a non-linear fashion.  The best fit relationship was given when both variables were within the same ten day time period.




================================================================

Finally, of course, I should remind you that you can buy one of our Dodecahedron Books titles.  Since Harper Lee wrote about the social and racial complexities of the American experience, I will offer up “On the Road with Bronco Billy”, a travelogue and cultural study of late 20th century America, as seen from the cab of a big rig.  It also includes some observations on race and class in America, though not with so fine a literary touch as Harper Lee’s books.  J

On the Road with Bronco Billy - A Trucking Journal

Kindle Edition




What follows is an account of a ten day journey through western North America during a working trip, delivering lumber from Edmonton Alberta to Dallas Texas, and returning with oilfield equipment. The writer had the opportunity to accompany a friend who is a professional truck driver, which he eagerly accepted. He works as a statistician for the University of Alberta, and is therefore is generally confined to desk, chair, and computer. The chance to see the world from the cab of a truck, and be immersed in the truck driving culture was intriguing. In early May 1997 they hit the road.
Some time has passed since this journal was written and many things have changed since the late 1990’s. That renders the journey as not just a geographical one, but also a historical account, which I think only increases its interest.

We were fortunate to have an eventful trip - a mechanical breakdown, a near miss from a tornado, and a large-scale flood were among these events. But even without these turns of fate, the drama of the landscape, the close-up view of the trucking lifestyle, and the opportunity to observe the cultural habits of a wide swath of western North America would have been sufficient to fill up an interesting journal.

The travelogue is about 20,000 words, about 60 to 90 minutes of reading, at typical reading speeds.





No comments:

Post a Comment