Big Data and Tracking Consumers in Publishing


        This short paper explores some of the benefits and drawbacks of big data in the publishing industry as it is becoming a prevalent feature in analyzing and predicting trends in every field such as retail, healthcare, and crime prevention (Kobo, 2014, para. 1). In order to look at the role of “big data” in publishing, a general understanding of the term is required. An article by Forbes Contributor Gil Press discusses the origin and popularization of the term “big data”, and notes that the definition is problematic and still needs development. Although Press and other experts “predict a relatively short life span for this unfortunate term”, Press gathers various definitions in order to get a well-rounded idea of big data, as it an amalgamation of many characteristics. The definitions that appealed most to my understanding are:
        (#2) “an all-encompassing term for any collection of data sets so large and complex that it becomes difficult to process using on-hand data management tools or traditional data processing applications.”
        (#6) The new tools helping us find relevant data and analyze its implications.
        (#7) The convergence of enterprise and consumer IT.
        (Press, 2014, p.1).
An article by Kobo also highlights the point that “analyzing these data sets is quickly becoming the basis for competition, productivity, and innovation” (Kobo, 2014, para. 1) as it becomes “the ability of society to harness information in novel ways to produce useful insights or goods and services of significant value” (Press, 2014, p. 1, para. 10). In the publishing industry, the digitization of reading is paving the way for Big Data to become the driving force in evolving the relationships between commercial parties, publishers, authors, readers, and technology.
        Digitization allows reader behaviour to be tracked and reported. Different parties (e.g. commercial parties, publishers, and authors) are able to see what kind of books readers have purchased, “what books were left unopened, which were read to the very last word and how quickly” (Kobo, 2014, para. 4). Statistics drawn from the data can help to draw connections between the completion rates of books and sales numbers to determine whether or not publishers should invest in an author’s book. As Kobo says, “opportunity lies among those books that have high completion rates yet suffer low sales” because this shows high reader engagement and a rectifiable lack in marketing (Kobo, 2014, p. 3, para. 4). On a publisher’s level, big data helps to highlight trends and reader engagement; thereby giving them a picture of which authors to invest more resources into. This can also benefit authors in that they can get a better sense of their reader demographic and compare results between their different works. However, while authors used to worry more about whether or not their books were purchased, new payment models based on big data determine how an author earns money. Unlike in the matter of print books, these models are based on new parameters such as pages read for self-published ebooks.
        A related example can be found in Tracking reader habits using tech: Good or bad for readers and writers? by Troy Lambert in which various ebook platforms that collect(ed) data are examined. One of the platforms discussed is Amazon’s Kindle Unlimited subscription service in which authors are compensated by detection of how many pages are read. This could potentially be problematic due to various causes behind reader data that could affect the accuracy of page reads. The method of having a single algorithmic protocol for all genres, whether or not they are read linearly or non-linearly, is also a problem, as noted by VanDyke in the comments (VanDyke, 2016). Other issues according to this article are people who cheat or work around the system which can result in having one’s books pulled or being banned from operating on Amazon (Lambert, 2016, para. 1). While this article did not go into depth about the drawbacks of the payment model, I did find another recent article about the associated publishing platform, KDP (Kindle Direct Publishing) Select, that confirmed my suspicions.
        Experienced KDP Select author David VanDyke reports that he, along with fellow authors using the platform are losing page reads due to software glitches (VanDyke, 2016). Vandyke also conducted a few experiments to confirm the odd statistics, and concluded that there was some sort of issue with the way reader data was reported. This can lead to large discrepancies in compensation. The article and its comments below also illustrate a lack of transparency between Amazon and its authors in trying to diagnose the precise issue, or even in Amazon admitting that there is a problem in the first place. The ongoing discussion in the comment section suggests that PageFlip, a newly incorporated feature, may have affected the way page reads are recorded, alongside reader behaviour such as returning to the book cover after reading or going back to a favourite passage.
        Readers may also have other reasons as to why they stop reading, and making assumptions based on data that doesn’t necessarily account for these causes could be inaccurate. While this issue also stems from the payment model itself, the model can be seen as a result of our attitude towards big data and can be an example of how algorithmic defects become difficult to troubleshoot, especially the more complex a system is. Trying to work towards a solution when there is a lack of transparency because a company wants to appear unaffected makes the situation worse.
When we use data to gauge the success of sales— like in the case of how Nielsen BookScan numbers are really limited to point of sales based on ISBNs and don’t account for a large portion of sales like ebook sales (Michel, 2016, para. 10-12) — we must remain aware of the context of said data. It is important since we rely on the accuracy, handling, and interpretation of data in order to make decisions that affect the publishing industry on all levels.
        Another feature of big data is “the shift (for individuals) from consuming data to creating data” (Press, 2014, p. 2). It seems the related issues here tend to be ethical ones, such as privacy. When we read ebooks or sign up for Goodreads accounts, it usually involves us giving companies permission to track our habits and preferences. However, it seems that we are okay with providing businesses with our attention and data if it helps to enhance or build products and services catered to us.
        Our participation makes it possible for companies to suggest pregnancy yoga books to a consumer if her data indicates that she is pregnant and likes yoga. My question is whether or not it ever crosses the line. In 5 Reasons to Liberate Your Ebook From Their DRM Prison, K.T. Bradford recounts five ways in which customers can lose their ebooks due to companies monitoring reader behaviour (Bradford, 2013). In one of these cases, reported data tipped off Amazon into confiscating all the purchased ebooks on a woman’s device and denying her access to her account because the DRM detected that her she was, along with a related account, in violation of terms of services (Bradford, 2013, para. 6). What they didn’t tell her was why, or give her any warning. The subsequent conversations with Amazon were also non-transparent and unfruitful (Bradford, 2013, para. 6). The mishandling of data in monitoring user behaviour contributed to the dissatisfaction of another big data service and became an ethical issue.
        In conclusion, privacy, transparency, inaccuracy, and shortcomings in technology are common topics that have negatively affected big data’s transition into publishing. However, big data ultimately contributes to providing better services for authors and consumers. It enables higher levels of interactivity between commercial parties, publishers, authors, and readers. It will also provide the means for new innovations and efficiencies, and work towards mapping an industry that is often unpredictable.


Bradford, K. T. (2013, August 21). 5 Reasons to liberate your ebooks from their DRM prison. Digital Trends. Retrieved from

Kobo. (2014, October 9). Publishing in the era of big data. Kobo Newsroom. Retrieved

Lambert, T. (2016, September 24). Tracking reader habits using tech: Good or bad for readers and writers? Teleread. Retrieved from

Michel, L. (2016, June 30). Everything you wanted to know about book sales (But were afraid to ask). Electric Lit. Retrieved from

Press, G. (2014, September 3). 12 Big data definitions: What’s yours? Forbes. Retrieved

VanDyke, D. (2016, October 8). Amazon KDP Select authors are losing page reads, apparently due to software glitches. Teleread. Retrieved from



First blog post

This is your very first post. Click the Edit link to modify or delete it, or start a new post. If you like, use this post to tell readers why you started this blog and what you plan to do with it.


PUB 401: Short Paper 1

In “Words with Friends”: Socially Networked Reading on Goodreads, Lisa Nakamura brings a few specific things into focus that stand out to me the most: The social processes behind digitized reading culture, the remediation of the catalog, how the millennial generation is reflected in the evolving consumption of books, and the commodification of reading that results from these other factors.

She points out that when it comes to the future of reading, many conversations surround the growing obsolescence of print books and the industry’s focus on materialistic aspects of digitization (Nakamura, p. 238). We have an incessant tendency to keep up with the upgrade of technological devices, but would also benefit in studying how the social networking affects reading culture.

We can study the way these social processes are created by using Goodreads as a case study because it is a platform in which the experience of interacting with books and other readers align with the way much content is popularly consumed. The interface is feed-based with social networking capabilities, and offers ways to stay connected with the outside world (e.g. sharing via social media and interactivity with other platforms). I think this allows us to validate our actions on a higher social level and helps us to position or measure ourselves against others by being able to bring our reading preferences and behaviours onto a public platform.

The author discusses the way in which Goodreads allows readers to share and provide visual evidence for the books they are reading – the bookshelves are presented to friends in a way that is “bibliocentric” and “egocentric”, as a throwback to the way print books on a physical shelf display one’s personality and social status (Nakamura, p. 240). The capability to track one’s taste extends further than this, because it also allows users to easily share their current literary endeavors as well as their wants in a compact and convenient way. The organization of interactions on the site mimic the social context surrounding print book collections where people can gauge and relate to friends and strangers by sharing recommendations, sharing said wishlists, generating literary dialogue, and so on. While the socialization aspects of reading have always existed, it is now easier and faster to connect with others through common preferences and feedback on posts. I also enjoy how Nakamura references and discusses the remediation of the catalog on platforms like Goodreads (Nakamura, p. 242). It is a popular form for which we organize and interact with content, and in turn, other people.

The article also compares the open-access, forum-like environment of Goodreads to the walled-up environment of academic publishing, and how conversations and experiences generated in places like Goodreads can be just as culturally valuable and intellectually stimulating as the latter (Nakamura, p. 242).  Many users like myself look less to formal literary criticism in judging books, and instead prefer the opinion of the common masses. The community is informal, but people still care about having provocative discussions. This also relates to how Nakamura talks about the changes in author-reader relations (Nakamura, p. 238). The processes of reading and writing become arguably more collaborative, with a higher potential for interactivity when it comes to the digitization of reading culture.

While my encounters with Goodreads have mostly been limited to book info and review results on Google in the past, I have signed up for it to better understand its capabilities. The site seems a lot more appealing than the last time I visited it, for the very reasons highlighted in Lisa Nakamura’s article. I have become accustomed to consuming content in a certain way, reflected in the way the Goodreads interface is constructed. The site allows users to “follow” a wide variety of genres, and organizes recommendations on feeds similar to other social media platforms. As pointed out in the article, the level of interaction not only with other readers but with the books themselves creates convenience and is much more connected to a greater community. There seems to be more actions offered for interacting with a book – the ability to listen to and preview the book, the ability to purchase it through direct links to sellers like Amazon, and view a “Readers Also Enjoyed” section of recommendations in the sidebar – just to name a few.  The familiar protocols from online shopping platforms (e.g. ratings, reviews, Q&A, comments, etc.) as well as social networking platforms (e.g. profile, inbox, status updates, community feeds, etc.) are both integrated into the Goodreads interface. Goodreads offers familiar tools of content consumption to provide an outlet for readers to “perform their identities in a public and networked forum” (Nakamura, p. 240).

Nakamura also describes the way Goodreads is a data-rich experience for both the user and the company. The user is both consuming and creating information. She draws a connection to the idea of controlled consumption and how our reading experience is commodified (Nakamura, p. 241). The data resulting from user interaction and behaviour on the site is given to Goodreads partners. Goodreads also controls consumption in the way they direct users to bringing business to certain companies, and perhaps has a hand in which books are more publicized through recommendation. On one hand they must make money in order to keep their service going, and on the other hand, people are being subtly manipulated. There is also another ethical issue here, as user data collection may be viewed as an infringement of privacy. However, users seem to be willing to trade off these issues for the service that the site offers and the exchange of personal information with others. I also believe the consumeristic features need not be so villainized, as those who are savvy with internet culture are also capable of recognizing and maneuvering around marketing/advertising strategies themselves.


Nakamura, Lisa. 2013. “Words with Friends”: Socially Networked Reading on Goodreads. PMLA 128 (1). 238-243.