Some Initial Findings from the Predictive Model

Among conclusions presented in our research article [DHQ 14.2 2020], we noted:

Our project has identified several challenges that will be of interest to scholars in the digital humanities, particularly those working at the intersection of text analysis, geography, and public data sets. Our original goal to capture and predict mass literary events has largely been met. As “capture,” we have created an archive of nearly a decade of multiple media forms (and metrics for them) associated with a cultural program that has engaged many thousands of people across a major American city for years. In pursuit of “prediction”, we have produced a novel predictive model integrating demographics, book content, and branch data to produce branch-level predictions of book circulation. With this tool, we plan to generate branch-level circulation predictions for book titles (OBOC program or not) beginning in summer 2020. This will be reported in a future paper.

While we expect this predictive model to be of use to CPL staff, it is important to note that it has not been our intent (nor theirs) to optimize against such a model in choosing books. One does not need data-intensive modeling to identify books that circulate highly; for example, a good bet at any time would be current best-sellers by authors with name recognition who have been highly promoted by publishers. But maximizing circulation alone has never been the primary goal of “One Book, One Chicago”. We expect that library staff will continue to make OBOC text and theme choices as they always have, through an in-depth process that considers an entire constellation of cultural and socio-political factors. However, they will now be able to do so with the help of an additional data source: for any given book, they will also be able to calculate “what-if” scenarios for all CPL branches and consider different levels and kinds of promotional activity.

One of the key findings of our predictive model is that prior circulation makes the largest predictive contribution for the circulation of OBOC selected works. This is a measure, however, that will be unavailable for new books and little-known or first-time authors. It is possible, as we have shown, to do similar types of predictions without prior circulation data, but with significantly lower accuracy. This is not a surprising finding, but our ability to quantify the effect will enable library staff to reason about the tradeoffs inherent in choosing works already circulating well in the local library-system as opposed to “importing” choices from outside the system in the name of expanding readers’ horizons.

With a new season approaching, and data from the last two soon available, we can continue to improve the model.

Leave a Reply

Your email address will not be published.