Circulation Modeling

Partly inspired by the reception of our work at the LITA Forum (see previous post) and partly by a desire to clean up a bunch of ugly code left-over from a year of hacking by 4 different programmers, I started over with the entire RCR  data pipeline.

The results are a set of R notebooks with HTML output that you can see on the Circulation Modeling page.

Lessons learned:

  • Our previous model of how books were announced and promoted was incorrect. Two of the books, The Warmth of Other Suns and The Book Thief, were announced in the Spring, even though the events were scheduled in the fall. This explains some data patterns that we had seen before, but not properly interpreted. See visualizations below.
  • Some of the principal components have a distinct non-linear relationship to the data. Not that surprising, really, but it hadn’t come out before.
  • Visitor counts / Circulation / Book holdings are all highly correlated with each other, probably because OBOC books are distributed based on a branch’s traffic. So, this is one reason our prior model, which separated these, was difficult to fit.

New density plot, correctly aligned:


Old density plot: