Over at if:book, Ray Cha relays and recommends an upcoming chapter from Clifford Lynch, about moving beyond "reader-centric views of scholarly literature." It has much in common with Franco Moretti's work on literary history, and is worth reading for that reason alone.

But I'm also on the lookout for ways to articulate just what it is we're trying to do with CCC Online, and Lynch's piece fits the bill. Namely...

We would also see an explosion in services that provided access to this literature in new and creative ways. Such services would also incorporate specialized vocabulary databases, gazetteers, factual databases, ontologies, and other auxiliary tools to enhance indexing and retrieval. They would rapidly transcend access to address navigation and analysis. One path here leads towards more-customized rehosting of scholarly literatures and underlying evidence into new usage and analysis environments attuned to the specific scholarly practices of various disciplines.

We would also see a move beyond federation and indexing to actual text mining and analysis, to the extraction of hypotheses and correlations that would help to drive ongoing scholarly inquiry. Indeed, the literature would be embedded in a computational context that reorganized and re-evaluated the existing body of knowledge as new literature became available.

That excerpt separates nicely into what I think we're already doing at the site, although not perhaps to the extent that Lynch imagines it, and the second half, which in many ways is the prize that we've got our long-term eyes on. If you don't think we're watching projects like this and this, well, you don't know us very well. Heh.

I'm less worried about the potential objections that Cha raises at the end of his post--"Purists will undoubtedly frown upon the use of computation that cannot be replicated by humans in scholarly research"--than I am about getting to the point where such objections can be raised. In other words, I believe that such work, if it can generate compelling results, will override knee-jerk complaints. I think it's also going to be necessary, in our own field at least, to be very careful to qualify the value of this work appropriately. Not that that's always been enough, especially when it comes to quasi-statistical work, which tends to run afoul of the old "me humanities. me hate math." goofiness.

Two other points. First is one that I'm guessing some people will not appreciate, and that's that, to an extent this work is fairly easily decoupled from the "open access" that appears to drive Lynch's piece. That is, the value of data mining is offered as a consequence of open access, and while that is true at a very large scale, I think it possible to do quite a bit in this area without it, honestly. We're able to work around providing the metadata we wanted without having to open up the journal's content, even if we might have preferred it otherwise. And I think that some pretty entrenched attitudes will need to change for what Lynch describes to be more than a thought experiment. Not that they shouldn't change, but I'm not sure how far they actually need to, for this at least.

Second point is that we use a fairly small, fairly simple suite of tools to do what we're doing now. We had to cobble stuff together, and we've done so fairly successfully, but it shouldn't go unmentioned that a couple of good programmers would go a long way towards making this a lot more doable. Personally, I have enough ability to tweak, and I'm pretty good at making MT modules do what I want them to, but we spent a fair bit of time just cobbling. I'm conscious of how much more efficient our system could be.

And yeah, it's only one journal that we're working on, and all things considered, we really have to pace things more slowly than I'd like. But it's also our flagship journal, and if nothing else, we tackled the biggest job first, in designing and testing it on CCC. There's going to be some real value in what we're doing, even if it doesn't hit the scale that Lynch imagines. And we're a pretty solid model for how to accomplish these goals on both a small scale and approaching it from the bottom up.

