I've posted a couple of times about the National Library of Australia's newspaper digitsation project and now Gideon Haigh in "The Age" has published a piece about the same project. His essay delves deeper into the differences between it and the Queensland based Austlit national bibliographic enterprise, and introduces us to some of the top text correctors.
As I've mentioned previously, the digitisation process aims to provide digital text versions of old Australian newspapers via OCR (Optical Character Recognition). The problem is that the standard OCR process produces a large number of mistakes, so the National Library decided to go down the "open-source route", championed so well by Wikipedia, and allow users to fix the text online.In the international library community, digitisation of old newspapers so that they can be keyword searched is a supercool area. But the NLA's system would be taking this to Brangelina-like coolness by, politely and sotto voce, soliciting members of the public to participate in ironing out the wrinkles in the digital text. What happened next is pretty damn amazing. "What's that movie?" says Cathy Pilgrim, the program's manager. "Field of Dreams? Well, we built it and they did come."
So they - the Australian public - did. Once word began spreading - among genealogists, amateur historians and online library users - the trickle became a flood. Right now, a community of about 3000 far-flung souls are turning on their computers for up to 50 hours a week and tidying text prepared by optical character recognition (OCR) software, as well as subject tagging and even annotating it.
The most prolific of these contributors is Julie Hempenstall who, as of today, has corrected 185,952 lines. I reckon I've done a fair bit and I've only corrected some 3,293 lines. It's a vast difference, and I have no idea how she's done it.