Posted by Dr. Margaret Zeegers on 2009/05/19
Folk may be interested in a massive public project being undertaken by the National Library of Australia. It has initiated its Australian Newspaper Digitisation Project, one of the biggest such underatkings anywhere in the world, which will make 40 MILLION historic newspaper articles avaialble by the end of 2011. Anybody who has ever sat through those hours of reading through microfilm for their research will understand just what this would mean for researchers. Microfilm and that it entails will be a thing of the past as all those old articles will soon be able to be searched using the technology that allows this. But to get all that stuff onto computers is not easy, for every single one of those 40 million items needs to be manipulated into a form that can be loaded for computer access using optical character recognition (OCR) software. It is not just a matter of scanning the old microfilmed material in...each item needs to be massaged into a form that the software can deal with. This is a massive undertaking. Each scanned page needs to be cleaned up, with all old ink flecks and various other marks removed, and the copies made perfectly straight on the page, so that OCR can do its thing with the material it is fed. But that's not the news. The person hours for 40 million items and the cost of this is staggering, much more than the library can manage. What is the news is one most remarkable feature of the project: that a commmunity of about 3,000 volunteers has come on board to deal with those articles, spending up to 50 hours per week at their computers to work on each item as it presents and preparing it...doing subject tagging, and some even do some annotating of the items before them. One such volunteer has already corrected more than 160,000 lines of text for the project. It is an amzing human response, a response from a community of scholars and readers to a technology problem. The combination of their efforts with the possibilities that the technology suggests is one of those uniquely 21st century phenomena that will itself become part of the historical fabric of this project.
No comments have been made
Add a Comment
This weblog implements rel="nofollow" in comment links,
thus links in comments will not be indexed by Google, MSN, Yahoo! etc.