The February 2008 issue of PC Pro reports on the British Library’s plan to digitize 100,000 books published in the nineteenth century – 25,000,000 pages.
The digitizing partner chosen is Microsoft, with the actual work being done by a German firm, Content Conversion Specialists; the library ‘retains the rights to all the data being collected’ but Microsoft has the right to host the collection on its Live Search Books site, for a duration not revealed by the library. The team of five people scans 50,000 pages a day to complete the project in two years. Books smaller than 28 x 35.5 cm can be automatically scanned, and so 20-30% must be scanned manually. All books are visually checked for loose or torn pages, then placed under a lectern with two Canon 16.6 megapixel lenses; the operator turns the first few pages then the machine uses suction to turn the remainder, at one page every two or three seconds. The operator at the station sees all the pages as thumbnails on a PC, to fix errors. Fold-outs that can’t be scanned by the machine are around 1% of the total, and they’re scanned separately and integrated later by software. The project has a 12 CPU blade server with 40TB of storage.
Resolution is 300dpi for both text and images, which the library says is ideal for reading online but also suitable for print on demand if required in the future. Output formats are JPEG 2000, PDF and plain text; OCR is used to capture plain text which is ‘specially processed’ to deal with antique orthography and typography. A secondary check takes place in Romania, and the library batch-samples files delivered by CSS to ISO 2859-1.
Scanning takes place underground with no natural daylight, to ensure colour consistency, and the scanning room is air-conditioned: ‘Just one degree in temperature changes the light tuning and requires colour adjustments.’
To deal with copyright issues the library is using ‘a database of authors’; those in copyright (less than 1%) won’t be digitized, and orphan works (about 40%) will be but with a ‘notice and takedown’ procedure on the website.
Note: the article uses ‘scan’ throughout but it’s clear from the diagram that a static photograph of each page is used.


3 Comments
Dear Sir,
Greetings to you. Hope you are just fine over there.
We are professionally running embroidery digitizing company located in Mumbai, India. We would be interested in doing digitizing work for your company.
We have the state-of-art facility for embroidery digitizing, embroidering. We have the excellent team of committed digitizers with latest software for best quality digitizing. We commit ourselves for shortest turnaround of 1-2 days for all designs. We are very reasonably priced at $3 per thousand stitches. Our minimum charges is $10.00 & maximum charges is $180. We offer all digitizing in all formats for example Wilcom-emb, Compucon-ref, Ethos-isi, Wings-mls formats. Our systematically designed digitizing process always maintains very high quality.
We have the a capacity of doing 100 designs per day. We believe fast and right communication is the key for success and we practice it.
If you are interested, please send us sample artwork by email that we will digitize for you at free of cost to prove selves. The punch file will be sent through email in the required machine format.
We expect your valuable reply and sample orders. For more detail info, please feel free to contact us.
With Thanks,
pragyan
Email:- orders@mygalaxydigitizing.com
Website:- http://www.mygalaxydigitizing.com
Hand Phone:- 91 0 9594688850
Yes, I believe Google have been using book scanners which read the distance of the pages using infra red 3D scanners, including the curve of the pages. So that when scanned they appear as flat images with little or no black depth marks on that often comes with book scanning.
We usually carry out scanning using both ways. But the fastest way is always to slice the book and feed scan the pages if you are able to.
http://www.pearl-repro.co.uk
http://www.4document-scanning.co.uk
http://www.forms-data-capture.co.uk
http://www.microfiche-microfilm-scanning.co.uk
Yes speaking of library digitising I believe Google have been taking the biggest step forward in this area. There speed is down to using book scanners which read the distance of the pages using infra red 3D scanners, including the curve of the pages. So that when scanned they appear as flat images with little or no black depth marks on that often comes with book scanning. We usually carry out scanning using both ways. But the fastest way is always to slice the book and feed scan the pages if you are able to.
http://www.pearl-repro.co.uk
http://www.microfiche-microfilm-scanning.co.uk