[ Home Page ]

The Archive CD Books Project - Click to Enter
The Book Scanning & Digitising Process

Let's start with an average book.

This one happens to be a Kelly's Directory of 1879.

The binding is somewhat tight. Excellent in fact. Just like a new book, it doesn't lie flat when opened.

That presents us with the first problem. What we don't want to do is press it down hard and flat, or fold it back on itself to loosen the binding.

That would definitely damage any valuable old book.

A planetary camera. There are basically two ways of scanning a book.
  1. using a conventional flat bed scanner
  2. using a planetary camera

Cost-wise there is no contest. A conventional flat bed scanner costs around £60 to £100, whereas a planetary camera costs around £12,500.

300dpi-bw

However, the flat bed scanner has some severe limitations.
  • To get a decent image, the book must be placed face down on the glass, picked up, the page turned, and placed down again. That repeated handling is not good for the old and valuable book.
  • The book needs to be pressed down hard on its spine to flatten the pages. That does damage the book.
  • And despite pressing down hard, the page still "curves down" at the centre of the binding, distorting the words which are near the bound edge.
  • Even then, there is a dark shadow down the centre of the book where the light creeps in. Just like on a photocopier.
  • A normal flat bed scanner can't cope with anything larger than A4, and many books are larger than that. (Our planetary camera can scan books 4 times that size - right up to A2 in size).

Left: a 300dpi black & white scan from a conventional flat bed scanner. The book has been pressed down quite hard to flatten it, (and that does cause damage) but still we see a "squashing" of the words close to the binding, and of course, that dark shadow down the centre where the light gets in.

Readable of course, but not what we would call high quality.
.

300dpi grey scale - flat bed scanner Left: an improvement by scanning at 300dpi grey scale.

Much better!

The down-side, is that one page image can result in a file size of up to 6,000 kb instead of 300kb and therefore less pages will fit onto a CD. With some types of documents such as those which are hand written, there is just no option. They have to be grey-scale images to get the best possible results.

We still see the "book fold" and the distortion of the words close to the binding, even though the book was pressed down hard again. And we still see the shadow down the centre of the book fold.

Interestingly, we also see the "bleed through" of the print from the back of the page. (This shows as black marks on the black & white scan above). This is very common indeed with old books.

The book really is like that, and therefore it will appear on the scanned image. Personally I don't consider it to be a big problem. I would rather see the book as it really is than attempt to make it into a "new book". As it is, it feels like a real old book, and that, to me, is preferable.

So..... we have improved the image, but in the process, we have still caused some damage to the book by pressing it down hard to make it flat. Using a conventional scanner we also have to repeatedly handle the book, picking it up, turning the pages, and then placing it down flat on its face again.

.


the book in its nromal open position

This is one of my favourite photographs. In the middle of winter, a red admiral butterfly hatched out somewhere in the office. It was drawn to the camera by the warmth and brightness of the lamps, and settled on the book and walked around.

So I scanned it! And afterwards grabbed the digital camera and took a photograph. The butterfly was totally unharmed.


Now let's look at what the planetary camera can do.

For a start, the massive advantage is that the book sits on the desk top in its normal open position. It doesn't have to be repeatedly handled, and the only time it is touched, is to turn over a page normally.

And it doesn't need to be pressed flat!

this butterfly was scanned by accident, and then it flew away unharmed

The photograph of the book - done by the planetary camera The planetary camera is really a digital camera. What you see is a photograph of the book, which is saved as a digital image.

It is a sort of cross between a single lens reflex camera and a scanner. There is an optical quality mirror in the camera, which reflects the image onto a superb camera lens. Behind the lens, instead of a film.... is the scanner head.

But it is even more clever than that.

From the time that the "go" button is pressed:

  1. It does a pre-scan to see where on the table the book is, and automatically sets the scan area to include just the book.
  2. It auto focuses on nine different points on the book or object and remembers where those focus points are.
  3. It then scans the whole double page of the book
  4. The next step is really clever !
    It recognises that the book curves into the centre and that the words are distorted, so it digitally, and progressively straightens up the page so that it is flat !
  5. Then it finds the centre of the book and divides the 2 pages into two separate page images
  6. And finally saves the two images to disk as two separate high resolution tiff files.

And that whole operation (steps 1 to 6)  takes about two seconds.

Oh yes... and if we wish, we can have it automatically "deskew" and straighten a crooked page, or automatically rotate a page.

without book fold correction Here we go.....

Left: a scan without the "book fold correction" turned on.

As expected, and because we didn't press the book down flat (and damage it), the words close to the book fold are distorted and compressed.

but.... and see below.....

with book fold correction ..... a second scan of exactly the same part of the same book, with the book fold correction turned on.

Magic !

The built-in software has recognised where the words are distorted, and has straightened them up for us and has flattened up the page.

This really is the same section of the same page. It is wider because the page has been "flattened" out.

It's uncanny. But it happens.

The final touch, by adjusting the scanner settings, is to lighten up the background "grey" of the page. Old paper does scan as grey (actually it really is grey or brown).

Enlarging the images to this extent is, of course, a cruel test.

We wouldn't normally expect to view a book up this close, even on screen, and of course, the print in the original book is tiny.

But in fact, on the Archive CD Books CDs it is possible to set any magnification you wish for comfortable reading on screen. Anything from a full page on screen right up to a huge magnification without any substantial loss of quality or readability. That's because of the fact that the initial scanned images are of a very high quality indeed. No compromise.

a cruel magnification

And with an image quality this good, we can then use optical character recognition software to read every letter, and then make final PDF format files of the books that can be searched using Adobe Acrobat Reader.... for any word or part of a word. Not just words in a dictionary.


Even higher quality bitonal scan


The story doesn't end there though.

We are always striving to make the Archive CD Books better. (And to keep one step ahead).

The next generation is already under way. Surprisingly, not through the use of higher quality grey scale images, but by vastly improved black and white images through the use of some new and clever software processing techniques. The original books were in black and white after all.

This is an example from our latest CD book.

A little part of an advertisement page in a county directory.
Cruelly enlarged from a little more than an inch wide. At normal size it appears to be grey scale, but the enlargement clearly shows that it isn't.

original

People have already asked "How do you do that ? ! ! "

.... sorry. We're not telling.


Care of old documents

[ Home Page ] [ Scanning for Museums & Libraries ]

Digitise-it.com Ltd
5 Commercial Street,  Cinderford, Gloucestershire GL14 2RP England

Phone: +44 (0)1594 829879
Fax : +44 (0) 1594 827864
E-mail : sales@digitise-it.com