
Digitizing Photos and Documents
by Philip Chien
This article is based on a class the author gave at the Maitland Public Library.
A flatbed scanner is one of the most versatile accessories you can get for your computer and generally available for well under $100. Scanners take photos and documents and convert them into computer readable formats. A scanner mixes some of the features of a photocopier, fax machine, and digital camera.
There are many reasons to digitize your photos. You can create collections of your family photos and index them to make them easier to search, you can easily email copies or make duplicate prints, you can put your photos into computer programs or web presentations, and that’s just the beginning. While most folk are using digital cameras for all of those reasons a scanner makes it possible to convert your old film photos into digital pictures.
Another important use for scanners that many folk are unaware of is the ability to scan documents. You can take almost anything on a sheet of paper and scan it into your computer. Contracts, forms, articles from magazines, and other documents can all be scanned into your computer. If it’s a printed document there are “Optical Character Reader” (OCR) programs which will “read” the document and save it as a text file.
Hobbyists have found scanners useful for cataloging reasonably flat objects. You can “photograph” coin and stamp collections, leaves, and similar objects with extremely high quality images that really show the fine details.
Most flatbed scanners have similar basic features. There’s a hinged cover to prevent stray light from entering and a glass surface where you place your photo or document to be scanned. It’s similar to a photocopier, other than the lack of a printer. The scanning mechanism is similar to a fax machine, except it’s in color and instead of transmitting your document over the phone lines the scanner transmits the image to your computer.
Current scanners are color (early ones were only black and white). Some scanners have transparency adapters to permit you to scan in 35 mm. slides. Many scanners have ‘one-touch’ programmable buttons that automate some of the functions. Some scanners are all-in-one units with a built in printer and possibly slots that accept your digital camera’s memory cards.
Features on high-end units include large scan areas (11x17 inches for oversize documents), an upside-down “V” shaped surface so books can be scanned to their edges, automatic document feeders (useful if you need to scan multiple pages) and duplex (double-sided) document mode (very expensive, but useful if you’re scanning a large number of double-sided documents).
Besides flatbed scanners there are specialized scanners: small units designed only to scan business cards, dedicated units for 35 mm. slides and strips of negatives, portable scanners, handheld scanners, and high-end drum scanners for ultra-high quality photographic studio work. There are also “feeder” scanners where you pass a sheet through a slot. Those scanners have similar capabilities to flatbed scanners, but are less versatile because they’re limited to single sheets which fit inside the slot.
While technically not scanners there are other ways of digitizing real-world images including digital cameras (digital photos directly without film), webcams (small video cameras connected directly to your computer), and video digitizers (capture still frames from a video signal).
Almost every current scanner plugs into your computer’s USB port. Older models may use older interface standards like SCSI, serial, parallel, or need a proprietary interface card.
The most important quality feature on a scanner is actually unimportant for most uses – the quality of the scan. Even the simplest, lowest price scanner has at least 300 dots per inch resolution and 24 bits of color depth. This is perfectly adequate for most consumer applications. Most scanners are capable of higher resolution (1200 or 2400 dots per inch is common) and that is useful for ultra-high quality scans if necessary.
Scanners come with a CD which includes the driver (software which interacts with your computer’s operating system) and almost always contains some image editing program, typically a “light” version of Adobe Photoshop Elements. Many scanners also come with an Optical Character Reader program. If you’ve lost the CD or obtained the scanner used you can usually download the drivers from the manufacturer’s website. It’s important to note that many companies, Lexmark in particular, have not written drivers for new operating systems, like Windows Vista, for their older scanners.
The most common standards for scanner drivers are TWAIN (it isn’t an acronym even though it’s in caps) and ISIS (Image and Scanner Interface Specification). A software programmer writing a graphics software package or another program which would benefit from the use of a scanner doesn’t need to know anything about a particular scanner model as long as the program uses the TWAIN or ISIS interface. Likewise the scanner manufacturer doesn’t need to know anything about any specific program if its driver is TWAIN or ISIS compatible. Many scanners also come with a specific driver for Adobe Photoshop because it’s so popular.
Photoshop is THE definitive program for manipulating images. It’s extremely versatile and has many excellent image editing features which you can use to improve images once they’ve been scanned into your computer. In addition Photoshop is expandable through plug-ins, additional software written by other companies. Many graphic programs are designed so they can also use Photoshop plug-ins.
An excellent alternative to Photoshop is GIMP (GNU Image Manipulation Program). Its major advantage is it’s free. It’s a decent program and has many satisfied users who swear by it.
Photoshop and GIMP are both extremely sophisticated programs and it takes a while to learn how to use them properly. There are simpler programs available (both free and pay) which are easier to learn, but have less features.
Optical Character Recognition (OCR) is one of the most amazing advances in computer technology. It’s like teaching a dog to sing – the fact that the dog can’t sing well isn’t the point; it’s that it’s even possible. An OCR program is a form of artificial intelligence. It takes the image of words on a page and converts it into text. If you scan a document into your computer and save it, it’s just a picture even though it looks like text and can be printed. But once an OCR program converts the picture into text the text can be manipulated (edited, typos corrected, reset in a different format, searched for keywords, etc.). The added bonus is a text file takes up far less memory than the image of a page of text. But OCR programs are only so good – it’s difficult to tell the difference between the number 1 and a lower case “l” for example. OCR programs can only work with normal readable fonts and have difficulties recognizing esoteric novelty fonts. Fortunately most documents you’d want to OCR are ones which tend to OCR well. In ALL cases it’s your responsibility to have a human inspect the text output and read it for errors (a spell checker can help but is not a substitute for a human brain).
How to use your scanner
You have to make two decisions each time you use your scanner – one simple and one more complicated. The simple question is whether to scan in black and white (actually different levels of gray from white to black) or color. As a rule you should use black and white for printouts and color for pictures. With black and white pictures if you scan them in color you can make the decision later to convert them into black and white.
The more complicated decision is the resolution – how fine do you want to scan your image? This depends primarily on the size of the original image, whether or not you want to crop the image down to concentrate on the primary subject in the picture and crop out unwanted stuff, and ultimately how you want to use the final image.
Let’s look at each of these factors.
If you start with a 4x6 print you’re going to have to scan it at a higher resolution than an 8x10 photo if you want the same size resulting image. It’s a simple math problem, multiply the scanner’s resolution by the picture’s dimensions. A 4x6 print scanned at 300 dots per inch is going to result in a 1200 x 1800 pixel image. If you want to scan an 8x10 photo with the same size digital image it needs to be scanned at roughly 180 dots per inch. It sounds complicated but you can do it by instinct once you’ve scanned a couple of photos, just remembering the typical photos you scan and how you want them to look.
On the other hand let’s say you only want to use the left half of a 4x6 photo, and convert it from a landscape (wider base than height) into a portrait (longer height than base) and result in the same 1200 x 1800 result. (This is a drastic case for cropping; in most cases you’re going to want something in the middle and crop out the stuff along the edges.) For this specific case you want a 4x3 inch image on your photo, so dividing 1800 pixels (the final vertical dimension) by 4 inches equals 450 dots per inch. You also divide 1200 pixels by 3 inches which equals 400 dots per inch. In this case you’d scan the photo at 450 dots per inch and do the final crop and adjust within the image editing program.
The most important factor when determining resolution is how you’re going to use the photo. You should also take into account how the photo may be used in the future, and how much power your computer has. The later is an important factor if you’re using an older (slower) computer without a lot of memory. Less powerful computers will take a LONG time to scan and process large high-resolution images.
If the photo is just going to be used as a thumbnail on a webpage a 200 pixel wide image may be enough. But if you want more detail you may want a larger photo at roughly 800x600 or 1024x768 image. These are de facto image sizes because earlier computer monitors had these resolutions. Now monitors are much larger with 1300 or even 2000 pixels wide common on large monitors.
If you want to print out the photo you’ll want to match the resolution of your printer. (For example, if you want to print an 8x10 photo on a 300 dpi printer the image on your computer should be 2400 x 3000 pixels.)
Some folks scan everything at the highest resolution possible and then edit the image down to the appropriate size with an image editing program. While this method works the large images take longer to scan, take up more space in your computer’s memory, and take up more space on your hard drive if you save them at their full resolution. If you have a super-powerful computer with lots of storage space this isn’t as much of a concern.
Another factor for scanning images is the source. True photographs and actual objects are continuous images. But printed images, especially newspapers, are optical illusions. Tiny dot patterns are used to simulate shades of color that the human eye blurs together into an image. It’s especially obvious in cheap comic books and newspapers where you can see the dot patterns without a magnifying glass. If you scan these photos directly into your computer you’ll see the dot patterns. Most scanner drivers include a “descreen” function that scans the image at a higher resolution and then blurs the image to remove the dot pattern effect. You have to be careful with the descreen function to get the best results.
Besides scanning photos, documents, and objects, scanners can be useful for other functions. Some scanners come bundled with fax software. The software scans a document and uses your computer’s modem to send the image over a phone line to a standard fax machine. You can use your scanner as a convenience “photocopier” – just scan in a page and then print it on your printer. Many scanners have a button which automates this process. It’s a quick way to make a copy, but relatively expensive per page cost with most all-in-one printer-scanners. With some all-in-one units the copy button runs a program on your computer that performs the scan and print tasks, so the computer has to be on whenever you want to make a copy.
Some talented folk have pushed scanners way beyond their official uses and written software or modified the scanner hardware for some pretty amazing tasks. There are programs which will scan and decode barcodes, scanners which have been modified into panoramic cameras, scanners with robotic page turners to scan in books, and even one enthusiast who scanned in a long playing vinyl record and wrote a program to take the image of the record’s grooves and convert them back into music!
Links
An excellent tutorial for how to use your scanner.
Purchase flatbed scanners
from Amazon.com.
Purchase Photoshop CS4
from Amazon.com.
Free Photoshop plug-ins.
Download GIMP, an excellent free image editor.
Download Simple OCR, a free Optical Character Recognition program.
A homemade DIY book scanner project.
About the author
Philip Chien has been using computers since the 1970s.
© 2009 neatinformation.com. All Rights Reserved.
Home