“The first five guys to sign up were two mechanical engineers, two software developers, and an intellectual property lawyer.”
—Dan Reetz, speaking about the diybookscanner.org forum at the New York Law School D is for Digitize conference, October 9, 2009
I personally scan the books I bought because I’m tired of thousands of them cluttering up my house, ending up lost at the bottom of a box marked Dishes, getting eaten by bugs, or attacked by acid inherent in the paper. If you don’t like the idea of reading electronic editions of books, stop reading here! Also, if you think format-shifting is intellectual property theft, stop reading here. There are as many reasons people have for scanning books as there are people:
- Saving family history documents
- Format-shifting for the print-disabled
- Archiving rare books
- Annoyed at the space a huge collection of physical books occupies
- Appalled at the prices for college text books
- Increasing access to out-of-copyright works from libraries
- A cheaper, more book-friendly method of scanning
- A mobile alternative to hundreds of pounds of reference books
I didn’t make these up. Each of these represents a real person on the DIY Book Scanner forum. I won’t pull any punches: some uses of this technology are copyright infringement. Some are not. This is technology easily built by anyone with a few hundred U.S. dollars (mostly for the two 8+ megapixel cameras), and there is no reasonable way to stop it. Those against format-shifting should have a long, hard think about what that means. Those who believe the technology can be embargoed should probably stop reading this blog.
Dan Reetz started it all by posting his initial design on instructibles.com — winning a competition for a laser etcher in the process. After pages and pages of comments and refinements, he decided to set up a website just for the design, and invited everyone to come on in. It’s been going strong ever since.
“If Dan Reetz didn’t exist, it would be necessary for Cory Doctorow to invent him.”
—Author Robin Sloan, The Future of the Book: Bringing Book Scanning Home, October 12, 2009
Having built a book scanner, and having digitized a few books on it at about 10 pages a minute — I’m slow and careful; higher speeds are regularly attained by others — I’ve found that the images have some distortion because the pages are not pressed completely flat. A minor cause of additional distortion is lens geometry. How to postprocess the images to end up with nice undistorted images?
Several ideas that have been kicked around the DIY Book Scanner forum are:
- calibration images (see here and here)
- image analysis and modeling for a mathematical distortion inverse transformation (see here)
- stereoscopic imaging for direct measurement of distortion (briefly mentioned here)
- manual page straightening (if all else fails)
After fiddling around briefly with the first, and spending a lot of time on the second, my current objective is to work with the third idea, stereoscopic images. I believe this holds the most promise of accurately determining page distortion without relying on the content of the page. So my next step is to convert my two-camera scanner into a four-camera model, after first experimenting with placing the existing two cameras on the same side to see if stereoscopic imaging is practical without a lot of fiddling.
Tags: books, bookscanning, singularity

I built a book reader with a scanner and OCR software (built in to scanner). Scanner: Microtek S400 and Text to speech software: Ghostreader for the mac.
I use it for text to speech obviously, to read back complicated Buddhist Logic and other texts (in monesteries, monks study and listen to this material constantly whereas lay people don’t have that advantage…until now with Ipods and machines like the book reader), to learn material that lends itself to repetition and not to have to read the material, plus it also works quite with literature and poetry.
Keep in mind that great writers like Milton had book readers to read to them when they got too old to read. This is quite a luxury in other words besides the learning aspect.
Some possible book scanning “gutter math” appears at http://www.tinaja.com/glib/gutter01.pdf
Ah… I had been looking for a good transform. A 2nd-order polynomial in x seemed to work fine for mild curviness, and higher order polynomials tended to be poorer fits. I had not hit upon your equation!
I designed my camera set-up in 2007. While it has been affective, I believe some of your designs would be more effcient for what we are doing. We went to Southampton County, VA in early 2009 and digitized 56,000 pages of old court documents in 3 weeks. We have distributed the images all over the US to volunteers for indexing. We have already indexed 630,000 names. This is the first county in US history to have this done, but we are hoping others will see the advantage of this and follow suit, causing further research capability for genealogist and historians.
Would love to talk to someone there and get some advice.
Ken Brantley
President: The Brantley Association of America.
I have been doing book scanning/rescue for 10 years, using Canon DR5080C scanners (I have some mods to improve paper feed of thin pages and improve feed reliability) and a hydraulic paper cutter that can debind a phone book clean. I’m rescuing A Full library of semiconductor databooks from the 70s on, including early computing, s-100 Etc.
An automated system is a dream i am still chasing, i am encouraged by these efforts and no longer feel like i’m the only one doing it!
holler if you use DR-5080C Scanners and ill fix you up with a mod that will make things work MUCH better.
My book scanning is done without funding or pay of any kind, i am trying to figure out how to get help in the process, has anyone figured out how to get help in the scanning process from the local community? I have 150 Milk crates that need to be scanned….
drd
If you’re willing to slice a book, then definitely the best way is to just use a sheet-fed scanner. It will probably go a little slower than a camera-based scanner, but you will end up with perfectly aligned scans in the end. Of course, if you’re not willing to slice a book…
Anyway, go ahead and join the diybookscanner community, and ask away! It’s not just for the DIY Book Scanner, but for anyone scanning books.
I second Rob — please come join us at DIYbookscanner.org and contribute what you know. We have lots of helpful, smart people working on both the hardware and software problems facing everyone interested in scanning. You’d be most welcome.
Hello, I have perfect vision according to the doctor who examined me lately, since I was involved in explosion incident in Iraq, neither my hearing nor my vision seemed to be right as far as I know, I used to read books but not anymore, every time I try to read any book, I get headache and my eyes can not focus, so I had to start scanning my school books and convert them to audio and listen, buy scanning and editing my school books then converting them to Audio is time consuming, I came across this forum while looking for auto-scanner, I am not sure if someone can suggest me any scanner at all but I am really looking to buy or build my own book scanner which I can easily connect to a computer and use it, my current one is the “HP7200 all in one” wireless and it works great specially when scanning a book as PDF file, but it takes a lot of time to scan the whole book.
Please if you have any suggestion e-mail me at servicenazret@gmail.com, I can buy only a product in USA no Canada or UK or china.
Thanks.
You might try the Kurzweil Reader, built by my personal hero, Dr. Ray Kurzweil. It is a print-to-speech device which is built specifically for those with difficulty seeing or reading print. I highly recommend it.
So what is the finality to all this scanning? Do you upload them to the internet or do you create your own private digital libraries?
I am a new postgraduate student and I dread the day when I will have to pay 14k pounds for the program I am enrolled in….
Different people do different things. The one scanning historical documents puts them on the web. I scan my collection of books and save them privately. The end result, if you don’t OCR, can be a PDF, with each page being an image. Personally, I’ve never met an OCR program that I like.