In this post I’ll show you two ways in which you can automatically split a (collection of) scanned pages, each containing several photos, into individual image files. My experience is that for this GIMP works better than Photoshop, and as an added bonus: it’s free!
[2013-05-16 Update: the GIMP script can now handle TIF files as well]
Just like you, I also have old photo albums at home. Albums with family photographs, glued to paperboard pages. And you also probably want to have them in digital format – e.g. to share with family members, to protect them from degradation and loss, or just for your digital library.
In my case (and probably in yours), the negatives of many of these photographs are unavailable. So the only option is scanning the prints themselves.
Removing the photos from the album pages before scanning them is time-consuming, and may even harm them. So your only option is photographing them individually, or whole-page scanning/photographing. Of these the latter is the fastest and easiest, since macro-photographs of single photographs is tricky in terms of lighting, focus, and framing.
But here you run into another problem: you end up with tens to hundreds of scanned pages, each containing multiple distinct photographs. Unless you have the hand-eye coordination of a surgeon, most images will also be slightly skew. And even if the page is straight, individual photos may be skew relative to the page since they were glued that way.
You could manually rotate, crop and save each photo, but this takes a loooot of time. So isn’t there an easier way? Of course there is! You’re not the first person who wants to do this. I wasn’t either – but it took me a bit of googling to find a good solution, and now I want to save you the trouble and share it with you.
Solution 1: Adobe Photoshop (easy, costs money, regularly screws up)
Photoshop is the first place where you would expect a solution to exist, since it is the veritable industry standard for photo editing and graphic design.
In recent versions of Photoshop, you’ll find the following item in the “File” menu:
I used Photoshop CS5. The same should apply to other recent versions. The function shown above works on a single scanned page. To process a whole folder of scanned pages, you’d want to use the following option (also in the “File” menu) instead:
This takes you to a directory browser dialog. Press “ok”, and you’ll see Photoshop run through each scanned image, cropping, rotating, and saving each individual photo into an automatically created subfolder with the name “Edited”.
Generally Photoshop worked well, but also had several bloopers like the one above. You can see how it failed to split the three photos, while simultaneously cropping part of the adjacent photos into the output. Not good.
Solution 2: Gimp (easy, free, customizable, sometimes screws up)
GIMP is a free alternative to Photoshop.
The basic interface and functionality of GIMP is similar to Adobe Photoshop, although its multiple-window interface is a bit unusual and quirky compared to other mainstream Windows software.
Standard GIMP doesn’t have a batch-cropping option, but you can install a plugin to automate this task. I’ll now show you how (fear not: it is quite easy).
For automatically splitting scanned photographs I came across a plugin called “Divide Scanned Images”. Originally by Rob Antonishen, and posted here, I’ve patched this original version to include improvements suggested by readers in that forum. You can download my version from the step-by-step guide below. And the good news is that, in addition to being free, in my experience it works better than Adobe Photoshop for detecting and dividing scanned photographs! To make it run, you’ll have to
- Download and install the latest version of GIMP (click here).
- Download deskew.exe (click here) to GIMP’s plugin directory.
On my computer this is C:\Program Files\GIMP 2\lib\gimp\2.0\plug-ins
- Download DivideScannedImages.zip (click here). Unzip, and copy DivideScannedImages.scm to the GIMP scripts folder. On my computer this is
C:\Program Files\GIMP 2\share\gimp\2.0\scripts
- Restart GIMP. You should now see the ”Batch Divide Scanned Images…” option as a sub-menu under “Filters -> Batch Tools”. Click on it.
- Unlike Adobe Photoshop, this plugin gives you some choice on how you want it to behave. Many of these settings should be self-explanatory. Important is that your scanned images have a consistent region that represents the “background color”. Typically this would be the corners of your scanned image. For me, the following settings worked well:
Only two values were changed from their defaults. Firstly: “Selection Threshold” that I changed to 25. This controls how sensitive the background color is defined in terms of separating it from the foreground photos. Furthermore I changed the “Abort Limit“, which specifies the maximum number of photos can be detected on a single page. In our case no page contained more than 10 photos, so I set it to 10. Feel free to experiment with these settings.
The “load from” directory should point to the folder of input scanned pages, and the “save directory” to an empty directory that will contain the output.
- Click on OK, and watch it run through all your photos.
Comparing Photoshop with GIMP surprised me, in that GIMP’s filter seemed to be much more reliable, even straight out of the box. It is also possible to customise the filter’s behavior to suite your specific stack of scans.
Note, however, that both of these solutions (Photoshop or GIMP) can and will fail for difficult cases. Here are some tips you should follow to maximize your chances of success:
- The photos should not overlap or touch each other. If they do, they will not be divided from each other by the automatic script
- The scan / photograph borders should be cropped in such a way that it doesn’t extent beyond the page background, and the page background should extend up to or beyond the image borders e.g. – seeing the wooden floor (on which an album was placed while photographing it) will screw up the algorithm unless you carefully set up the “Background Sample X/Y offset” values.
- The page background should be uniform (white or black are good), and have enough contrast relative to the photos
- The page (including the background) should be evenly lit
Good luck and feel free to comment on your experiences!