Adventures in command-line photo organization


I’ve had a digital camera since 1997, and I’ve miraculously managed to save all the photos I’ve taken during the last two decades. I used to be able to keep up with organizing and tagging everything, but that fell off about 5 years ago (about when I stopped uploading to Flickr). Over the years, I would lie to myself and say I’d be able to catch up; but it’s never going to happen. My photo output is too high (especially after becoming a father), and my free time is at dangerous lows (see fatherhood, above).

The previous paragraph is a rambling way of telling you that I’ve given up, and decided to go for good enough: Giving in to Google Photos and relying on their algorithms / the NSA whenever I need to find a specific photo.

Although not perfect, Google Photos ended up being my choice for the following reasons:

Decision made, the next step was to gather all my files and start uploading. This, dear reader, is where the fun began. In hopes of helping out those who try something similar in the future, here are the steps I took in order to tame my collection.

Like Kudzu, my photo files spread and multiplied into a variety of locations, living in folders named latest, latest-too, latest-latest-photos, and clinton-trump-sex-tape.

There were so, so many files. Tons of duplicates from paranoid copying and re-copying. I was willing to spend a little time getting things ready, but not that much time.

First step: Getting rid of the duplicates. Fdupes is a lifesaver here. I created a new folder (inbox) to store all the unorganized photos, then ran the following command:

# Warning: Deletes files without prompting!
fdupes -R -d -N .

This can take a while, but it works like magic and gets rid of files with exact duplicates (keeping a single copy, of course). The program is quite paranoid about matches being perfect, so duplicates still get by (e.g. photos with slightly different metadata, or a photo that was rotated). I appreciate the paranoia, since it means I could safely let the program run without checking results manually.

If I had to do this all over again, I would have used something like Duplicate Photos Fixer Pro at this point in order to clear up more duplicates. Unfortunately, I didn’t know it even existed, so I kept on chugging in the terminal.

Once most the duplicates were removed, it was time to clean up the directory structure and filenames. Exiftool makes its first appearance now, as we use it to automatically rename files (based on Elle Stone’s explanation):

# Warning: Moves and renames your files without prompting
exiftool '-filename<CreateDate' -d "%Y-%m/%Y-%m-%d %H.%M.%S%%-c.%%le"

This command renames and moves files based on the date the picture was taken. For example, a picture taken at noon on Christmas 2015 would have the following path: 2015-12/2015-12-25 12.00.00.jpg. Using the timestamp as the filename isn’t super-descriptive, but it beats the hell out of IMG_21234.JPG. Grouping by month isn’t perfect either, but we’re not going for perfect here, we’re going for good enough.

I was hoping I’d be done at this point, but exiftool gave a bunch of errors and warnings and refused to rename a large chunk of files. It turns out that a few batches of files had lost all their EXIF metadata, meaning exiftool had no idea when the photos were taken.

At this point, I had to take a look at the files manually. I used to let Dropbox auto-import pictures from my phone, and it seems a large number of the files with the missing EXIF data were imported by Dropbox. Fortunately, the imported files encoded the date the picture was taken in the filename in a format similar to the new one I was using above. Setting the date from the filename is pretty easy with exiftool (the -ext avi, etc flags opt-in a few file extensions that would not normally be processed):

# Warning: Updates metadata and overwrites original files without prompting
exiftool "-alldates<filename" -overwrite_original_in_place -r -ext mts -ext mt2s -ext avi *

Now all those photos imported by Dropbox had correct date metadata and were able to be renamed algorithmically. However, there were still a bunch of stubborn files remaining. I looked at the photos manually and most of the files seemed to have a last modified time via the filesystem that seemed to match reasonably well with when the photo might have been taken. Exiftool comes to the rescue here again:

# Warning: Updates metadata and overwrites original files without prompting
exiftool "-alldates<FileModifyDate" -overwrite_original_in_place -r *

This command finds the date the filesystem thinks the file was last modified and copies it into the EXIF creation date. Now we can re-run the renaming script, and all the files are neatly organized by month, hooray!

At this point, I decided to run a couple of test uploads to Google Photos. Unsurprisingly, I ran into an issue: Google Photos doesn’t support some of the RAW files that were in my collection (specifically, RW2 files from a Panasonic Lumix DMC-G3).

After doing a bit of research, it turns out that Google Photos supports Adobe’s DNG RAW format. So I downloaded the Adobe DNG converter and converted a few files. The files converted well, and I was able to upload to Google Photos, hooray.

But, of course, I wasn’t done yet. For some reason, Google Photos was reading the incorrect date off the converted DNG files, and they all showed up as being taken at the time of file conversion (i.e. today).

After a lot of trial and error, exiftool saved my ass once again. It seems that setting the file modification date on the DNG file makes Google able to read the correct date. The following line sets the modification date to the date the photo was taken:

# Warning: Will make your dog gassy
exiftool '-modifydate<datetimeoriginal' -overwrite_original_in_place -r *.dng

Now that I had a method for making the RW2 files work, I had to figure out how to convert just the RW2 files, leaving all the other RAW images alone. If the DNG were a command-line tool, this would be pretty simple. But things aren’t that easy, so I used rsync to create a new folder with just my RW2 files:

# Warning: May cause singularity
rsync -avP --include "*/" --include "*.rw2" --exclude "*" --prune-empty-dirs inbox/ raw-files

The lines above create a parallel folder structure that contains just the RW2 files. I pointed the DNG converter at the directory, and let it chug along converting everything. Once it finished, I used exiftool again to make sure to set the modify date (see above) so Google Plus doesn’t freak out and go all amnesiac on me.

Finally, I removed the RW2 files from the main folder, since I don’t want the photo uploader to spam me with errors telling me that RW2 files can’t be supported.

# Warning: 010000010101
find inbox -type f -name "*.rw2" -delete

At this point, I hated all my pictures. So I stopped. Good enough!

Updated Pareto Browser Filter


Three years ago, I created the Pareto Browser Filter, a simple method for avoiding problematic browsers. Back then, it mostly just excluded IE7 and below, which had less than 20% combined usage but definitely accounted for more than 80% of cross-browser authoring pain.

The worst offenders with measurable market share today are IE9 and below and Android pre 4.3, which both have about 5% global share (sadly, IE numbers are much higher in a few Asian countries including China). Opera Mini (about 4%) is also missing some key features; however, its usage is mostly international, which means many US-centric developers were never testing on it anyway.

With those numbers in mind, we can construct a new filter aimed at reducing development pain:

if ('requestAnimationFrame' in window) {
  // Passes for:
  // Chrome
  // IE 10+
  // Firefox
  // Safari 6.1+
  // iOS 7.1+
  // Chrome Android
  // Android 4.4+

The new test is much simpler, however it is slightly over-aggressive, covering just under 77% of browsers based on global share. Naturally, usage numbers can vary greatly by site, so the actual number depends on the audience. Fortunately, this percentage will only increase, and should increase to 80% in less than a year.

Back in 2011, even if with the filter you still had to worry about the old IE event model.aspx). You also needed hacky scripts in order to create decent responsive designs. In 2014, once you get over the pain of dropping a fifth of your worldwide audience, there are a bunch of fun browser features you can use without guilt (or polyfills):

It’s tough to predict what the next version of this filter will be, since it depends heavily on future market share. Auto-updates in Chrome and Firefox mean new features can hit 50% very quickly. However, they’re held back by the much slower release cycles of IE and iOS.

Talking Treesaver


Video from my portion of the Designing for iPad and Other Mobile Devices panel at the 2011 International Symposium on Online Journalism is now available online:

You can view the presentation slides here:

Bonus: A few weeks ago, I did a Portuguese-language interview about Treesaver with Pedro Telles for the podcast. If you speak Portuguese (or even if you don’t), you can listen online or download the MP3 directly.

Treesaver on The Big Web Show


Scott Kellum and I were on the Big Web Show today with Jeffrey Zeldman and Dan Benjamin today talking about web design, dynamic layout, and a bit of the history about how Treesaver came to be. You can view the full video (or download audio here): Treesaver on the Big Web Show.

Dan’s also posted some bonus material here: Treesaver After Dark.

The Pareto Browser Filter


Lately, I’ve been a fan of using the following conditional as a quick high-pass filter for modern browsers:

if (document.querySelector && window.localStorage && window.JSON) {
  // Passes for:
  // IE8+
  // FF3.5+
  // Safari 4+
  // iOS Safari 4+
  // Android 2.1+
  // Chrome
  // etc

With the release of IE9, this test passes for the current and previous version of all major browsers. Depending on which browser usage statistics you use, this covers 70 to 80% users. For that reason, I call it the Pareto Filter, a nod to the Pareto Principle a.k.a. the 80-20 rule.

Aside from serving as an effective filter for older browsers, these three pieces of functionality (especially querySelectorAll) are incredibly useful when building modern applications. Being able to count on fast, bug-free implementations saves a lot of headaches (and download time) for authors.

Once IE8 market share falls (which will probably take a while since IE9 doesn’t work on Windows XP), I look forward to adding document.addEventListener to this filter. IE’s non-standard event model has always been a pain when trying to write compact cross-browser code.

If space is a concern, you could drop the querySelector test. If you already don’t care about IE8, add document.addEventListener.

Treesaver Open Source Release


Roger Black and Filipe Fortes

I’m happy to announce that the source code for the Treesaver.js Framework is now public. I’m still working on fleshing out the documentation, so let me know which parts are confusing or just require more explanation. The best place to start right now is the Walkthrough, a simple tutorial that goes through the steps of using Treesaver for some sample content.

My business partner in crime, Roger Black, has written up the history of Treesaver, describing how (and partially why) we’re on this crazy mission to change the world of online reading.

Here’s to an interesting 2011!

Treesaver gets Scobleized


Robert Scoble intereviewed me about Treesaver yesterday, here is the video:

Introducing Treesaver


The word is out about my latest venture: Treesaver. Nomad Editions is our first customer to make a public announcement, you can see their promotional video below:

The response and interest so far has been fantastic, and it’s great to finally be able to talk about what I’ve been working on for the past several months.

We’ve been getting a ton of questions, and I thought I’d take some time to answer the most common ones.

What is Treesaver?

It’s a way to make column and page-based layouts using HTML, CSS, and JavaScript. Text and image layout is tailored to fit whatever screen the user is using, in order to create a attractive, usable experience. Here’s a video demonstrating how Treesaver adapts to a changing window size:

Why use HTML?

There are many reasons to use HTML, but the most important ones in our opinion are:

What are Treesaver’s Browser/Device Requirements?

The column and page-based layout works on most modern browsers, specifically:

Because Treesaver content is just HTML, it degrades gracefully onto legacy devices. Users with IE6 or older phones will still be able to read any article that uses Treesaver, although it will have a simpler, primarily text layout.

Can Treesaver content be distributed via the App Store?

Yes. We will provide the ability to package Treesaver content as an application that can be downloaded and sold through the App Store.

When can I use Treesaver?

We’re not giving public access to Treesaver right now, although we will be doing a limited beta test in the coming weeks. Subscribe to the Treesaver mailing list to be notified when beta invites are ready to be sent out.

I’m interested in using Treesaver for my publication, how do I get in touch?

Contact us via the Treesaver website.

Will Treesaver be Open Source?

Yes. I’ll be releasing the client libraries Treesaver under an open source license later this year (it’s not quite ready yet, I have to remove all the dirty words first).

Got any Screenshots?

Sure, here are a few from a Nomad Editions sample issue:

Treesaver in Chrome on the Mac

Treesaver in Firefox on Ubuntu

Treesaver in IE

Treesaver on the iPad

Treesaver on the iPhone

More Questions?

Ask @treesaver or @fortes on Twitter.

Treesaver Sneak Peek


Treesaver sneak peek video

Soft Hyphenator


Soft Hyphenator

I’m happy to announce Soft Hyphenator , a simple utility I wrote this weekend that adds soft hyphens into HTML content.

Today’s web browsers don’t provide hyphenation, which makes justified text prone to rivers of white space. However, browsers do support the soft-hyphen: the &amp;shy; HTML entity. Soft hyphens tell the browser where it can break a word, if necessary.

Adding these hyphens manually is quite tedious. The Soft Hyphenator takes HTML and adds the soft hyphens automatically.

The site is written in Python and deployed on Google AppEngine. It uses the BeautifulSoup and python-hyphenator libraries, along with OpenOffice’s hyphenation dictionaries. For the nerdy & curious, I’ve made the source code public.

I’m still quite new to Python and Google AppEngine, but I’m pretty happy with both so far. The entire site took less than 24 hours to write, and WTF/minute rate was pretty good.

Full post archive