I’ve had a digital camera since 1997, and I’ve miraculously managed to save all the photos I’ve taken during the last two decades. I used to be able to keep up with organizing and tagging everything, but that fell off about 5 years ago (about when I stopped uploading to Flickr). Over the years, I would lie to myself and say I’d be able to catch up; but it’s never going to happen. My photo output is too high (especially after becoming a father), and my free time is at dangerous lows (see fatherhood, above).
The previous paragraph is a rambling way of telling you that I’ve given up, and decided to go for good enough: Giving in to Google Photos and relying on their algorithms / the NSA whenever I need to find a specific photo.
Although not perfect, Google Photos ended up being my choice for the following reasons:
Automatic uploads from mobile
Reasonably-priced storage (completely free if you don’t need backup-quality)
Fully automatic face matching and metadata extraction for search (by no means perfect, but requires zero effort)
Decision made, the next step was to gather all my files and start uploading. This, dear reader, is where the fun began. In hopes of helping out those who try something similar in the future, here are the steps I took in order to tame my collection.
Like Kudzu, my photo files spread and multiplied into a variety of locations, living in folders named latest, latest-too, latest-latest-photos, and clinton-trump-sex-tape.
There were so, so many files. Tons of duplicates from paranoid copying and re-copying. I was willing to spend a little time getting things ready, but not that much time.
First step: Getting rid of the duplicates. Fdupes is a lifesaver here. I created a new folder (inbox) to store all the unorganized photos, then ran the following command:
This can take a while, but it works like magic and gets rid of files with exact duplicates (keeping a single copy, of course). The program is quite paranoid about matches being perfect, so duplicates still get by (e.g. photos with slightly different metadata, or a photo that was rotated). I appreciate the paranoia, since it means I could safely let the program run without checking results manually.
If I had to do this all over again, I would have used something like Duplicate Photos Fixer Pro at this point in order to clear up more duplicates. Unfortunately, I didn’t know it even existed, so I kept on chugging in the terminal.
Once most the duplicates were removed, it was time to clean up the directory structure and filenames. Exiftool makes its first appearance now, as we use it to automatically rename files (based on Elle Stone’s explanation):
# Warning: Moves and renames your files without prompting
exiftool '-filename<CreateDate'-d"%Y-%m/%Y-%m-%d %H.%M.%S%%-c.%%le"
This command renames and moves files based on the date the picture was taken. For example, a picture taken at noon on Christmas 2015 would have the following path: 2015-12/2015-12-2512.00.00.jpg. Using the timestamp as the filename isn’t super-descriptive, but it beats the hell out of IMG_21234.JPG. Grouping by month isn’t perfect either, but we’re not going for perfect here, we’re going for good enough.
I was hoping I’d be done at this point, but exiftool gave a bunch of errors and warnings and refused to rename a large chunk of files. It turns out that a few batches of files had lost all their EXIF metadata, meaning exiftool had no idea when the photos were taken.
At this point, I had to take a look at the files manually. I used to let Dropbox auto-import pictures from my phone, and it seems a large number of the files with the missing EXIF data were imported by Dropbox. Fortunately, the imported files encoded the date the picture was taken in the filename in a format similar to the new one I was using above. Setting the date from the filename is pretty easy with exiftool (the -ext avi, etc flags opt-in a few file extensions that would not normally be processed):
# Warning: Updates metadata and overwrites original files without prompting
exiftool "-alldates<filename" -overwrite_original_in_place -r -ext mts -ext mt2s -ext avi *
Now all those photos imported by Dropbox had correct date metadata and were able to be renamed algorithmically. However, there were still a bunch of stubborn files remaining. I looked at the photos manually and most of the files seemed to have a last modified time via the filesystem that seemed to match reasonably well with when the photo might have been taken. Exiftool comes to the rescue here again:
# Warning: Updates metadata and overwrites original files without prompting
exiftool "-alldates<FileModifyDate" -overwrite_original_in_place -r *
This command finds the date the filesystem thinks the file was last modified and copies it into the EXIF creation date. Now we can re-run the renaming script, and all the files are neatly organized by month, hooray!
At this point, I decided to run a couple of test uploads to Google Photos. Unsurprisingly, I ran into an issue: Google Photos doesn’t support some of the RAW files that were in my collection (specifically, RW2 files from a Panasonic Lumix DMC-G3).
After doing a bit of research, it turns out that Google Photos supports Adobe’s DNG RAW format. So I downloaded the Adobe DNG converter and converted a few files. The files converted well, and I was able to upload to Google Photos, hooray.
But, of course, I wasn’t done yet. For some reason, Google Photos was reading the incorrect date off the converted DNG files, and they all showed up as being taken at the time of file conversion (i.e. today).
After a lot of trial and error, exiftool saved my ass once again. It seems that setting the file modification date on the DNG file makes Google able to read the correct date. The following line sets the modification date to the date the photo was taken:
# Warning: Will make your dog gassy
exiftool '-modifydate<datetimeoriginal' -overwrite_original_in_place -r *.dng
Now that I had a method for making the RW2 files work, I had to figure out how to convert just the RW2 files, leaving all the other RAW images alone. If the DNG were a command-line tool, this would be pretty simple. But things aren’t that easy, so I used rsync to create a new folder with just my RW2 files:
# Warning: May cause singularity
rsync -avP --include"*/" --include"*.rw2" --exclude "*" --prune-empty-dirs inbox/ raw-files
The lines above create a parallel folder structure that contains just the RW2 files. I pointed the DNG converter at the directory, and let it chug along converting everything. Once it finished, I used exiftool again to make sure to set the modify date (see above) so Google Plus doesn’t freak out and go all amnesiac on me.
Finally, I removed the RW2 files from the main folder, since I don’t want the photo uploader to spam me with errors telling me that RW2 files can’t be supported.
# Warning: 010000010101find inbox -type f -name"*.rw2" -delete
At this point, I hated all my pictures. So I stopped. Good enough!
Three years ago, I created the Pareto Browser Filter, a simple method for avoiding problematic browsers. Back then, it mostly just excluded IE7 and below, which had less than 20% combined usage but definitely accounted for more than 80% of cross-browser authoring pain.
The worst offenders with measurable market share today are IE9 and below and Android pre 4.3, which both have about 5% global share (sadly, IE numbers are much higher in a few Asian countries including China). Opera Mini (about 4%) is also missing some key features; however, its usage is mostly international, which means many US-centric developers were never testing on it anyway.
With those numbers in mind, we can construct a new filter aimed at reducing development pain:
The new test is much simpler, however it is slightly over-aggressive, covering just under 77% of browsers based on global share. Naturally, usage numbers can vary greatly by site, so the actual number depends on the audience. Fortunately, this percentage will only increase, and should increase to 80% in less than a year.
Back in 2011, even if with the filter you still had to worry about the old IE event model.aspx). You also needed hacky scripts in order to create decent responsive designs. In 2014, once you get over the pain of dropping a fifth of your worldwide audience, there are a bunch of fun browser features you can use without guilt (or polyfills):
Audio & Video elements
CSS Transforms (including 3D)
It’s tough to predict what the next version of this filter will be, since it depends heavily on future market share. Auto-updates in Chrome and Firefox mean new features can hit 50% very quickly. However, they’re held back by the much slower release cycles of IE and iOS.
With the release of IE9, this test passes for the current and previous version of all major browsers. Depending on which browser usage statistics you use, this covers 70 to 80% users. For that reason, I call it the Pareto Filter, a nod to the Pareto Principle a.k.a. the 80-20 rule.
Aside from serving as an effective filter for older browsers, these three pieces of functionality (especially querySelectorAll) are incredibly useful when building modern applications. Being able to count on fast, bug-free implementations saves a lot of headaches (and download time) for authors.
Once IE8 market share falls (which will probably take a while since IE9 doesn’t work on Windows XP), I look forward to adding document.addEventListener to this filter. IE’s non-standard event model has always been a pain when trying to write compact cross-browser code.
If space is a concern, you could drop the querySelector test. If you already don’t care about IE8, add document.addEventListener.
I’m happy to announce that the source code for the Treesaver.js Framework is now public. I’m still working on fleshing out the documentation, so let me know which parts are confusing or just require more explanation. The best place to start right now is the Walkthrough, a simple tutorial that goes through the steps of using Treesaver for some sample content.
My business partner in crime, Roger Black, has written up the history of Treesaver, describing how (and partially why) we’re on this crazy mission to change the world of online reading.
The response and interest so far has been fantastic, and it’s great to finally be able to talk about what I’ve been working on for the past several months.
We’ve been getting a ton of questions, and I thought I’d take some time to answer the most common ones.
What is Treesaver?
Why use HTML?
There are many reasons to use HTML, but the most important ones in our opinion are:
Lingua Franca of the Internet: HTML is supported by an incredible variety of devices, from expensive desktop computers to cheap mobile phones.
Linking & Search Engines: Articles within Treesaver websites have their own URL which can be linked to or spidered by search engines. Users searching in Google can be taken directly into the rich Treesaver experience.
Bookmarking & Social Media: Because Treesaver can run in the browser, users can bookmark and directly link to articles when sharing with friends. Friends and Family can immediately view shared content in a rich experience without having to go to an app store to download a viewer.
Leverage existing content: Treesaver articles are simply-structured HTML files, meaning little to no conversion is required for existing text content. Existing web assets like video and Flash can continue to be embedded (assuming the browser/device supports Flash, of course).
Ecosystem: Because Treesaver is built upon standard technologies, it can leverage the wide variety of tools available in the web ecosystem: Google Analytics, embedded YouTube videos, slideshow widgets, etc.
Embedding: You can embed a Treesaver article within other webpages, as you would a YouTube video. Embedded articles continue to be formatted by Treesaver, meaning branding and advertising are preserved, even if the article has been embedded on another site.
What are Treesaver’s Browser/Device Requirements?
The column and page-based layout works on most modern browsers, specifically:
Internet Explorer 7 and above
Firefox 2.0 and above
Safari 3 and above (including iPhone, iPad, and many smart phones)
Chrome 3 and above
Opera 10 and above
Because Treesaver content is just HTML, it degrades gracefully onto legacy devices. Users with IE6 or older phones will still be able to read any article that uses Treesaver, although it will have a simpler, primarily text layout.
Can Treesaver content be distributed via the App Store?
Yes. We will provide the ability to package Treesaver content as an application that can be downloaded and sold through the App Store.
When can I use Treesaver?
We’re not giving public access to Treesaver right now, although we will be doing a limited beta test in the coming weeks. Subscribe to the Treesaver mailing list to be notified when beta invites are ready to be sent out.
I’m interested in using Treesaver for my publication, how do I get in touch?
Today’s web browsers don’t provide hyphenation, which makes justified text prone to rivers of white space. However, browsers do support the soft-hyphen: the &shy; HTML entity. Soft hyphens tell the browser where it can break a word, if necessary.
Adding these hyphens manually is quite tedious. The Soft Hyphenator takes HTML and adds the soft hyphens automatically.