File Conveyor by Wim Leers

File Conveyor is a daemon written in Python to detect, process and sync files.

Designed to sync files to CDNs. Amazon S3 & CloudFront and Rackspace Cloud Files, as well as any Origin Pull CDN or (S)FTP Push CDN are supported.

File Conveyor is supported by the White House!

The development of File Conveyor was supported by The Executive Office of the President.

They're currently evaluating File Conveyor as an alternative to rsync. Their problem was that files were being synced too slowly from whitehouse.gov to their CDN, because rsync needs to scan the entire directory tree to detect changes. File Conveyor is faster, because it relies on inotify to detect changes.

More news to follow later, when the new site goes online.

What does it do? — In one paragraph

This daemon is designed to discover new, changed and deleted files via the operating system's built-in file system monitor. After discovering the files, they can be optionally be processed by a chain of processors – you can easily write new ones yourself. After files have been processed, they can also be synced (“transported”) to a server.

The 3 stages

Discovery of changes of the filesystem happens instantaneously through inotify on Linux, through FSEvents on Mac OS X and through polling on other operating systems.
↓
Processors are simple Python scripts that can change the filename and apply any sort of processing to the file's contents. E.g.:
- image optimization
- CSS & JS minification
- video transcoding
A couple of file processors are included by default:
- filename, which can convert spaces to underscores or dashes
- google_closure_compiler, to compress JS files using Google Closure Compiler
- image_optimizer, which is capable of optimizing images without losing quality, i.e. it stores images more efficiently (similar to the service that smush.it provides).
- link_updater, which should be used to update the URLs in CSS files so that the CSS file refers to the images, fonts … on the CDN
- unique_filename, to give files a unique name based on its last modification time (mtime) or MD5 hash
- yui_compressor, to compress CSS and JS files using the YUI Compressor
- … (it's easy to create your own, look at processor_sample!)
Proof of simplicity (Google Closure Compiler Processor)
↓
Transporters are simple threaded abstractions around Django Custom Storage Systems. Currently, the following transporters (and their corresponding protocols) are available:
- Amazon S3
- Amazon CloudFront
- FTP
- RackSpace CloudFiles
- SFTP
- Symlink or Copy (if a file hasn't been changed by a processor, a symlink to it is created to a given directory, otherwise the changed file is copied — this makes it possible to use processors in combination with an Origin Pull CDN).
Proof of simplicity (Amazon S3 Processor)

Configuring File Conveyor happens through a simple XML format. After looking at it for a couple of minutes, it should already make sense. Here's a sample configuration file.

It was originally written by Wim Leers as part of his bachelor thesis at Hasselt University in Belgium. For a detailed description of the innards of the daemon, see Wim's bachelor thesis text.

Live sites using File Conveyor

driverpacks.net (>1M page views/month)

CMS support

To further simplify usage of File Conveyor, the following CMSes explicitly support File Conveyor:

Drupal — CDN integration module

Download

You can download this project in either zip or tar formats.

You can also clone the project with Git by running:

$ git clone git://github.com/wimleers/fileconveyor.git

Requirements

Python >= 2.5
OS:
- Linux with kernel >= 2.6.13 (for inotify support)
- Mac OS X >= 10.5 (for FSEvents support)
- Windows: untested

Dependencies

Processors

Most processors require you to download and install one or more binaries — this is because typically file processing is a computation-heavy task and therefore highly optimized binaries are a necessity. Including binaries in the repository makes no sense.
Transporters

All dependencies are downloaded & installed automatically by pip.
Their licenses are all GPL-compatible.
- boto (MIT license)
- Django (parts) (modified BSD license)
- django.storages (modified BSD license)
- python-cloudfiles (MIT license)

Installation instructions

See INSTALL.txt.

Or, if you have pip installed, you can do this:

pip install -e git+https://github.com/wimleers/fileconveyor@master#egg=fileconveyor

License

GPL v2 or UNLICENSE

Authors

Wim Leers — http://wimleers.com
The authors of the many dependencies without whom this never would've been possible!

Contact

Wim Leers — http://wimleers.com/contact