API-First Drupal: file uploads!

published on April 8, 2018

Drupal 8’s REST API has been maturing steadily since the Drupal 8.0.0 was released in November 2015. One of the big missing features has been file upload support. As of April 3 2018, Drupal 8.6 will support it, when it ships in September 2018! See the change record for the practical consequences: https://www.drupal.org/node/2941420.

It doesn’t make sense for me to repeat what is already written in that change record: that already has both a tl;dr and a practical example.

What I’m going to do instead, is give you a high-level overview of what it took to get to this point: why it took so long, which considerations went into it, why this particular approach was chosen. You could read the entire issue (#1927648), but … it’s one of the longest issues in Drupal history, at 572 comments1. You would probably need at least an entire workday to read it all! It’s also one of the longest commit messages ever, thanks to the many, many people who shaped it over the years:

Issue #1927648 by damiankloip, Wim Leers, marthinal, tedbow, Arla, alexpott, juampynr, garphy, bc, ibustos, eiriksm, larowlan, dawehner, gcardinal, vivekvpandya, kylebrowning, Sam152, neclimdul, pnagornyak, drnikki, gaurav.goyal, queenvictoria, kim.pepper, Berdir, clemens.tolboom, blainelang, moshe weitzman, linclark, webchick, Dave Reid, dabito, skyredwang, klausi, dagmar, gabesullice, pwolanin, amateescu, slashrsm, andypost, catch, aheimlich: Allow creation of file entities from binary data via REST requests

Thanks to all of you in that commit message!

I hope it can serve as a reference not just for people interested in Drupal, but also for people outside the Drupal community: there is no One Best Practice Way to handle file uploads for RESTful APIs. There is a surprising spectrum of approaches2. Some even avoid this problem space even entirely, by only allowing to “upload” files by sending a publicly accessible URL to the file. Read on if you’re interested. Otherwise, go and give it a try!

Design rationale

General:

  • Request with Content-Type: application/octet-stream aka “raw binary” as its body, because base64-encoded means 33% more bytes, implying both slower uploads and more memory consumption. Uploading videos (often hundreds of megabytes or even gigabytes) is not really feasible with base64 encoding.
  • Request header Content-Disposition: file; filename="cat.jpg" to name the uploaded file. See the Mozilla docs. This also implies you can only upload one file per request. But of course, a client can issue multiple file upload requests in parallel, to achieve concurrent/batch uploading.
  • The two points above mean we reuse as much as possible from existing HTTP infrastructure.
  • Of course it does not make sense to have a Content-Type: application/octet-stream as the response. Usually, the response is of the same MIME type as the request. File uploads are the sensible exception.
  • This is meant for the raw file upload only; any metadata (for example: source or licensing) cannot be associated in this request: all you can provide is the name and the data for the file. To associate metadata, a second request to “upgrade” the raw file into something richer would be necessary. The performance benefit mentioned above more than makes up for the RTT of a second request in almost all cases.

PHP-specific:

  • php://input because otherwise limited by the PHP memory limit.

Drupal-specific:

  • In the case of Drupal, we know that it always represents files as File entities. They don’t contain metadata (fields), at least not with just Drupal core; it’s the file fields (@FieldType=file or @FieldType=image) that contain the metadata (because the same image may need different captions depending on its use, for example).
  • When a file is uploaded for a field on a bundle on an entity type, a File entity is created with status=false. The response contains the serialized File entity.
  • You then need a second request to make the referencing entity “use” the File entity, which will cause the File entity to get status=true.
  • Validation: Drupal core only has the infrastructure in place to use files in the context of an entity type/bundle’s file field (or derivatives thereof, such as image fields). This is why files can only be uploaded by specifying an entity type ID, bundle ID and field name: that’s the level where we have settings and validation logic in place. While not ideal, it’s pragmatic: first allowing generic file uploads would be a big undertaking and somewhat of a security nightmare.
  • Access control is similar: you need create access for the referencing entity type and field edit access for the file field.

Result

If we combine all these choices, then we end up with a new file_upload @RestResource plugin, which enables clients to upload a file:

  1. by POSTing the file’s contents
  2. to the path /file/upload/{entity_type_id}/{bundle}/{field_name}, which means that we’re uploading a file to be used by the file field of the specified entity type+bundle, and the settings/constraints of that field will be respected.
  3. … don’t forget to include a ?_format URL query argument, this determines what format the response will be in
  4. sending file data as a application/octet-stream binary data stream, that means with a Content-Type: application/octet-stream request header. (This allows uploads of an arbitrary size, including uploads larger than the PHP memory limit.)
  5. and finally, naming the file using the Content-Disposition: file; filename="filename.jpg" header
  6. the five preceding steps result in a successfully uploaded file with status=false — all that remains is to perform a second request to actually start using the file in the referencing entity!

Four years in the making — summarizing 572 comments

From February 2013 until the end of March 2017, issue #1927648 mostly … lingered. On April 3 of 2017, damiankloip posted an initial patch for an approach he’d been working on for a while, thanks to Acquia (my employer) sponsoring his time. Exactly one year later his work is committed to Drupal core. Shaped by the input of dozens of people! Just look at that commit message!

Want to actually read a summary of those 572 comments? I got you covered!


  1. It currently is the fifth longest Drupal core issue of all time! The first page, with ~300 comments, is >1 MB of HTML↩︎

  2. Examples: Contentful, Twitter, Dropbox and others↩︎