Drupal 8: best authoring experience for structured content?

Published on 31 May, 2013

Drupal 8 will ship with big authoring experience improvements: WYSIWYG editing & in-place editing, thanks to the Spark distribution that Acquia — my employer — is sponsoring.

But how well does it fare with the growing importance of structured content? Do Drupal 8’s WYSIWYG & in-place editing enable it or prevent it?

The new web world order: many form factors

The Big Thing of the last few years: the advent of mobile. Inherent to that: websites that are optimized for mobile devices and act as data providers for apps.

A new form factor — mobile devices — changed web development forever. Before mobile, the life of web developers and authors (content creators) was relatively simple: make sure websites work well on a few typical screen sizes (let’s deny the existence of Internet Explorer 6 and all the misery it caused).

But 
 we cannot predict what’s next. We cannot predict new content consumption form factors. That’s where content strategy becomes vitally important:

content strategy is to copywriting as information architecture is to design

We have to make sure that our content is structured and has enough metadata to successfully reuse the same (structured) content for different content consumption form factors. Without having to edit each piece of content again.

Structured content: successfully dealing with form factors

NPR’s Create Once, Publish Everywhere is the most often cited example of a content strategy that successfully provides content for many form factors. They create content once, then publish it to >10 different platforms. With a small team, they do more than some other companies, because of their excellent content strategy. It took them years to evolve their systems in this direction, and it paid off.

Another example is TV Guide. They decided back in the 1980s to capture all semantic metadata, to build a database and extracting a magazine from that, rather than just creating a nicely formatted magazine every time. Thanks to that, they’re still relevant today.

It appears that the reuse of content is something every website should strive towards. There’s nothing inherently bad about it. However, there are downsides.

TV Guide editors used a mainframe application (and maybe still do?). NPR editors use this UI:

NPR editors are encouraged to only think about content, not presentation — hence a very basic data entry UI is all they get 1. This UI looks more like a web front-end to a database than a CMS (anybody else who’s reminded of PHPMyAdmin?)


So, while this may be true:

The goal of any CMS should be to gather enough information to present the content on any platform, in any presentation, at any time.

No CMS really aims to have a poor authoring experience, of course.

Drupal & structured content

Drupal is already well prepared for structured content.

All of the principles that are being used when reviewing code that is being proposed for Drupal core inclusion, are a superset of the principles applied to structured content. Drupal demands full separation of concerns at every level. Everything must be overridable/alterable. Separation of concerns for CSS files, to ensure clean overriding of styling without having to duplicate all CSS. Content may never contain CSS nor depend on CSS. And so on.

Five features in particular stand out with regards to structured content and content reuse:

  1. Structured content: Field API.
    It allows content to be modeled as granularly as desired.
  2. Clean content: Filter system.
    Ensures fancy mark-up is only added on output, and the stored content is as clean as possible. e.g. the fancy typographic features in this very piece of text is automatically added by Typogrify.
  3. Different presentations of the same content: view modes.
    A view mode defines the order of the fields and the field formatter & label of each field. 2
  4. Internal reuse of content (within the website): Views module.
    To create lists, grids, tables, galleries etc. of content, while showing related content. A listing can be configured to use a specific view mode.
  5. External reuse of content (outside the website): REST module.
    To provide JSON, XML, HAL, JSON-LD, YourCustomMarkupLanguage output.

Drupal authoring experience

Drupal’s authoring experience used to be remarkably similar to that of NPR’s COPE. We’ve gone through a lot of effort in Drupal 6, 7 and 8 to improve usability in general. In Drupal 8, the Spark distribution on which I work has specifically targeted the improving of the authoring experience.

Some of the authoring experience improvements in Drupal 8 (in part) thanks to Spark:

  1. two-column backend content editing (with publishing options/meta configuration in a sidebar)
  2. in-place editing for fields
  3. CKEditor-powered WYSIWYG editing

The first is noncontroversial when looking at it from a structured content perspective. It’s the second and third that appear to be counter to the premise of structured content — to quote Karen McGrane about WYSIWYG editing:

[
] we allow content creators to embed layout and styling information directly into their content. Unfortunately, the code added by content creators can be at odds with the style sheet, and it’s difficult for developers to parse what’s style and what’s substance. When it comes time to put that content on other platforms, we wind up with a muddled mess.

or Jeff Eaton about in-place editing:

The editing interfaces we offer to users send them important messages, whether we intend it or not. They are affordances, like knobs on doors and buttons on telephones. If the primary editing interface we present is also the visual design seen by site visitors, we are saying: “This page is what you manage! The things you see on it are the true form of your content.”

First, let me state that I in fact do not disagree with either of them. We’ve actually taken that into account while adding WYSIWYG editing and in-place editing to Drupal core. Let me explain how.

WYSIWYG in Drupal 8: enforces clean markup

By default (in the Standard install profile), Drupal 8 will not ship with formatting/layout tools enabled in its WYSIWYG editor (CKEditor).

We make sure in Drupal 8 to prevent crappy markup and format/layout markup (style, font attributes). It’s not only impossible to set these kinds of “bad attributes” in the WYSIWYG editor using the toolbar, it’s also impossible to paste them in and to use the “source mode” (where you can type HTML directly) to insert them — you can type them in the latter case, but they will be stripped upon going back to WYSIWYG mode from source mode, or upon save if you try to save it without going back to WYSIWYG mode.
This is powered by the new “Advanced Content Filter” feature in CKEditor 4.1, which was added specifically on our request to make this possible.

Furthermore, we made it very easy to configure CKEditor in Drupal 8, yet at the same time very hard to break the above strictness. Only HTML tags and attributes allowed by a specific CKEditor toolbar button will be allowed, even if you add more buttons. So the above “guaranteed clean HTML” will not only be true for the default WYSIWYG configuration, but for any configuration. Drupal 8 will even automatically sync WYSIWYG configuration with filter system configuration:

In the past, configuring WYSIWYG editors was a pain, and in part because of that, the configuration of the WYSIWYG editor and corresponding filter system settings were too permissive.

Finally, we’re currently working on making sure that when you insert an image into a piece of text (with or without a WYSIWYG editor), that won’t result in the final HTML like <img src="/files/styles/thumbnail/llama.jpg" width="100" height="100" alt="Awesome llama!" />, but instead in a placeholder that the filter system will transform into the final HTML upon output: <img data-file-uuid="aa657593-0da9-42c0-9a05-5d63d27ad27d" data-image-style="thumbnail" />.
In other words: the text should only contain text and programmatic references to other content; the filter system should then handle “upcasting” these into their final form. This will make it much, much easier to upgrade existing content to new image styles, to modify referenced media, to migrate to a new CDN, and whatnot.

WYSIWYG in Drupal 8: from brochureware to newspapers

Drupal needs to cater to both the extreme of very structured content for maximal reuse and to the extreme of unstructured content (where pretty much all data is in a single “blob” called the “body” field, besides maybe a “title” and a “tags” field). It also needs to deal with everything in between.

Drupal may be used for news sites, but also for brochureware sites. By having the WYSIWYG editor be configurable, and hence letting the site builder choose whether formatting/layout tools are available or not, we empower the user to choose.

WYSIWYG in Drupal 8: previews are evil? WYSIWYM to the rescue?

A WYSIWYG editor by definition provides a preview — a best effort preview, that is not guaranteed to be accurate. Providing a preview is not a problem in and of itself, as long as the author knows and understands that the content will be used in multiple contexts, where it will look different.

Of course, reality is that not every author will be sufficiently educated, so we have to take potential abuse into account. Drupal’s filter system and very strict WYSIWYG editing in Drupal 8 do precisely that.
What might be even better though, is if we were to make it explicitly visually obvious that the WYSIWYG editor is indeed providing a best-effort preview: visualize the building blocks of the content that the author is using, to make him very aware of the structure of the content that he’s creating.

This is what is some people have called WYSIWYM: “What You See Is What You Mean”. 3 Wikipedia defines it as follows:

WYSIWYM (an acronym for “what you see is what you mean”) is a paradigm for editing a structured document. It is an adjunct to the better-known WYSIWYG (what you see is what you get) paradigm, which displays a formatted document on screen as it will appear in only one mode of presentation.

The main advantage of this system is the total separation of presentation and content: users can structure and write the document once, rather than repeatedly altering it for each mode of presentation, which is left to the export system.

A HTML text editor specifically built for to be a WYSIWYM HTML editor exists: WYMeditor.

WYMeditor’s main concept is to leave details of the document’s visual layout, and to concentrate on its structure and meaning, while trying to give the user as much comfort as possible (at least as WYSIWYG editors).

  • You may have tried a full-featured WYSIWYG editor, but you apprehend that your clients use it inappropriately, with the risk it degenerates visually and on the code quality.
  • You may also have tried the BBcode syntax, Markdown or the wiki-style syntax, but you don’t want to force your clients to solutions that are too technical/complex for them, even if it tends to generate good quality code.

The downside of WYMeditor (besides its utilitarian UI and absence of keyboard accessibility) is that it doesn’t support the whole range of websites that Drupal needs to support: some people want to do everything in a WYSIWYG editor, and for the simplest websites, that’s acceptable. Drupal tries to impose as few choices as possible.

So, ideally, we’d use CKEditor, with a way to turn on a “WYSIWYM mode”. The great news: this already exists to a certain extent in the form of its “Show Blocks” plugin! (Which we’re already shipping with Drupal core specifically to accomodate this.)

If we find this an acceptable solution, then all we need to do is improve CKEditor’s “Show Blocks” plugin!

Of course, this line of reasoning might come across as a superficial solution that isn’t a real solution. But let me demonstrate that the core a this pattern has been used for almost 20 years: in the LaTeX world.

WYSIWYM & LaTeX: LyX

I’m sure many of you know LaTeX. It’s a “document markup language and document preparation system”. It’s typically used for writing papers, but also books. 4

LaTeX is based on the philosophy that authors should be able to focus on the content of what they are writing without being distracted by its visual presentation. In preparing a LaTeX document, the author specifies the logical structure using familiar concepts such as chapter, section, table, figure, etc., and lets the LaTeX system worry about the presentation of these structures. It therefore encourages the separation of layout from content while still allowing manual typesetting adjustments where needed.

That really captures the gist of it: authors focus on content, don’t think about visual presentation. That’s up to “the system” to figure out. Now, here too, it is the domain markup, and complete knowledge of it, that is problematic: the plethora of LaTex commands.

That’s why tools like LyX exist. LyX is essentially an easier to use interface to generate LaTeX. It shields the user (mostly) from the rather complex LaTeX markup. It provides a preview of sorts, but one that clearly looks completely different from the end result that LaTeX’s typesetting will generate: LyX encourages writing based on structure (WYSIWYM) rather than appearance (WYSIWYG).

If all of the above sounded rather abstract, let’s look at an example:

  • Writing LaTeX: here’s a tiny subset of the LaTeX code — see the attached file for more:
     In inline formulas it looks like this:
    \begin_inset Formula $\lim_{x\rightarrow\infty}f(x)$
    \end_inset
    
  • Writing LaTeX in Lyx:
    Writing LaTex in LyX
  • The output for both:

LyX’ initial release was in 1995. It’s still actively being used. Many, many papers have been written it as well as many books.

But 
 WYSIWYG editors suck!

Sure, WYSIWYG editors sucked
 because they allowed for formatting & layout, which Drupal 8’s WYSIWYG editing doesn’t allow.

We still have work to do to stress the importance of content structure over content presentation — see the WYSIWYM section above. But that can be bolted on top of the solid foundations that we already have.

So, these wonderfully colorful quotes used to be painfully true, but they’re not applicable to Drupal 8’s WYSIWYG:

WYSIWYG Editors suck because they promote thinking about style rather than content. While content editors are busy changing headings to Comic Sans, pondering the use of a grimacing smiley on their about us page or getting creative with colour, they are not considering the actual copy they are adding to the site.

WYSIWYG Editors suck because as a designer you lose control over big chunks of the design. Anywhere that allows people to enter HTML via an editor allows them to get as creative as they like, using any mark-up that they like. Unless you carefully go through and remove all the creativity that stuff is going to stay there. For developers, even if you switch off most of the buttons, just allowing the administrator to enter simple formatting and links, you still have a situation where a user is entering HTML which you then display on the website. This can enable all kinds of stuff to get into your content, which is then very hard to remove and fundamentally tied to the current design of the site.

In-place editing

In-place editing does not inherently conflict with structured content. In fact, for most things, Drupal’s implementation of in-place editing stresses the fact that the content is structured: most structured data is impossible to edit in the same way as it is presented. Only for textual fields, we offer the ĂŒberfancy “true WYSIWYG in-place editing” capability, where Jeff Eaton’s quote from above is most relevant. Even there though, abuse is prevented by the very restrictively configured WYSIWYG editor. For other fields, like taxonomy terms, image fields, boolean fields and so on, we still offer a form-based editing UI while editing in-place, and the danger of letting content presentation prevail is extremely limited.

To a degree, in-place editing can even be useful in increasing awareness of the need for structured content. If the content isn’t structured (i.e. one blob of data, for example a “body” field containing all content besides the title), then that becomes immediately and painfully obvious: no specialized, optimized in-place editors appear to edit the particular piece of content; instead you’d have to find your way to the particular thing you want to edit in the body field.
In-place editing in the way we’ve implemented it encourages structured content.

In our initial implementation of in-place editing, there was more potential for misunderstanding and abuse. But we’ve made two important changes:

  1. in-place editing is no longer triggered on the page level, but at the entity level: the user must declare his intent to edit a specific entity in-place. So the user can no longer get the impression he’s “editing the page”: he’s explicitly made aware of the type of content (entity type) he’s editing (node, taxonomy term, custom block 
) and of the field within that piece of content (entity) that he’s currently editing (Title, Author, Body, Tag, Image 
).
  2. in-place editing is no longer saving each field individually, instead the modified fields for a specific entity are queued up and saved at once, this strengthens the communication to the user that he’s editing a singular piece of content that just happens to be rendered on this particular page. (In progress.)

Finally, in-place editing is only designed to be used for quick edits (hence it being triggered by a “Quick edit” action in the contextual links of entities). It’s intended to bring a level of “delightful interaction” to editing, instead of being forced to go back to the overwhelming back-end form every single time, even if you don’t need to modify metadata.

Education, understanding, awareness of content reuse

It is absolutely essential that authors (content creators) understand the entire flow of the content: from creating it first, using each field for its proper purpose, to the different ways that content might end up in output.

Because in-place editing happens on the output, and output can happen in many ways, in-place editing never allows all the content to be edited: at the very least it is going to be impossible to edit metadata. From that last perspective, it’s definitely possible for an author to abuse in-place editing.

We need to provide omnipresent, explicit awareness whenever an author is creating or editing content. Both when editing on the back-end and on the front-end. Low-fidelity, simultaneous previews of the different view modes and preferably on multiple form factors would be the ideal here.

Embedding this explicit awareness is something we still have to achieve for Drupal.5

Data storage in NPR’s COPE

We saw NPR’s UI earlier in this article. What we didn’t see yet, are two fundamentally different ways of storing the data within what is presented as a single field to the end user:

  1. Each paragraph of a single text field is stored as a distinct database record. This also implies that the position of the paragraph needs to be stored. (See the full diagram for details.)
  2. When saving a paragraph, all HTML markup it contains is stored independently: it stores just the text in one database record, and then there is one database record per HTML tag used within that paragraph, which stores the type of tag, the start and end position of that tag within the text, and the attributes for that tag. They call this Markup Addressing:
    .

In essence: extreme database normalization!

Drupal does not yet support this out of the box. The question is whether this is actually necessary? There’s a lot of additional overhead to going so far in normalizing data. What is the use case for storing individual paragraphs in separate database records, when many paragraphs are meaningless without the surrounding paragraphs?

The use case for storing the markup separately from the text it was applied to is more clear: to easily facilitate those platforms that don’t use HTML markup, and to support changes in markup more easily (e.g. <b> → <strong>). NPR decided against the alternative: storing the markup in the database and filter (strip/transform) it on the way out.
The main gripe Daniel Jacobson had with “filter on output” is based on how he’d seen that implemented before: hard-to-maintain scripts and most systems allowed all markup to be used. However, Drupal already has a mature system to deal with that: its filter system.

Both architectures have downsides. Neither is clearly superior6. Time will tell whether Drupal’s data storage approach needs to evolve.

Conclusion

WYSIWYG and in-place editing can clearly be highly problematic when it’s implemented like it has been for many websites for about a decade now. For many websites, they have been (ab)used to the extreme point of entire HTML pages being built by a WYSIWYG editor, which has caused consistent inconsistency and utter lack of reuse. Liked by authors at first, until things went bad — or until the next redesign.

The other extreme is a system like NPR’s COPE, where it is guaranteed that content is consistent and reusable. At the cost of the authoring experience.

However, I believe that using WYSIWYG editing in a very disciplinary manner combined with a well-defined system for filtering on output and a data model similar to NPR’s COPE, can yield equally successful results as NPR’s COPE, but with a significantly better authoring experience.

Sources & related reading


  1. Both examples are content businesses. The efficient managing and reusing of that content is the whole reason they exist and survive. Hence it is acceptable for them to have a very poor authoring experience. Also: the data model has to be right from the beginning; if something was missing or wrong, it may be impossible to transform old content to the updated data model. Hence there is also an intentional lack of flexibility. â†©

  2. Use the Entity View Modes module to create new view modes. â†©

  3. Not in the sense that it was discussed at the WYSIWYM BoF at DrupalCon Portland, where it was really about semantic annotation↩

  4. The whole reason it exists is because somebody got fed up with messing with WYSIWYG editors to get everything just right: the typography, the whitespace, the layout, and so on. Instead, that person wanted to just write the content and have software automatically calculate optimal whitespace, optimal typesetting. â†©

  5. The Spark team has already been working on this to a certain extend: the responsive previews patch. However, it is not tightly integrated with editing; neither on back-end nor front-end. â†©

  6. Ideally, there would a domain-specific markup (as in, a markup with annotations for the specific knowledge domain of your site) that has more expressive semantics and would then be transformed to HTML when the content gets rendered for web purposes, and to something else than HTML for other purposes. We should explore this.
    But at the same time, the threshold would become rather high: which sites, besides those whose primary business is the longevity of their content, the long-term relevance and reusability of their content, will want to invest to build their domain-specific language?
    It requires a lot of discipline and research, to come up with a sufficiently expressive domain-specific markup. Precisely because once you’ve begun expressing content using your domain-specific markup, there is no way back. You cannot automatically enrich existing content with newly added domain-specific markup. The domain-specific markup must be complete before you begin using it.
    Not to mention that either the author will need a complete understanding of the complete domain-specific markup as well, because otherwise it will all have been a measure for nothing. Once you enter this realm, it’s also very realistic (and human) for authors to forget about a few elements of the domain-specific markup. So then something like a WYSIWYG editor, but with buttons that generate the domain-specific markup could be a great help. This is once again WYSIWYM↩

Well, I’ve been called out so I’d best weigh in. ;-)

I won’t bang my drum about inline editing beyond what I’ve already written, but the “body field” WYSIWYG issue is a related one that I’ve been digging into a lot lately.

First off, I think the approach you’re describing is a huge improvement over the “kitchen sink” approach that is often used when exposing WYSIWYG editing functionality. A lot of the pain and suffering inherent in WYSIWYG can be reduced by stripping out the egregiously presentation-oriented markup features like colors and fonts, and taming the dreaded “paste from Word” feature.

We’ve found (consistently) that the “standard” markup elements like em, strong, blockquote, h1-h6, img, ul, li, and even table are pretty straightforward. As long as people aren’t abusing the tables for layout purposes inside the body, and Drupal fields are being used to manage appropriately “chunked” data, we’re in pretty solid shape. Sufficiently creative output filtering and CSS can adapt that markup to responsive sites and alternative output channels quite effectively. I like to think of this aspect of the issue as the “formatting” problem.

The challenges come when richer semantic concepts enter the picture: captioned figures, document transclusion, interactive elements like header-oriented collapsible text, inline charts and graphs, etc. Those things almost never correspond to a single simple HTML element, and the underlying semantic meaning may need to be represented with different markup depending on the output channel. I’ve seen other writers refer to this as the “upstream meaning, downstream markup” divide.

The formatting problem is (IMO) completely solvable using the kinds of techniques you’re discussing. The meaning/complex structure problem requires some different approaches, and in many cases the toughest aspects will have to be site-specific. The problem isn’t just in visual vs. markup representation, it’s the mismatch between HTML’s vocabulary and the concepts that need to be expressed.

We’re actually in the process of generalizing a couple of the tools we’ve used on previous projects, and working on ways to smooth the implementation curve for the site/business specific pieces that inevitably arise.

I’ll reiterate Karen’s comments from the Drupalcon keynote – I have no objections to assistive editors, even ones called ‘WYSIWYG!’ ;-) A toolbar, visual cues inside the text area instead of “ugly markup,” buttons and tools that make the editing process easier
 all of these are really important parts of improving the editorial experience. The challenge is figuring out how to do this without recreating the long-term markup reuse problems that have plagued other systems.

Thanks for the hard work in articulating this stuff, and the great UX and development work that’s been going into the D8 editing interface!

By now, we’ve talked about this in real life, so I think we’re indeed on the same page :)

It’s interesting and very insightful of you to split the problem in formatting (which is solvable and arguably solved in Drupal 8) and meaning/complex structure. The latter is indeed much, much harder. And would be elegantly solvable using domain-specific markup, for which the technical/financial/educational setup cost would be too high for many sites, unfortunately.

I also like the specific challenges you call out: > The challenges come when richer semantic concepts enter the picture: captioned figures, document transclusion, interactive elements like header-oriented collapsible text, inline charts and graphs, etc. Those things almost never correspond to a single simple HTML element, and the underlying semantic meaning may need to be represented with different markup depending on the output channel.

I believe we have solved captioned figures by shipping the Caption filter with Drupal 8 (which equates to “custom markup”: data-caption and data-align attributes) and made using that to caption images — the most common use case — usable for all thanks to a CKEditor Widgets-powered UX. And AFAICT you agree with those claims — let me know if I’m mistaken there :)

I believe that in the case of document transclusion, custom markup (e.g. <drupal:entity type="node" id="345"></drupal:entity>) + filter + assistive “WYSIWYG” editing UX is once again the solution. Drupal core could and should provide a built-in solution for that.

The others are less clear cut and different sites may want to use different approaches, but I think that in general the “just write a filter to deal with your custom markup” approach is solid, and as long as you only have to implement a handful of them, it should also be manageable.

I think the big challenge there is to come up with a system of no longer requiring custom filters to be written for each use case plus accompanying custom assistive “WYSIWYG” editor plugins to be written to make the UX nice. It is my hope that Drupal 8 contrib will experiment a lot in that area, so that we hopefully will learn enough by the time we work on Drupal 9 to make that a reality :)

Matthew Oliveira

11 years 7 months ago

Great article, really enjoyed it.

I second Eaton’s comment about the limitations of WYSIWYG. When you have a simple mapping from what the semantic meaning you as an editor are trying to express to a simple HTML element(s), you’re golden. As soon as you want something like an image caption, which doesn’t map to single representation in HTML, suddenly the WYSIWYG editor falls apart.

I was encouraged by Nate’s talk here: https://portland2013.drupal.org/node/2878 that talked about solving this by hijacking CKEditor’s default dialogs with something custom for Drupal, e.g. an image insert dialog that has an option for a caption. Not sure how this is implemented, but it would be good to have some intermediate representation of that image caption, something like Wordpress does with it’s short tags API:

[caption id="attachment_120" align="alignleft" width="300"]<a href="http://local.alro.com/wp-content/uploads/2013/05/5953291314_74d8e8b37e_o.jpg"><img src="image.jpg" width="300" height="190" /></a> This is a caption[/caption]

On output, it’s filtered into the HTML markup needed, which can change without the content needing to change.

Yep, that shortcode-style approach was the starting point of the mechanism we’re leaning on now. My only concern is that it’s essentially inventing a parallel markup format inside of the custom markup format.

The approach taken by for image captioning in Drupal 8 – overloading the standard HTML element with data-* attributes – feels like a much more flexible system that could be used in other, similar situations.

Glad you enjoyed it, Matthew! :)

I share your concerns, but this is in fact a solved problem by now — when I wrote the article, that was still a work in progress, but by now it has landed. From my reply above to Jeff Eaton:

> I believe we have solved captioned figures by shipping the Caption filter with Drupal 8 (which equates to “custom markup”: data-caption and data-align attributes) and made using that to caption images — the most common use case — usable for all thanks to a CKEditor Widgets-powered UX.

That implements the spirit of what you were suggesting, but using a different method, for a reason that Jeff Eaton already pointed out in his reply to your comment: > My only concern is that it’s essentially inventing a parallel markup format inside of the custom markup format.

Exactly! That’s highly problematic. It makes it unnecessarily different to manage, maintain, massage, transform that content.

The advantages of data-* attributes in comparison are numerous:

  • simply HTML: parsing & transforming can be implemented much more robustly
  • much more extensible1
  • graceful degradation when the output filters are missing

  1. i.e. add not only a data-caption attribute, but also data-source and data-license attributes to an <img>, which would translate into a crazy nested syntax in the [caption 
]<img>[/caption] example. â†©

Joseph

11 years 7 months ago

First of all, I found this post encouraging for the future of Drupal.

Second of all, how do you do footnotes on this site? I like it. Is there a module you’re using or something?

Thanks, -Joseph

As others have noted, having clean and tidy and semantic HTML is only part of the picture. That assumes an HTML output. “The Web” contains more than HTML now (weird as that sounds). The REST API project I’m on now has both browser-based and non-browser-based clients. For that reason, we’re not allowing HTML anywhere but instead planning to use Markdown and ship that straight to the client applications to render to the appropriate format locally. Effectively we’re using Markdown in place of a DSL.

Both Wim and Jeff are right that a DSL is the ultimate optimal solution, but hardest. The trick, though, is that a DSL is simply an inline form of chunked data. Fields are (as Wim correctly points out) awesome for chunked data.

That is, in the ideal case
 Drupal is your DSL. :-)

Oh, the siren song of custom XML schemas. If we keep talking, someone’s inevitably going to say that we should use DITA. ;-)

This is really the heart of the problem, though. Semantically structured HTML, managed with care, can be transformed into other forms but we have to plan for it rather than slathering that on after the fact. I’d also argue that certain techniques (like core tags with data-* attributes to layer additional meaning, or custom HTML5 element types) can get us some of the advantages without going whole-hog XML.

While Drupally field chunking is often a good solution, I don’t think we’re ever going to overcome the need for some rich content in text fields. Fields capture the fact that a piece of data is associated with an entity, but not where that piece of data lives in the narrative flow of a larger body of text. When that aspect is actually important, we enter the world of semantic editors. ;-)

Whoa, this is quite an interesting read. I am surprised to learn how many tools and considerations we already took to better support structured data. Ever since CCK, I think Drupal has tried to provide structured content. The fact that it can do this, even somewhat from the UI has been a big contributor to its succes.

I personally don’t think WYSIWYM, is an answer to this need — it’s a way to expose the structure that is applied. It highly depends on education whether content creators can add meaning to that structure. I think it’s in the same realms as markup, since it makes the relationship between content and structure more explicit. I think LaTeX has been so successful, because content creators wish to publish their content within a certain system — the presentation is less important to them than conforming to this system. Although this is true for Drupal, the system is much more free in how you express the content.

I think the best solution would be a mix of #6 and more advanced previewing. Currently previewing is largely a “best attempt” because our technology doesn’t come close, but there are many ways it can get a lot closer. What we really want is content creators to be able to see their content in different contexts, it should be part of their workflow to preview and adjust/optimise. However there is currently still a disconnect between the places where “chunks” live, and your ability to see that through the creation/editing interfaces. I think Drupal’s job would be to keep track of those connections, and provide the ability to see different contexts, devices is really just one of them (as you note, there are view modes, Views and even REST).

IPE, in many ways, brings editing and these different contexts a lot closer — in a way contextual links did this too, but I feel like IPE adds another dimension to it.

The normalisation is quite an interesting approach. I always wondered if the truly future approach isn’t more in the realms of machine learning, where (search) tools have a better understanding of meaning in sentences. Currently this requires a lot of data attributes, e.g. Wolfram Alpha to work. But the holy grail is from my point of view in being able to extract meaning not just from phrases/words, but to divide a paragraph and sentence into meaningful parts (objects, prepositions, modifiers etc.) that can be used as chunks elsewhere.

Just philosophising here :)

@eaton It’s good to see we are still missing essential parts, I hope plugins are able to capture the more advanced elements. The question is how it maps to the user experience, it’s often that these advanced elements come with a heavy set of configuration.

I think LaTeX has been so successful, because content creators wish to publish their content within a certain system — the presentation is less important to them than conforming to this system.

I don’t think that’s true. It is possible to provide additional metadata or even specific instructions in LaTeX markup for the LaTeX processor, to respectively direct the presentation or specifically control the presentation.

It’s a simple fact that most of us don’t have the necessary skills to perfectly align every single symbol to yield an optimal reading experience. I know I don’t. LaTeX takes those worries away and lets you worry about the content.

Just like you could — in Drupal — add a custom <drupal:entity type="node" id="345" /> HTML tag and write a filter to transform that into something useful, you can write custom commands to accommodate your semantical needs in LaTeX:

\newcommand{name}[num]{definition}

The differences are that LaTeX is oriented towards page output (just like HTML 4) and Drupal’s stored HTML is oriented towards stand-alone pieces of content intended for reuse inside and outside of the website. But in theory, I think it’s perfectly plausible to transform every single piece of “filtered text field” content in Drupal into LaTeX or vice versa.

Anyway, enough about LaTeX.


I personally don’t think WYIWYM, is an answer to this need — it’s a way to expose the structure that is applied. It highly depends on education whether content creators can add meaning to that structure. I think it’s in the same realms as markup, since it makes the relationship between content and structure more explicit.

I’m not sure what you mean by adding meaning to structure. In the context of what we’re talking about, they’re the same? You see the structure of your text. You mean to apply a blockquote structure to a piece of text, so that is what you see in WYSIWYM.

You’re right that it’s in the same realm as markup (and having to know mark-up): you have to know the different concepts. But a big difference is that you don’t have to know the syntax anymore. WYSIWYM to me is just about making writing markup a lot easier.

The point is that authors should not think about what the blockquote or heading or paragraph or code sample looks like, but that the thing they’re writing is in fact a blockquote, heading, paragraph or code sample: WYSIWYM — meaning over a pretending preview.
The cool thing is that we can offer three ways of content creation in Drupal 8:

  1. WYSIWYG (as is implemented today in Drupal 8)
  2. WYSIWYG + WYSIWYM (by simply enabling the “Show Blocks” plugin that ships with Drupal 8) — you can see this in the screenshot in the article.
  3. WYSIWYM (can be implemented in Drupal 8 by overriding the CKEditor stylesheet to something that styles all content in a monospaced font etc.)

However, using WYSIWYM does not mean that we should abandon previews altogether. It’s merely inappropriate (except for brochureware sites) to be editing inside a preview (i.e. WYSIWYG). Previewing in different contexts is indeed very useful.
It’s the WYSIWYG expectation — “what you see while editing is precisely what you’ll get when viewing” — that is problematic. Hence my proposal to make it visually obvious that it’s a best-effort preview
 of a single channel/context.

znerol

11 years 7 months ago

Thanks for that very interesting writeup. I especially appreciate the look behind the NPR scenes.

On a Drupal 7 newspaper site we allow our editors to put literally anything into articles. Interactive maps, tables, figures (including captions), embedded YouTube movies, etc. However the body field is restricted to a very tight set of HTML tags allowing not much more than structured text. We do not even allow there.

In order to protect the body from text-unrelated markup and still make it possible to insert fancy stuff, we developed a scheme largely based on the Field Collection and Field Injector modules. Instead of inserting the tag directly into the body field (or having that inserted by some plugin), our editors upload the image into a field collection item along with a caption (of course the form is embedded into the node editing form). A simple integer field allows them to choose the paragraph number where the image should be displayed — if the default value is not good enough.

We use the same mechanism to support non-restricted HTML content for special cases. The body text remains the same, even when editors insist on an embedded YouTube movie in the middle of the text.

Because field collection items are entities, we can have bulk operations-enabled administrative Views for them. And when that’s not enough, there is still EFQ Entity API.

This mix works out pretty well for us.

Very interesting — thanks for sharing! :)

What you describe is indeed another way to achieve this. However, it seems more restrictive and more brittle to me at first sight: what if there is no paragraph, but only a <blockquote> and a <ul>, for example? Sure, you can accommodate those cases, but it’s easy to think of such edge cases.

Furthermore, that does not solve the case of wanting to “inject” things inline (e.g. a link to a node whose title is automatically updated when the node title changes). For such cases, you still need filtering on output.

I do see a broad range of use cases where this will work just fine though :)

Hey Wim, really nice post and thanks for all your efforts in improving Drupal’s content authoring experience.

Some thoughts:

  • Can you make your comments Markdown enabled? :)
  • What you are doing with images is very similar in its intent with how we ended-up handling rich media assets (audio, video, images etc.) in Public Media Platform. I should be able to share some of that thinking with you soon, in case it’s useful. So: stay tuned.
  • I think that in-place editing, for a content-management system, is evil always and under any circumstances. Any way you try to cover it, it’s still extremely tight coupling of presentation with the content, promotes editor’s thinking that what they see is what everybody else will see – a perception that is increasingly and completely wrong.

Tight coupling of content with presentation is a large topic and an extremely important one. It’s not enough to avoid it at a single content-item level but it should be a cross-cutting concern.

Here’s the thing: in this day and age our main concern is not just that content destination is diversified (which is what COPE was addressing years ago) but that content’s sources are also highly diversified. For all but the simplest use-cases, it’s smart to embrace the notion that: There is no single CMS anymore!

This notion is so important that we had to revise COPE into CAPE to facilitate it: http://bit.ly/capeapi (slides have minimal text in it, but there’s full narrative in slide notes).

Bottom-line is: assuming that all content on your website comes from a single CMS is wrong, dead wrong. The reality is that content comes from many sources and it’s because of the “traditional” tight coupling of content editing and content presentation that those sources need to unnecessarily go through the “main CMS”.

I believe that the future is not in a monolithic CMS. Not even one as customizable as Drupal. I believe that the future of content management lies in an elegant collaboration of loosely coupled Content Tools, each one of which exposes some vertical of content via a web API. Each tool can be written in a completely different language/framework and deployed on separate servers.

Content through those APIs can then be run through a web rendering layer to produce HTML. It’s very important to assume that the rendering layer is built in different technology than content tools and deployed separately.

Why?

Two reasons:

  1. Security: your website must be public, but your content management may need to be behind VPN, deployed separately.

  2. We need to stop thinking of website as something that is allowed to access database systems directly. Web is just one of the many target platforms which we push our content to. Don’t make it special! You would never dream about letting an iOS native app access your databases directly: you’d route them through an API. Do the same for your website as well.

  • These comments are Markdown-enabled! It even says so in Drupal’s helpful (yet terribly crappy UX-wise) filter tips right below the commenting <textarea> ;) :)
  • Glad to hear we’re apparently applying the same reasoning — that’s usually a good sign :) Any news about
    I should be able to share some of that thinking with you soon, in case it’s useful. So: stay tuned.
    ?

This is very interesting:

Bottom-line is: assuming that all content on your website comes from a single CMS is wrong, dead wrong.

And I think it’s indeed increasingly true. But I don’t think it applies to everybody. I’m sure it applies to large organizations such as NPR.

I believe that the future of content management lies in an elegant collaboration of loosely coupled Content Tools, each one of which exposes some vertical of content via a web API. Each tool can be written in a completely different language/framework and deployed on separate servers.

I believe you’re right — but only for big sites. The part I emphasized is what indicates to me that this no longer applies to smaller websites. For smaller websites, what you describe is simply too technically advanced (at least in the foreseeable future). Each “vertical of content” (I assume that means articles versus videos versus
) gets its own tools, you say — but that means many UIs to learn, many systems to connect, manage and scale.
Everything in my article benefits small sites as well as large ones.

I think Drupal will indeed need to become better at becoming “just” a Content Repository, so that it can become a viable component in the architecture you describe. Maybe Drupal should even be split into two parts — then it could very well meet your two requirements/reasons.

But there will always be a need for an integrated system.

Picking up from our Twitter conversation
 A great content authoring experience should have the ability to control how text flows around image. WordPress has this ability, and as you say, maybe their tool creates bad markup, to which I say “oh well”. Often there is not a silver bullet in software design, so we have to pick the lesser of two evils. In this case, I think the content author’s experience should win over the themer’s experience because if the author’s experience isn’t great, then there won’t be anything to theme. Also, there are MANY MANY more content authors than themers.

The author doesn’t need font colors, or underline. The theme can take care of that, but they MUST be able to control the flow of content. Authors live in the world of Word, PPT, or similar. Most pick themes to guide their work and won’t change that because they know they’re not designers. But imagine picking a PPT or Keynote theme, and being forced to place images on the left, or right. Like it or not, content authors will compare Drupal’s editing experience to it. You need to get as close as possible.

Jeff,

it’s not the author’s experience vs. themer’s experience. The only “experience” that matters is that of of the reader!

Unfortunately, we see this confusion way too often: developers think they work for authors (mostly because “authors” pay for CMSes
 at least when they pay). The reality is: developers, designers, themers and authors all work for the end-user: the reader, listener, watcher. It’s either that or they all fail.

When put in that perspective, priorities do change. Readers couldn’t care less if author has freedom to place images on the right or left, they care about pleasant and intuitive reading experience.

And: there may be more authors than themers, but there’re millions of times more readers than authors :)

A great content authoring experience should have the ability to control how text flows around image. Wordpress has this ability, and as you say, maybe their tool creates bad markup, to which I say oh well. Often there is not a silver bullet in software design, so we have to pick the lessor of two evils.

To be clear, I think a lot of this discussion gets tangled up in the idea of “bad markup,” with old-timers fighting off flashbacks of Adobe Pagemill and MS Word HTML output. The concern isn’t simple cases like ordered lists, emphasis, or even inline images – the markup for those things has always been quite straightforward. Rather, it’s the more complex scenarios like fully captioned images, embedded slideshows, inlined media elements, and structurally-intensive “house styles” like citation and design treatment.

The assertion that “We need to get as close as we can to Word and Powerpoint and Keynote” is an assumption that needs to be examined, IMO. It’s no different than someone from 1995 saying that we need an HTML editor that feels like Photoshop, with pixel-perfect drag and drop alignment of every image and text block. Quite a few of those tools were built, in fact, and we learned a lot of painful lessons from them


Indeed! The idea of “give the user control” vs “not” is, IMO, a false dichotomy. Even if we just consider variable sized screens for the moment (and there are plenty more issues besides that), the question is better phrased:

Do we make the user have to think about the layout anywhere from 2-8 different layout considerations when entering content, or do we automate it so they *don’t* have to be graphic designers just to make a page not look like ass on the latest phone?

Most users really don’t want to have to think about that, I wager. They may think they want all the controls!, but they change their mind very quickly. And that’s still just dealing with the visual layout, before we get into questions of content strategy, reuse, Views, multi-channel publishing, and other complications.

“With great power comes great responsibility”, and most content editors, I wager, don’t actually want that responsibility. If they did, they’d be designers, not content editors. :-)

Jeff Noyes

11 years 4 months ago

The reader’s experience is created by the author and the themer. That is to say, the reader’s experience will be poor if the author writes poorly or the content flow is choppy. It will also be poor if themer fails at vertical rhythm, contrast, uses hard-to-read fonts, etc. Drupal can only control reader experience by giving the author and themer the right tools, so I think “the reader’s experience being the most important” is of little relevance here.

In providing the right tools
 A themer may have to hold his nose while working with bad WYSIWYG markup, but they know how to maneuver the cruft, they’re motivated by getting paid, and it’s set-it-and-forget-it task. In contrast, an author uses this tool to send his reader messages, again, and again, and again. They will only hold their nose if made too. And they do not get paid unless what they’re authoring has an impact. To have an impact, the author needs:

  1. writing skills (we can’t help here)
  2. a content authoring tools that helps them deliver their message
  3. a theme for ensuring their content looks nice / on brand.

I think you guys are over complicating my point. I’m not saying we need a WYSIWYG that’s comparable to having Photoshop spit out HTML, or by making sure the authors experience factors in N number of layouts across different devices. I’m also not saying we need to recreate the Word or PPT experience in Drupal. All I’m saying is that you have to consider the author’s point of reference — which often will be Word, PPT or similar. And with that, there is a certain bar that will be expected, and that bar has to include the ability to wrap text around images.

FWIW, I just went through this with a client last week. He didn’t know what WYSIWYG meant. He didn’t know that he was authoring markup underneath. All he knew was that he wanted his page to look like X — which was a single column layout, had headers, some bold text, and some images with text wrapping around.

All he knew was that he wanted his page to look like X

How did he want his content to look on a mobile device? How did he want his content to look when somebody shared it on Facebook? How did he want his SERP to appear in Google? How did he want his content to perform on a screen reader?

If your client doesn’t know what a WYSIWYG is, he probably also doesnt know the importance of COPE/CAPE.

This isn’t really a question of a themer simply working with crufty WYSIWYG markup to make it look right to the author in one singular context. The author in a modern publishing world is creating content that has the potential to be presented in myriad ways. Creating an editing environment that encourages the author to think in those terms helps to decouple the idea that they are creating a page that looks like X. And instead fosters the idea that they are creating content that has value in multiple contexts, and sometimes it will look like X, and sometimes Y.

@Jeff,

I hear you more than I am able to express using this tiny comment text-box. I’ve been in your shoes, working for editors. And then I had a chance to work alongside the editors, designers and product managers for the benefit of users. These things are not the same, not by a long shot.

You are absolutely correct that many editors use MS Word as a point of reference. And to that if they are paying the bill and that pretty much explains the horrible experience most websites deliver. I agree with you 100% in the description of the status quo, but that doesn’t make status quo right.

Those editors aren’t trained user interface or user experience designers. It’s none of their business to make calls whether text is laid out as one column or two columns. They simply aren’t qualified for it. Add to that everything Ryan said about mobile/tablet/Facebook/Google Glass view of the content and it becomes abundantly obvious that using Word as the point of reference is the absolute worst thing the editors can do. To be fair, though, not all editors/writers are equal. Many have no designer to pretend being a designer.

The success of writing platforms such as Medium is also very important. How far is Medium from WYSIWYG/MS Word? As far at it gets. And it is intentional. And plenty people love it. There’s an important lesson somewhere there.

Jeff Noyes

11 years 4 months ago

If you haven’t already, it might be worth thinking about whether all of those services are defaults. Should a vast number of users get an editing experience that doesn’t map to their mental model because some users also want their content tool look good on Facebook, Google, etc.? Does it make sense to break advanced features into modular components?

For displays, can you specify wrapping conditions for just the two extremes, phone and desktop, and make others extensible by modules? For multi-channel distributions, can you port to one or more modules?

I’m not sure what the answer is, but it feels to me that you’re falling into the typical Drupal trap and designing for the community, or thinking so hard about how to scale to the power users that you’re possibly neglecting others. Many users want responsive content, multi-distribution channels, e.g. Facebook, Google, Twitter, etc. But many more don’t even know how to think like that. Are you designing for the minority and neglecting the majority? What can be done to do both?

The reason many people want or realize they should want multi-channel digital publishing is because there’s overwhelmingly abundant data clearly showing desktop-browsing is steadily decreasing in overall web browsing.

I absolutely agree that many don’t know how do to it correctly, but if Drupal is to stay competitive it’s exactly the reason why Drupal should enable the new way for everybody (current majority as well as current minority) rather than encourage and cater to the dying trend. At least that is what sounds like a reasonable approach to me.

I agree with both of you :) Like I said in the article:

WYSIWYG in Drupal 8: from brochureware to newspapers

Drupal needs to cater to both the extreme of very structured content for maximal reuse and to the extreme of unstructured content (where pretty much all data is in a single “blob” called the “body” field, besides maybe a “title” and a “tags” field). It also needs to deal with everything in between.

Drupal may be used for news sites, but also for brochureware sites. By having the WYSIWYG editor be configurable, and hence letting the site builder choose whether formatting/layout tools are available or not, we empower the user to choose.

Not everybody cares about every potential way a website could be displayed. For most websites, it’s impossible to optimize for every possible channel, especially because channels change and new ones get added over time — to do this well, you almost by definition need a full-time team.
That being said, I think there’s a big distinction between optimizing for every channel (which is very expensive) and ensuring your content is well-structured, so that it is future-proof (which comes at only a fraction of the cost). If you do the latter, you’re still keeping the former open as an option.
Drupal 8 core does not cover every possible use case — and it can’t do that, because there will always be site-specific needs — but it does make a few very common things much more affordable! Plus, by setting a precedent, Drupal developers now have a blueprint of how to solve such challenges.


From Jeff’s comments, I think that all he wanted to point out is that Drupal 8’s WYSIWYG editor makes it easy to align a display: block image, but not a display: inline image, and that the latter is necessary for text flow control, and a great reader experience.
I think he got much more reactions than he expected :D

That’s a fair point, but at least it is now trivial to support that: just add a few lines of CSS.

Funny enough, he’s the first to have made a remark about this in all that time :)

harmless

10 years 11 months ago

I would be happiest with Markdown in my database instead of HTML. Anyone know how uphill/against the grain that would be? Is that a thing that can happen while retaining most of Drupal 8’s content authoring toolkit?

I know CKEditor has some form of Markdown functionality but am unsure if that is compatible with everything D8 is doing with it?

You will of course still be able to use Markdown in D8. But that would indeed mean you lose some of the nice authoring improvements, such as CKEditor, and specifically CKEditor widgets.

That is normal, since Markdown is designed to be written by hand, not to be generated using UI tools.