It's here!
The 2024 State of Customer Self-Service Report is now available.
Read the Report
Content Ops
November 2, 2020
xx min read

Is DITA Good for Search Engine Optimization (SEO)?

Everyone needs to know a little bit about SEO

We are at the mercy of search engines. Websites like Google, Bing, Google, Baidu, and YouTube (owned by Google) have become as essential to online commerce as the sidewalk is to storefront business. It’s an inescapable fact that the searchability of your content is a crucial consideration.

In my role as Content Marketing Manager at Heretto, I am continuously considering DITA, search engine optimization (SEO), and the interaction between the two. I am constantly analyzing content with two major things in mind:

  1. Helpfulness
  2. Findability

I want to create and arrange content that is both helpful AND findable. I suspect those are your intentions as well.

Throughout this guide, I’ll use some terms specific to DITA XML (like topic, map, etc.). If you aren’t sure what DITA is, check out our explainer video! 

For this article, I will often use the terms Google and search engines interchangeably since, heck, Google has a Rockafeller style monopoly over internet searches. So it’s important that we understand how Google search works.

How Google Works

To get your content to rank well, you need to understand the process.

Google search works through a three-step process:

  • Crawling
  • Whereby Google automatically follows links within a website to find all the pages under that URL
  • Indexing
  • Whereby Google categorizes the page by looking at things like titles and contents of the page
  • Serving
  • Whereby Google delivers pages to a searcher based on how their query matches with the sites they have already indexed

The rest of this article will focus on how DITA can optimize your standing with crawling and indexing so that you are one of the top pages served by Google to the searcher.

5 DITA Advantages of SEO

1. Google Wants (Some) Structure

Structured content is simply content that follows a predetermined standard. The standard provides consistent rules so that a system knows how to interpret the content.

DITA offers expansive capabilities with structure via a rich selection of elements and metadata configurations. However, Google only really cares about a few, key pieces of structure.

Google Wants a Title

This is one of those “so obvious that people forget it” things, but Google always disproportionately values your title when determining the relevance of your content. Within DITA, this is really simple. Every topic has a <title> tag that will always become the <title> when converted to HTML.

If you have a good title, ensure that it is:

  • Actually what the post is about
  • Includes relevant keywords
  • And it is matched by the URL

For example, if you write the assembly instructions for a toaster, make sure that the title is something like:

<title> How to Assemble a Toastmaster 7000</title>

Never something like this:

<title> Assembly Instructions</title>

If you use the former, you have a solid title that tells readers and Google exactly what the post is about. However, if you use the latter title, you are leaving the matchmaking up to fate (Google) and if someone else writes a post called “how to assemble a toastmaster 7000” on a site like Medium or Reddit (you can find anything on Reddit), those sites have a higher Domain Authority plus a page title that more accurately matches the likely search query.

tl;dr - Use DITA title tags and make sure they actually state what the topic is about.


Google Wants A Description

Titles are obvious, but what about that blurb of text that goes underneath the title in the search engine results page (SERP)? Those are called descriptions and either you can write them, or Google will generate them for you (more on that later).

These descriptions are basically the window into the soul of the post for the reader. In DITA, this is called a short description or the <shortdesc>. The short description provides a summary for the rest of the topic. Now, when you convert your DITA code to HTML, it comes out as:

<meta name="description" content=”your description here”>

The description is then used by DITA as the blurb of text that appears under your search result.

One super important note, the description is not a ranking factor according to Google.

However, the description is a significant factor in whether or not people click on the link and those clicks are the most important ranking factor. So, while it might not directly affect your rank, it is still hugely consequential.

tl;dr - Use the short description in DITA to change the description that appears under the page link on the search results page.

Google Wants Labeled Images

Fun fact, do you know what the second largest search engine is behind Google?

It’s *drum roll* Google Images, with about 22% of total search volume as of 2018.

Want another fun fact? Google is a lot better at indexing text than it is at indexing images.

So, Google put the onus on you to label the images with something called alt text. The alt text tells Google what the image contains. These images don’t just appear in the image search, but you will often see the images appear on the right side of the first page of Google.

I’m full of fun facts today, so here’s one more Alt text is also essential for people with visual or certain cognitive disabilities. Text readers will use the alt text when describing an image to the listener. So, while alt text is good for SEO, it also (and more importantly) makes your visual content more accessible.

Luckily, DITA offers a simple way to include alt text with every image.

In Heretto, this is as simple as looking for the “alt” field in the attributes tab. By filling this out, it adds <image alt> to the DITA source code. When this content is converted to web output the <image alt> for DITA is converted to <img alt> for HTML.

Now, this alt text becomes the cheat sheet that Google uses to determine if your image is a good fit for image search results as well as the top of a Google SERP.

If you need some help crafting the alt text for your images, keep these principles from WebAIM in mind. The alt attribute should typically:

  • Be accurate and equivalent in presenting the same content and function of the image
  • Be succinct. Typically no more than a few words are necessary
  • Not be redundant or provide the same information as text within the context of the image
  • Not use the phrases "image of ..." or "graphic of ..." to describe the image

tl;dr - Use DITA’s alt descriptions to improve image SEO as well as make your content more accessible to users with disabilities.

2. Link Management

Link management is crucial! And no, I’m not talking about staying alive in Hyrule (sorry), I’m talking about the hyperlinks that connect the content on your site.

Internal links are one of the most effective ways to improve SEO scores both by what it communicates to Google and by how it directs users on your website.

In order to appreciate why internal links with DITA are different, we need to appreciate how constructing content with DITA is different. The important takeaway is that DITA topics are self-sufficient and modular so you can use each topic anywhere you need it. Linear writing style, like what you see with blogs and novels, entangle all the ideas so they become difficult to extract and reference.

Now, this modular methodology is especially important because it means that rather than re-writing content, you are more inclined to link to relevant topics.

For example, let’s say that I’m writing documentation for a smartwatch.

In DITA, I would construct a series of topics about:

  • Turning on your smartwatch
  • Initial set-up for your smartwatch
  • How to connect your smartwatch to your phone
  • Charging your smartwatch
  • Smartwatch features
  • Taking proper care of your smartwatch
  • How to monitor your heart rate with your smartwatch

Each topic would be self-sufficient in that it would thoroughly explain the subject and nothing else.

Now, at other points in the documentation, anytime I mention something about heart rate features, rather than stopping to explain it, I simply link to the topic. This trims the fat from the reader’s perspective and it also gives them clear direction if they want to read more.

Additionally, when you link to a URL within your own website, it actually boosts that URL for relevant searchers. The more you link to a page within your site, the more it communicates to Google that it is valuable.

So, if you have a URL that is specifically titled “How to monitor your heart rate with your smartwatch” and you have a series of internal links pointing to that URL, when someone searches “How to monitor your heart rate with my smartwatch” you have much more visibility and a much better chance of being clicked. This is opposed to the other route where the searcher finds themself on a web forum where the help is less reliable and lowers customer satisfaction with such a dope smartwatch.

Compare this with documentation that is written in a linear format. Linear content has ideas that overlap and intersect. A format that works well for writing a novel or a blog, but is terrible for documentation.

With linear writing, documentation for that same smartwatch would not be as cleanly delineated which would limit internal linking and it would also result in longer web pages with more subjects designated to a single URL.

For example, with modular writing, you would likely have four distinct pages:

  • Turning on your smartwatch
  • Initial set-up for your smartwatch
  • How to connect your smartwatch to your phone
  • Charging your smartwatch

Each with its own URL.

In a linear writing style, you might replace those topics with a single web page titled “Getting Started” that has a series of subheadings underneath, walking the user through getting their smartwatch set-up.

Now, from a reader’s perspective, these two experiences might both be fine IF they intend to read the getting started guide from start to finish, however, most users simply don’t do that (I know I don’t). When they visit the documentation site, it’s often because a specific step in the process isn’t working. If the user wants to figure out why their bluetooth won’t connect, they don’t want to sift through the whole walkthrough of turning the smartwatch on and going through the initial set-up.

Additionally, as we mentioned previously, if “How to connect your smartwatch to your phone” is just a subheading instead of being included in the URL and title, it will be more difficult to rank on Google.

tl;dr - DITA’s modular methodology of content architecture means each idea has its own URL and more internal links pointing to it. This, in turn, communicates to Google that the page is valuable. It also enhances the customer experience (CX) when looking for a specific solution.

3. Native Language

This one is a bit obvious but it’s worth repeating. Well translated content will always beat out poorly translated content which will always beat out untranslated content.

DITA is a huge boon to your translation efforts with translation memory saving you time and cost significantly.

If your product and content are going to a global audience, translation is a no-brainer for SEO.

tl;dr - affordable translation is good, do it. 👍

4. Design (Simplicity)

When SEO specialists talk about technical SEO, they will fixate on structure and code, content, and links. These things are really important but they can often leave out an important ranking factor:


When I say design, I’m not talking about beautiful graphics and exciting JavaScript. I’m talking about consistent, user-focused design choices that improve the readability of the content and overall customer experience.

For documentation, (and the rest of the web, tbh) the focus should always be on minimizing the friction for the reader’s experience.

We do this by paying attention to things like spacing, margins, fonts, color schemes, accessibility, element placements, etc.

Maintaining these design elements can be difficult for many website systems. A web CMS like WordPress makes it too easy to let different people make different design choices on different pages. The end result is a little bit of anarchy and a lot of friction.

DITA, however, separates the content from the publishing and formatting.

For example, if you were to create a map and series of topics in Heretto, you would assemble all of your content without making a single design choice. Every decision would be content-centric. Content first.

Now, if you wanted to publish that content as a static website, your content would have consistent styling applied automatically to all content uniformly when you publish it. Rather than manually selecting the font during the writing process, those design decisions would be applied once and for all at the moment of publishing.

This results in a much cleaner and more streamlined reading experience that is cohesive across all your content.

Again, design doesn’t directly impact your search rank, but your design directly impacts the user’s behavior, specifically, whether they stay on your website and click further in versus hitting the back button and checking out the next website (a behavior commonly called pogo sticking). Those behaviors are huge ranking factors.

tl;dr - DITA separates content from design so you can add consistent formatting automatically. This leads to better CX and improved rankings.

5. Website Speed

This advantage is a little nerdier 🤓 and a little more Heretto specific, so consider yourself warned.

There are no two ways about it - the speed of your website will directly influence your rank.

Since 2010, Google has used page load speed as a ranking factor.

Many things contribute to the speed of a web page load time, but three of the top factors are:

  1. The contents of the page (how large is it / animations/illustrations etc.)
  2. The JavaScript framework
  3. The hosting service

DITA itself has no impact on the above factors, however, the way you leverage your DITA content does matter.

Regarding the contents of your page, remember that DITA, by its nature, will lead to smaller individual pages because the content is modular, concise, and self-sufficient.

Regarding the second and third points (the framework and hosting), I will give an unabashed plug for Heretto’s static site generator (SSG) because we built it exactly for this reason. The SSG allows you to take your DITA content, and in mere minutes host it on a fast hosting service like Netlify as a static website built with the Gatsby JavaScript framework.

This is significant because Gatsby is generally regarded as one of the fastest JavaScript frameworks available and Netlify is also widely considered one of the best hosting options as measured by Time to First Byte (TTFB) for both desktop and mobile devices.

The end result is a static website that checks all the SEO checkboxes.

  • Ridiculously fast
  • Clean and readable
  • Organized and navigable
  • Expertly crafted content (that part is up to you, but I have faith)

tl;dr - Heretto lets you publish a crazy fast static website in a few minutes - check out this walkthrough of Heretto Portal to see it for yourself

So... Is DITA Good for SEO?

I don’t want to oversell the power of tools and standards. Ultimately, the most important factor in appeasing the search engine optimization gods is the quality of your content. No system compensates for bad content. However, a bad system can undermine the value of good content. When you create good content, you want to ensure that you use a tool and a standard that allow you to reach full potential.

With DITA, you have:

  • A standard that’s conducive to the priorities of search engines
  • A methodology that naturally encourages you to write in a way that lends itself to SEO
  • A way to build a strong network of internal links that communicate relevance to web crawlers and site visitors alike
  • An affordable way to translate more content into the language of the searcher
  • Content separate from format which automates clean and consistent design
  • The best way to get the fastest docs site possible

So, is DITA good for SEO? Yes 🎉

If you’d like to get started with DITA, you can request a demo and learn how to create the content that search engines will readily serve up to searchers.

Create great content together

Write, review, translate, and publish all from one system. Heretto is the only ContentOps platform that allows multiple authors to work together at the same time.