Introducing Insights!

See your content like never before with a new suite of analytics built into Heretto CCMS

Technical Writing

xx min read

The Ultimate Guide to LEGO Taxonomy & Metadata

Ren Taylor

Table of contents

Heretto's AI Features

What a LEGO Taxonomy Teaches About Content Ops

Request a Demo

What makes a 2x4 red LEGO® brick a 2x4 red LEGO® brick? It’s the data: the color, the shape, the number of studs. This information is what allows you to identify it in a pile of thousands of other pieces. This is metadata. Applying this concept to your content library is like creating a LEGO taxonomy for your information. Each topic, procedure, or warning is tagged with data about what it is, who it’s for, and what it does. This simple act of labeling transforms your content from a disorganized collection into a searchable, reusable asset, making every piece easy to find.

What Can LEGO® Bricks Teach Us About Content?

I had several LEGO® brick sets. Once you have a bunch of different sets and you’ve already built them based on the model instructions, thousands of pieces would end up mixed together in a box. Which was awesome because then you could build whatever you want, break it, and repeat infinitely.

Except, there was always the frustration of searching endlessly for that one essential piece among the thousands. It took forever. It was a hassle. The sheer number of brick pieces you’d amassed was never really apparent until you had to find that one.

Imagine being able to search all the LEGO® brick pieces in your collection based on what they looked like, what they were meant for, and what they did. Then they would appear in your hand. It would be a game-changer:

“Red brick. Flat. Six studs. Two columns. Three rows.”

There’s your piece, right in your hand. That would be cool, right?

It’s not enough to know approximately where the piece is. You know it’s there somewhere. It’s about identifying a specific piece based on a set of characteristics.

That is metadata. Data about content. In this case, information about a brick piece. Unfortunately, we can’t use metadata to find a certain piece in a pile. However, we can — and do — use metadata to search through our content libraries.

The Scale of the Challenge: Thousands of Pieces

To put the LEGO challenge into perspective, there are over 3,500 different types of pieces available in dozens of colors. When you’re staring into a bin with thousands of components, the sheer volume is the first hurdle. Many people find it difficult to build creatively not because they lack ideas, but because it’s so tough to find the right pieces in a giant, disorganized pile. The time spent searching for a specific component is time not spent building. This mirrors the experience of many technical writers who know the information they need exists somewhere in their content library, but finding and reusing it is a frustrating and time-consuming hunt.

So, how do serious builders manage this complexity? They develop a system. Experts agree that the most effective method isn’t sorting by color, but to organize by the part’s shape and function. It’s much easier to spot a blue 2x4 brick in a container of 2x4 bricks than it is to find that same piece in a massive bin of everything blue. The category of the piece—its structure—is a more useful identifier than a superficial quality like its color. This systematic approach turns a chaotic collection into a functional library where components are easy to locate and use.

This is where the analogy becomes powerful for content teams. Your content library is your LEGO box, filled with thousands of individual pieces—topics, procedures, warnings, and product descriptions. If your only way to sort them is by “color” (like the document they belong to), you're constantly digging through a massive pile to find the specific component you need. A more effective approach is to organize content by its “shape”—its structure and function. This is the core idea behind structured content like DITA, where every piece of information is a distinct, reusable component defined by what it is, not just where it lives.

If You Can't Find It, Does Your Content Even Exist?

If you’ve ever read a blog post, which you are right now, you’ve probably seen tags. Tags are just labels that help readers identify what a piece of content is about. This way, when a reader wants to look for other articles about a similar topic, they can search for posts that share those tags. These tags are a simplified version of what occurs in Heretto.

Heretto’s Component Content Management System (CCMS) uses metadata because we know that navigating a disorganized content repository is somehow worse than searching for that one LEGO® brick piece amongst the thousands. Plus, no offense to your brick creations, the consequences are greater.

Heretto offers a unique metadata feature similar to searching LEGO® bricks based on their physical features and uses. It’s another method of organization beyond folders that enables you to learn more about your content library, find content quickly, and filter search results.

The Metadata feature in Heretto comes in two forms:

CCMS-Level Metadata
Custom Metadata

The Golden Rule of Sorting: By Type, Not Color

When faced with a mountain of LEGO® bricks, the first instinct for many is to sort them by color. It seems logical and looks neat, but seasoned builders will tell you this is a critical error. As experts point out, sorting by color first makes it incredibly hard to find small, specific pieces. The golden rule is to sort by the type of part. It’s much easier to spot a red 2x4 brick in a bin of other 2x4 bricks than it is to find that same red brick in a giant bin of thousands of other red pieces of all shapes and sizes.

This same principle applies directly to your content library. Sorting content by a single, broad attribute is like sorting bricks by color. It might seem organized on the surface, but it hides the details you need to find and reuse content effectively. A much more powerful approach is to organize content by its function, audience, or product applicability—its 'type.' This is the foundation of managing structured content. By categorizing components based on what they are and what they do, you create a system where finding the exact piece of information you need is simple, regardless of where it will be published.

Using 'Meta Bricks' as Your Foundation

Metadata at the CCMS level is automatically assigned and tracked by Heretto as you create and interact with files. Why? Because we’re content creators short on time too. Heretto automates the mundane tasks like capturing basic metadata about each file that’s created.

At this level, you’d find automatically assigned metadata in Heretto with these labels:

Last Time Modified
Created Time
Needs Attention

Is Valid
Contains Broken Links
Contains Comments
Owned By
Locked By
Status
Content Type

We know that assigning general information about files would be needlessly tedious, so Heretto does it for you.

In context, CCMS-level metadata might function something like this:

What kind of content is in your content library? Perhaps you want to see the number of Task topics compared to the number of Concept topics.

This is easily achievable by using Content Type metadata.
If I’m going on vacation next week and there are things I haven’t finished, someone else will need to find and access my work.

Metadata allows you to check on my files to see which ones are Owned By or Locked By me.
What’s the workflow status of our documentation?

The Status metadata will show how much of your repository is In Progress, In Review, Approved, or Needs Reevaluation.

What is a LEGO Taxonomy?

A LEGO taxonomy is a systematic way of organizing all those different pieces into categories based on their characteristics and what they do. It’s the strategic plan that turns a chaotic pile of plastic into a functional library of parts. Instead of relying on luck to find that one specific connector, a taxonomy gives you a reliable system for locating exactly what you need, when you need it. This organization is based on the characteristics and functions of each piece—its shape, its connection points, its purpose. By creating these logical groupings, you spend less time searching and more time building. It’s the foundational step that makes large, complex creations possible without the headache.

Understanding Your Pieces: Bricks, Plates, and Tiles

When it comes to sorting, there's a golden rule among serious builders: categorize by type, not by color. As the experts at Brick Architect point out, "sorting by part shape is more effective because it's easier to spot different colors of the same part than to find a specific small part in a bin of all one color." This principle is the bedrock of structured content. Organizing your documentation by its functional type—like tasks, concepts, and reference topics—is far more powerful than using superficial labels. Just as LEGO.com has official categories like 'Bricks,' 'Plates,' and 'Technic Beams,' a well-defined content taxonomy provides the essential framework for your information. This structure is what allows you to effectively manage content in a CCMS, ensuring every component is findable, reusable, and ready for publishing.

Building Your Own 'Meta Bricks' for Custom Needs

Heretto is packed with included features that make your life easier, but we don’t know everything. That’s where custom metadata comes into play. Beyond the CCMS-level metadata that’s automatically assigned and tracked in each instance of Heretto, you’re also able to create custom metadata.

Creating custom metadata enables you to make bespoke tagging systems that apply to the context of your world of content. With the ability to build metadata systems unique to your organization, you’re better able to gain specific and useful insights.

Remember your oversized pile of LEGO® Bricks? Imagine you can search for certain pieces based on nicknames that only you use. Custom metadata makes this a reality for organizationally specific content.

Starting with a Simple Taxonomy

When you first dump out a few LEGO® sets, the instinct might be to sort by color. It’s visually satisfying, but it’s not very functional. Experts agree it’s much better to start with a few broad categories based on the type of piece, like ‘Bricks,’ ‘Plates,’ and ‘Everything Else.’ This simple taxonomy creates a foundation you can build on. The same principle applies to your content. Instead of creating dozens of hyper-specific tags from day one, begin with a simple, functional structure. You could start by categorizing your content by its purpose—like ‘Getting Started Guides,’ ‘Troubleshooting Articles,’ or ‘API Reference.’ This approach makes your system easy to learn and manage, ensuring your team can actually use it without getting overwhelmed. It’s about creating structured content with a clear, scalable plan from the beginning.

Developing an Advanced Taxonomy

A taxonomy shouldn't be a static artifact. Just as your LEGO® collection grows, so does your content library. That initial ‘Plates’ bin will eventually overflow, forcing you to subdivide it into ‘1x Plates’ and ‘2x Plates.’ For truly massive collections, you might even sort by both part type and color. Your content taxonomy needs the same flexibility to evolve. As your product line expands and your documentation becomes more complex, those initial broad categories will need more granularity. ‘Troubleshooting Articles’ might need to be broken down by product, feature, or error code. This evolution is a sign of a healthy, growing content operation. A powerful CCMS is built for this kind of scale, allowing you to refine your metadata and content governance strategy over time, ensuring every piece of content remains findable, no matter how large your collection gets.

See Your LEGO® Taxonomy in Action

For the sake of example, I’ve created a file and hidden it in Heretto. You don’t know the file name, but here’s the metadata that’s been automatically assigned:

Status: Progress
Content Type: Concept
Owner: Tim Ludwig

Without giving too much away, that’s the basic auto-assigned metadata. I’ve gone a bit further to assign some custom metadata based on Heretto's own metadata construction.

Subject Metadata: Management → Content Management
Marketing Metadata: Complexity → Beginner, Persona → Writer
Internal QA Metadata: Failed
Content Maintenance: Needs Improvement

You get the picture. Without any filters applied, there are more than 44,000 files in our own content repository. Sifting through that would be ridiculous. Fortunately, we have metadata to help us. Now, all you have to do is track down a file you don’t know the name of. Easy, right? Check it out:

‍

‍

If you peek at the red arrows in the animation, on the left side you’ll see that I select three of the metadata filters I mentioned above. On the right side, you’ll see those filters narrow the files in our repository down from over 44,000 to four. Then we’re easily able to identify the file called metadata_example.dita, our hidden treasure. All in less than 10 seconds.

Metadata isn’t magic, it’s just thoughtful labeling and meaningful content organization. We’ve got a whole other article on metadata and why Amazon’s search tool is better than Google’s. It’s another fabulous example of how well-developed metadata makes a search that much easier. Head over there for a metadata breakdown and fast online shopping tips.

Matching Your Sorting Strategy to Your Collection Size

Just as with a growing pile of LEGO® bricks, the way you organize your content has to evolve. A system that works for a handful of documents will buckle under the weight of thousands. The key is to match your organizational strategy to the scale of your content library. What starts as a simple pile can quickly become an unmanageable mess without a plan. Thinking about your content in terms of collection size helps you anticipate when you’ll need to adopt more sophisticated methods to keep things findable and useful for your team and your customers.

For Small Collections (Under 3,000 Pieces)

When you have fewer than 3,000 bricks, you don’t really need a system. You can just spread them out on the floor and find what you need. This is the content equivalent of a startup’s early days. With a small number of documents, a simple folder structure on a shared drive works just fine. The team is small, and everyone generally knows where to find things. There’s no need for complex taxonomies or metadata because the sheer volume isn’t a problem yet. You can get by with descriptive file names and a logical folder hierarchy, and that’s perfectly okay. The goal is to get work done, not to build a perfect system you don’t need.

For Medium Collections (3,000 - 10,000 Pieces)

As your brick collection grows, spreading them all out becomes impractical. The common next step is to sort them into a dozen or so broad categories: bricks, plates, tiles, wheels, etc. This is the stage where your content library starts to feel the strain. Your simple folder structure is now deep and complex, and finding a specific piece of information requires clicking through multiple layers. You’re spending more time searching and less time creating. This is the point where you begin to see the need for a better way of managing structured content. Broad categories help, but they don’t solve the core problem of pinpointing the exact component you need quickly.

For Large Collections (Over 10,000 Pieces)

With a massive collection, sorting by broad categories is no longer enough. To be efficient, you have to sort common pieces by their specific part type, like "1x2 Brick" or "2x4 Plate." For truly huge collections, you might even sort by both part and color. This is the enterprise level of content. You can’t rely on folder structures or simple tags anymore. You need a granular system based on specific metadata, allowing you to find and reuse precise components across thousands of topics. This is where a Component Content Management System (CCMS) becomes essential. It allows you to treat every piece of content like a specific part, tagged with rich metadata so you can find exactly what you need, every single time.

How to Design Your Custom LEGO® Taxonomy

You already know that custom metadata enables you to create personalized metadata tags that uniquely identify information described in your files. You can create custom metadata using taxonomy or labels. Let’s pump the brakes and review these two branches of custom metadata.

Taxonomy: This is a fancy word for classification. Basically, when you create taxonomy metadata, you’re making classifications of terms that apply to your body of content. For instance, a medical software company and a heavy machinery company will require different taxonomies of terms for their respective documentation repositories. Taxonomy metadata needs to be established ahead of time and can be applied to your content as needed. This ensures that your team uses consistent, predefined terms and doesn’t duplicate or use non-preferred terms.

Labels: These work exactly like they sound. Label metadata can be made and applied on the fly. These are less rigid than taxonomical metadata but important in the case that you’ve not yet established a foundational taxonomy for your content.

We recommend starting with a brain dump of all relevant keywords used to describe your content and consider other data points you might want to use to classify your content. When we created our taxonomy, we determined that the most relevant information was about:

Heretto Interfaces: What interfaces are being written about? This makes it easier for us to find topics when we need to make updates.

Marketing: What type of users will want this information? This makes it easier for us to determine if we have content that covers all our personas.

Internal QA: Did this content pass or fail QA testing? This makes it easier for us to test our documentation and software.

Content Maintenance: What is the health of our content? This makes it easier for us to evaluate and maintain our content.

Step 1: Take Everything Apart

Before you can organize anything, you have to see what you’re working with. With LEGO® bricks, this means dismantling every spaceship, castle, and car until you have a giant pile of individual pieces. The same principle applies to your content. You can’t effectively manage or reuse information that’s locked away in monolithic documents like long PDFs or Word files. The first step is to break those documents down into their smallest useful parts—individual topics, procedures, or concepts. This process is the foundation of creating structured content. Each piece becomes a standalone "brick" that can be found, updated, and reused to build countless other documents, which is far more efficient than having the same information copied and pasted in a dozen different places.

Step 2: Sort in Small, Manageable Batches

Staring at a mountain of thousands of LEGO® bricks is overwhelming. The expert advice is to scoop a few handfuls into a smaller tray and focus only on sorting that small batch. This makes the enormous task feel achievable. Apply this same logic to your content projects. Whether you’re migrating to a new system or cleaning up an existing repository, trying to tackle everything at once is a path to burnout. Instead, break the project into manageable phases. You could focus on the content for a single product line, or start by converting your most frequently used articles. This approach allows your team to build momentum, refine your process, and see progress without feeling buried by the scale of the project.

Step 3: Use Clear Containers and Labels

Once you start sorting your bricks, you need a place to put them. The best method is to use clear containers with distinct labels. You might start with broad categories like "Bricks," "Plates," and "Tiles." As your collection grows, you can subdivide those into more specific containers. This is a perfect analogy for building a content taxonomy and applying metadata. Your CCMS is the set of clear containers, and your metadata tags are the labels. By starting with broad terms and getting more specific, you create a logical system for managing your content. This ensures that when someone needs to find a specific "brick" of information, they can find it quickly through filtered search instead of digging through a messy, unlabeled box.

How to Apply Your 'Meta Bricks' Correctly

Creating custom metadata is just a part of an effective metadata strategy. Consistently applying custom metadata to your content is the other part. Whether you already have a large content set or you’re starting from scratch, you have several options for applying custom metadata.

You can assign metadata at the same time you create a file. This ensures that your metadata is correctly applied from the get-go.

You can assign metadata after you create a file. Assigning it after creation ensures that the custom metadata reflects the final version of the content.

You can assign metadata in bulk to multiple files or maps. Assigning custom metadata in bulk is useful if you already have a large set of content in your repository and are just starting to use metadata.

There’s no one-size-fits-all process to metadata organization and strategy, but not having one will cause more problems down the road.

Building on Your Metadata Foundation

We know you’ve spent (and will continue to spend) a considerable amount of time planning, writing, and updating your documentation. This is the reality with content, a collection of forever living documents.

But, it’s not really useful unless it’s consumed. And you know that won’t happen if no one can find your content easily. That’s where metadata enters the game, writing machine legible labels that make your content shout: “I’m right here!”

Metadata isn’t only useful internally for helping team members find and access content quickly, but it can also help your end-users. You can expose these metadata tags to your users so that they can filter their own search results using metadata that you created. And that is conscious content management.

Oh, and LEGO® bricks? Unfortunately, you still need to sift through those the old fashioned way.

LEGO® is a trademark of the LEGO Group of companies which does not sponsor, authorize, or endorse this site.

‍

Planning for Growth and Scalability

Just like a LEGO® collection, your content library is going to grow. A system that works for a few hundred pieces (or articles) will break down when you have thousands. The key is to choose an organizational strategy that can scale with your collection so you aren't forced to constantly change it. Start with broad categories, similar to sorting bricks into groups like 'Bricks' and 'Plates.' For your content, this means establishing a foundational taxonomy. As your repository expands, you can introduce more granular classifications, much like a master builder might eventually sort common parts by both type and color. A flexible approach to content governance ensures your system can evolve without requiring a complete overhaul, saving you from the massive headache of re-sorting everything from scratch later on.

Frequently Asked Questions

Why is organizing content like sorting LEGO® bricks? The main point is that function is a better organizing principle than appearance. Just as it’s easier to find a specific red brick in a bin of similarly shaped bricks, it’s easier to find a specific piece of content when it’s categorized by what it does (like a task or a procedure) rather than where it lives (like in a specific PDF). This approach, using metadata as your sorting system, turns your content library from a chaotic pile into a functional set of reusable components.

I'm new to this. How do I start building a content taxonomy without getting overwhelmed? You don't need a perfect, exhaustive system from the start. Begin by brainstorming a few broad, functional categories that make sense for your content. Think about things like your main products, your user personas, or the purpose of the content, such as 'Getting Started Guides' versus 'Troubleshooting.' The goal is to create a simple, foundational structure that your team can actually use and then build upon it over time as your content library grows.

What's the practical difference between using a taxonomy and using labels? Think of a taxonomy as your set of official, pre-approved categories. It’s a system you plan ahead of time to ensure everyone on your team uses consistent terms to classify content. Labels are more like sticky notes; they can be created and applied on the fly as needed. While labels offer flexibility, a formal taxonomy is what provides the consistent structure needed for reliable searching and content management at scale.

My content library is already huge and disorganized. Is it too late to implement a metadata strategy? It's never too late, but you shouldn't try to tackle it all at once. The key is to break the project into manageable pieces. Instead of trying to organize the entire library, start with the content for a single product or your most frequently updated articles. This allows you to develop and refine your process on a smaller scale, build momentum, and demonstrate value without getting buried by the scope of the project.

This seems like a lot of internal work. How does it actually help my customers? This internal organization has a direct impact on your customer's experience. When your content is tagged with clear metadata, you can use those tags to power filters on your help site or knowledge base. This allows users to narrow down search results and find the exact answer they need much faster. Instead of wading through irrelevant articles, they can filter by their product version or a specific feature, which reduces their effort and builds confidence in your support resources.

Key Takeaways

Classify content by its purpose, not its location: Treat your content like individual LEGO® bricks. Organizing by function, such as what a topic is or what it does, rather than by the document it lives in makes every piece of information easier to find and reuse.
Create a smart labeling system with metadata: Metadata is the key to a searchable content library. A good strategy uses a mix of automatically tracked data and a custom taxonomy with terms that are meaningful to your team, products, and audience.
Build a taxonomy that can scale with your needs: Your organizational system should evolve as your content library grows. Start with a simple, foundational structure and add more specific categories over time to keep your content manageable and effective.

Create great content together

Write, review, translate, and publish all from one system. Heretto is the only ContentOps platform that allows multiple authors to work together at the same time.