Home > Resources > Blog > DITA

Conversion to DITA: Requirements Before Estimates

Yehudit Lindblom

Yehudit Lindblom

Subscribe: Subscribe

Customers often reach out to request price estimates for migrating non-DITA files to DITA. We request information to help us provide estimates, and some customers aren’t ready or hesitate to supply information. Conversion to DITA is not a standard, across-the-board process. It’s dependent on the needs of a project.

Restaurant Scenario

Imagine you own a restaurant. A successful restaurant with many happy customers over many years.

Someone walks in – a new customer. Except, wait – they don’t want to sit down to a meal now. They’re looking to come back in three or four months. Great! You start to make a reservation for them. You ask a few questions.

“How many people will be in your party?”

“I have no idea yet. Maybe 10, maybe 50, maybe 100.”

“Okay, we’ll leave it open for now, and we’ll be in touch closer to the event.”

“How much will it cost?”

“Well, we have a full menu, from appetizers to desserts, with everything in between, including lower cost options, such as baked chicken, or premium items including steak, filet, etc. Would you like to set the menu now? We can go over the pricing.”

“No. We don’t yet know what we want to eat. Please just tell me how much it’s going to cost for us to come here.”

“I’d love to do that for you, but I don’t have enough information. It could be as little as $10.00 per person, if you order a drink and a small dessert item. Or, it could be as much as $65.00/person, if you order a drink, an appetizer, a premium menu item, and dessert. We give discounts for larger groups, so if you have more than 20 people, there will be a discount, and if you have more than 50 people, an additional discount.”

“But I want to know how much it’s going to cost.”

The bill before the meal or event, without knowing what the order will be?

That doesn’t sound very realistic, does it?

Yet, that is what it’s like when we’re asked to give a price estimate for a project that doesn’t have requirements in place. This is why we ask for samples for conversion projects, why we ask for your DITA and other input files, and why we try to understand how you’re using your PDF, web help, and other outputs.


You’re about to embark on migrating to DITA. Your team hasn’t yet been trained on how to author in DITA. You haven’t yet worked to understand how you’re going to set up your DITA – will you have specializations? Those pieces of content that look like a table without borders – in DITA, will that be a table? A definition list? Something else? You’re not quite sure yet.

You aren’t sure exactly which publications you’re planning to convert, so you aren’t ready to send your input publications. Do you know how many templates you have? Are your current technical writers/authors consistent with styles, so the publications to be migrated are the same styles across the board? Or do you have several different templates coming from different places, or inconsistent styling?

For the moment, you’re not comfortable sending samples of the eventual output to the conversion team, either, because you haven’t 100% decided that you want your output to continue to look the way it currently does. After all, now’s a great time to make some changes, and the marketing team has input, too.

Why are these important?

All of these issues mean that it’s impossible to predict how much time and effort are necessary to migrate your current content to DITA. We don’t know what the input is. What will the output DITA XML files look like? We don’t have information on what the output content will look like for PDF, HTML, etc., to guesstimate what the output DITA may include.

We want to give you the most accurate estimate possible, and we are happy to help you work through the process to ensure you get the most efficient conversion. If you’re in the exploration process for now, just “window shopping,” we can show you samples of what other clients have migrated and what the prices were, but it won’t be the same as an estimate for your project.

Sorting out the necessary details ahead of the price quote will allow us and you to have a better understanding of what it will take to make that happen.

Home > Resources > Blog > DITA

5 Basic Components of Your Content Model

Joe Gelb

Joe Gelb

Yehudit Lindblom

Yehudit Lindblom

Subscribe: Subscribe

Once you have completed the content audit (see our last post), you can build your content model. Your content model should include:

1)  Which topic types to use in which situations (such as task, reference, or concept)

2)  The specializations you have decided to build

As we noted in our last post, you should consider your use of specializations carefully – they may lead to complexity in the future. Use specializations when they will provide significant benefits in expressing the meaning of your information, ease the authoring of your documentation, and facilitate publishing.

3)  A strategy for organizing maps (which represent your publications), sub-maps and content references (conrefs), with the ultimate goal of re-purposing content.

This should also include:

a. A convention for standardized usage of conditional attributes and attribute values

b. A convention for standardized usage of linking, eliminating uncontrolled cross-linking between files, using methods such as map hierarchies and relationship tables


4)  A metadata model or subject scheme model which defines your taxonomy and how you will classify your content. For example, taxonomy and classification will allow your content to be filtered for different products, information types, and user roles in knowledge base and help systems.

Metadata is very important not just for ensuring the accessibility and searchability of your content, but also for your style sheet development – defining how your content will appear, whether in a PDF, online help, or in a mobile ePub. For example, there may be many types of publication information that you would like to include in your PDF or other publishing formats. This information needs to be populated in your DITA files using metadata. Your metadata model often impacts how your CMS and other tools should be configured.

5)  Templates for new topics and publications so that your authors can get up and running and create new content as quickly and easily as possible. Using the right tagging and the right content model (from the last step) will help you not only make the process easier for new authors, but also make the maintenance of your style sheets a lot more efficient. When you use a very consistent content model, it is relatively easy to enforce the consistent usage of tags by using templates, schematron and RELAX NG rules. This allows you to drive down maintenance costs and reduce headaches when you go online with your style sheets.

Reality Check: Your information architecture and content model will continue to evolve. More often than not, down the road there will be different outputs that you want to provide, whether a new type of content like release notes or a different format like HTML5. Your content needs will change as time goes on. Your content model is a process, and your information architecture will evolve as you migrate more content. You will need to stay in touch with your information architect to ensure your system is evolving in the way you want to go.

Are we finished? Of course not! Tune in next time for “Conversion to XML: What to Do First.”

Home > Resources > Blog > DITA

4 Necessary Steps for Your Content Audit 

Joe Gelb

Joe Gelb

Yehudit Lindblom

Yehudit Lindblom

Subscribe: Subscribe

A crucial and central part of planning for DITA is developing your information architecture. This requires a content audit that will help you build your content model for the future.

A content audit includes four main steps:

I. Review your existing content.

Reviewing your current content will help you formulate your tagging rules and policy and a common map assembly plan. Based on your current content, you can identify your various topic information types, the common internal content structure for each information type, how topics are organized in your current publications, and conditional attribute usage.

II. Identify the delivery outputs and structure you will be using and your customer’s use-case scenarios for accessing your content.

Without knowing how content will be read and used, you won’t know how to structure it. For example:

  • Where will customers and employees expect to see your information? (In an online knowledge base? In print? In mobile ebooks?)
  • Do your customers use search? Will they be using a table of contents? Will they benefit from an index?
  • Will users go straight to the topic they need, or will they read it within a specific flow? Keep in mind that when moving to structure, your topics will need to be more “stand-alone” than they were before.

III. Determine where DITA specializations will be beneficial for your organization.

A DITA specialization means customizing the DITA DTDs – that is, the DITA rule sheets – in a way that will make authoring and formatting your content easier. The content audit and tagging policy will help you arrive at DITA specializations that can make authoring easier and more semantically correct. Many organizations try to keep DITA specializations to a minimum, as they can create extra work to deploy your DTDs and style sheets to your authors and to your toolset. Many times, however, specializations have the potential to save so much time for authors and for your style sheet development that they are worthwhile.

This cost/benefit analysis for specializations is not always clear. You can best arrive at the proper balance with the help of a good information architecture consultant.

IV. Identify the reuse opportunities for your content.

Content reuse is a key benefit of DITA. But to benefit from reuse, you need to identify potential reuse opportunities in your content, on the chapter, topic, element and even phrase level.

Reuse often comes with a cost of additional complexity for authors, but it can also provide significant cost savings in authoring time and localization costs. You need to decide what types of content can be reused, which reuse opportunities are worth the cost of additional complexity, and then plan for how to implement your reuse strategy. Generally, the larger your team of writers, the harder it is to implement and maintain a complex reuse strategy.

There is a wide range of content reuse you should consider.

  • Reusing topics by linking them into multiple publications.
  • Reusing topics for different contexts by conditionalizing their content based on attributes such as product, audience and platform.
  • Reusing commonly used text such as notes and warnings across multiple topics using content references (conrefs).
  • Reusing commonly used terms such as product name. By reusing commonly used phrases you can standardize their usage, easily change them if necessary in only one place, and conditionalize them to change based on how they are used in different publications or product lines.

We hope this has been helpful for you. Let us know if you would like to hear more about any one of these steps, and tune in for our next blog post: The 5 Basic Components of your Content Model.

Home > Resources > Blog > DITA

The 5 Crucial Roles in your DITA Project Team

Joe Gelb

Joe Gelb

Yehudit Lindblom

Yehudit Lindblom

Subscribe: Subscribe

Making the decision to migrate to DITA is the first step in what can be a challenging journey. In the next series of blog posts, we will focus on the initial stages of creating a DITA project plan that will ensure your success.

A project team that can plan your DITA migration will be crucial to its success. Every member of the team will bring his or her own expertise, experience, and perspective to your project. To succeed in the long term, your team needs to include:

1) Main project stakeholders

High-level managers and those in your company who are funding your project need to be involved in your project from the very beginning. They don’t need to be in every meeting, but they need to be kept informed.

Stakeholders should include the project sponsor, who is funding the project, and can include representatives from the groups responsible for localization, learning and training, marketing, and support.

When you start your project, consider who in your company might watch your progress with interest and who might want to join in later. Make sure to get wider consensus and buy-in from potential stakeholders. You will need their support down the road!

 2) Key technical writers

Your writers are the main users of your system and the producers of your content. The key technical writers will need to become evangelists for this initiative who will help other writers join later on, so they must be technically savvy. Your project team must include at least one or two writers who are:

a) Familiar with your content

b) Open to learning the new methods and tools

These writers do not need to be the most senior or entrenched writers in your team. They do need to be eager to learn the new tools and be prime movers in the process.

You need to be careful about who is on your team – it needs to include people who are optimistic and raring to go, people who are ready and willing to roll up their sleeves and work with you to reach the right solution.

3) Information architect

The information architect in your team needs to have a good understanding of the main body of your content and how it is (or should be) used. (Sometimes the writer and the architect are the same person, if it’s a small team.)

Often an external information architecture consultant can be very helpful in training your internal information architect and writers to perform a content audit and to help you to come up with the tagging policies and content models that you will use moving forward. However, do not leave this knowledge with the consultant — this is knowledge and experience you will need to bring in-house. Get training so that you have an internal information architect from the beginning.

4) Technical tools “guru”

You will need someone on your team who really understands your new toolset and the vendors you will be working with — someone who will understand your customizations and how everything works together.

This “guru” may be your writer or your information architect, it could be someone from  your IT staff, or it might be the Publications Manager who was managing the previous system. Someone on the team needs to understand how everything fits together, so that if later on something needs to be tweaked or adjusted (or even breaks), you have a member of the team who can works well with the vendors and speaks their language while also working well with your own IT department who will be hosting your environment.

This member of your team will also be a key evangelist in your organization, explaining what to expect in terms of changes to procedures and processes and getting your IT team on board.

5) DITA implementation consultant/trusted advisor

There are many aspects to a DITA project that you may not be familiar with. At some point you will need to hire or contract with someone who will:

  • Help you build a comprehensive project plan, including the dependencies of different pieces of your project
  • Help you coordinate with your external resources. These include the Information Architecture consultant, your software vendors, conversion experts, and style sheet developers.
    In the project plan, all these parts work in parallel. The better and more experienced your project manager is, the shorter your development cycle will be.
  • Have a broad and detailed view of all the moving parts and be focused on the success of your entire implementation
  • Bring experience to your team.

Now that you have built your project team, it’s time to train them. Tune in for our next blog, “Planning Your DITA Project: The 3 Pre-Project Training Goals.”

Home > Resources > Blog > DITA

Inside DITA: Line Breaks

Subscribe: Subscribe


Leora Betesh, Suite Solutions

One of the advantages of authoring content in DITA is the separation between content and presentation. This feature sometimes frustrates writers new to DITA who are used to adjusting layout and styling themselves. As my colleague Katriel Reichman is known to say, “Writing in DITA is like raising teenagers – you just got to learn how to let go!” We have to learn to let the style sheets handle all the presentation issues.

Occasionally there is need for authors to tweak presentation within the DITA source. One example is deciding where a line break should occur. Often this decision has to be made by a human being rather than by a computer.

Bad Line Breaks

Say I have the title: “Specifications, Features and Limitations of the DITA Accelerator by Suite Solutions.” I want the company name “Suite Solutions” to stay together on a line, and not break up as happened by default – see the screen shot below.

bad line break


To solve this, I could configure the transform to always keep the words “Suite” and “Solutions” together on a line. But this would require the author to know in advance all the words that can or can’t break on a line. In addition, this would get quite complicated with localized content, or even multi-lingual content.

The PDF renderers have a built in algorithm for calculating line breaks. The renderer will first look for a space in the text, then a hyphen, and so on. But in our case, I don’t want the renderer to break at the space.

This is a case where the author needs to have control of line breaks from within the DITA source. This can be accomplished using XML entities created for this purpose.

Non-Breaking Space

The first XML entity we will introduce is the non-breaking space. When we replace the space between “Suite” and “Solutions” with a non-breaking space, the renderer will be prevented from breaking the line here. The XML entity is   Many XML editors have built-in tools for inserting this character.

Zero Width Space

In other cases, rather than tell the renderer what words must be kept together, we may choose instead to tell it where it may break. Let’s say I have a long computer name or hostname included in my title, such as: “About the host c-61-123-45-67.hsd1.co.hostname.net”. By default the renderer will not know where it may break the hostname. The screenshot below shows the results with my current transform:

zero-width space


I probably want at least the “hostname” part of the text to stay together, but here a non-breaking space won’t help me as I don’t want any spaces in the hostname. This is where I might use a zero-width space. A zero-width space will add a space for the line-break algorithm, but won’t be visible at all in the presentation of the text. The XML entity for zero-width space is ​

Here is the zero-width space applied in the DITA source:

zero-width applied in DITA


Zero-Width No-Break Space

Another useful XML entity is the zero-width no-break space, also known as a word-joiner. This entity is similar to the non-breaking space in that it indicates that the text may not be broken at this point. The difference between the zero-with joiner and the non-breaking space is pretty obvious – the zero-width no-break space will not add any visible space in the output. The XML entity for the zero-width no-break space is ⁠

What are other scenarios where your authoring team has needed to tweak presentation via the DITA source? Let us know in the comments below.