What’s the difference between HTML and XML files?

By GeaSpeak Team | 2021-07-05

As part of the localization process, we deal with many files with extensions that are different from the traditional .doc or .docx, but why is it important to know what they mean?

This time, let’s focus on HTML and XML files. Understanding how they are created and used can be of help in situations where you do not have enough context or are uncertain about a phrase or sentence you are localizing (e.g., whether it is a title, a header or a button).

Let’s take for instance the localization of a website. In that case we are going to work with an HTML file. But, what does HTML mean? It stands for “Hypertext Markup Language”.

“Hypertext” refers to a word, phrase or image that you can click on to jump to a new document or a new section within the current document that the HTML page may contain. They are also known as internal links. As for “Markup Language”, it refers to the way tags are used to define the layout and elements within the webpage.

In HTML, any command that is enclosed by “<” and “>” characters, or by “&” and “;” characters is considered a tag. There is a predefined set of them that you can find here. For example:

This set of tags allows us to infer that the phrase “Herramientas para desarrolladores” makes up a paragraph in the webpage (<p>), that it is in bold and therefore considered highly relevant (<strong>) and that it is followed by a line break (<br />) before the end of the paragraph. So there is an extra empty line after this sentence. If this was a title, we would have seen <h1></h1> as tags. An interesting fact is that a webpage might have up to 6 different title levels, ranging from <h1> up to <h6>.

Now let’s see what XML means. XML stands for “Extensible Markup Language” and is very similar to HTML. One of the main differences is that tags are not predefined, meaning that they are “invented” by the author of the document. So <br> and <p> can perform different functions in an XML file.

Another difference is that whereas HTML was designed to display data focusing on how data looks,  XML was designed to carry data focusing on the data itself. XML files facilitate exchanging data between incompatible systems and also allow you to expand or upgrade to new operating systems, applications or browsers without losing data. That is why the format is called “Extensible”.

We hope this little insight will help you the next time you are localizing a website or software and you have HTML or XML files as reference. Use a standard text editor or browser to read them and remember: code in these files is readable and it can be your ally when it comes to clarifying context.