Andrei's blog

Use your own XML generator

TL;DR: It is easy to convert JSON into XML, and JSON tends to be easier to work with. You can avoid an xml library with a short function.

XML is the markup language with <tags> and a parent of HTML, SVG, RSS, and protocols like XMPP. So for simplicity assume we are trying to make an HTML page.

Reading (parsing) formats is hard, but writing is easy. For writing, ugly string concatenation gets the job done, but concatenation embeds itself deeply into the code: even for a minor change, you have to run around tweaking all the strings in the script.

Usually you would use a library or some kind of template system, but those are confusing, overkill, or to inflexible especially for beginners. You need to go look up the docs, figure out new syntax etc.

But XML (at it's core) is simple. It is a tree of tags each with attributes and children tags. Trees as data-structures are simple and familiar: most categorizations, files systems, and hierarchies are trees. Folders, with folders. Boxes, with boxes. Tree branches with tree branches.

<parent attribute1="a1" attribute2="a2" attribute3="etc.">
  <child attribute1="child1">
    <subchild>...</subchild>
    <subchild>...</subchild>
    ...
  </child>
  ...
</parent>

Most tree data-formats (JSON, YAML, INI, XML etc.) are similar and easy to convert into each other. Given that Python and JavaScript have dictionaries (JSON) built into them, it makes sense to go create the tree using builtin tools, and perform a conversion later.

The conversion would be one recursive function (here it is in Python):

element = {
    "tag" : "div",
    "attributes" : {
        "id":"myelement"
    },
    "children" : [
        {"tag":"p", "attributes":[], "children":["World!"]},  # child element
        "<script>alert('Hello')</script>" # also child element
    ]
}

# I am using the JSON/dictionary structure {"tag":"tag-type","attributes":{...},"children":[...]}

def convertDictToXML(element): # element is dict or string
    if isinstance(element, str):
        return element;
    else:
        result = "<" + element["tag"]
        for key in element["attributes"]:
            result+=" "+key+"="+element["attributes"][key]
        result+=">\n"
        for child in element["children"]:
            result+=convertDictToXML(child)+"\n"
        result += "</" + element["tag"] + ">"
        return result

print(convertDictToXML(element))

python3 script.py > file.html or Copy Paste

Any programmer should be able to script this in 5 minutes.

When you use this approach, you understand fully how everything works and how to modify it, because you made it yourself. And if something needs a special conversion, you can always convert it to a string manually and add as a child into the structure (as I did with the script tag in the example). Whatever other special edge case for formatting you have, you can always add another if statement into the converter. If you want indentation in the output, adding a depth counter into a converter is much easier than adding it into your main code.

Note: Be careful with copying elements. Remember, Python doesn't deep copy it's objects (nor do other languages)

Note: If you want the style attribute to be a dictionary, add an edge case in the for loop.

Writing a converter for yourself might help you appreciate or understand why the real xml library is designed the way it is, making it easier to use on the next project.

Unfortunately writing your own converter will backfire the moment you need to read what you generated. Parsing XML (or any other format) is edge-case hell and best left to libraries.

And if you are going to say something about efficiency, or devs forgetting how everything works when they come back later, or that making a tree is much harder in lower level languages, just remember that you can use a library or whatever you want. I am just pointing out an (in my opinion underrated) way to do it.

- very qualified XML expert