HTMLtoMochiDOM HTML to MochiDOM script

by Matt Harrison

Contents

Edit

MochiKit's DOM utilities

MochiKit, a library to "make javascript suck less" (read provides AJAX and other javascript functionality) provides DOM utilities to programmatically create HTML, rather than embedding html in a string. Doing it programmatically provides a few benefits, such as escaping of entities and guaranteeing well-formed html. But it may seem like extra work for those used to writing their html by hand. Or say you are a web developer but you get your design code from a gui person. Now you have to convert their pretty html to the DOM api. Not fun and error prone! On that note I created a script that will convert well-formed chunks of html (read valid xml) to javascript code for MochiKit.

Here's a quick example, say my friendly gui developer gave me the following html and I wanted to place it in a div using mochi.

<div class="section" id="description">
<h1><a name="description">Description</a></h1>
<p>As you probably know, the DOM APIs are some of the most painful
Java-inspired APIs you'll run across from a highly dynamic language.  
Don't worry about that though, because they provide a reasonable basis 
to build something that sucks a lot less.</p></div>

The corresponding javascript code using Mochi is this:

DIV({'class': 'section', 'id': 'description'},
 H1(null,
  A({'name': 'description'},
   "Description"
  )
 ),
 P(null,
  "As you probably know, the DOM APIs are some of the most painful Java-inspired
APIs you'll run across from a highly dynamic language.  Don't worry about that
though, because they provide a reasonable basis to build something that
sucks a lot less."
 )
);

If you want to do this by hand, by all means go ahead. But, I've written a script that will take well formed html and spit out the javascript code for you.

Edit

Script to convert HTML to MochiDOM

The script is written in python and requires the ElementTree XML Parsing Library.

To run the script save the contents of the file shown below to htmlToMochiDom.py. Typing
python htmlToMochiDom.py
will run the 7 unit tests included in the code. If you pass a filename as an additional argument, then instead of running the tests, it will spit out the javascript code (or tell you that your html wasn't well-formed). If my ui person had sent me a snippet of html called
snippet.html
I would type
python htmlToMochiDom.py snippet.html
to generate javascript from this html.

Edit

This is a live Document

Feel free to edit this document as necessary (it's in a wiki). If you have questions, comments, suggestions, you can either email me (mharrison at spikesource DOT com), or create a FAQ section in this page. Enjoy and have fun creating javascript!

Edit

htmlToMochiDom.py

#!/usr/bin/env python
# Copyright(c) 2004, SpikeSource Inc. All Rights Reserved.
# Licensed under the Open Source License version 2.1
# (See http://www.spikesource.com/license.html)

from elementtree.ElementTree import parse, XML
from elementtree.SimpleXMLWriter import XMLWriter
import sys
import unittest

__author__ = "matt harrison"
__license__ = "osl 2.1"
__version__ = "0.01"

INDENT_START=0  #adjust these two values for spacing...
INDENT_INCR=1
ELEMENTS = ["A", "DIV", "INPUT", "SPAN", "TABLE", "TBODY", "THEAD", "TFOOT",
            "TR", "TD", "TH", "UL", "OL", "LI", "H1", "H2", "H3", "BR", "HR",
            "LABEL", "TEXTAREA", "FORM", "P", "IMG"]

def _getFileContent(filename):
    fin = open(filename, 'r')
    return fin.read()

def _getElementTree(content):
    try:
        tree = XML(content)
        return tree 
    except Exception, e:
        print "Please make sure content is valid XML:\n%s"%content
        sys.exit(1)
    
def _getAttr(elem):
    if elem.attrib:
        return str(repr(elem.attrib))
    return "null"

def _elemToDOM(elem, indent=INDENT_START):
    if elem.tag.upper() in ELEMENTS:
        attrs =  _getAttr(elem)
        children = _getChildren(elem, indent+INDENT_INCR)
        if not children:
            if attrs == "null":
                #just close current element
                return "%s%s()"%(indent*" ", elem.tag.upper())
            else:
                #has attribute and no children
                return "%s%s(%s)"%(indent*" ", elem.tag.upper(), attrs)
        else:
            return "%s%s(%s,\n%s\n%s)"%(indent*" ", elem.tag.upper(),
                                        attrs, children, indent*" ")
    return ""

def _getChildren(elem, indent):
    content = []
    if elem.text and elem.text.strip(): #strip to not include newlines...
        content.append('%s"%s"'%(indent*" ",elem.text))
    for child in elem:
        if child.tag.upper() in ELEMENTS:
            content.append(_elemToDOM(child, indent))
        #deal with mixed content... this is a little wierd
        #see http://effbot.org/zone/element-infoset.htm#mixed-content
        if child.tail and child.tail.strip():
            content.append('%s"%s"'%(indent*" ",child.tail))
    if content:
        return ",\n".join(content)
    return ""

def textToDOM(text):
    tree = _getElementTree(text)
    depth = 0
    dom = _elemToDOM(tree)
    return dom+";"

class TestDom(unittest.TestCase):
    def testSimple(self):
        html = "<table/>"
        dom = textToDOM(html)
        self.assertEquals(dom,"TABLE();")


    def testSimple2(self):
        html = "<table><tr/></table>"
        dom = textToDOM(html)
        self.assertEquals(dom,"""TABLE(null,
 TR()
);""")

    def testMultipleChildren(self):
        html = "<table><tr/><tr/></table>"
        dom = textToDOM(html)
        self.assertEquals(dom,"""TABLE(null,
 TR(),
 TR()
);""")
    def testAttr(self):
        html = "<table id='foo' bar='baz'/>"
        dom = textToDOM(html)
        self.assertEquals(dom,"TABLE({'bar': 'baz', 'id': 'foo'});")

    def testText(self):
        html = "<table>text</table>"
        dom = textToDOM(html)
        self.assertEquals(dom,"""TABLE(null,
 "text"
);""")

    def testMixedContent(self):
        html = "<div>content<a/>more content</div>"
        dom = textToDOM(html)
        self.assertEquals(dom,"""DIV(null,
 "content",
 A(),
 "more content"
);""")
        
        
    def t1estBadXML(self):
        #exiting on bad XML how to catch?
        html = "<table id='foo' bar='baz'>"
        dom = textToDOM(html)
        self.assertRaises(ExpatError, testToDOM, html)


    def testLonger(self):
        txt = """<div class="section" id="description">
<h1><a name="description">Description</a></h1>
<p>As you probably know, the DOM APIs are some of the most painful Java-inspired
APIs you'll run across from a highly dynamic language.  Don't worry about that
though, because they provide a reasonable basis to build something that
sucks a lot less.</p></div>"""
        dom = textToDOM(txt)
        self.assertEquals(dom,"""DIV({'class': 'section', 'id': 'description'},
 H1(null,
  A({'name': 'description'},
   "Description"
  )
 ),
 P(null,
  "As you probably know, the DOM APIs are some of the most painful Java-inspired
APIs you'll run across from a highly dynamic language.  Don't worry about that
though, because they provide a reasonable basis to build something that
sucks a lot less."
 )
);""")



if __name__ == '__main__':
    if len(sys.argv)>1:
        print(textToDOM(_getFileContent(sys.argv[1])))
    else:
        unittest.main()


Edit

Text Mate Modification

For those of you using Text Mate, just modify the bottom chunk of the script as follows:

if __name__ == '__main__':
    if len(sys.argv)>1:
        print(textToDOM(_getFileContent(sys.argv[1])))
    else:
        print(textToDOM(sys.stdin.read()))
        #unittest.main()
MediaWiki

This page has been accessed 6,495 times.

This page was last modified 20:38, 6 June 2007.