Web Junk

| | Comments () | TrackBacks (0)

INS and DEL elements re-discovered and used for version control. HTML document's version history kept within itself.

HTML pages as containing structure and storing their contents in separate files. Easier for everyone to deal with and splits up your page into logical pieces.

HTML Version Control

Summary: INS and DEL elements re-discovered and used for version control. HTML document's version history kept within itself.

I re-discovered the INS and DEL elements in the HTML 4.01 specification last week and realized that it could be used for version control of an HTML document. The document itself could be a history of all the changes made to it. The key piece that makes this possible is the datetime attribute of the elements. If it is unspecified, then you don't know when the change was made which means all changes are blended together which makes this method of version control impossible. The W3C did the elements a great disservice by supplying examples in the specification that lack the datetime attribute.

So I hacked on some code that uses jQuery and JavaScript to go through a versionned HTML document and it applies changes from various days. You can see it in action here. You select a day and then click "Apply Changes Up Until This Date" to cycle through all of the changes made up until that date. Any text highlighted in red has not yet been deleted, it will be deleted at a future date. Any text in green was inserted as part of a change. Click "Apply All Changes" to have it cycle through all the versions and apply their changes. This will turn the HTML document into the latest version.

This was all a hack and not very well explained on the page itself. I'm starting to code a PHP script that will generate the HTML document in various dates to make it much easier to see the differences.

I was excited about this because it allows version control software to become aware of the structure of HTML. Instead of the software having to define new methods and data structures to save changes made to an HTML document, they can use the HTML document itself. For example, SVN and GIT store meta-data about the files in separate folders like .svn and .git. Using the INS and DEL elements, you can leave all the changes in the document; the meta-data is already in the document. The advantage is that you can move around the HTML document and it will always contain its own history of changes. You could have this by default and you wouldn't have to worry about saving files like this "my_document.html.version2" or "my_document.html.version3".

The trouble with HTML is that you need to wear a fucking biohazard suit to deal with it. You should and cannot be touching it directly, there are too many things that can go wrong. It's the same thing with XML. If you are writing it by hand, something will inevitably go wrong. You'll forget to close a tag, or one of the attribute values will be incorrect, or you'll just get confused by it all. We no longer use regular expressions to parse HTML and XML, we instead use proper parsers. Unfortunately we haven't gotten to that point with regular web design, the visual editors usually produce awful code.

I'll be creating a biohazard suit to deal with editing HTML that will have integrated version control. This is a spare time project, just like everything else I work on so don't expect too much too soon. Dealing with the versions and grouping the changes won't be difficult, the hard part is the user-facing portion where the versions will be creating by editing the document. I'm thinking of using an already existing HTML/rich-text editor like TinyMCE or whatever it's called.

Assembling An HTML Page Out Of Components

Summary: HTML pages as containing structure and storing their contents in separate files. Easier for everyone to deal with and splits up your page into logical pieces.

I hacked on another web junk project last night that separates content from structure. Current templating systems all will separate content/structure from the presentation layer but they don't go the one step further to make it all worthwhile. When I'm working on a website, I have portions of it where particular content goes in. I want to write out this content as plain-text, maybe with a few HTML tags like italics and bold inserted, and then have it inserted into a particular part of a webpage.

This isn't too hard, templating systems already do the find/replace stuff. However, I wanted the plain-text to be processed and HTML tags to be added to it automatically. Basically markup without me doing any markup. For example, I have a file whose content is used as a list. Every paragraph in the file is a list item in an unordered list. In another file, every paragraph should fit into an HTML paragraph tag.

The advantage for me is that if I have to collaborate with someone, I can send them the plain-text file and they can edit it without learning some fancy markup language or HTML. All they need to do is be able to write out sentences and paragraphs. Then I just drop the file back in and the proper content is generated. This is much easier than going through the file manually, looking for content differences and adding HTML tags.

This requires a biohazard suit for generating the HTML and for processing the plain-text content. The classes for processing the plain-text content are finite state machines: they read a line, check for some conditions and then do something. I'll post the code soon.

The hard part isn't inserting the content into the structure, it's finding out how much you want to split up the content. One section could be split into many different sub-sections and each sub-section could contain several paragraphs. Traditional CMSes (Content Management Systems) work on a page-based basis. They'll let you edit all of the content for the main part of the page, but they won't let you edit or split up that content into sub-sections.

Anyway, I'll have to write up some documentation and a tutorial on exactly how this all works. It'll probably be released under the GNU GPL version 3 license.

0 TrackBacks

Listed below are links to blogs that reference this entry: Web Junk.

TrackBack URL for this entry: http://www.neverfriday.com/cgi-bin/mt/mt-tb.cgi/48

Comments