Saturday, August 04, 2007

Implementing Comments + A Lightweight CMS Data Format

So I followed Lee's suggestion from my last article, and had a look at the Tim Bray article on implementing a non-database post comment system. This describes a file-based approach with three folders where comment fragments live: incoming, rejected, and accepted.

My initial reaction was that this is massive overkill for the task. On further reflection I have come to the same conclusion. :-)

Instead I think I might write all of the contents for a given post to a single file. Since editing and deletion of comments will be uncommon activities, contention should not be an issue. Maybe you wouldn't want to run Slashdot on such a system but it should do for any normal volume use.

When a reader contributes a comment it'll be threaded in the appropriate physical place in the file, but marked with the status "incoming". This action might optionally send the editor an email. An editorial function will be available to view all such comments. They can be "approved" or "rejected", the status marked accordingly. When the article is built for output to normal (non-editor) readers, only approved posts will be gathered.

Physically, comments will be stored like this, using my previous file example:
/article/2007/07/21/My_New_Article.dat
/article/2007/07/21/Another_Article.dat
/article/2007/07/21/Another_Article.comment.dat
/article/2007/07/22/Short_Dissertation.dat


I'd rather have 30 comments in a single file, regardless of their status, than to have them in 30 different files scattered over three folders.

As far as formatting goes, I followed up on Helge's recommendation for YAML. Believe it or not, this looked too heavyweight for me... may as well use XML. But if you do need a pure-Python YAML implementation get PyYAML. If you prefer a wrapper around the C library for same, use PySyck, which provides Windows binaries.

A much simpler format is JSON which provides only two data structures: lists and dictionaries. But that's enough for pretty well anything I can think of. It is minimal, well-formed, readable and open. Not to mention the parsers are about a thousand times simpler and a hundred times faster than those for XML. (Yes, I am picking these numbers out of thin air.)

This comparison of the Python implementations reveals that Python-cjson is way fast but demjson is more tolerant of edge cases. The latter should be fine for the relatively small data payloads I'm needing, and has the benefit of being a pure-Python implementation.

So it looks like I'm heading down the rosy JSON path. I still reserve the right to use XML in the content of the JSON data structures. We'll see if that is necessary.

RELATED POSTS

No comments:

Post a Comment