The comments table has been repopulated! I’m currently working on restoring their hierarchy. For ease, I’m working backwards, from the newest to the oldest posts. Please bear with any site instability/sluggishness you might encounter. Cheers! ☺

Building Corpus Juris

Written by Raymond Santos Estrella on Tuesday, 26 June 2012. Posted in 2012

At the very heart of Corpus Juris is an idea, an ideal view of how easy it should be to find a legal text. There’s a few obvious places to look for on the web when you’re looking for laws or jurisprudence. Of course there’s the official websites of the House of Representatives and the Senate. Then you have gov.ph and the Official Gazette. Then there’s the unofficial sources, LawPhil at the forefront of them. For offline CD-based access you have CD Asia’s Lex Libris. But none of them look good to me. They’re all made of an ugly, unsightly mess. Why? Because there’s no one there who cares about the presentation of the written word. No one on their IT department cares about things like proper typography or good web design. None of the aforementioned entities, I believe, care as much as I do at presenting this huge wealth of information to the public in a manner appropriate to the medium they are presenting it in.

Well, I care.

And that is why I started this whole thing so long ago. It’s been almost four years since I first started taking steps in the direction that I believe will be in the best interested of the public with regard to being able to freely access these legal texts.

I’ve got something really awesome and mind-blowing for the next step in the sites evolution. Scratch that last. I’m sure this will be a revolutionary change for this medium and I believe I’ve already got most of the tools I need to deliver this quickly. My only constraint at the moment seems to be time. I’m so seriously lacking in time at the moment because I’m reviewing for the bar exams that I sometimes think that I might miss out on this chance to do work on this update.

I think I messed up on my first few iterations of the site. I was naïve in thinking that my current webhost would be able to deliver the kind of performance for the backend of a database-driven site. Don’t get me wrong, I truly believe that databases are clearly the future in terms of handling the content of the web and I’ve spent more than one night staying up debating on the “regression” to a static-based website. However, until I can afford a better webhost that will not artificially and arbitrarily limit my MySQL and PHP memory, if I were to continue this path, it will certainly lead me to an extremely sluggish and poor-performing site, prone to breakdowns, rendering errors, and server instance crashes.

By going back to a static-based website design, I’ll at least ensure one thing: faster, easier updating. This is due to the removal of two of the more time-intensive parts of the current updating process:

  1. Transfer of edited HTML code into the CMS and then manipulating its date-time stamps, metadata, etc.; and
  2. Updating remote server databases with local server databases.

This former is time and effort-intensive on my part while the latter is time-intensive in terms of the band-width involved and, as I see it, going to turn out to be even more aggravated in the future as the database grows and grows to ever larger sizes.

There are pitfalls to this approach, though, and I’ve tried to mitigate against that with a lot more conscientious planning than I originally did with the original site design.

First is that the next design should stand the test of time for quite a while. I believe that the design should be freezed for at least two years and even the design should be updated quite sparingly, probably to take modest advantage of any new developments in HTML, CSS, and Javascript.

You see, this design will be seen on ALL the pages, as if it were a template in a modern PHP-MySQL based CMS, only it will be deployed as-is on the HTML files themselves. This may seem a disastrous misstep that you’ve probably seen on ugly sites like LawPhil and ChanRobles.

I’ve foreseen this obstacle and I think the best way to tackle this is to make a compromise between static HTML and server-rendered PHP. By making all pages hybrid PHP and HTML, I can still ensure that I can still easily make changes to the underlying site design while keeping the core document intact and free from changes. This requires a certain modularity to the underlying non-document code.

By separating the portions of the code into header, navigation, footer, and main document, I can just make all my current work stable and just add the new code to call separate PHP files for the header, nav, footer, etc. Any changes for those non-document modules can be made on their respective files and not on the whole site itself. This makes the whole thing a lot more dynamic and any tweaks to the template will be much, much easier to do. In fact, I foresee that it won’t be too unlike using a CMS’ template.

If any of you have seen the CSS Zen Garden designs, this is the same principle at work, only on an expanded scale in that I’m not only changing the CSS but also certain non-document elements of the HTML as well. Thus, should the need arise that I have to add a new menu element, I can just change the navigation PHP file without making site-wide changes that would take maybe half an hour to propagate around the whole site by means of Dreamweaver’s find-replace function and the hours it would take to upload a copy of the whole site to the server.

It is in that last point that I find LawPhil’s implementation to be weak and myopic. I’ve looked at their site’s organization and it’s the old way of doing things. It would probably take over a day to do any major site redesign which is probably why their site has stagnated the way it does and looks just about the same as it did back when people were still on dial-up. Its shit. There’s a ton of information there but believe me, its shit underneath. Oh, have I mentioned the ugliness of the underlying code? Sure, everything renders reasonably correctly on modern browsers but underneath, it’s a plethora of code that I’m pretty sure would just fail on any W3C HTML standards test. There’s misused tags, unused tags, typographical errors, archaic tags, and just plain nastiness that gives me the creeps whenever I right-click on a page and select view page source. I did mention, however, that it renders okay but really, that’s a testament to the incredible work of the browser-makers on automatically fixing stupid programming errors on the part of web designers. That is nothing to be proud of. We’re in the 2010s and these guys are probably still coding in Frontpage 9x or worse, Notepad. It makes me cringe at the thought of how much waste is going into keeping their site going.

Second, I love optimizing code.

Even more than writing it from scratch, I love looking at someone else’s work and doing some editing. If I were really into writing and journalism, I think I would have loved to be an editor someday. As things stand, I absolutely love doing the programmer’s equivalent.

Looking at the web’s sources of these texts, you can see a lot of superfluous crud that in them. Again, I’ll cite miscellaneous archaic tags that are no longer under any of the modern specs used for web pages. Please, guys, those BLINK tags are gone. They’ve gone the way of the dodo about the same time the internet bubble burst. SMALL and BIG tags? Deprecated and folded into CSS. And all these tab stops, extra spaces, double tags (<i><i>what?!<i><i>), copyright comments (<-- hello -->), and multiple DIV nestings all add up to make bloated code.

Try to open the source code of a document from the Supreme Court itself (sc.judiciary.gov.ph) and you’ll see what I’m talking about. Yes, that’s actually a document written in Microsoft Word—which is fine—and then saved as a webpage using the File » Save As… dialog. Any smart guy can see all this extraneous information that adds nothing to the document. I’ve taken about ten of these pages, totaling about 2Mb in size altogether and did some optimization to it. I removed all the detritus Word puts in, separated the document from its presentational elements which was all put into a single CSS file that all the files shared, and just plain cleaned it up so that it would parse correctly on any modern browser and pass HTML validation tests. That original 2Mb dropped to around 300Kb. That’s an 85% decrease in size!

By my calculations (yes, there’s math to back this up which I won’t go into right now), all published jurisprudence and laws up to, say R.A. No. 10,000 would hover around the 1.2 Gb mark when expressed in an XML-like language. This includes any HTML presentational markup tags such as boldface, italics, super and subscripts, footnotes and endnotes, paragraph breaks and alignments. Left to the other guys and their stupid programming skills, this would be worth 8Gb and still look ugly. This is assuming the 85% reduction. Even if we were to be conservative and assume only an 80% reduction, the other guys would still have files in excess of 6Gb! This just can’t be right.

Indeed, in this day and age, people don’t care much about the bloat. We have ever faster processing, bigger storage, and faster connections but not everyone has access to these advances in technology. Some still have thin pipes or are in places or situations where a suboptimal internet connection is the norm (think mobile devices and slow cellular data service). It is specifically for these instances, secondarily, that I have endeavored to make documents on Corpus Juris as lean and optimized as possible. I want it to be as miserly with bandwidth and computational resources as is feasible.

Remember I mentioned the thin pipe as a secondary reason for optimization. What’s my main motivation? It seems silly to say it now but hear me out: it’s because I can’t sleep if it were otherwise. Honestly, it may very well be an obsessive-compulsive thing. I know in most cases, just having validated code is more than enough. But truly, I would think about it all night, and probably driven nuts if I knew there was more performance I could eke out of the code by making it faster, smaller, and more precise. I know hardly anyone will ever look at the HTML code itself, but that’s not the point. If I was sure I made good, optimized code behind all the screen render, then I’ll be able to sleep soundly at night knowing that I had this exquisite code work sitting underneath it all. Most will not understand this. But I am sure good programmers, like good craftsmen, care about keeping things neat, even when these are parts that no one else will ever see.

There is, however, another more tangible benefit to all this effort: search. In this day and age, one can never be too invested in ensuring that pages are well indexed by all the major search engines. This is at the heart of it all. Not every really knows how to find what they’re looking for and there’s only a limited number of things one can do to help (indices, search books, date organization, tables, etc.) but nothing beats a good search engine.

Now, it’s almost a given that properly formatted HTML will be a big plus against improperly formatted HTML when it comes to search rankings. It just goes a long way toward ensuring that the information can be found by people looking for it. At the end of the day, I guess that’s the whole point to this whole thing, isn’t it? Ensure that the public can easily access and find this invaluable information.

Share This Article

About the Author

Raymond

Raymond Santos Estrella

I guess I should really make a proper writeup here. Something witty or maybe a joke to add some levity. I’ll come back to this when I have time. If you have any suggested copy that I can insert here, drop me a line.

Leave a comment

You are commenting as guest. Optional login below.