HtmlSource - a new DBO driver for CakePHP

Posted by Jad on December 06, 2007

Ok, ok - I’ve been slacking on this blog again, but I will keep that for another post where I will announce some major changes I have been thinking of lately. For today, I’d like to introduce the new DBO Source Driver: HtmlSource - which is completely functional but still lacking some of the features I have planned for it.

So what’s an HTML DBO driver you ask?

Simply put, it’s a way to treat any HTML page like a database and be able to retrieve (scrape) certain parts using an SQL-like command:

SELECT href, title FROM a WHERE class="submit"

Continue reading…

Domain TLD Parser

Posted by Jad on September 17, 2007

Parsing URLs in PHP isn’t perfect. Don’t get me wrong here, it does the job when it comes to breaking the URL in logical parts, but, it doesn’t have any options to parse the host into domain name, TLD and sub-domain(s). Most probably because new TLDs are coming out from time to time and they want to avoid having to update that same function with every new TLD release.

To over-come this limitation and because I needed some way of extracting the domain, sub-domain and TLD out of each given URL, I came up with the following class: Domain TLD Parser

It parses hosts with all kinds of different TLDs, even the country-specific ones like ‘.co.za’, ‘.ne.jp’ or ‘.ltd.uk’. Here is an example:

<?php
$url = $_SERVER['HTTP_REFERER'];
include('/path/to/domain_tld_parser.class.php');
$domain = new DomainTldParser;
echo '<pre>';
print_r($domain->parse($url));
echo '</pre>';
?>