UGN Security
Posted By: Gremelin Utilzing SE Friendly URLs in PHP - 07/01/07 04:04 AM
Disclaimer
Please note that all code contained in this thread was created, and belongs to, VNC Web Design and Development and may be freely used for non-commercial purposes with the agreement that credit will be visibly identifiable within your application and within code. Commercial applications must obtain permission before utilizing code.

This article may be reproduced and republished with prior written permission from myself. This code is being provided here for example, but should fully work "out of the box".

Onward!
One thing that I've always been an advocate of is web standardization; a loose part of this can be SE Friendly URLs which roughly do away with certain characters in URLs to pages which Search Engines (or poorly coded web browsers) dislike.

These characters can be, but are not limited to:
&, ?, =

There are several ways to go about this, and I'll introduce two which I've used; these aren't the "only" way to go about this, but they are rather simple and efficient.

Option A, mod_rewrite
Not necessarily my favorite method of things, it works and it works well; in fact, our IRC information page here on UGN utilizes this.

Please note that there are MANY ways to do this via mod_rewrite, and I'm sure there are more efficient ways of doing it than I use below, but this is a good starter way of allowing SE friendly URLs via mod_rewrite:

Example .htaccess entry:
Code
# Tell mod_rewrite we're wanting to utilize it
RewriteEngine on

# SE Friendly URLs
RewriteRule ^irc/(.*)/(.*).php /irc.php?section=$1&channel=$2


This will allow SE Friendly URLs on a script named irc.php with the section of $1 (which is the first (.*)) and "channel" of $2 (which is the second (.*)).

So, accessing the page as:
http://www.undergroundnews.com/irc/chat/staff.php

You'll see that the section is "chat" and the channel is #staff. .php is just there as a virtual extension and isn't needed (but it is there none the less).

Another way of doing this is via the path_info variable; I like this method more as it allows all options to be worked on via php and can be adjusted a lot easier and is well more powerful.

For users of Apache2 you'll at times need to "turn on" path info in your .htaccess file as:
Code
AcceptPathInfo On


The coding I tend to go with for utilizing Path Info in PHP is:
Code
// Path Info Translation
// ------------------------------
// Take the path from the URL.
	$path = strip_tags(addslashes(htmlspecialchars($_SERVER["PATH_INFO"])));

// Build an array from the path.
	$translation = preg_split("/[\/]+/", $path);
	unset($translation['0']);


What this does is reads the path after your script name (in this case articles.php) and splits it into an array. After the array is split it unsets the first row as it will always be empty (so there is no point in allowing it to stay).

Now, translating these into our script is done via:
Code
// Split the array into useable chunks.
	if($translation["1"] == "category") { $category = (int)$translation["2"]; }
	elseif($translation["1"] == "task") { $task = strip_tags(addslashes(htmlspecialchars($translation["2"]))); }
	elseif($translation["1"] == "article") { $article = (int)$translation["2"]; }
	if($translation["3"] == "page") { $page == (int)$translation["4"]; }


Which basically reads, if line2 of the array is one of the 3 possible variables (category, task, or article) to pass the value of line3 to the script. If line4 contains the variable "page" it passes the value of line5 to the script as the page number.

This looks like one of the following:
articles.php/category/21/page/1
articles.php/category/21
articles.php/article/54
articles.php/task/rss

You could also pass a virtual extension (.html, .php, etc) if you'd like to do so, however you'd want to make sure the script knows to filter it out so it's not passed to the parser.
Posted By: Gremelin Re: Utilzing SE Friendly URLs in PHP - 07/01/07 04:11 AM
Originally Posted by Gizmo
Code
// Split the array into useable chunks.
	if($translation["1"] == "category") { $category = (int)$translation["2"]; }
	elseif($translation["1"] == "task") { $task = strip_tags(addslashes(htmlspecialchars($translation["2"]))); }
	elseif($translation["1"] == "article") { $article = (int)$translation["2"]; }
	if($translation["3"] == "page") { $page == (int)$translation["4"]; }


figure I should referance my "security" in place here and what they differant items mean...

(int) means "accept only a number"
strip_tags means to strip any markup code
add_slashes means to add \ to any "'s
htmlspecialchars means to convert non-ascii elements to their ascii varient.
Posted By: Gremelin Re: Utilzing SE Friendly URLs in PHP - 07/01/07 06:57 AM
For anyone wondering what my actual code block looks like (including the virtual extension):
Code
// Path Info Translation
// ------------------------------
// Lets set some variables
	$fake_html_extension = ".html";
	$fake_rss_extension = ".rss";

// Take the path from the URL.
	$path = strip_tags(addslashes(htmlspecialchars($_SERVER["PATH_INFO"])));

// Lets weed out any "baddies"
//	$replaces = array(".xml", ".rss", ".php", ".html", ".htm", ".shtml");
	$replaces = array($fake_html_extension, $fake_rss_extension);
	$path = str_replace($replaces, "", $path);

// Build an array from the path.
	$translation = preg_split("/[\/]+/", $path);
	unset($translation['0']);

// Split the array into useable chunks.
	if($translation["1"] == "category") { $category = (int)$translation["2"]; }
	elseif($translation["1"] == "task") { $task = strip_tags(addslashes(htmlspecialchars($translation["2"]))); }
	elseif($translation["1"] == "article") { $article = (int)$translation["2"]; }
	if($translation["3"] == "page") { $page == (int)$translation["4"]; }
// ------------------------------
// End Path Info Translation


As long as line 4 isn't "page" you can use it as a "virtual extension" or even SEO URLs; as anything placed there will be ignored by the parser; so by running it through my seo_urls function you can push them as:
articles.php/category/8/text.html
articles.php/article/53/text.html

My seo_urls function, and my "sanitize" function (for ensuring no "invalid" data is passed to the urls) is as follows:

Code
function shorten_length($str, $start, $end) {
	if(strlen($str) > $end) { $str = substr($str, $start, $end); }
	return($str);
}

function make_sane($str) {
	$str = htmlentities(htmlspecialchars($str));

	$patterns = array("’", "“", "”");
	$replaces = array("'", """, """);
	$str = str_replace($patterns, $replaces, $str);

	return($str);
}

function seo_titles($str, $type, $shorten) {
// Lets eliminate bad content
	$str = htmlspecialchars(make_sane($str, ENT_QUOTES));
	if($shorten == 1) { $str = shorten_length($str, 0, 50); }

	$patterns = array(""", "'", "<", ">", "&", """, "\\", "|", "[", "{", "]", "}", "?", "!", "@", "#", "$", "%", "^", "&", "*", "(", ")", "+", "=", ";", ":", ",", ".", "'", ":", ";");
	$str = strtolower(str_replace($patterns, "", $str));

	$patterns = array(" ", "%20");
	if($type == 1) { $replaces = "-"; }
	else { $replaces = "_"; }

	$str = str_replace($patterns, $replaces, $str);
	return($str);
}
© UGN Security Forum