Navigation
Recent Posts
Archive
The majority of the mod_rewrite examples you will find on the internet relate to .htaccess. I didn't want to use a .htaccess file but instead wanted to write the rules in either vhost.conf or in httpd.conf. There is a performance hit in using .htaccess on your server which is one of the reasons I didn't want to use it, also, if you don't need to use htaccess the recommendation is not to.
Although the rewrite rules in .htaccess are almost identical they didn't quite work for me in httpd.conf. I needed a series of rewrite rules to handle dynamic page content, namely to make them search engine friendly, such as turning page.php?id=2 into page-2.html
Mod_Rewrite can be complex, particularly in the use of regular expressions (regex). A regular expression is a string used to describe a pattern. They can be used for server side validation of user submitted input or for rewrite rules in Apache. If you haven't used regular expressions before their first sight might instil the same wonder as a wall filled with strange, alien hieroglyphics. Where do you even start to translate your dynamic URL into this odd language? Fortunately there is a key to the characters to help decipher a regex string or to translate your URL into a regex pattern..
| Character | Meaning |
|---|---|
| Â | Start matching from this point |
| $ | End matching at this point |
| . | Any character, equivalent to the wildcard (note: except: the dot will not match character to denote a new line i.e ). Be careful when using the wildcard, particularly in validation as you may not want to match every character |
| [ ] | Denotes a character class. Will match any one of the characters included between the square brackets, as in [xyz] will match any of x or y or z, not all three together. Note the dot is not a wildcard if used between square brackets, it's simply treated as a dot. |
| | | Or |
| ? | Optional |
| + | Match at least one or more times |
| * | Match zero to infinite number of times |
| { } | Curly braces are used to specify a specific number of times to match |
| ( ) | Used for Grouping |
| Use before characters to escape or negate the meaning of them $ . + | |
| - | Range for matching, as in [0-9] numeric characters or [a-z] lowercase characters |
When some of these characters are used in combination with each other their meaning may change
| Character | Meaning |
|---|---|
| [^ ] | Not like the following as in [^xyz] not any one of x y or z |
Some common Examples
| Character | Meaning |
|---|---|
| [0-9] | Numeric, will match any one numeric character |
| [a-z] | Lower case alphabetic |
| [A-Z] | Upper case alphabetic |
| [a-zA-z] | Alphabetic (upper and lower case) |
| [^0-9] | Not numeric |
| [0-9a-fA-F] | Matches a single hexadecimal character |
| "[^" ]*" | Matches between double quotes |
| ([^/]+) | Match any folder name |
Shorthand characters can also be used in pattern matching. You might be familiar with some of these from your PHP scripts. The majority of these will not be used in Mod_Rewrite I only include them for completeness and so as to refer back to them later.
| Character | Meaning |
|---|---|
| \d | Matches a single numeric character |
| \t | Matches a tab character (ASCII 0x09) |
| \r | Matches carriage return (ASCII 0x0D) |
| \n | Matches line feed (ASCII 0x0A) |
| \A | Only ever matches at the start of a string |
| \Z | Only ever matches at the end of a string |
| \b | Matches at a word boundary |
| \w | |
| \B |
The ^ and the $ are known as anchors. Anchors match a position before, after or between characters.
When you start to look at examples of using regex the terminology, metacharacters and their meanings becomes a lot easier to understand. Let's look at some simple examples first before applying what we know to Mod_Rewrite.
In testing our examples we will use PHP's function preg_match. Here we will define two variables $pattern (the pattern to test) and $match (the string we apply the pattern matching to). We pass both variables to the PHP function. The function will return 0 if there is no match and 1 if there is a match.
This will match one aphabetic character. It will fail if there is more than one character such as "sa". It will fail if the string contains a non alphabetic character. It will fail if the letter is in upper case.
This will match for a single uppercase or lowercase letter. Any other characters will fail
Using the curly braces we can define how many characters in the match. In this example any three letters will match, but the match will fail if it is only two letters or more than three.
Bu using "+" instead of the curly braces we say the match can occur one to infinite times. i.e this will match for any word or string comprised of letters.
Remember the caret ^ negates a class. In the following example only characters which are not letters will match. This includes symbols like the comma etc.
The following will match any number of aphanumeric characters
Our pattern might have different elements we want to match. Lets add extra classes. In this example Lotus and Lotus123 will match. We've made the 123 optional (note how we have gpouped it with () brackets
In the following example we introduce another character s to denote a space. As we have added ? to it, i.e. s? we are saying it is optional. This pattern would match Lotus, Lotus123 and Lotus 123
Of course if we wanted to match the word and only the word Lotus completely we would use the following. This will only match Lotus, Lotus123, Lotus 123, lotus, lotus123 and lotus 123. But of course this would also match Lotus 345
In these examples the [0-9] could equally have been written as [d] indicating a digit.
These are the basics but working through them should enable us to read and understand regular expressions, we can understand the quantifers(*, ?, +) and the anchors (^, $, \b, \w) and the other metacharacters used in regex to pattern match.
Lets now apply what we have learned to examples using Apaches mod_rewrite. At first we will just examine the pattern matching, we will then apply it to the rewrite syntax. Don't practice on your live server as, if you're unfamiliar with mod_rewrite and regex, your rules might render unexpected results.
The basic syntax for Apache mod_rewrite in httpd.conf is
RewriteEngine on
RewriteRule ^PatternToMatch$ WhatToDo
For example the rule below will match the web page ella.html and rewrite it to mark.html
RewriteEngine on
RewriteRule ^ella.html$ mark.html
i.e. the the URL will say ella.html but the content served up will be mark.html
The first part (^PatternToMatch$) is what a user will type in as a URL or click on to follow a link. For search engine indexing it is better if this link is a static page rather than a dynamic one. The mod_rewrite is a cloaking device. Our "bird of prey" php pages can disappear and appear as static html.
When the user clicks the link for a static html page mod_write will apply the matching rules we've given it and display the content from our dynamic php page.
Lets suppose we have a blog. The actual URL of a post might be blog.php?id=122. We might prefer a user to link to it as follows blog.php/2006/12/02/here-it-is.html
So in the URL we are looking for a specific match. Lets build it up
The more you define the greater the load on the server.
Anything between ( ) brackets in our pattern we can use as variables in our match. IN our example we have four which will be known as $1, $2, $3 and $4. We can pass these to our match as these will be the variables needed for our PHP script to run without giving a 404 error. Our PHP script will look like this
blog.php?date=$1-$2-$3&name=$4
ReWriteBase /archive/
Command Flags
| Character | Meaning |
|---|---|
| [R] | Redirect. Write as [R=301] for example to change the type |
| [F] | Forces the URL to be forbiden |
| [G] | Results in a 401 message |
| [L] | The last rule. Use this at the end of every rewrite rule that doesn't link together. |
| [N] | Rerun the rules again from the start |
| [C] | Chains the rule with the next one |
| [NC] | No case. Make the rule case insensitive |
When you change a URL to directory level remember you URL's for css, javascript, images etc. need to use absolute rather than relative path or they won't be found.
Posted in: Business
Tags: Regex | Regular Expressions | Apache | Mod Rewrite
© Eriginal Ltd 2011, all rights reserved