We need to tell Apache where and what to rewrite. You have two options here - more commonly, you can place the code in a .htaccess file. This is nothing more than a text file which when present in a directory will be interpreted by Apache. You may already be familiar with a .htaccess file, as these allow you set all kinds of server options, e.g. as a custom 404 error page. The other alternative is only available if you have root access and that is to place the code inside your httpd.conf. See load issues for more information.
Throughout this tutorial, I am assuming you are using a .htaccess file in the root of your domain (i.e. http://example.com/.htaccess), unless otherwise stated.
Initiate Rewrite Engine
Before we can rewrite, there is one option we must first set: FollowSymLinks. This is a security feature of the rewrite engine and you will not be able to rewrite without this option. In most cases, this will already be set in the server httpd.conf but you can safely state it again.
Additionally, if you use indexes, we must enable the Indexes option.
The final but most straightforward requirement is to turn the rewrite engine on. I recommend you begin all your rewriting code with the following three lines, but you may not need the first two:
This command can explicitly set the base URL for your rewrites. If you wish to start in the root of your domain, you would include the following line before your RewriteRule:
Rewrites are done according the rules you specify. You create a rule using the follow syntax:
Requests that match this PATTERN will be rewritten to the DESTINATION. Note that the format of the request that is compared against the PATTERN will be the requested filename on the server - without host or query string, or starting forward slash.
For example, if you have a RewriteRule code in domain.com/.htaccess and you make a request for the URL:
The string that will be compared to PATTERN is just dir/file.php. If you need to access the query string for your rewrite rules, see the advanced section which is a continuation of this tutorial.
At the most basic level, our PATTERN can be a simple string such as:
If you want to move a number of pages, it may not be practical to create a rewriterule for every page. This is where regular expressions (regex) patterns come into play. You may be familiar with wildcard notation (*) to match anything - a regex is merely an extremely extension of this.
For those of you familiar with regex, this tutorial continues below. If you are not completely comformtable with them, I suggest you visit the guide to regular expressions first and return to this tutorial when you are.
Consider this simple script: you have articles stored in a database and a PHP script that retrives them. The script articles.php takes the article ID in a GET parameter, for example /articles.php?id=24. Now already you could write the rule to put this in a nicer format - assuming we change our script to output the URLs in the format /articles/24, you can come up with the RewriteRule that will rewrite such a request back to the script:
NB: The original example did not use ^$ to anchor the string for the sake of simplicity. From now on, I will include these as at this stage you should be familiar with its meaning - articles/24 alone would match /a_different_dir/particles/243. Obviously that will give us undesired results so we must anchor the string using start (^) and end ($) symbols to ensure we only match requests that we want.
But what if the ID is 23? Or 25? Or 100? Now our regex pattern is really becoming useful, as we can match any number using a range:
Our parenthesis (that\'s brackets to you and me) around the regex for the number means we capture the match into a backreference. By itself, being able to create patterns that match an unknown is of limited use. We want to be able to use the unknown in our destination, and we can do that with a backreference. These take the form of $N, where the value that was matched in the first set of parenthesis is $1. The next match is $2, and so on.
Our new pattern will match any number you wish to choose and rewrite it to the articles.php script.
But why? All it does is make a URL look a bit nicer? This is only the tip of the iceberg - once upon a time, the general consensus was search engines did not bother spidering the same page twice, regardless of the query string. In other words, if the bot reached articles.php?id=1, it would not consider articles.php?id=2 to be a different page.
Obviously this had drastic implications and with the massive rise in dynamic content, such a belief is entirely infeasible on today\'s internet. I myself am not an SEO expert so I cannot tell you if alone there is any difference between articles.php?id=24 and articles/24 in regards to search engine rankings. However, you can benefit from having your keywords in the URL - compare articles.php?id=24 to articles/24-medieval-pie-throwing-competitions.html. Which would you prefer?
You cannot use mod_rewrite to place your keywords or article title in the URL - you can however alter your script to output URLs in that format and use mod_rewrite to do the rewrite back to the articles.php script.
NB: The keywords will now be in the backrefence $2, however we do not need them in the destination - we are just using the brackets to group the .* expression.
This rule will now accept articles/24[ANYTHING CAN GO HERE].html and rewrites to our script.
A "flag" goes at the end of the line and alters the behaviour of the rewrite. The common ones are listed and explained below:
- L - Last: when a RewriteRule has been applied and has the L flag, the rewrite engine will stop processing any remaining rules that may follow. See [L] Last Flag for more details on this behaviour.
- R - Redirect: a standard RewriteRule is transparent to the user. However you can use the R flag to send back a redirect status code (3xx) and the new location. This will cause the client browser to update the address bar with the destination. This flag takes the status code as a parameter allowing you to send a 301 header if desired. By default, R will send a 302 temporarily moved header, whereas R=301 will send a 301 permanently moved header. To maintain search engine rankings, you will want to use the latter. See moving pages (301).
- QSA - Query String Append: on dynamic pages, the query string contains the GET data (i.e. everything after the ?). If you do not include a query string in the destination, the original query string is automatically transferred. If you do specify a query string in the destination, this overwrites the original. If you wish to add to the query string and keep original data intact, you must use the QSA flag.
- NC - No case: makes the comparison case insensitive.
There are many more flags allowing you access to more advanced options - even setting cookies, environmental variables and sending mime-types. These are explained in the documentation and as they are not commonly used, I won\'t bore you with the details.
If you use a flag, it must be encapsulated in square brackets. Multiple flags are possible but must be separated by a comma - to use the NC and L on our pretty URLs for the articles script: