Life of a Front-end WordPress Request

Have you ever asked yourself what happens when you hit a URL on a WordPress website?

Here’s the very simplified version of the story:

  1. WordPress environment is loaded (core, plugins, theme)
  2. WordPress looks at URL and builds some query arguments based on it
  3. Obtained query arguments are used to run a \WP_Query (known as "main query")
  4. Based on the "type" of the query (eg. "single", "archive"…), WordPress chooses a template file among the ones available in theme
  5. Template file is loaded to display page

WP request flow 1

There are thousands of lines of code executed in this process, but in this article I’ll concentrate on what happens in points 2 and 3 in the list above.

The "main" WordPress function

The process in which WordPress sets main query arguments according to current URL is the very core of WordPress operation. The class responsible to perform these tasks is named WP is proof of that, and the method where everything happens is named main().

That method is quite simple, below you can see its entire code:

public function main($query_args = '') {
    $this->init();
    $this->parse_request($query_args);
    $this->send_headers();
    $this->query_posts();
    $this->handle_404();
    $this->register_globals();

    do_action_ref_array('wp', array(&$this));
}

These 9 lines of code are powering a quarter of the web.

I already mentioned that this article will focus on how WordPress goes from a URL to a WP_Query. This happens in one of the methods called inside main method: WP::parse_request().

From a URL to a query

Let’s add some details to the "request flow" we’ve already seen, focusing on what we’ll discuss:

  1. WordPress environment is loaded (core, plugins, theme)
  2. WordPress looks at URL and builds some query arguments based on it
  • Rewrite rules are loaded from the database
  • Current URL is compared to each of the rules to find the one that matches
  • The query variables stored with the matched rule are merged with any variable in the URL query string
  • Any variable that is not a valid WP_Query argument is stripped out
  • Resulting variables are stored in WP::$query_args property and will be used for the main query
  1. Obtained query arguments are used to run a \WP_Query (known as "main query")
  2. Based on the "type" of the query (eg. "single", "archive"…), WordPress chooses a template file among the ones available in theme
  3. Template file is loaded to display page

WP request flow 2

Please note that to make things simpler I’ve assumed pretty permalinks are enabled.

In short, what’s in between URL and query args are rewrite rules.

What are rewrite rules?

A rewrite rule is something that maps an ugly URL to a pretty URL.

Let’s take step back.

To display information, WordPress needs to know what information you want to show — the way you tell WordPress is URL.

If you visit a URL like http://example.com/?pagename=sample-page, you are telling WordPress you want to see the page "Sample Page".

That URL is an ugly URL. The pretty version of it is: http://example.com/sample-page/.

A rewrite rule tells WordPress that a ugly URL should make to a pretty URL. The way rewrite rules do that is via regular expressions.

To better understand this process, we can see the function that you have to use in WordPress to add rewrite rules: add_rewrite_rule().

This is example of it in use:

add_rewrite_rule('^page/([^/]+)/?', 'index.php?pagename=$matches[1]');

The first argument represents the pretty URL and the second argument is the related ugly URL.

Where do rewrite rules come from?

Even when you don’t add any rules using add_rewrite_rule, WordPress still resolves pretty permalinks.

In fact, a vanilla WordPress installation comes with a set of rewrite rules already set.

They are the rules for:

  • Posts
  • Pages
  • Date and author archives
  • Default taxonomies (categories and tags)

Some of them can be modified from the WordPress backend in the Settings > Permalinks page.

When you register a custom post type or a custom taxonomy, you can set the rewrite rules for them (using the rewrite option), and if you don’t, WordPress will set some defaults for you.

Any rewrite rule you register via the rewrite API (of which add_rewrite_rule is part of) adds to these default rules.

Two traps of rewrite rules

There are two important things that are worth noting:

  1. Rewrite rules are stored in database — any change in code that acts on rules will not work until rules are stored (this process is known as "flushing rewrite rules")
  2. We saw how rewrite rules maps pretty URLs to ugly URLS and not URLs to query arguments

The consequences of the first point are pretty (and sadly) known. A piece of code that registers post types or taxonomies, or uses the rewrite API, needs to flush the rewrite rules to work.

For a plugin (or theme), flushing may be done calling flush_rewrite_rules on activation and again on deactivation. This is because the process of flushing is expensive, and can’t be done on every request.

Another way to flush rewrite rules is by visiting the "Settings > Permalinks" page in the WordPress admin, which isn’t possible via code.

This is annoying, but what the second point in the list above implies is probably worse and may not immediately clear. Only "simple" query variables can be used in rewrite rules. By "simple", I mean scalar variables (strings, numbers, booleans), but not arrays, for example.

A challenging example

Let’s assume you want to implement a URL like: http://example.com/2016-jan-02-afternoon/ where "afternoon" might either be "morning", "evening" or "night".

Let’s also assume that you want this URL to pull a list of posts published in the day present in the URL, but only in the specific part of the day at the end of the URL.

If you have some experience with WordPress development, you know that a query like that is a job for a date query.

The problem is that date query has to be set with an array, and as I said above, that’s not possible directly with a rewrite rule.

How can we implement this feature? Is it even possible? Yes, it is possible, but it requires some work. We need to:

  1. Register a rewrite rule that sets two custom query variables
  2. Ensure these variables are recognized as valid and not stripped out during request parsing
  3. Hook into pre_get_post with a callback that sets the proper date query when those query variables are present in the request
  4. Be sure to flush rewrite rules after having put the code in place

It’s quite a lot of work, and the code below is evident at showing how the limitations of rewrite rules can make your life hard:

// Let's add rules on init
add_action('init', function() {

    $custom_date_format = '([0-9]{4}-[a-z]{3}-[0-9]{2})';
    $timespan_format = '(morning|afternoon|evening|night)';

    // This ensures that custom vars are recognized as valid query vars
    add_rewrite_tag('%customdate%', $custom_date_format);
    add_rewrite_tag('%timespan%', $timespan);

    // This adds the rule
    add_rewrite_rule(
        "^{$custom_date_format}-{$timespan_format}/?",
        'index.php?customdate=$matches[1]&timespan=$matches[2]',
        'top'
    );
});

// Now alter the main query if custom vars are set
add_action('pre_get_posts', function(WP_Query $query) {

    // We want to act only on frontend and only main query
    if (is_admin() || !$query->is_main_query()) {
          return;
    }

    // A map from the timespan string to actual hours array
    $hours = [
        'morning'   => range(6, 11),
        'afternoon' => range(12, 17),
        'evening'   => range(18, 23),
        'night'     => range(0, 5)
    ];

    // Get the custom vars, if available
    $customdate = $query->get('customdate');
    $timespan = $query->get('timespan');

    // If the vars are not set, this is not a query we're interested in
    if (!$customdate || !$timespan) {
        return;
    }

    // Get UNIX timestamp from the query var
    $timestamp = strtotime($customdate);

    // Do nothing if have the wrong values
    if (!$timestamp || !isset($hours[$timespan])) {
        return;
    }

    // Reset query variables, because `WP_Query` does nothing with
    // 'customdate' or 'timespan', so it's better remove them
    $query->init();

    // Set date query based on custom vars
    $query->set('date_query', [
        [
            'year'  => date('Y', $timestamp),
            'month' => date('m', $timestamp),
            'day'   => date('d', $timestamp)
        ],
        [
            'hour'    => $hours[$timespan],
            'compare' => 'IN'
        ],
        'relation' => 'AND'
    ]);
});

Even if you don’t understand what every line does, you can surely understand that this is a lot of code. Surely more code than one may expect for such task.

And still we have to flush rewrite rules.

Imagine a better world

Our example implies a bit of logic per se, but WordPress makes things difficult. To resolve URLs to "data" relevant for the application, other frameworks and CMSes implement what is know as a routing system.

Such a system is based on routes, something that maps URLs to an action or controller, unlike URLs to other URLs like WordPress does.

If we had a routing system in WordPress, it should map URLs to query arguments arrays. Something like:

add_frontend_route(
    '^([0-9]{4}-[a-z]{3}-[0-9]{2})-(morning|afternoon|evening|night)$',
    function(array $matches) {
        $hours = [
            'morning'   => range(6, 11),
            'afternoon' => range(12, 17),
            'evening'   => range(18, 23),
            'night'     => range(0, 5),
        ];

        $timestamp = strtotime($matches[1]);

        return [
            'date_query' => [
                [
                    'year'    => date('Y', $timestamp),
                    'month'   => date('m', $timestamp),
                    'day'     => date('d', $timestamp)
                ],
                [
                    'hour'    => $hours[$matches[2]],
                    'compare' => 'IN'
                ],
                'relation' => 'AND'
            ]
        ];
    }
);

The fictional code above maps a URL to some query arguments, loading the route in memory. And that’s it, without the need to flush rewrite rules.

…too bad that the add_frontend_route function does not exist.

Is routing something that’s possible in WordPress?

Yes, it is. To understand the how, we need a deeper understanding of what happens when WordPress parses the request.

One important thing to know about is the filter hook "do_parse_request" .

It’s fired on top of the WP::parse_request() method, and if the callbacks hooked there return a false value then the request is not parsed at all.

Our visual overview becomes something like:

WP request flow 3

It means that when WordPress is instructed to skip a request parsing via the "do parse_request" filter, the main WP_Query is triggered anyway, and everything continues to work as usual — just no query variable is parsed from the URL.

You may ask: "If no query variable is parsed from the URL, what are the variables that’re used?"

The answer is that the variables used are the one stored in the WP::$query_vars object property.

That object property is initially set to an empty array, so when the "do_parse_request" filter returns false, the main query is ran with an empty array as an argument, which results in showing the home page of the website.

Let’s experiment

If we always return false on "do_parse_request" filter with:

add_filter('do_parse_request', '__return_false');

…WordPress will always show the home page, no matter what URL of the website we visit.

A more interesting experiment is setting the WP::$query_vars variable to something arbitrary and doing a return false on the filter.

To set query vars we could use global $wp, which is the variable that holds the WP class instance, but it’s not needed because the "do_parse_request" filter passes a second argument:

add_filter('do_parse_request', function($do_parse, $wp) {
    // Setup some query vars...
    $wp->query_vars = ['post_type' => 'page'];

    // ...and don't let WordPress parse the request
    return false;
}, 10, 2);

Now, no matter the URL of the website we visit, we will always see an archive of our pages.

The experiments above prove that to set main query arguments to something arbitrary, we just need 2 things:

  • Store the query variables we want to use in the WP::$query_vars object property
  • Return false on the "do_parse_request" filter

That’s cool, however we are ignoring the URL, and a real routing system can’t be decoupled from the current URL.

If we want to implement such a thing, we first need to retrieve the current URL, and only after that, we can set some routes to compare to the URL to.

Retrieving the current URL in WordPress

You may know that in PHP code to retrieve current URL we usually look in the $_SERVER['REQUEST_URI'] variable.

We can surely do that, but in that way we couple our code to a global variable that is hard to reproduce in a command line context, such as unit tests.

Surprisingly, WordPress doesn’t have a function to retrieve the current URL, but we can use the add_query_arg() function to get it.

That function is normally used to add query string variables to a URL, but if no URL is passed to it then the current URL is used.

$current_url = esc_url_raw(add_query_arg([]));

Note: I escaped the URL because recently it was discovered that to use add_query_arg() without passing any URL may be a security risk. The risk is averted by escaping the obtained URL.

Anyway, if we visit a URL such as http://example.com/foo/bar/, the function above returns /foo/bar/.

This is quite fine for our purposes, however we also need to take care of the case WordPress is installed in a subfolder.

Let’s assume that WordPress is installed in http://example.com/blog/ and we visit the URL http://example.com/blog/sample-page/. What we want to use in our routing system is /sample-page/, but add_query_arg() returns /blog/sample-page/.

Now we need to strip any URL path that is present in the home URL.

function get_current_url() {
    // Get current URL path, stripping out slashes on boundaries
    $current_url = trim(esc_url_raw(add_query_arg([])), '/');
    // Get the path of the home URL, stripping out slashes on boundaries
    $home_path = trim(parse_url(home_url(), PHP_URL_PATH), '/');
    // If a URL part exists, and the current URL part starts with it...
    if ($home_path && strpos($current_url, $home_path) === 0) {
        // ... just remove the home URL path form the current URL path
        $current_url = trim(substr($current_url, strlen($home_path)), '/');
    }

    return $current_url;
}

Towards a WordPress routing system

Now that we know how to get the current URL, what we need is a way to add some routes and compare them to the URL.

A clever enough way to add routes could be to provide a hook to register our routes. This is a standard way to do things in WordPress, so anyone interacting with our code will not be confused.

The code may look something like this:

add_action('do_parse_request', function($do_parse, $wp) {
    $routes = []; // Let's initialize an empty array of routes

    $current_url = get_current_url(); // This is the function we wrote above

    // Users can add routes using the 'routing_add_routes' hook
    $routes = apply_filters('routing_add_routes', $routes, $current_url);

    // If there are no routes, just let WordPress do its work...
    if (empty( $routes) || !is_array($routes)) {
        return $do_parse;
    }

    // Get query vars (we will write the parse_routes() function soon)
    $query_vars = parse_routes($routes, $current_url);

    // If parse_routes() returns an array of query arguments as we expect...
    if (is_array($query_vars)) {
        // ...we set query vars in WP object
        $wp->query_vars = $query_vars;

        // Fire an action, this may be useful later to know when a route matched
        do_action('routing_matched_vars', $query_vars);

        // Finally return false to stop WordPress from parsing the request
        return false;
    }

    // In other cases, we just let WordPress do its work...
    return $do_parse;
}, 30, 2);

The code above, and the comments in it, should make clear what we are doing.

We add a callback to the "do_parse_request" filter, and inside of it we trigger a filter that lets users add some routes.

After that, if we have some routes, we parse them. Parse means that we compare routes to the current URL to see which one matches.

If a match is found, the matching function (that we are about to write) returns an array of variables that we can store in WP::$query_vars and return false, just like we did when we did earlier.

Parsing routes

We can probably figure out different ways to compare a URL to some routes, but the most flexible way is to use regular expressions.

This is a syntax that is not something we need to invent, and it is very powerful to enforce rules and obtain variables from the mathing. This is also the method that WordPress uses for rewrite rules, so users will be familiar with it; but in contrast to WordPress, we will map URLs to query variables and not URLs to another URL.

Let’s write the code:

function parse_routes($routes, $url) {
    // Strips any query vars in the URL because we only need the path
    $urlParts = explode('?', $url, 2);
    $urlPath = trim($urlParts[0], '/');

    // If the URLs have query string vars, eg. ?preview=1
    // we store them, for later usage
    $urlVars = [];
    isset($urlParts[1])) and parse_str($urlParts[1], $urlVars);

    // Parse the routes
    foreach($routes as $pattern => $callback) {
        // If we found a match...
        if (preg_match('~' . trim($pattern, '/') . '~', $urlPath, $matches)) {
            // ...we call callback stored in the route to obtain query vars
            $routeVars = $callback( $matches );

            // If callback returns an array as we expect...
            if (is_array($routeVars)) {
                // ...we return an array obtained merging the vars returned by route
                // with any query string var present in the URL
                return array_merge($routeVars, $urlVars);
            }
        }
    }
}

The function above uses preg_match to match the route pattern with the URL path.

In case of a match, the $matches array from preg_match is then passed to a callback stored in the route to obtain an array of query arguments.

Finally, the returned arguments are merged with any query variable present in the URL.

The function assumes that $routes is an array where each item key is the regular expression pattern, and the item value is a callback that receives matches from preg_match and returns an array of query arguments.

The missing piece is something that allows to add such routes.

Adding routes

Earlier we imagined a better world where WordPress had a add_frontend_route function to add routes.

Now we got the chance to write that function:

function add_frontend_route($pattern, callable $callback) {
    add_filter('routing_add_routes', function($routes) use($pattern, $callback) {
        $routes[$pattern] = $callback;

        return $routes;
    });
}

It’s as simple as that.

In the code we wrote to implement the routing system, we were firing the filter "routing_add_routes" to allow some code to add routes.

The function we just wrote uses that hook to add a route to the routes array, setting the route pattern as the key and the route callback as the value. That’s exactly what the parse_routes function we wrote is expecting.

Final touches

Our routing system is overriding the WordPress way to handle URLs. There are cases when this is problematic.

An example is the WordPress dashboard where our system might break things. We can prevent this by not running our system when is_admin() is true.

However, in WordPress, is_admin() is true also for AJAX requests, and we probably want to allow our system to run on AJAX requests.

The conditional code to allow the system to run would be something like:

$allowed = !is_admin() || (defined('DOING_AJAX') && DOING_AJAX));

Another thing that may conflict with our system is the WordPress canonical redirect. Considering that WordPress does not recognize URLs matched via our system (because we are not registering them using WordPress core features), it may redirect them using the redirect_canonical function.

That function is hooked in the "template_redirect" hook, so a solution would be remove that hook when a route is matched.

We are already firing an action, "routing_matched_vars" when a route matches. We can use that hook to remove redirect_canonical from "template_redirect" action:

add_action('routing_matched_vars', function() {
    remove_action('template_redirect', 'redirect_canonical');
});

Putting the pieces together

At this point we have all the pieces to create a routing system in WordPress. We can put them together in a plugin to better reuse it in different websites.

The code below is without comments to save space, but all of the lines of code below were already discussed in this article. It’s also available as a Gist.

<?php
/*
 * Plugin Name: Routing WP
 * Author: Giuseppe Mazzapica
 * Description: A routing system for WordPress
 */
namespace RoutingWP;

function get_current_url() {
    $current_url = trim(esc_url_raw(add_query_arg([])), '/');
    $home_path = trim(parse_url(home_url(), PHP_URL_PATH), '/');
    if ($home_path && strpos($current_url, $home_path) === 0) {
        $current_url = trim(substr($current_url, strlen($home_path)), '/');
    }
    return $current_url;
}

function add_frontend_route($pattern, callable $callback) {
    add_filter('routing_add_routes', function($routes) use($pattern, $callback) {
        $routes[$pattern] = $callback;
        return $routes;
    });
}

$allowed = !is_admin() || (defined('DOING_AJAX') && DOING_AJAX);

$allowed and add_action('do_parse_request', function($do_parse, $wp) {
    $routes = [];
    $current_url = get_current_url();
    $routes = apply_filters('routing_add_routes', $routes, $current_url);
    if (empty($routes) || !is_array($routes) ) {
        return $do_parse;
    }
    $urlParts = explode('?', $current_url, 2);
    $urlPath = trim($urlParts[0], '/');
    $urlVars = [];
    if (isset($urlParts[1])) {
        parse_str($urlParts[1], $urlVars);
    }
    $query_vars = null;
    foreach($routes as $pattern => $callback) {
        if (preg_match('~' . trim($pattern, '/') . '~', $urlPath, $matches)) {
            $routeVars = $callback($matches);
            if (is_array($routeVars)) {
                $query_vars = array_merge($routeVars, $urlVars);
                break;
            }
        }
    }
    if (is_array($query_vars)) {
        $wp->query_vars = $query_vars;
        do_action('routing_matched_vars', $query_vars);
        return false;
    }
    return $do_parse;
}, 30, 2);

$allowed and add_action('routing_matched_vars', function() {
    remove_action('template_redirect', 'redirect_canonical');
}, 30);

unset($allowed);

This is the whole plugin, and it is everything we need to build our routing system.

How to use the routing plugin

After the plugin is active, using it is quite easy.

The only function you need to interact with is add_frontend_route. You need to provide a first argument, the regular expression pattern that will match URLs, and a second argument that’s a callback that receives the matches array and has to return an array of query vars.

For example, add the following route:

RoutingWP\add_frontend_route('^([^/]+)/latest$', function($matches) {
    return [
        'post_type'      => $matches[1],
        'posts_per_page' => 5,
        'orderby'        => 'date',
        'order'          => 'desc'
    ];
});

Visiting the URL example.com/post/latest will show the latest 5 posts, just like it will show latest 5 products if you visit example.com/product/latest

Do you remember the route we wrote under the "imagine a better world" section earlier?

You can use it, and it will work (just remember to add the namespace). It seems that the better world is here.

Room for improving

We can learn things about URL routing by looking at frameworks and CMSes out there that are already using it.

For example, most routing systems differentiate routes not only based on URL path, but also on HTTP method, because a $_GET request is usually different than a $_POST request, even when sent to the same URL.

We can also use hosts to differentiate routes, so that api.example.com/foo/bar is considered different from www.example.com/foo/bar.

Moreover, our plugin calls preg_match for every route we add. Regular expression functions are pretty slow and we can improve the performance of our plugin if we use a library like FastRoute to match our routes to the url.

Moreover, we can improve our plugin returning WP_Error objects or throwing exception when unexpected things happen.

…or maybe just use Cortex

Recently I updated a library of mine, named Cortex.

It implements a routing system using same concepts I exposed in this article, but the actual code is quite different.

In its latest version it uses the FastRoute library and has some additional features: differentiate routes by HTTP method and host, route groups, redirect routes, and more.

Summary

In this article we saw how WordPress creates the main query arguments starting from current URL. We discovered that rewrite rules are what WP uses to build query arguments according to the current URL, and saw two annoying issues that affect rewrite rules.

After that, taking inspiration from other software, we imagined a routing system for WordPress that could solve the rewrite rules issues. Finally, step by step, we implemented that system we imagined, giving it the shape of a plugin.

Read the discussion on our Discourse

Get our latest updates & occasional tips on building better WordPress sites

Follow @rootswp on Twitter