Faceted Navigation: SEO Best Practices for eCommerce Sites
What is faceted navigation?
Ever shopped online? Well you’ve come across faceted navigation, without perhaps realising what it was!
If you own an ecommerce site, you may well even be using faceted navigation to make choices easier for your customers and to organise your products in ways that will appeal to them. Faceted navigation is essentially allowing users on the site to choose filters/sorting preferences or “facets” to browse through all your products.
Faceted navigation also exists on publisher sites (sorting articles by date, author or topic for example), job sites, large sites (the larger the site, the more likely it is that some sort of faceted navigation will exist to make finding content easier for users).
For the purposes of this blog, we are focusing on ecommerce sites, but the solutions remain the same.
For every product that you have, there will be variations in size, colour, brand and more - or you will give people the option to sort them by price, popularity or reviews: these options are what constitutes faceted navigation. For example:
So in this example, I can filter by price, products type, brand etc. to be shown the results that I want to see. As a user, this is enjoyable. I like Raleigh products, I need some sort of pump and I don’t want to spend more than £25. Perfect.
Now, these are excellent for your customers: they can tailor a product category page to fit their specific requirements and tastes. It can, however, lead to a myriad of issues for search engines and therefore how well your website ranks in search engine results pages (SERPs), affecting your online performance - and ultimately how much money you make from your online store.
Below, we will go over what these issues are, and what solutions you can implement to make your website as search engine friendly as possible to keep traffic and sales coming to the website.
Why do faceted navigation options cause problems for SEO?
Now the issue lies within the fact that for every single one of these options, a new page is created. So, as an example, I’m on a website that sells clothing, among this, socks.
Essentially: if you have a category page for socks on an ecommerce website, it stands to reason that you’re going to have various filters depending on size, material, colour etc.
Suddenly, with my preference for red, wool socks in a size 4, https://www.example.com/socks becomes:
And the infinite variations depending on what order I add my preferences:
And so on.
Add to that the fact that I may also want to sort them by price, each of these becomes:
Ad infinitum. This could be true of any preference in size, material, colour, sorting preference etc - so your one URL for the category page “Socks”, suddenly becomes a list of hundreds/thousands/millions of URLs with the same content, meta data and products depending on how many filters and sorting options you have.
And then this happens with every single other category page on your website. Suddenly that moderately sized ecommerce website with approx 500 products and 30 categories becomes an absolute behemoth of a website with millions of URLs - most of which are essentially the same.
For instance, Asos have fixed this issue with regard to all of the filtering options etc. But there are so many ways to access the product category “Women’s socks” that even while blocking all of the colour/size etc variations (as we’ll show you later) they still have over 100 pages indexed for the specific women’s socks category:
Faceted navigation is an issue for SEO performance for 3 reasons:
1. Duplicate Content
We know that search engines, notably Google, absolutely hates duplicate content or what it perceives as duplicate content. This is because it believes that you are trying to spam the search engine results pages with as much of the same content as you possibly can in the hope that one of these will rank.
“Look, I have a million pages on socks, I must be great at selling socks!”. Google will treat this as a breach of trust and potentially penalise your pages and thus your site - although they are getting better at identifying these sorts of pages themselves and won’t necessarily penalise you, it's still best to avoid any confusion and avoid the potential for “punishment” - make it as easy as possible for crawlers to understand what’s going on on your website, and you’ll likely be “rewarded” for it.
2. Crawl Budget
The second issue is a matter of crawl budget. Google has vast servers and “bots” or “spiders” that go around the world wide web crawling pages. If they crawl them, understand them, they will then index them and store them in their servers (if they don’t understand them, they will not).
Then, with their specific algorithm which is constantly being improved, when a person looks for a specific query using a search engine, they will look through their index and serve the pages that they think are most relevant to that search query (hence the existence of SEO: making it as clear as day what your pages are to fit the algorithm and rank better).
However, this crawl budget per website is not unlimited - search engines will not crawl the same URLs over and over again, particularly if they are all duplicates. They can also sometimes get “trapped” from crawling all of these different permutations and give up altogether. So if you’re using up a lot of your crawl budget on the same, unimportant URLs, more important pages may be “sacrificed” along the way.
That’s why you want to prevent search engines from crawling these, so that they focus on your primary category pages, home page, important content on your blog, FAQs etc, rather than just wasting all of this budget on a myriad of similar sock pages. Socks probably aren’t even your most valuable product.
3. Unoptimised Links
We know that links are important for SEO. It’s one of the 4 key pillars of performance in search engine results pages, as links are essentially votes of confidence to tell search engines that your website is trustworthy.
“Look, this page says it’s selling socks, these 19 credible publications that talk about the best socks to look out for in 2021 are pointing to this web page, it must be true”.
However, if you have a million pages on socks, these links could very well be pointing anywhere (as in, on any one of the sock page variations), rather than your primary category page: https://www.example.com/socks, thereby diluting your digital PR and link building efforts on a number of pages that aren’t actually important to you.
So, now we know why faceted navigation can cause issues for search engines and prevent your website from performing as well as it should in SERPs: it looks like you’re trying to duplicate content, you’re wasting crawl budget and you’re diluting all link building efforts.
How do I know if faceted navigation is causing issues on my site?
You can start by looking in Google Search Console. In the index coverage report, you can see what URLs are indexed, which are presenting issues etc:
So here you will see if these pages are being indexed.
You can also do what I did above and run a site search common in the SERPs, so for a given category such as https://www.example.com/socks you would type site:https://www.example.com/socks into Google and it would show you how many URLs containing that are indexed, if there are thousands of them and whether or not the permutations are in there.
Finally, you can do what is known as log file analysis. Essentially, you would need to ask for server logs to your site.
This basically tells you everything that has been accessing your site - you will literally be able to see what pages search engines are crawling, how often etc. You may come to find that you are surprised by these, as you may see that some thoroughly unimportant pages (for example some of these faceted nav pages) are being crawled in favour of much more important content on your site.
Here is how to do that, how to diagnose and then fix issues.
So essentially, if you see that a lot of crawl budget is being wasted by search engines by going to unnecessary pages through GSC and log file analysis, and if you can see in Google Analytics that there’s very little traffic going there and no sales, you’ll want to implement the fixes below.
If there is however a lot of traffic and a decent number of sales, you may want to consider the below fixes on the “faceted navigation” version of this page, but definitely create a landing page on this topic (like the example of the little black dress) that people can find via search engines.
So how do I fix faceted navigation issues?
There are multiple ways to deal with this, and most can be used in conjunction with each other in order to provide an absolutely foolproof solution to make it categorically clear that although you know that these other pages exist, this is the URL that you want Google (or other search engines) to index.
1. Canonical Tags
To avoid looking as though you want to spam the search engines with a myriad of pages on the same topic, you want to tell them “hey Google, sorry, this is the page I want you to rank: https://www.example.com/socks and all of the rest is fodder depending on other people’s preferences. I had to do it for them to find what they want, but please ignore all of those other pages, this is my number one page on socks”.
=> Enter the canonical tag: this is exactly what it tells search engines.
You’ll want to tell your developers to do this for both product pages (as there will presumably be multiple variants of the same product) and category pages if there are filters and sorting options available.
On ecommerce sites this is applicable to:
- Categories: if there are multiple pathways to category pages, such as Women’s clothing > accessories > socks and another variation such as Accessories > women’s accessories > socks which both create URLs, you’ll want to add a canonical to the main page (or simply make sure that they use the same URL within the menu rather than creating multiple URLs, but this can be difficult depending on the content management system being used). By this I mean that just because there are multiple entry points to a page in the menu, you don’t have to have a different URL for each - you can just use the same one each time.
- Categories and filtering/sorting: like mentioned previously, if there are multiple filtering or sorting options that change the URL then you’ll want to canonicalise to the main category page URL.
- Product variations (in different colours or sizes etc): if someone’s on your sock page and chooses the sock in blue and a size 4, make sure that this doesn’t create a new url (and if it does, canonicalise it - unless it’s a very, very specific circumstance in which there's actually quite a lot of demand for this product in that particular variation. In that case create its own landing page, but that’s pretty rare). For example, little black dresses - a very well known “trope” that every woman “needs” to own a little black dress - many fashion retailers will have a category page on their site just for this. So on your site, if for example you sold waterproof socks, I’d add this as a subcategory, as this is something that people actively search for.
So for example, on this bike pumps page, which I have filtered by products under £17 and then sorted the products by ascending price:
So this is to say, “yes this URL exists for users, but please do not index this. The bike pump category page is the sole priority here Google, don’t worry about wasting your time here”.
Your developers will know how to do this, and some content management systems will do this automatically, it just depends on your specific circumstances - but it is entirely possible on Magento, Shopify, BigCommerce, Drupal, WooCommerce and all other popular ecommerce content management systems. If you are building / have built a website using a bespoke CMS, make sure to have a conversation around this topic with your developers to implement a fix that works for your store.
Regardless of whether you implement the following potential fixes, the canonical tag should be your number 1 priority and go-to method of solving this issue - the following are just secondary ways of trying to solve the issue. This solution prevents pages from being indexed and also helps with the issue of link equity dilution - it doesn’t necessarily mean that these pages won’t be crawled or waste crawl budget though.
2. Robots.txt & no index and no follow tags
Previously, you’d have been told to add specific parameters that you don’t want indexed into your robots.txt file. This is a file on your website that essentially acts as a guideline to search engines to tell them what to look at and not look at on your website (as it’s one of the first things they’ll look at when they go onto your website).
This should contain your sitemap.xml (a map of the important pages on your site which should absolutely not include your faceted navigation pages). It can also contain directives such as:
Which basically means: User agent (crawlers such as search engines) * (which means all search engines). Disallow (don’t crawl) the specific folders mentioned.
Now this used to work for Google - it’s what is known as a “polite crawler” (so other search engines, such as Yandex or Baidu would just ignore all of this and crawl those subfolders anyway). Google however would abide by it - but that is no longer the case, all the time.
However, I’d still recommend adding the specific parameters that you’d like to block in here, as it’s just another element in your toolbox and another signal sent to prevent search engines from crawling these pages in the first place. So for example, you could add the parameter *?material=* or whatever your materials parameter looks like (plus all the others). Here’s best practice for robots.txt so you know what to add.
So people now use “noindex tags” on pages that they do not want to be indexed. This is essentially just a small piece of HTML code placed on these pages which informs search engine bots not to index these pages (hence: noindex). The reason that this is not the number one solution is that it does not stop search engines from crawling them and thus wasting crawl budget, it just tells them not to index them. It also doesn’t really help with the dilution of link equity - but again, it is yet another signal to search engines to tell them not to index these pages.
Rel= nofollow tags
Make all unnecessary URL links rel="nofollow ." This option minimises the crawler's discovery of unnecessary URLs and therefore reduces the potential for wasted crawl budget.
3. Blocking URL parameters in Google Search Console
Google Search Console has a handy legacy tool to deal with these sorts of parameters too - you can effectively tell Google not to crawl them. You can find this here:
From here, you add the parameters that you’d like to warn Google about, such as:
Other bits that can help
- If using a bespoke CMS, or even some popular CMS, ask you developers if there is a way of filtering and sorting products that simply doesn’t create a new URL. This is the simplest fix, but typically requires forethought prior to the site actually being built and can be quite difficult to implement post go live.
So now we know what faceted navigation is, why it’s great for users but potentially bad for SEO, how to see if it’s affecting your website’s performance and how to solve any issues. I hope this has been helpful!