“#514: Real 404 response is missed” – Forum Thread – Discuss

- dushakov
- 26 Jan 11, 12:52 am
- Comment #1

This is discussion is addressed to michael-e and Allen mostly, cos i didn’t find other way to contact them. Sorry, but seems like there no way to update already closed issue, so i decided to open a discussion against it.

To confirm the issue i can provide the screenshots of Fetch as googleboot tool with results for the same page with and w/o R=301. However, will it be enough? Cos, obviously, i can’t provide you with access to my google account :)

- nickdunn
- 26 Jan 11, 1:28 am
- Comment #2

Original issue: #514 Real 404 response is missed

A relevant discussion from some time ago: Trailing slash rule in .htaccess

It’s usually best to raise something like this in the forum first, so we can gauge whether or not it’s actually a bug, and then open an issue referring back to the discussion.

I can confirm that using Fetch as Googlebot (or CURL, or my browser), that if I make a request to a URL that does not exist, but omit the trailing slash:

http://www.bekonscot.com/zombies

Then a 301 Moved Permanently response is given, and the updated URL provided as the new URL:

http://www.bekonscot.co.uk/zombies/

However if I view the bogus URL with a trailing slash:

http://www.bekonscot.co.uk/zombies/

Then a 404 Not Found response is given.

A 301 is served when I view both a bogus and a legitimate URL without a trailing slash. Both of these are expected results in my opinion. Google will interpret the 301 response as a permanent move and will index the new URL.

Don’t forget that the Fetch as Googlebot probably just does the first request; it doesn’t follow the 301 to request the correct URL. That is not to say that Google itself does not, but the experimental Fetch as Googlebot “Labs” tool does not. Your browser on the other hand, and CURL (with CURLOPT_FOLLOWLOCATION set to true) will follow the 301 redirect, and you’ll end up with the eventual response of either 200 or 404.

Is your concern that Google itself is indexing URLs without trailing slashes because it is not correctly interpreting the 301 response, and thereby creating duplicate URLs for the same content?

- dushakov
- 26 Jan 11, 2:17 am
- Comment #3

Ouch, sorry for that - somehow i missed that discussion before filling the bug.

Regarding Fetch as Googlebot, i found it by link in Google help about 404 errors:

You can use Fetch as Googlebot (or other tools available on the web) to verify whether the URL is actually returning the correct code.

So, this makes me think that if Google recommends this tool for such checks, then it’s fair to consider that this is exactly the same as Google itself does.

Though i can’t confirm URLs duplicating for the same content per the same Google help it’s possible:

Returning a code other than 404 or 410 for a non-existent page (or redirecting users to another page, such as the homepage, instead of returning a 404) can be problematic. Firstly, it tells search engines that there’s a real page at that URL. As a result, that URL may be crawled and its content indexed.

Exactly my concerns were about crawlers errors in google webmaster tool against pages that were removed quite a long time ago, which caused me to think that Google is still trying to index them.

- nickdunn
- 26 Jan 11, 2:29 am
- Comment #4

No worries, the forum isn’t easy to search.

You can use Fetch as Googlebot (or other tools available on the web) to verify whether the URL is actually returning the correct code.

It depends what you consider being “the correct code”. In this instance, technically speaking, the 301 response header is the correct code.

Google (and the others) are pretty good at detecting duplicate URLs and establishing the canonical version (trailing slashes, with/without www etc.). I had a quick look at the Wordpress and Drupal communities and various people have found the lack of a 301 redirect to be harmful:

Drupal SEO: How Duplicate Content Hurts Drupal Sites (written in 2007 so may no longer be accurate)
In the past there have been Drupal plugins to tackle this issue (Global Redirect)

So I’m inclined to say that the 301 method is the correct choice since we’ve not seen any harmful effects in search engines themselves, yet (Drupal and Joomla) site owners without the 301 header have had problems.

- michael-e
- 26 Jan 11, 3:21 am
- Comment #5

Thank you, Nick, for explaining our point so precisely and in such detail. You are my hero!

- dushakov
- 26 Jan 11, 3:44 am
- Comment #6

I got your point. It’s quite reasonable, but now i’m just wondered how to explain the situation with google crawler errors (non-existing pages it still try to index)? i.e. can we consider them as errors caused by 301 redirection or not? Any opinions are welcome.

Anyway time will give us with answer: i’ll keep an eye on crawler’s status this month to ensure no errors occurred since i removed R=301 from htaccess.

Symphony.

#514: Real 404 response is missed

Search

Server Requirements

Symphony.

#514: Real 404 response is missed

Search

You are looking at page 1 of 1

Server Requirements

Sign in