Types of 404 errors
A 404 status header lets search engines know that a page does not exists. 404’s can prove useful when you want to de-index / get rid of a page from search engines like Google. While learning how to use 404 headers effectively is a must, especially on dynamically generated sites, understanding the concept of soft 404’s is as important if not more so. Soft 404’s can severely damage your sites search rankings but you can use Google webmaster tools to pick up on soft 404’s and sort them out before they cause your site real damage.
When a browser or search engine requests a page the usual response is what’s called a 200 response header. This header lets the requesting body know that the page is fine, the page exists and will be returned. As stated above a 404 header tells a browser or search engine that the page no longer exists (which can be for any number of reasons). A soft 404 however can be returned when 404 headers are not properly configured and what they do is tell search engines and users the page does not exists but still returns the page. It’s like returning both a 200 response code (the page is here and all is good) and a 404 header (the page no longer exists). As you can probably quickly gather this is incredibly confusing, how can a page exist yet no longer exists?
Confused? … So are search engines
While search engines use incredibly complicated programs and algorithms to build their index of pages, they still use very simple systems in places including response codes for web pages. If you return a soft 404 on a page you are actually returning a very serious error that search engines have no choice but to penalise and eventually drop the pages rank. If you don’t sort out the problem pretty quick what will actually happen is your page will slowly drop all the way to the bottom of the Google index and eventually drop out at which point it can take a long time to recover. In terms of priority you should look at sorting out soft 404’s as soon as you see any listed, don’t wait, find what is causing them and fix them as soon as possible.
Soft 404’s – the real error
Many web designers and web design agencies will come to you with lines like “we can help you fix 404 errors which harm your search rankings”… rubbish! 404’s don’t harm your search rankings, soft 404’s do however and they should be treated as completely separate entities. If you turn around to them and ask them “what is the difference between a 404 and a soft 404” and they don’t give you a clear answer, well they obviously are exaggerating your sites problems or worse still they don’t actually know the difference. While you can have HTML syntax errors and other small errors within your page, soft 404’s are probably the most serious error you can have on a page.
The cause of soft 404’s
The causes of soft 404’s are varied and many. Paginated catalogue pages can cause them very easily, as can URL parameters, so can issues with whitespace within your PHP code. Returning a 404 header but not killing a script using die() or similar can also be a major cause. You would be surprised at the simple things that can cause soft 404’s but long as you regularly check the errors listed within Google Webmaster Tools you should be able to figure out the cause very quickly and modify your coding appropriately. A big cause of soft 404 errors can be when you include a file using a PHP include or similar yet you do not return a 404 response code. Having a 404 page that says something like "page not found" and just including it within a page does not produce a 404 message to search engines. It all comes down to just returning a correct 404 response header. You can do this pretty easily and dynamical using PHP and basic logic within your scripts such as if this URL path /file does not exists then execute the 404 header.