WordPress, Nginx, virtual robots.txt and 404

While working on my new site screenart.media I encountered a strange problem. I was wondering about the Google Search Console complaining that the robots.txt is throwing a 404. But I could see the correct content in the browser! The following was kind of unexpected.

While setting up Nginx I followed various guides on the web – but was primarily inspired by https://codex.wordpress.org/Nginx, which suggested this configuration:

location = /robots.txt {
    allow all;
    log_not_found off;
    access_log off;
}

Looks nice, right? Everyone can access it and no spamming the logs.

Wrong.

The problem is based on my situation in that I was having PHP generate the robots.txt on demand. I’ve installed a very nice plugin called “XML Sitemap & Google News feeds” which does all the heavy lifting for me. This also means that there is no physical robots.txt file.

Now, when accessing the robots.txt thru Nginx, the config rule is looking for that specific location. It can’t find the file and sets the HTTP status code 404, but only one moment later the next fallback rule kicks in, generating the content. This resulted in me being able to see the correct content, but also getting a 404. Finally Google Bot stopped at seeing the 404 and did not even try to read the content. Damn.

The solution was quite easy – at least when you see it. Thanks to this article, I could finally end my headaches. It suggests this snippet for my situation:

location = /robots.txt {
    try_files $uri $uri/ /index.php?$args;
    access_log off;
    log_not_found off;
}

Yeah, you just have to add a try_files directive, so that Nginx will at least try to ask PHP for a solution before giving up, throwing the towel into the ring.


Also published on Medium.