To find out why an URL isn't indexable, please take a look at the indexability report. There are a number of reasons why an URL might not be indexable, so make sure to check the column "Indexability" in the report to find out why your URLs may not be indexable. To see further information on these reasons, please see further down in this article.
1. Disallow via robots.txt
The robots.txt file can be used to tell bots which parts of your website they should or shouldn’t visit. The robots.txt is only a guideline but reputable bots will follow these directives. If a URL or directory is set to "disallow" in the robots.txt file search engine bots won’t crawl them when they visit your website. Please note that a URL might still be indexed even though it’s set to "disallow", though, if external links are pointing to it. The robots.txt should follow certain rules. For more information, please check the Google specifications.
2. The canonical tag refers to another page
To avoid duplicate content issues Google introduced the canonical tag. With this tag, website owners can define which pages are canonical and which aren’t. Only canonical URLs will be indexed, so if your page should be indexable but isn’t, make sure it has a self-referential canonical tag. If the canonical tag points to a different URL only that URL will be indexed. Common pitfalls with the canonical tag are missing trailing slashes, protocol mix-ups (http rather than https) and slip-ups with the subdomain (URL with "www" or without). Make sure your canonical URL matches exactly with the URL you want to see indexed. You can check your URL in the single page analysis to see if your canonical tag has been set correctly.
The canonical tag can be set in the <head> part of the source code or in the HTTP header.
Ideally, only URLs that answer with the status code 200 are indexed. If your page answers with a different status code, like a 3xx redirect, it will eventually be removed from the index. If you are using temporary redirects (302) make sure that your URLs answer with the status code 200 again once the temporary issue has passed or fix internal links to point directly to the new target URL if the redirect is permanent.
4. Meta tag "robots"
By setting the meta tag "robots" to "noindex" you can tell search engines that your page should not be indexed. The meta tag can be found in the <head> part of the source code. The "noindex" attribute can be set in the <head> part of the source code or in the HTTP header.
If a URL is set to "noindex" crawling of that URL should be allowed in the robots.txt. Search engine crawlers cannot read the noindex directive if they are disallowed from crawling that URL.
5. Broken pages
Pages that answer with header status codes 4xx or 5xx also won't get indexed by search engines. If your indexed page suddenly answers with an error status code search engines will usually keep trying for a while to see if the page becomes available again before removing the page from the index. Make sure to analyze your website regularly to catch and fix any broken pages right away.