Optimize Your Website: Analyzing Crawled but Not Indexed Pages
Table of Contents
- Introduction
- Understanding Crawled but Currently Not Indexed Pages
- Exporting Data for Analysis
- Analyzing Internal Links
- Checking for Text Content
- Analyzing GSC Clicks and Impressions
- Filtering URLs for Active Pages
- Excluding Feed URLs
- Dealing with Pages without Internal Links
- Conclusion
Understanding Crawled but Currently Not Indexed Pages
In the world of SEO, it is important to understand why certain pages on a website are crawled by search engines but not indexed. This often indicates a lack of value in the content. When Google crawls a page, it may determine that the content is not relevant or unlikely to be served to users. In such cases, the page is categorized as "currently not indexed" or "discovered currently not indexed." In this article, we will explore a nifty trick to gain a better understanding of these pages and how to effectively analyze them.
Introduction
Hi, my name is Daniel Foley Carter, an SEO expert from SEO-audits.io. Today, I want to share a useful trick that will help you gain insights into pages that are currently not indexed but have been crawled by search engines. Pages that are crawled but not indexed often lack value and may not appear in search engine results. By using the method I will demonstrate, you can export data, filter it efficiently, and identify the pages you need to take action on. However, keep in mind that Google's data may not always be up to date, and some crawled pages may no longer be valid. Let's dive into the process and learn how to effectively analyze these pages.
Understanding Crawled but Currently Not Indexed Pages
Before we delve into the process, let's first understand what it means when a page is crawled but currently not indexed. When Google's bots crawl a website, they examine the content and determine its relevance and value. If Google finds a page that it believes does not provide value to users or is unlikely to be served in search results, it categorizes the page as "currently not indexed" or "discovered currently not indexed." This means that although the page has been crawled by Google, it has not been included in the search engine's index.
Exporting Data for Analysis
To effectively analyze pages that are crawled but not indexed, we need to export the necessary data and filter it accordingly. Let's start by navigating to the indexing section in Google Search Console. Go to the "Pages" tab, and focus on the "Crawled - currently not indexed" section. Export this data to a Google Sheet for further analysis. Make sure to select the appropriate URLs based on your analysis needs.
Analyzing Internal Links
An essential aspect of understanding crawled but currently not indexed pages is analyzing their internal link structure. Internal links play a crucial role in signalizing the importance of webpages to search engines. In the Google Sheet containing the exported data, create a column labeled "Internal Links." Using the VLOOKUP function, compare the URLs of the crawled but not indexed pages with the URLs of other pages on your website. This will help identify which pages lack internal links, indicating potentially low relevance or discoverability.
Checking for Text Content
Another important factor to consider when analyzing crawled but not indexed pages is the presence of text content. Text-based content is vital for search engines to understand the context and relevance of a page. In the Google Sheet, create a new column labeled "Text Content." Again, utilize the VLOOKUP function to check if the URLs of the crawled but not indexed pages have associated text content. Pages that lack text content may suffer from a lack of value, making them less likely to be indexed by search engines.
Analyzing GSC Clicks and Impressions
To gain more insights into the performance and potential value of crawled but not indexed pages, we can analyze Google Search Console (GSC) data. Export the click and impression data for the past 16 months from the GSC "Performance" section. Add this data to a separate sheet, labeling it as "GSC Click Data." Now, using VLOOKUP, cross-reference the URLs of the crawled but not indexed pages with the URLs in the GSC click data. This will provide information on past clicks and impressions, helping determine if these pages had any indexing value in the past.
Filtering URLs for Active Pages
Once you have gathered all the necessary data, it's time to filter the URLs and identify active pages that need attention. Start by focusing on the URLs that return an HTTP status code of 200, indicating a successful page response. Filter the data based on this criterion, as these pages have a higher chance of being actively indexed and can provide value to users. This initial filtering step helps narrow down the list of URLs to those that still hold value.
Excluding Feed URLs
As you filter the URLs, it's important to exclude feed URLs from your analysis. Feed URLs typically do not provide valuable content and should not be indexed. Remove these URLs from the list to avoid unnecessary analysis and focus on the relevant pages that require attention.
Dealing with Pages without Internal Links
After filtering out feed URLs and any pages that are not actively indexed, it's essential to assess the pages that have no internal links. These pages may be anomalies or generated by plugins, media URLs, or structural issues in the site architecture. Determine if these pages have any value or relevance to your website goals. If not, ensure that Google does not recrawl or index them by utilizing appropriate indexing control measures, such as updating your robot.txt file or adjusting your site structure.
Conclusion
Analyzing crawled but currently not indexed pages is crucial for optimizing your website's performance and overall SEO strategy. By exporting and filtering relevant data, such as internal links, text content, and performance metrics, you can identify pages that may need further action. Remember to exclude feed URLs and focus on active pages that hold value. By understanding and addressing these pages effectively, you can enhance your website's visibility and improve its chances of ranking well in search engines.