To set up the analysis, open the Project Settings and navigate to Advanced analysis.
1. How to analyze
1.1 How many URLs should be analyzed?
Define how many URLs you want to crawl in this project. If you have more projects, you can assign the URL capacity to your projects.
To look up the overall account URL Limit and its usage, have a look at your Projects in your Account Settings.
1.2 How fast should the analysis be?
You can define your crawling speed by adjusting the number of parallel requests (from 1 to 10).
In order to further increase the speed of the crawl by raising the parallel requests, you must verify your page first. To do so, you need to download the authentication file first. Then upload the file to your root folder and click Check authentication.
Once your Page is verified, you can raise the parallel request up to 100. Please note that this will require more server resources! This could lead to 5xx status codes. Less parallel requests will require less server resources.
1.3 Login Data
Your website is under construction and protected with a password? That's no Problem, you can add your .htaccess Authentication credentials in the Project Settings.
Once you have entered the login credentials, you can run a test at Test settings to check if the crawler is able to access your domain. Now you are good to go.
1.4 Robots.txt Behaviour
Robots.txt behaviour options:
1.5 Analysis user-agent
The option user-agent will determine the name of the crawler. You can enter RyteBot or GoogleBot as user-agent, while the RyteBot is the default user-agent. Or if you want to make sure that you whitelist or track our service only, you can give the crawler an individual name (e.g:crawler123abc)
2. What to analyze
2.1 Homepage URL
The Homepage URL defines what RYTE takes as the homepage of your Domain also it will give the crawler a start point. If your Homepage is not indexable please have a look here.
2.2 Analyze Subdomains
Analyze all found subdomains and show them as part of the main domain in reports. If you deactivate this feature, they will be treated as "external links".
2.3 Analyze Sitemap.xml
Analyze your sitemap.xml for errors and optimization potential. If you use many sitemap.xml files (20+) you could deactivate this function to speed up the overall crawling.
2.4 Sitemap URLs
By default, our crawler is looking for the sitemap.xml in the root folder (domain.com/sitemap.xml), If your sitemap is located elsewhere or has a different name you can add its URL to the settings so the crawler will follow it.
You can add as many sitemaps as needed.(one per line)
If your sitemap_index.xml is not recognized correctly please add each sitemaps URL from the index.
2.5 Ignore GET Parameters
Define GET parameters here, that will be automatically removed from URLs found on your website. Useful to avoid unnecessary URL variations from Session-IDs or tracking parameters. Downside: Issues like duplicate content might not be discovered.
3. Ignore / Include URLs
3.1 Exclude URLs (blacklist)
You can exclude URLs from your crawl by adding blacklist rules, in this example we want to exclude the Magazine and our Wiki, we realize that by blacklisting the "subfolders". The rules should look like this:
This rule will exclude all URLs containing /wiki/ and /magazine/. Please note that this does not take the folder hierarchy into account. For instance, a URL domain.com/wiki/ will be excluded as well as domain.com/subfolder/wiki/
You can apply any rule in any depth if you wish that certain sites to be excluded e.g.:
You can add as many rules as you like.
3.2 Include URLs (whitelist)
The whitelist has the same functionalities as the blacklist but it works the opposite way. If you need to apply rules by "include only" then you can use our whitelist feature.
In this example, we want to crawl ONLY our Magazine and Wiki. We realize that by whitelisting each "subfolder":
Please note that this will also whitelist domain.com/subfolder/wiki/, you may need to adjust it in depth:
4. General settings
In the General Settings, you can change the Projects Name and Slug.
The name of your project, how it will be displayed within your Ryte account. (default: domain)
This is the short notation of your project and how it is displayed within the URL.
ATTENTION: The project domain will remain unaffected. This will only change the display URL of the project. This feature is important if you analyze the same domain multiple times in your Ryte account.