In a recent post, I created a Complete Graphic and Web Design Stack. I created this guide to be my go to resource for everything graphic design. In one section I highlight web crawler tools. Well, one tool, Screaming Frog. It’s the industry standard. This post aims to discuss Screaming Frog in depth. In it, you’ll learn what a web crawler tool is, why they are important, and how to use Screaming Frog for your website.
What is a Web Crawler?
A web crawler/spider is a software program written to index all items on a website. I’ve found Screaming Frog is the best product available to make this happen. It’s a downloadable program that crawls a website’s files: CSS, images, links, scripts, and webpages and allows you to review every them with ease. Its powerful algorithm highlights individual flaws that can hurt speed, SEO, and overall performance. These flaws include 404 error codes and broken links, as well as showing duplicate, missing, or unoptimized page titles, URLs, H1, H2, meta descriptions, and images. Screaming Frog and web crawlers in general surface these issues so you are aware. Updating them on your site is up to you and will need to be made in your code.
Let’s dig in.
How to Download and Install
Go to https://www.screamingfrog.co.uk/seo-spider/. The installer will pop-up the appropriate version (Mac, Windows, & Ubuntu). Once downloaded execute the installer. If you’d like to see more than 500 files you will need to purchase a license. Anything under 500 you can use the free version.
Crawl Your First Website
Now that you have downloaded and installed Screaming Frog, you are ready to crawl your first site. We’ll be using my portfolio site as an example, but feel free to use any domain you want.
Out of the box, the program looks a little intimidating, but once you run your first crawl you will become much more comfortable with it.
Enter the domain: https://michaellutjen.com in the URL bar and click “start”.
Once the crawler starts you will see a percentage bar to the right letting you know the status of completeness. Depending on the size of the site, the crawl may take a minute or several minutes. Once at 100%, you are now ready to start analyzing. 😍
Note – If you want to crawl additional subdomains (for example a blog on the URL ‘blog.website.com’), you need to check the Crawl All Subdomains box under Configuration > Spider.
Alright, let’s jump in.
Identify any Errors and Check Response Codes
I always take a look at the ‘Response Codes’ tab first to see if there are any errors and/or omissions. Typical statuses are as follows:
- 200: OK
- 301: Permanent redirect
- 302: Temporary redirect
- 404: Not found
- 500: Server error
- 503: Unavailable
The most important of these are 404 errors. 404 errors arise when a page or file is missing or incorrectly recorded on the site. You should work to eliminate every 404 error as they give your website users an inadequate experience (broken link). 404s also reflect negatively with Google.
Next are 301s. A 301 is not necessarily bad, but they can turn ugly quick. Too many can decrease your site performance lowering both your user experience and page rank on Google. Over time you can remove older redirects that aren’t needed anymore.
200 code means that everything is OK. All is acting as it should. You can move on.
Within the URL tab on Screaming Frog, you have the opportunity to analyze every single URL on your site. Here you can check for non-ASCII characters, underscores, uppercase, and URIs over 115 characters. Each of these can cause issues with Google and eradicating them would greatly benefit your ranking and status with Google.
Next, we take a look at page titles. Page titles are one of the most important items on a webpage. Each should be unique and include the keyword that the page is geared towards. Any duplicates should be fixed as soon as possible. Each title should be under 60 characters long. Note that not all letters and numbers are created equal. For example, the number 1 and letter I are thin. Conversely, the numbers 2 through 9 and most letters are wider and take up more space. Be sure to double-check the title tab once you’ve deployed your updates.
Meta Descriptions allow a search engine to understand the purpose of any given page. They also show up in search engine results for the end-user. Best practices call for them to be under 160 characters long. Anything over that will be truncated and won’t be seen. Each description should be unique allowing it to be geared towards only the page that it represents.
One of the most visible aspects of your website is your images. Whether vector or raster in nature, your user will no doubt take notice of your image choices. There are a couple of items to keep in mind when looking at the list of images used on your site.
First and foremost, I recommend selecting the “Missing Alt Text” filter. This will surface every image that is missing alt text. Alt text allows users with disabilities to understand what each image represents. Be as specific as possible. They should also be similar to the keyword that the page is trying to rank for. This will allow Google to narrow down the page’s purpose. And be sure to keep the character count under 100.
The second selection I recommend filtering out is the “Over 100 kb”. Oftentimes, images are bulky and not sized properly. By selecting the “Over 100 kb” filter you can isolate larger images that you may be able to reduce in size. Smaller images allow for a faster page load. There is a balance between page load and high-quality images. And each site will be different. Use the list created here as a starting point.
In the age of search engines, XML sitemaps have become essential for every website. Search engines use these documents to confirm that they have found all relevant content on the site. The WordPress Yoast plugin has this built into it. Post, Page, and Category are all individual sitemaps included in the composite version. If you are using WordPress for your site, I highly recommend adding the Yoast plugin, if only for this purpose. It will automatically adjust when you publish a page or post.
If you aren’t using WordPress, Screaming Frog can generate an XML sitemap for you. You’ll have to add it to your site’s root directory. Once uploaded, any time a new page is added you will have to manually include that page as well.
Screaming Frog is an excellent tool to have in your toolkit. It allows you to analyze the technical elements such as response codes, URLs, and page titles for your site. Be sure to start with the free version. It’s proven to be a valuable resource in my websites. I’m sure you will find it valuable on yours as well. And when combined with Google Page Speed and Google Analytics, you will not only be improving your page speed, SEO, and performance, but also staying ahead of your competition.