mTrawl – A free gift for the web community

For a number of years the landscape free website checkers hasn’t really moved forward that much. Since the introduction of xenu’s link sleuth it seems like everyone said game over and stopped trying with free link checkers and other quality tools.

We don’t want that to be the case. Our tools, frankly, aren’t for everyone (yes, I really did just type that). If you’re a professional web developer working on many sites or the owner of quite a large site then DeepTrawl and CloudTrawl make complete sense. You need pro-tools, because you’re time is very valuable. But, if you’re an ametuer web developer, or a pro in the making then you may simply not have the cash to splash out on something commercial. You may want something free. That doesn’t mean the tool you use should be slow, unpolished, or lacking in essential features.

This is why we’ve created mTrawl. It’s based on DeepTrawl, just with less features. It’s aimed squarely at those who need something to check their website works right, but can’t justify spending money on a product or service.

mTrawl

So what does mTrawl do? Check it out:

It’s a link checker

Install it on your PC or Mac & click start. mTrawl rips through every page page of your site (yes, every page, there are no limits) & finds the broken links. It shows every broken link on every page in a really easy to read report. It even gives you the line number where the broken links were found.

It’s a validator

This is where mTrawl really excels. If you just want a free link checker there are many options. None were aware of also validate your html. While mTrawl is checking each page for broken links it’s also validating every page, just like the W3C validator does. Why would you want this? Well, validation is really, really important. It checks your code is correct, i.e. complaint with a standard (either html 5, xhtml or 4.01). If you have errors in your code that could damage your seo or mean your site doesn’t render correctly in different browsers.

It’s really very, very polished

It’s may have way less features than it’s sister, DeepTrawl but it’s just as polished. Just because it’s a free product doesn’t mean it should look bad, crash, only work with some older operating systems, or generally just annoy you with little niggles. All we know about producing a successful commercial product also went into mTrawl, so you’ll feel good using it. Of course we hope that one day you’ll love it enough to move over to one of our pro products. Until then, happy trawling!

You can download mTrawl, here.

Adding a description meta tag in wordpress

Having a description meta tag in your pages is really good for SEO. A description goes in the head section of your html & looks like this:

<meta name=”descriptioncontent=”Description of your page.” />

The description is often used displayed below the title of the page in search results. Although it isn’t generally technically used as a ranking factor for your page it is important to SEO. If the description is poor, viewers are less likely to click. Since modern search engines actually use click through rate as a ranking factor, the description can influence your page’s rank indirectly! This makes is very important to have a good, compelling description for your pages which includes your most important keywords.

When we released DeepTrawl 4, we added a cool new SEO feature which takes your list of keywords and checks every page in your site to make sure at least one is optimized for it. Amongst several ranking factors this feature makes sure the keyword is in the page’s description tag.

When we started pushing our own long tail SEO more we discovered that adding a description meta tag in WordPress is not supported by default. Damn.  This seems like an odd omission. There are a few plugins that attempt to fix this but we wanted to go plugin-free (won’t get into why here).

The solution we discovered is surprisingly simple. We now use the Excerpt feature which is built into WordPress and output it’s content as the description. The Excerpt box is shown below the text entry box in the WordPress editor:

The excerpt becomes your description meta tag in wordpress

Before using this technique you should know that the excerpt can be used for other purposes, and you should be sure you’re fine with using the same text for the page description and any of these:

1) It can be used in place of the full post texts in RSS fields.

2) Depending on your theme it can be displayed in places like search results and archives.

Most of the time, the same thing you’d want Google showing for your page description is probably what you’d want in these places, but it’s best to go in with full knowledge. More on the above can be found here.

Adding the code to your template

We’ll assume at this point that you’re comfortable making basic edits to your WordPress template. Once you’ve added the code below you’re pretty much done. That’s why we love this technique, it’s stunningly simple.

1) Open the header.php file in your theme

2) Insert the following code right after the <title></title> tags:

<?php
if (is_single() && $post->post_excerpt != “”) {
$post = $wp_query->post;
$descrip = strip_tags($post->post_excerpt);
echo ‘<meta name=”description” content=”‘.$descrip.’”>’;
}
?>

The code above adds a description tag containing the excerpt text if there is an excerpt and if the viewer is looking at a single post (e.g. it won’t be added to your blog front page).

Once you’re done the code should be below your title tag, like this:

<title><?php wp_title(”, true, ‘right’); ?></title>
<?php
if (is_single() && $post->post_excerpt != “”) {
$post = $wp_query->post;
$descrip = strip_tags($post->post_excerpt);
echo ‘<meta name=”description” content=”‘.$descrip.’”>’;
}
?>

Now save your template and you’re ready to test it out.

Adding descriptions to posts

You can now add a description meta tag in wordpress posts very easily. Just update your posts by adding an excerpt. Remember your descriptions should be engaging, they should make searchers want to click your link. Also remember to include your keywords, just don’t be spammy :) .

Also, since you’re clearly into SEO, remember that DeepTrawl can make the process of performing on site SEO much easier. It checks you have a page optimized for each of your keywords. This allows the SEO process to be all about writing great content, not worrying about the minutiae of on page SEO. Learn more here.

Before we end, thanks should go to the people who helped us figure out the code above, it was based on the discussion here.

<?php
if (is_single() && $post->post_excerpt != “”) {
$post = $wp_query->post;
$descrip = strip_tags($post->post_excerpt);
echo ‘<meta name=”description” content=”‘.$descrip.’”>’;
}
?>

An seo url tweak: adding keywords to urls in wordpress

In DeepTrawl 4 we added a new keyword seo feature. It checks a page of your site is perfectly optimized for each keyword on your list. This encouraged us to up our own long tail seo game.

One of the things the feature checks is that a page has the required keyword in it’s url, like this:

http://www.example.com/your-keyword.htm

The reasons for doing this are quite well documented. Having your most important keywords in a page’s url (what we call an seo url) helps you to increase your rankings in search results and helps clicks-throughs since Google highlights the keywords in the seo url (this subtly sends a signal to the searcher that the page is more relevant). Adding keywords to urls really is a no brainer, even Google’s own Matt Cutts thinks it’s a good idea.

We found it very easy to follow this policy for the parts of our sites that run on flat html files. For the parts in Wordpress this is a little harder, but with the steps below you can get wordpress to play ball and become an seo url master.

Step 1: Change your permalink settings

“Permalinks” refers to the url scheme used by wordpress. It can use many different schemes, for example the default is probably the worst, just giving a unique id for each article:

http://www.example.com/?p=123

Although wordpress can give you a much better scheme with a couple of clicks, we’re going to show you how to create the best automatic permalink structure, which can then be tweaked and changed for each post by adding keywords to urls manually. By default, your untweaked urls will look like this:

http://www.example.com/your-post-title

So, let’s get started:

1a) First, log into wordpress and go to Settings > Permalink Settings. You’ll see a screen like this:

Adding keywords to urls starts with the permalink settings

1b) Click the Custom Structure button and enter this:

/%postname%

Now click Save Changes.

At this point you may see a message like the following:

htaccessWarning

If you don’t get this message you can move on to step 3. If you do see the message, it means Wordpress didn’t have the permissions needed to change your .htaccess file. It’s important to get this resolved right away but you’ll need to be able to log into your server using ssh or similar. If you don’t have access to do that, reset and save your permalink settings & you’ll be back where you were. If you do see the message and you can ssh into your server, continue with step 2.

Step 2: Amend your .htaccess file if needed

This section assumes you have Wordpress installed on Linux and are comfortable using a command line at a basic level. If these don’t apply to you, we recommend you revert your permalink settings in the WordPress dashboard & stop here.

2a) Log into your Wordpress server using ssh or similar

2b) cd to the root directory for your wordpress install. It may be something like this:

/home/webadmin/site_name/html

2c) Edit your .htaccess file or create one if it doesn’t exist, (we prefer to use vi for this, but tastes vary)

2d) If the file already contains some settings, down key to the end and paste the directive given to you in the WordPress dashboard. It’s best to paste from the dashboard as this may vary, but the directive we were given looks like this:

<IfModule mod_rewrite.c>
RewriteEngine On
RewriteBase /
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule . /index.php [L]

Now you should be all set. To make sure, go to the main page of your blog and click one of the post titles. As long as the post appears ok everything should be fine now, the url scheme and the rewriting directives are now working together. Phew!

We can’t help putting a plug here, having a link checker like our own DeepTrawl or CloudTrawl really helps with peace of mind for changes like this ;)

Step 3: Control the url for each post

The this point each post in your blog will have changed to an seo url scheme, the links will all be pointing to urls like:

http://www.example.com/your-post-title

This is great, but post titles aren’t always the best default for your seo url. It’s best to try not to have more than 5 words in any url. Much more and the most important words look less important and large amounts of words in urls can look spammy, both to visitors and to search engines.

It’s a good idea to manually override the url of a post when it gets too long. Luckily WordPress will now make this easy.

3a) Create a new test post

3b) Click Save Draft

3c) You can now see the permalink box underneath your post title:

Add your seo url here

This makes changing your url really easy. Generally keywords are separated with hyphens (-).

So… you’re done. You can now enjoy all the seo benefits of adding keywords to urls.

Don’t forget; DeepTrawl can make your on site SEO much easier. You can learn more here.

<IfModule mod_rewrite.c>
RewriteEngine On
RewriteBase /
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule . /index.php [L]

How to spell check web pages effectively

Making sure to spell check web pages is really important. This isn’t a new issue. The problem of spelling errors is as old as the written word, but the modern web does present some unique challenges. When you review a page on the screen, you need to make sure you’re checking all the text. This is sometimes tricky. Take a look at this example:

spelling

Can you see the spelling error? Try looking again. Still no? Here it is…

spellMenu

The reason you couldn’t see the spelling error in the (made up) example is that it was hidden until the mouse roles over the menu, and this is a common issue when spell checking sites.

If you spell check web pages manually in your browser you need to be extremely careful to check the “hidden” areas of the page. Often these are simply menus, where there isn’t too much text to get wrong, but in some sites these hidden areas can be a lot bigger….

When you spell check web pages, do you check the tabs too?

In the above example there’s a large amount of text hidden behind those tabs. In fact, taken together they contain more text than the rest of the page. So the first lesson is: beware of tabs, menus and other hidden areas, they often contain text that needs to be spell checked.

The following sections contain hints on checking other parts of pages, which in some case are completely hidden when you view them in the browser, but they’re some of the most important parts of the content.

The title

The <title> tags in the <head> section of a web page contain the text that will appear in a couple of places:

1) The browser tab used to open the page:

title

2) Search engine results. Search engines use the title tag in their listings. This makes the title an incredibly important page element to check!

search

The description

This is a slightly more obscure part of a page. It never appears in the browser when the page is viewed. The description is found in the top of your html, like this:

<html>

<head>

<title>Your title</title>

<meta name=”description” content=”Text describing the content of the page”>

</head>

The description is often used by the search engines to describe the page, enticing the viewer to click on it. Here’s an example of how it appears:


description
Since the title and description may make someone decide whether or not to view your page, getting the spelling right here is very important.

Image alt’s

Image alts are put into the html of your page with images. They provide a description of what’s in the image. They look something like this:
<img src=”imageFile.png” alt=”A description of what’s in the image” />
These are fairly important because image alt’s are shown by some web browsers when a visitors mouse is positioned over the image. They’re also used in indexing images to be shown in systems like Google’s image search. Finally image alts can sometimes be read out as an alternative to the image by screen reader software.

How to handle these

If you’re spell checking manually by opening pages in your browser it’s important to check tabs and menus etc. to find hidden content. It’s also important to do right click > View page source and make sure all the hidden areas (title, description, image alt’s) are spelled correctly.

There is also a better way. Our tool, DeepTrawl, will automatically check all the content of every page and all of the above. With a single click you can find the spelling errors in an entire site.

spell check web pages

DeepTrawl v4 is here

v4

It’s been a little while since DeepTrawl has seen a major upgrade so this one is truly huge. Here’s an overview of the most important new features.
Html 5 validation
This is something we’re very proud to announce. Html 5 validation (along with xhtml & html 4.01) is now baked in. In fact it’s exactly the same html 5 validation you’ll get from the w3c’s own validator. We’ve the same validator code the w3c uses for html 5. Of course as with every DeepTrawl check, this works on your entire site with one click.
Css Validation
This is a brand new feature in v4. When a site is trawled all of the internal, external and inline styles are read, just like the html. Css validation shows you all the errors along side all the site’s other issues. But we’ve gone a lot further than just validation. Css is now a first class citizen – user added checks can now analyze css and it’s import, font & image links are checked.
wedwedwed

It’s been a little while since DeepTrawl has seen a major upgrade so this one is truly huge. Here’s an overview of the most important new features.

Html 5 validation

This is something we’re very proud to announce. Html 5 validation is now baked in. In fact it’s exactly the same html 5 validation you get from w3c. Of course as with every DeepTrawl check, this works on your entire site with one click. More.

Css Validation

This is a brand new feature in v4. When a site is trawled all of the internal, external and inline styles are read, just like the html. Css validation shows you all the errors along side all the site’s other issues. But we’ve gone a lot further than just validation. Css is now a first class citizen – user added checks can now analyze css and it’s import, font & image links are checked. More.

Keyword SEO

This is such a big feature there’s too much to go through in this post – so here’s a very quick overview. The first step towards getting to the top of the search results is really good on-site seo. You need to make sure each keyword you want to rank for has at least one really well optimized page. That page must have the keyword in the title, h1, body text and there are a lot more details to check.

There are a few tools out there to make sure a single page is optimized for a single keyword – that’s relatively easy to do. The problem is anything more than a tiny site will have a lot of pages and a long list of keywords it wants to target. That’s where DeepTrawl 4’s Keyword SEO check comes in. It checks every page in your site against all of your keywords and gives you a report showing which page is optimized for each keyword. If there’s no perfectly optimized page it will show the closest matches and tell you how to optimize them.

As a bonus, because this is like every other check in DeepTrawl it’s run every time you trawl your site so you can spot if your seo efforts are being wrecked by changes later in the site’s life.

This is a completely unique feature – we haven’t seen any other tool which handles the on site seo for every keyword and every page in your site. More.

(Much) better html exports

We’ve upgraded html exports hugely. They’re now beautifully rendered in html 5 & css. We’ve also added branding options – you can now add your own logo, colors and texts making html exports perfect for sharing with clients.

Dependency checks

DeepTrawl has always checked for broken links. In fact for a long time it was known mainly as a link checker. Starting in v4 we’ve added the dependency check, this does link checking for things like JavaScript, font & iframe imports. Ever seen a page with a broken css import? We bet you have and would wager the page was ugly as hell – now you never have to torture your visitors like this. More.

Better UI

The interface in DeepTrawl is now a lot sexier. We’ve gone for a cross platform look & feel that works really well on all modern platforms like Windows 7, 8 & OSX Mavericks. Besides looking better the interface now also works better. We’ve added features like Chrome style reorderable tabs and a new Monitor which pops out from the bottom of the screen, instead of being in a separate window.

Improved filters

DeepTrawl has has filter tabs for a while now – they allow you to see pages with specific errors in a new tab. We’ve enhanced with two new features:

1) It’s now possible to hide all other errors in the filter tab.

2) Filter tabs can now be used to filter by url. This allows you to show only errors in specific parts of your site or even zero in on a single page in the error results.

Check analytics code

Did you remember to put your analytics tracking code in *every* … *single* … *page* in your site? 100% sure? Now DeepTrawl can tell you which pages you’ve missed.

Try it

There are many more new features and enhancements (actually there are more than twice as many new features and enhancements compared to v4). We suggest you try it out.

Why Google should use plus for rankings – and almost certainly will

Recently there was a bit of a stir caused by Moz.com discovering that there’s a correlation between sharing on Google + and the ranking of pages. Their data shows that pages with more shares tend to get higher rankings.

They made the case that Google + is actually something of an seo machine – that each post is like a mini seo’d article. This seemed very odd to us. Google own all that data – they don’t actually need to crawl it all with their spider to index it. It’s already in a database sitting in one of their data centers. Now – maybe they do just crawl it because everything isn’t always as joined up as we might imagine but that seems unlikely. As the article points out, shares on + are indexed way faster than most pages indicating some kind of special sauce.

But, here’s the really interesting part: The article explicitly stated that giving a +1 to a page wasn’t directly influencing it’s ranking. In a response to the article on Hacker News (later appended to the article), Google’s own Matt Cutts even took the time to explicitly rule this out:

It is not the +1’s themselves that are causing the high rankings of posts but the fact that most +1’s on a site result in a shared post on Google+, which creates a followed link back to the post. It’s instant organic link building.

This got us thinking… why on earth wouldn’t Google use +1’s to directly influence page rankings? The obvious answer is that it would be open to manipulation by spammers. That’s true, but by all accounts Google’s previous most powerful weapon (PageRank) has been demoted in relevance by them because it’s being spammed way too much. Generating spammy links is just too easy these days.

The advantage of +1’s is that they’re tied to your Google account which gives the search giant a lot of really useful details to try and hone in on and ignore the spammers. For example they could:

- Not count any +1’s generated in the x days after an account was created

- Discount any by accounts which have +1′d a number of sites deemed to be spammy

- Weight against +1’s from non-verified accounts.

That’s just a quick though experiment. Given time one could generate hundreds spam-y metrics using the data Google holds about all of us. Hell – I might seriously demote the +1’s of anyone who’d Googled phrases like “black hat seo” in the past – ooh – creepy!

But slightly sinister jokes aside – it’s clear that this data could help Google create a better search. Even if they aren’t using +1’s now it’s a good bet they will in the future. So, if there weren’t enough benefits already installing those +1 buttons is probably a good idea to do it now – you might just see a direct rankings boost from it in the future.

The world’s biggest companies have boatloads of broken links

Since we recently released CloudTrawl we decided to undertake some research to prove just how valuable it is. The uptime of major websites and the damage to reputation and profits downtime causes has been written about extensively, so we decided to go a different way. Every web user has seen a broken link; they often make our blood boil & frequently people will leave a site on seeing one assuming the content they’re seeking simply doesn’t exist. 404 has become the new blue screen of death. Broken links are a real risk to reputation & profit but we’ve never seen a comprehensive study on just how common they are in major sites.

We decided to undertake that study and to perform it on the group of sites whose owners aren’t lacking in resources: the Fortune 500.

The Results

Here’s a big figure to open with:

Fortune500_1

You read that right, 92% of the sites in our sample included at least one broken link & most had several. 68% had more than 10 broken links, 49% had more than 50 and a surprising 43% of Fortune 500 sites have more than 100 broken links.

We also broke down the amount of pages which had broken links against the total amount of pages in each site. A stunning 13% of all pages in Fortune 500 sites have at least one broken link (many pages have several).

Fortune500_2

What isn’t shown in the figures is the importance of some of these links. We saw examples of broken links to annual reports, quarterly statements, social presences (e.g. broken Facebook links) & external + internal news articles. Perhaps most worrying were the unreachable legal notices & terms & conditions documents. Along with making users leave the sites (& possibly making lawyers pass out!) these things are bad for search engine optimization. Google won’t be able to find these pages & sites will be penalized.

Our Method

To get a fair cross section of the Fortune 500 we chose 100 companies at random across the set. We entered their names into Google and picked the first US / international result owned by that company. This resulted in a mix of sites. Some were corporate (company news, quarterly statements etc.) and some were online presences for customers (stores & marketing). We rejected any sites which CloudTrawl didn’t finish crawling in 5 hours or which contained more than 5,000 pages (these can sometimes spawn loops in page generation and unfairly bias results, search engines also stop crawling sites if they think this is happening).

To eliminate false positives we quality checked results both randomly and where sites contained a high percentage of broken links. To make sure the headline figures weren’t biased we only check links (not images) and only checked for 404 & 410 http error codes, ignoring server timeouts etc. as these can sometimes be temporary.

Conclusion

Although there are some big headline figures above, the one that troubles us most is the 13%. Essentially we’re saying that more than 1/10 Fortune 500 web pages has a severe bug that’s waiting to pop up and grab unsuspecting users.

Next time you see a 404 error you’ll at least have the consolation that they’re proven to be really common. Of course we do give webmasters the tools to fix these issues – and I think we’ve presented a decisive demonstration of why they’re needed.

Note; feel free to use the infographics in this post; we hereby release them for use on other sites.

OMG; We’ve Launched!

It’s a proud day over at CloudTrawl.com; we just launched the full live service!

We’d loved it if you sign up for the free trial and we’re all ears for new feature requests & suggestions.

So, what made it into the first version? CloudTrawl is designed to watch out for stuff that goes wrong on it’s own, even if you don’t change your site. So for the first version we have:

- Link Checking (we check every page of your site, daily or weekly)

- Uptime Minotoring (we check your site is online every 30 seconds / 24×7)

We also have features like complete history charting, the ability to share site reports and settings with colleagues & customers, very cool looking real time views for uptime checks, the ability to “Start Now” for link checks, image validation and a lot more.

Even this tidy set of features is really just the tip of the iceberg of what’s planned for CloudTrawl. The ultimate goal: monitor absolutely everything that could go wrong with your site on it’s own; over time we’ll be adding more checks and we’d love for you to tell us what extra features and checks you think CloudTrawl should have.

Happy Trawling!

Last bug is fixed!

This is a real development milestone. All of the code for CloudTrawl v1 has been written for a while and we’ve been focused entirely on testing. Our testing has included a lot of steps:

1. Automated testing; we now have a massive suite of automated tests which can be run at the click of a button

2. Functional testing; making sure every feature works as described and they all hang together well

3. Cross browser testing; making sure the interface works across browsers and operating systems

4. Scale; running up hundreds or thousands of uptime checks and hundreds of link & image checks simultaneously to make sure the system performs well with lots of people using it (if I can think of a way to make this not boring it deserves a blog post all of it’s own).

5. Third party testing; we got the guys over at TestLab² to do a barrage of tests to make sure we hadn’t missed anything.

And then this evening it finally happened… the last known bug was fixed. So “OMG”, it’s so nearly time to open the champagne hit the release button. Watch this space!

Will crawling affect your Google Analytics?

This is a question we’ve been asked a few times. Many users want to know if the hits generated when their pages are checked for broken links will show up in Analytics. The answer for both CloudTrawl and DeepTrawl is no, it won’t be a problem. For reasons I’ve written about previously, both products don’t execute the Javascript on your pages. Since Google Analytics relies on Javascript to count page views it has no way of knowing that we’ve visited a page, so this won’t show up.

As a side note, in the next version of DeepTrawl we’re planning to implement a way to make sure all of your pages contain analytics tracking code, until then it’s relatively easy to do this check yourself using DeepTrawls ability to add your own new checks.