It’s been a little while since DeepTrawl has seen a major upgrade so this one is truly huge. Here’s an overview of the most important new features.
Html 5 validation
This is something we’re very proud to announce. Html 5 validation is now baked in. In fact it’s exactly the same html 5 validation you get from w3c. Of course as with every DeepTrawl check, this works on your entire site with one click. More.
This is a brand new feature in v4. When a site is trawled all of the internal, external and inline styles are read, just like the html. Css validation shows you all the errors along side all the site’s other issues. But we’ve gone a lot further than just validation. Css is now a first class citizen – user added checks can now analyze css and it’s import, font & image links are checked. More.
This is such a big feature there’s too much to go through in this post – so here’s a very quick overview. The first step towards getting to the top of the search results is really good on-site seo. You need to make sure each keyword you want to rank for has at least one really well optimized page. That page must have the keyword in the title, h1, body text and there are a lot more details to check.
There are a few tools out there to make sure a single page is optimized for a single keyword – that’s relatively easy to do. The problem is anything more than a tiny site will have a lot of pages and a long list of keywords it wants to target. That’s where DeepTrawl 4’s Keyword SEO check comes in. It checks every page in your site against all of your keywords and gives you a report showing which page is optimized for each keyword. If there’s no perfectly optimized page it will show the closest matches and tell you how to optimize them.
As a bonus, because this is like every other check in DeepTrawl it’s run every time you trawl your site so you can spot if your seo efforts are being wrecked by changes later in the site’s life.
This is a completely unique feature – we haven’t seen any other tool which handles the on site seo for every keyword and every page in your site. More.
(Much) better html exports
We’ve upgraded html exports hugely. They’re now beautifully rendered in html 5 & css. We’ve also added branding options – you can now add your own logo, colors and texts making html exports perfect for sharing with clients.
The interface in DeepTrawl is now a lot sexier. We’ve gone for a cross platform look & feel that works really well on all modern platforms like Windows 7, 8 & OSX Mavericks. Besides looking better the interface now also works better. We’ve added features like Chrome style reorderable tabs and a new Monitor which pops out from the bottom of the screen, instead of being in a separate window.
DeepTrawl has has filter tabs for a while now – they allow you to see pages with specific errors in a new tab. We’ve enhanced with two new features:
1) It’s now possible to hide all other errors in the filter tab.
2) Filter tabs can now be used to filter by url. This allows you to show only errors in specific parts of your site or even zero in on a single page in the error results.
Check analytics code
Did you remember to put your analytics tracking code in *every* … *single* … *page* in your site? 100% sure? Now DeepTrawl can tell you which pages you’ve missed.
There are many more new features and enhancements (actually there are more than twice as many new features and enhancements compared to v4). We suggest you try it out.
Recently there was a bit of a stir caused by Moz.com discovering that there’s a correlation between sharing on Google + and the ranking of pages. Their data shows that pages with more shares tend to get higher rankings.
They made the case that Google + is actually something of an seo machine – that each post is like a mini seo’d article. This seemed very odd to us. Google own all that data – they don’t actually need to crawl it all with their spider to index it. It’s already in a database sitting in one of their data centers. Now – maybe they do just crawl it because everything isn’t always as joined up as we might imagine but that seems unlikely. As the article points out, shares on + are indexed way faster than most pages indicating some kind of special sauce.
But, here’s the really interesting part: The article explicitly stated that giving a +1 to a page wasn’t directly influencing it’s ranking. In a response to the article on Hacker News (later appended to the article), Google’s own Matt Cutts even took the time to explicitly rule this out:
It is not the +1’s themselves that are causing the high rankings of posts but the fact that most +1’s on a site result in a shared post on Google+, which creates a followed link back to the post. It’s instant organic link building.
This got us thinking… why on earth wouldn’t Google use +1’s to directly influence page rankings? The obvious answer is that it would be open to manipulation by spammers. That’s true, but by all accounts Google’s previous most powerful weapon (PageRank) has been demoted in relevance by them because it’s being spammed way too much. Generating spammy links is just too easy these days.
The advantage of +1’s is that they’re tied to your Google account which gives the search giant a lot of really useful details to try and hone in on and ignore the spammers. For example they could:
- Not count any +1’s generated in the x days after an account was created
- Discount any by accounts which have +1′d a number of sites deemed to be spammy
- Weight against +1’s from non-verified accounts.
That’s just a quick though experiment. Given time one could generate hundreds spam-y metrics using the data Google holds about all of us. Hell – I might seriously demote the +1’s of anyone who’d Googled phrases like “black hat seo” in the past – ooh – creepy!
But slightly sinister jokes aside – it’s clear that this data could help Google create a better search. Even if they aren’t using +1’s now it’s a good bet they will in the future. So, if there weren’t enough benefits already installing those +1 buttons is probably a good idea to do it now – you might just see a direct rankings boost from it in the future.
Since we recently released CloudTrawl we decided to undertake some research to prove just how valuable it is. The uptime of major websites and the damage to reputation and profits downtime causes has been written about extensively, so we decided to go a different way. Every web user has seen a broken link; they often make our blood boil & frequently people will leave a site on seeing one assuming the content they’re seeking simply doesn’t exist. 404 has become the new blue screen of death. Broken links are a real risk to reputation & proffit but we’ve never seen a comprehensive study on just how common they are in major sites.
We decided to undertake that study and to perform it on the group of sites whose owners aren’t lacking in resources: the Fortune 500.
Here’s a big figure to open with:
You read that right, 92% of the sites in our sample included at least one broken link & most had several. 68% had more than 10 broken links, 49% had more than 50 and a surprising 43% of Fortune 500 sites have more than 100 broken links.
We also broke down the amount of pages which had broken links against the total amount of pages in each site. A stunning 13% of all pages in Fortune 500 sites have at least one broken link (many pages have several).
What isn’t shown in the figures is the importance of some of these links. We saw examples of broken links to annual reports, quarterly statements, social presences (e.g. broken Facebook links) & external + internal news articles. Perhaps most worrying were the unreachable legal notices & terms & conditions documents. Along with making users leave the sites (& possibly making lawyers pass out!) these things are bad for search engine optimization. Google won’t be able to find these pages & sites will be penalized.
To get a fair cross section of the Fortune 500 we chose 100 companies at random accross the set. We entered ther names into Google and picked the first US / international result owned by that company. This resulted in a mix of sites. Some were corporate (company news, quarterly statements etc.) and some were online presences for customers (stores & marketing). We rejected any sites which CloudTrawl didn’t finish crawling in 5 hours or which contained more than 5,000 pages (these can sometimes spawn loops in page generation and unfairly bias results, search engines also stop crawling sites if they think this is happening).
To eliminate false positives we quality checked results both randomly and where sites contained a high percentage of broken links. To make sure the headline figures weren’t biased we only check links (not images) and only checked for 404 & 410 http error codes, ignoring server timeouts etc. as these can sometimes be temporary.
Although there are some big headline figures above, the one that troubles us most is the 13%. Essentially we’re saying that more than 1/10 Fortune 500 web pages has a severe bug that’s waiting to pop up and grab unsuspecting users.
Next time you see a 404 error you’ll at least have the consolation that they’re proven to be really common. Of course we do give webmasters the tools to fix these issues – and I think we’ve presented a decisive demonstation of why they’re needed.
Note; feel free to use the infographics in this post; we hereby release them for use on other sites.
It’s a proud day over at CloudTrawl.com; we just launched the full live service!
We’d loved it if you sign up for the free trial and we’re all ears for new feature requests & suggestions.
So, what made it into the first version? CloudTrawl is designed to watch out for stuff that goes wrong on it’s own, even if you don’t change your site. So for the first version we have:
- Link Checking (we check every page of your site, daily or weekly)
- Uptime Minotoring (we check your site is online every 30 seconds / 24×7)
We also have features like complete history charting, the ability to share site reports and settings with colleagues & customers, very cool looking real time views for uptime checks, the ability to “Start Now” for link checks, image validation and a lot more.
Even this tidy set of features is really just the tip of the iceberg of what’s planned for CloudTrawl. The ultimate goal: monitor absolutely everything that could go wrong with your site on it’s own; over time we’ll be adding more checks and we’d love for you to tell us what extra features and checks you think CloudTrawl should have.
This is a real development milestone. All of the code for CloudTrawl v1 has been written for a while and we’ve been focused entirely on testing. Our testing has included a lot of steps:
1. Automated testing; we now have a massive suite of automated tests which can be run at the click of a button
2. Functional testing; making sure every feature works as described and they all hang together well
3. Cross browser testing; making sure the interface works across browsers and operating systems
4. Scale; running up hundreds or thousands of uptime checks and hundreds of link & image checks simultaneously to make sure the system performs well with lots of people using it (if I can think of a way to make this not boring it deserves a blog post all of it’s own).
5. Third party testing; we got the guys over at TestLab² to do a barrage of tests to make sure we hadn’t missed anything.
And then this evening it finally happened… the last known bug was fixed. So “OMG”, it’s so nearly time to open the champagne hit the release button. Watch this space!
As a side note, in the next version of DeepTrawl we’re planning to implement a way to make sure all of your pages contain analytics tracking code, until then it’s relatively easy to do this check yourself using DeepTrawls ability to add your own new checks.
… there’s even a TED video devoted to them!
For those not in the know AWS is a Cloud hosting provider; they allow their customers to use servers on a pay as you go basis, starting up and shutting them down quickly and paying by the hour.
Some of their customers are traditional web sites, some are web applications. In both cases the beauty is that extra web servers can be added almost instantly to cope when peak load comes along, i.e. when lots and lots of people are using the site.
So what’s so special about CloudTrawl that we need this? Are we expecting 100 users to log on one hour and the 10,000,000 the next? Well no, probably not.
The answer lies in the type of things CloudTrawl does:
1) Uptime Checking
This is nice and consistent. At launch we’ll have three servers doing this job, based in the US, Ireland and Japan. That number will grow but not overnight, as we get more customers we can add more.
2) Link Checking
This is the big reason we need a true Cloud service to run on, but it’s not obvious at first site. Using other online link checking services we’ve seen you set up your account and your site is scanned perhaps once a day, once a week or once a month. That’s nice and consistent right? Surely we can balance all of that out and just add servers as we need them? Nope, afraid not. We have an awesome button that rides right over that idea:
That little Start Now button means our service needs to be truly flexible. One minute we could be checking 10 sites for broken links, the next minute it could be 1,000.
So we needed to make sure we’d always have enough servers to do all that work and that’s why we’re running on AWS. We can automatically start up as many servers as we need to do the work and our customers don’t have to wait around.
If they’re worried their site might have broken links they can always hit Start Now and see CloudTrawl checking their site in real time and even fix the errors as they come in.
Pretty cool hugh?
So what’s the lesson for the web community? Well, the requirement to scale your site can come when you least expect it. Once your site is gaining some popularity it may be time to start seriously wondering: will one server always be enough? What if I suddenly get linked to from the BBC, CNN or Slashdot?
Luckily scaling isn’t necessarily that hard. For example If you have a site running static HTML Amazons EC2 is pretty easy to set up for scaling. If you’re into Wordpress services like WP Engine are designed to scale automatically for you. It’s not that old-fashioned single server hosting is dead, but if you think there’s a chance you might see a big spike in traffic some day, now is a great time to start looking into options.
Something rather depressing landed in my inbox today. I subscribe to the mailing list over at Startups.com. They send through offers every few days, some of them are pretty cool. Then there’s this:
It seems it’s now possible to pay to get Facebook likes. Presumably the company offers the people giving the likes money or some other sweetener to encourage them to hit them blue button of power.
This strikes me as very much the same thing as black hat seo, tricking the system to seem popular and sell more stuff. It’s a shame really, we’re planning on putting like buttons on CloudTrawl.com when it goes live, but with people buying likes doesn’t that cheapen the whole thing just a little?
I’d like to tell you a little about what we’re working on right now. In the past we’ve had quite a few requests from web consultants who’d like to be able to share exported reports from DeepTrawl with their own branding attached.
That’s been a priority for CloudTrawl from the beginning. If you’re a consultant working in web design we think CloudTrawl is exactly the kind of thing you’ll want to use and share with your clients, because:
- CloudTrawl proves the site you’ve created is consistently available & functioning well
- It allows both them and you to rest easy knowing if there’s a problem you’ll be alerted
- CloudTrawl is an awesome value-add service you can provide & shows you care about their site
So how are we going to make this work? Perhaps by allowing you to export a PDF containing a report and manually email it to your client? Nope, that’s so last century.
Surely the best way would be to allow them to log into CloudTrawl directly, see reports themselves and optionally allow them to change settings so they can get alerts for things link downtime and broken links.
That’s exactly what we’re doing. We’re implementing a feature called site sharing. Your CloudTrawl account could contain perhaps 10’s or 100’s of sites, all being constantly monitored. You can choose to share any one of these with anyone. If they don’t already have a CloudTrawl account we’ll automatically send them an email inviting them to create one for free, they’ll then be able to see and interact with the reports and settings you’ve shared with them.
As an added bonus when that user logs in they’ll see your branding.
That’s some serious added value for your clients. For a low monthly fee you’ll be able to add all the sites under your care, share their reports with your customers and prove you care about their site. Feel the love!