This post is part of Made @ HubSpot, an in-house line of thought leaders, where we draw lessons from experiments conducted by our own HubSpotters.
Have you ever tried bringing up your clean laundry by hand and things keep falling out of the huge item of clothing you are wearing? This is similar to trying to increase organic website traffic.
Your content calendar is full of new ideas, but with every website published, an older page drops in search engine rankings.
Getting SEO traffic is difficult, but sustaining SEO traffic is a whole other ball game. Content tends to “expire” over time due to new content created by competitors, constantly changing search engine algorithms, or a variety of other reasons.
You’re struggling to get the entire website up, but data loss keeps coming up that you’re not careful about.
Recently, both of us (Alex Birkett and Braden Becker a) developed a way to automatically, to scale and determine this traffic loss before it even occurs.
The problem with traffic growth
At HubSpot, we’re increasing our organic traffic by taking two trips from the laundry room instead of one.
The first trip is with new content targeting new keywords that we haven’t ranked yet.
The second trip consists of updated content that uses a portion of our editorial calendar to find out what content is losing the most traffic – and leads – and adding new content and SEO-focused maneuvers that better serve certain keywords. It’s a concept we (and many marketers) refer to as “historical optimization”.
There is one problem with this growth strategy, however.
As the traffic to our website grows, tracking every single page can be an unruly process. Choosing the right pages to update is even more difficult.
Last year we wondered if there was a way to find blog posts whose organic traffic is at risk of only declining, diversifying our selection of updates, and possibly making traffic more stable as our blog grows.
Restoring the traffic versus protecting the traffic
Before we talk about the absurdity of trying to restore the traffic that has not yet been lost, let’s look at the benefits.
When you view a page’s performance, it’s easy to see a decrease in traffic. For most growth-minded marketers, the downward traffic trend line is hard to ignore, and nothing is as satisfying as the rebound in that trend.
But there is a cost to restoring all traffic: since you can’t know where you are going to lose traffic until you lose it, the time between when the traffic drops and when it is restored is a victim of leads, demos, and free users. Subscribers, or a similar growth metric taken from your most interested visitors.
You can see this in the organic trend chart below for a single blog post. Even if the traffic was saved, you have missed the opportunity to support your sales efforts downstream.
If you had a way to find and protect (or even increase) the site’s traffic before it needs to be restored, you wouldn’t have to make the sacrifice shown in the image above. The question is: how do we do this?
How to forecast falling traffic
To our delight, we didn’t need a crystal ball to predict the wear and tear of traffic. What we needed, however, was SEO data suggesting that traffic for certain blog posts might go goodbye if something should go on. (We also had to write a script that could extract this data for the entire website – more on that in a minute.)
High keyword rankings generate organic traffic for a website. Additionally, the lion’s share of traffic goes to websites that are lucky enough to rank on the first page. This traffic reward is even higher for keywords that receive a particularly high number of search queries per month.
If a blog post slips off Google’s first page for that high volume keyword, it’s toast.
Taking into account the relationship between keywords, keyword search volume, ranking position and organic traffic, we knew that this would be the prelude to a loss of traffic.
And luckily, the SEO tools available to us can show us that the ranking is falling over time:
The image above shows a table of keywords that a single blog post is ranking for.
This blog post is in position 14 for one of these keywords (page 1 of Google consists of positions 1-10). The red boxes show this ranking position as well as the high volume of 40,000 monthly search requests for this keyword.
Even sadder than this article’s 14th position ranking is how it got there.
As you can see in the teal trend line above, this blog post was once a high-profile finding but kept dropping over the next several weeks. The post’s traffic confirmed what we saw – a noticeable drop in organic page views shortly after this post dropped from page 1 for that keyword.
You can see where this is going. We wanted to recognize these declines in rank when they are about to leave Page 1, and thus restore the traffic that we ran the risk of losing. And we wanted to do this automatically for dozens of blog posts at once.
The traffic tool “in danger”
The way the At Risk tool works is actually a bit simple. We thought about it in three parts:
- Where do we get our input data from?
- How do we clean it?
- What outcomes of this data will enable us to make better decisions when optimizing content?
First, where do we get the data from?
1. Keyword data from SEMRush
What we wanted was real estate-level keyword research data. So we want to display all of the keywords that hubspot.com stands for, especially blog.hubspot.com, and any associated data that matches those keywords.
Some fields that are valuable to us are our current search engine ranking, our previous search engine ranking, the monthly search volume of this keyword and possibly the value (estimated with Keyword Difficulty or CPC) of this keyword.
To get this data, we used the SEMrush API (specifically the Domain Organic Search Keywords report):
Then, using R, a popular programming language for statisticians and analysts as well as marketers (in particular, we use the ‘httr’ library to work with APIs), we pulled off the top 10,000 keywords driving traffic to blog.hubspot.com steer (also) than our Spanish, German, French and Portuguese properties). We currently do this once a quarter.
This is a lot of raw data that is useless on its own. So we need to clean up the data and move it into a format that is useful to us.
Then how do we clean up the data and build formulas to get answers to the content that needs to be updated?
2. Clearing the data and creating the formulas
We also do most of the data cleansing in our R script. Before our data ever hits another data storage source (be it Sheets or a database data table), most of our data is cleaned and formatted the way we want it to be.
We do this with a few short lines of code:
After fetching 10,000 rows of keyword data, in the code above from the API, we parse them to make them readable and then build them into a data table. We then subtract the current ranking from the previous ranking to get the difference in ranking (so if we used to be in position 4 and are now in position 9, the difference in ranking is -5).
We filtered further so that only those with a difference in the ranking of the negative value show up (i.e. only keywords that we have lost the ranking for, not the ones that we have won or have stayed the same).
We then send this cleaned and filtered data table to Google Sheets, where we apply tons of custom formulas and conditional formatting.
After all, we had to know: what are the results and how do we actually make decisions when optimizing content?
3. Content Tool Spending on At-Risk Content: How We Make Decisions
Based on the input columns (keyword, current position, historical position, position difference and monthly search volume) and the above formulas, we calculate a categorical variable for an output.
A url / line can be one of the following:
- “IN DANGER”
- Empty (no value)
Empty exitsor those lines with no value mean that we can essentially ignore these URLs for now. You have not lost a significant ranking or have already been on page 2 of Google.
“Volatile” means that the page is losing rank, but is not old enough to warrant action. New websites are constantly jumping around in rankings as they get older. At some point, they create enough “topic authority” to generally stay in place for a while. For content that supports a product launch or an otherwise important marketing campaign, we may give these posts a TLC as they are still mature. It is therefore worthwhile to label them.
“In danger” is mostly what we’re looking for – blog posts that were published more than six months ago, dropped in rankings, and are now ranked between 8th and 10th for a high volume keyword. We see this as the “red zone” for bad content, with less than three positions from page 1 to page 2 from Google.
The spreadsheet formula for these three tags is listed below – basically a compound IF statement to find Page-1 rankings, a negative ranking difference, and the distance of the publication date from the current day.
What we learned
In short, it works! The tool described above has been added to our workflow regularly, if not frequently. However, not all predictive updates save data traffic on time. In the example below, a blog post fell after refreshing page 1 and later moved back to a higher position.
And that’s okay.
We have no control over when and how often Google decides to redraw and reorder a page.
Of course, you can resend the URL to Google and ask them to crawl again (this additional step can be worthwhile for critical or time-sensitive content). However, the goal is to minimize the time this content underperforms and stop the bleeding – even if it means leaving quick recovery to chance.
While you never really know how many pageviews, leads, signups, or subscriptions you are losing on each page, the precautions you take now will save you time that you would otherwise spend figuring out why all your website traffic is one Jumped last week.