# How to Filter Out Fake Referrals and Other Google Analytics Spam

If you work with Google Analytics, chances are you’ve run into some of these websites in your Referrals report (Acquisition > All Traffic > Referrals) lately:
• semalt.semalt.com
• buttons-for-website.com
• forum.topic31342700.darodar.com
• make-money-online.7makemoneyonline.com
• anticrawler.org
What are these sites and why are they linking to your site? Well, actually, they’re not linking to you at all. These sites represent fake referrals. They are created in your Google Analytics account to trick you into visiting spammy websites. If you open one of these URLs in your browser, you will likely be redirected to an online store, marketing scam or malware site. Nice, right?
In this post, we look at the impact these spam sites have on your metrics as well as steps you can take to eliminate the spam from Google Analytics. If you are looking for ways to filter this spam out of your Megalytic reports, see: Removing Semalt and other Referer-Spam from Megalytic Reports.

What’s the impact of a little spam data on your Google Analytics metrics? If you run a large website with tens of thousands of visitors or more per day, then maybe not much. However, if your site is smaller, there’s a good chance spam may be seriously skewing your metrics.

Below is the Acquisition > All Traffic > Referrals report from a small non-profit. I’ve checked the spam referral sources, and clicked “Plot Rows” to see the daily level of traffic from these spammers.

In the table above, you can see that spam accounts for the top two slots in this Referrals Report! Not only is this annoying, but it messes up the metrics pretty badly.

To analyze the spam’s impact on the non-profit’s metrics, I exported this table into Excel and did some calculations. In the results below, the Sources highlighted in yellow are spam referrals; the two summary lines at the bottom show metric calculations with and without spam.

The first thing to note is that 144 out of 283 referral Sessions are spam – that’s 50.9%! The impact on small websites like this one is huge as these spam visits throw off the engagement metrics. As you can see from the spreadsheet, the Bounce Rate for most spam referrals sources is 100%, the Pages/Session is close to 1.0 and the Avg. Session Duration is close to 0.00. When more than 50% of the referral traffic is spam, it is seriously dragging down the engagement numbers and giving you a false impression of the quality of your traffic.

Compare the Bounce Rate, Pages/Session, and Avg. Session Duration for “Including Spam” vs “Excluding Spam” (numbers inside the red rectangle). The spam is making these metrics look much worse than they really are. Bounce Rate, for example, is reading 77.74%. But, when we exclude the spam, the Bounce Rate is a much better 55.4%.

Other than exporting data to Excel and re-calculating all the numbers, is there any way we can stop these spam referrals from polluting our Google Analytics data?

### Filtering Out Google Analytics Spam

The techniques for removing spam rely on using Google Analytics View Filters. I first read about these techniques in this excellent article from the Analytics Edge blog: Removing Referral Spam from Google Analytics.

As explained in that article, there are two basic groups of spammers using two different techniques, and you need to use slightly different filters to combat each technique.

#### Eliminating Ghost Referrals

The first group is what people are starting to call “Ghost Referrals.” These are referrals generated in your reports by fake visits. In this scenario, the spammers don’t even visit your website. Instead, they transmit spammy data directly to Google Analytics that gets added to your reports.

To start cleaning this up, we create a new view and then add some filters. As shown below, you can create a view in the Admin section of Google Analytics. Pick the Account and Property where you want to create a spam-free view. [Note: Views do not contain historical data older than the date on which they are created. If you create a view on Jan 2nd, there will be no data in that view prior to Jan 2nd. So, this new spam-free view will not help clean up the historical data – only the new data coming in.]

Next, we are going to create a list of the valid hostnames that should be showing up in your Google Analytics reports. The key to removing ghost referrals is that they come from hostnames that are not yours – and you can use that weakness to filter them out.

Below is a list of the valid hostnames of visits to our Megalytic website:

• megalytic.com
• blog.megalytic.com
• support.megalytic.com
• megalytic.com

Note the last one – translate.googleusercontent.com. This is the hostname that shows when a user views your website through Google Translate – you do not want to filter that out.

If you are not sure of your list of valid hostnames, you can look at the Audience > Technology > Network report and select Hostname as the primary dimension. Set a long time range in the calendar – like a year or more if you have that much data. This will ensure that you capture all the valid hostnames.

Here is what that report looks like for Megalytic. The valid hostnames have little red arrows next to them. The rest (e.g., apple.com, iedit.ilovevitaly.com) are from spammers!

Once you have your list of valid hostnames, put them in a single line of text, separated by the “|” – OR character. Also put a backslash in front of all the “.” – PERIOD characters. This creates a regular expression that will match on your good hostnames and exclude all the spammer hostnames.

For example, here is what we use:

megalytic\.com|blog\.megalytic\.com|old\.megalytic\.com|forums\.megalytic\.com|client\.meglytic.com|

Before you put this filter expression to use we recommend that you build a segment to test it out on your historical data to see how it looks. Filters permanently alter the data in a view, so it’s a good idea to test filter expressions using non-permanent segments on your historical data before using them in filters.

Another benefit of testing your filter expression in a segment is that you can use this segment to look at your historical data without the ghost referrals.

Here is the testing segment created for Megalytic, which we named “My Hosts.”

And here are the results, filtered by using the “My Hosts” segment:

As you can see, some of the sessions have been filtered out – the “My Hosts” segment has 19,934 Sessions vs 20,235 in “All Sessions.”

Next, apply the “My Hosts” segment to your Audience > Technology > Network report and select Hostname as the primary dimension. Check to see that only valid hostnames are showing up. Below are the results for Megalytic.

Once you are confident your filter expression is working correctly, add it to your new view. We called our new view “Spam Free.” You can see below how we selected the “Filters” section to create a filter on this view, and then pasted in our filter expression as a Custom Filter Type. Make sure to select “Include” and to filter on the “Hostname” field.

Save this filter and you should be all set. This new view will now exclude all ghost-referral spam. Unfortunately, filtered views only include data from the date they were created. So, you cannot use this view to look back at historical data. However, you can use the segment “My Hosts” created during the testing process to view spam-free historical results.

#### Eliminating the non-Ghost Referral Spam

Unlike the ghost referrals, some of the spammer bots, like Semalt, actually visit your website. These will not be removed using the hostname filter described above. To remove these, you will need to create another filter that will exclude a list of known referral spam domains.

So, to clarify, the first filter INCLUDES only your valid hostnames. That kills the ghost-referral spammers. This second filter will EXCLUDE known spammer domains.

To find the non-ghost spammers visiting your website, open Acquisition > All Traffic > Referrals and add Hostname as a secondary dimension. Spam sources where the Hostname is valid (in our case, megalytic.com) are the non-ghost spammer domains we need to exclude.

From this list, you can see that semalt.com and buttons-for-website.com should go on our list. As before, create a filter, but this time use Referral as the Filter Field, and the filter is:

semalt\.com|buttons-for-website\.com

As shown below, we name this filter “Exclude non-Ghost Referral Spam.”

You should check your Acquisition > All Traffic > Referrals report periodically to identify any new spam referral domains that start showing up. Add these new ones to your filter as necessary to keep your data as spam-free as possible.

Another approach to filtering out the non-ghost spammers is to stop them from visiting your website at all. If you are hosting your website on the Apache web server, this kind of blocking can be accomplished by modifying the .htaccess file, as described here: How to block referrer spam traffic.

If you are running WordPress, there is now a plugin that will do this for you: SpamReferrerBlock. One advantage of using this plugin is that they claim to keep a “blacklist” of domains that are spammers and filter those visits for you, so you do not have to keep your filters up to date.

### Conclusions

Referral spam is becoming a serious problem and I expect that Google will soon introduce new features to help us protect the integrity of our Google Analytics data. Until then, you can use the filtering techniques described in this post to create a view that is relatively free of referral spam.

### Appendix

Its been almost 2 years since I wrote this, and the Google Analytics spam problem is still with us! If you are looking for more details on this subject, I suggest that you check out Carlos Escalera’s post: Ultimate Guide to Getting Rid of All the Spam in Google Analytics.

Update on March 23, 2017 …
I’ve seen a few articles indicating that Google is taking action to solve this problem. If you have noticed an improvement, let me know in the comments.

## 222 Comments on “How to Filter Out Fake Referrals and Other Google Analytics Spam”

1. markdhansen

Hi Dave – thanks for the comment. I think there is some confusion, this blog post is about “fake referrals” – not self-referrals.

Regarding exclude filters – yes, I agree that they are dangerous when not used carefully, but really there is no harm in excluding hostnames that are not yours. So, if the hostname “iloveitaly.com” is showing up on my GA report, that’s got to be spam – there is no other way a domain other than some variation on megalytic.com or a google translate domain could be legitimately showing up in the reports is there?

1. markdhansen

Its not actually apple.com. I don’t have my GA tracking for megalytic.com running on the domain apple.com. Somebody is just spamming google analytics to make it look like my GA tracking code is running on apple.com.

1. markdhansen

They have to include something in the hostname. But, the actually spammy part is what they put in the “source” – so when you look at your GA reports, it looks like you have traffic being referred by something like “buttons-for-website.com” and then you go visit that site.

1. Bill

Mark, first off thanks for the great article. My question is regarding
the list of Non-Ghost Referral Spam. My referral traffic list is 2800
sources long. A lot of them seem to be from Poland (.pl extension). What
can I use to help me determine if it is a legitimate source? Thanks

1. markdhansen

Hi Bill,

Great question! Other than going to the website and looking at it to see if there is really a link to your site there, I don’t know how you can. It would be create if somebody put together a list of known spamming referral URLs.

1. Bill

If I want to eliminate all Poland referral traffic, is there an expression I can use in the filter for .pl? like *.pl ?
Thanks

1. markdhansen

Yeah, you can exclude on the regex for ending in .pl, which I think would be *.pl$– the$ on the end is probably needed.

1. Bill

Thanks for the help so far. One more question with regard to Non-Ghost referral spam. Are sources such as google.com, msn.com, linkedin.com, facebook.com, etc all considered legitimate? I ask because there are other recognizable sources such as imdb.com on this list that I know do not include legit links to my page. Should I attempt to remove them using your method or do I want to be more careful with sources such as google.com so that I do not remove legitimate data?

2. markdhansen

Sure, its possible that you have legitimate referrals from all those sources – although google.com should probably be showing up as somesubdomain.google.com. Referrals from ghost spammers that are not actually visiting your website and using legitimate domains like msn.com, apple.com, etc. – these are blocked using the hostname filter. Not be excluding the referral domain.

2. Anthony

Hey Mark. Thanks for the great walk through advice in this post. I’ve just implemented these filters on a few of the websites I manage. I’m just wondering why you wouldn’t apply the filters to a normal view that has historical data? After testing with a segment of course. Saves having to turn the advanced segment on. I always keep an untouched view and then a second view in which I apply filters like these to.

1. markdhansen

Yes – I think you *should* apply as a filter, after testing as a segment. Sorry if that was not clear in the article!

3. bridget0439

Hi Mark, can I just confirm that as you are using filters, this will correct historic data as well? (I have previously added specific sites to exclusion lists, but that doesn’t get rid of historic data cluttering up my accounts, so am keen to try your method!)

1. markdhansen

Hi Bridget. Filters will not fix historic data. However, you can use a segment with the same conditions as the filter and apply it to your reports to clean up the historic data. Filters permanently remove data from views. Segments only work on the reports they are applied to. Make sense?

4. Capital SEO

Thanks Mark for putting this guide together.

I think a lot of us were fooled by a spike in site traffic, later to reveal it’s these dirt-bags using our trackers to send fake referral data.

At least the analytics industry has caught on to these Spammers.

I hear Google is taking steps in the back-end to help us out.

We’ll see!

1. markdhansen

My pleasure – glad it was helpful. It really is a dirt-bag tactic isn’t it? Hard to imagine that these spammers actually make money doing this, but they must or it wouldn’t have grown into such a problem!

5. CloudNo9

So basically the information Google Analytics provides about my visitors is worth crap. Good to know. I might as well implement more trustworthy counter software into my site.

1. markdhansen

Well, I wouldn’t say that! There is a spam problem right now; but your Google Analytics data is still extremely useful. Just like your email is useful even if you get some spam.

6. @throwsknives

Thank you for the advice. The problem I’m having is that in 78 percent of my site sessions (which I think are all spam), the hostname is not set. For another approach, I have tried to set up a filter that would exclude any site visit less than, say, 30 seconds. That seems like to would nail all the spammers. If some non-spammers are caught in the filter, I can honestly say that anyone who visits my site for less than 30 seconds isn’t of much benefit anyway. But I couldn’t figure out how to set up such a filter. Ideas? Thank you!

1. markdhansen

Martin – I do not recommend that, as you will filter our all single page visits. Whenever a user comes to your website and visits only 1 page, the time on site will be 0 – even if they spend 30 minutes reading your single page.

7. Victoria

Hi Mark,

Thanks for the post – it is awesome!

I’m trying to include the valid hostnames in the filter, but I only have the option to exclude under the filter field. Am I missing something here? I have full admin so should not be causing an issue! :/

8. Phil Hunter

This looks like a great solution to eliminating spammy referrals. I can’t get past the ‘Create new view’ stage. All the historical data is completely missing!

1. markdhansen

Hi Phil,

When you create a new view, it will only include data going forward from the time it was created. You cannot apply filters to historical data.

For historical data, apply segments, as described in the article.

— Mark

9. Ashish Monga

Hi,

Thanks for the great article. I have just tried this on two of my sites. The segment worked perfectly on 1 of my websites, where my hostname did not have the “www”. However, for the second site, I am having some issues.

1. In my second site, I have 2 versions of my valid hostname showing up – http://www.mysite.com & mysite.com

2. Another domain that is showing up here is the domain of my blog website. There is a bit of backlinking between the 2 sites and they are hosted on the same shared server, but they are different domains (not subdomains), so not sure why it shows up as a hostname, with a fairly significant proportion of traffic.

The regex I am using is

However, when I put this, I get 100% in the summary that shows on the right. The correct figure is around 96%. When I set-up the segment using the “include” option instead of Regex, I am able to get the 96% without any problem. However, the problem is when creating the “filter” I can only use Regex with include. I am fairly sure there is something not right with my RegEx so would appreciate if you could tell me what might be the issue here.

1. markdhansen

Looks like you have an extra “|” on the end. That might be causing the problem, as OR blank might be matching on everything.

10. Luke Miller

Really appreciate this article! Using the regular expressions to more quickly block a list of 26 spam referral sites from 60+ client analytic accounts was really helpful. It would be nice if it was not limited to 255 characters, as I have to break them into two separate filters, but still sped the process up a lot! There seem to be more and more of these spammers every day, and the list grows constantly.

I wrote a quick article on my website about the .htaccess trick (http://www.lukeamiller.net/blocking-semalt-buttons-for-website/), but I think this will be a more approachable method for many users.

1. markdhansen

Hi Luke,

I’m glad the article was helpful. Yes, the 255 character limitation is frustrating – especially with so many new spammer showing up! I like your piece and left a comment.

1. Luke Miller

Definitely! Thank you for the comment as well! Yea that character limit is a real pain, I identified over 25 spam referrals on client site analytic reports that I have to start filtering and a larger limit would be helpful!

11. Daniel Kratohvil

Finally a serious article solving the issue of fake-referrals. I have tried a lot of filtering solutions posted on other blogs, but they never worked permanently and every week I had to update the filter patterns to include the new spams. Thank you so much for publishing this great resource.

Great article, thank you so much! But my view is not working (0 sessions since I implemented it) although it works when I use segments. Do you have an idea of what I could have done wrong? thx

1. markdhansen

It might take a few hours for data to start showing up in the new view. And, the view will not contain any historical data.

13. Michael Berry

I’ve noticed that I’m now getting direct spam that’s using the twitter URL shortening service. You can’t filter out the referrer because “t.co” is also a source of real traffic. I’ve had to use a new filter based on the Request URI. I’m still in the process of testing this a test view, but the filter verification results seemed okay.

1. Michael Berry

Good question. in the Behaviour -> Site Content -> All Pages view a few pages appear that don’t exist. E.g /www.spamweb101.com/post12976 most of these have the full referrer as “www.spamweb101.com” but a few had t.co/dFGYeRR6e as the full referrer (where this link would normally forward to the top level of my site). Looking back at this now there was only a 3 entries in the logs with t.co versus 20 for the other referrer. So not that big an issue, it just annoyed me.

Oh, and thanks for the article easily the best one I’ve found.

1. markdhansen

Interesting. But, what I can’t understand is how the spammer knows your t.co URL – unless they specifically did some research on you and the site. That would be a lot of work just to dump spam in your GA account!

14. BGz

Hi Mark,

First of all thanks for this tutorial. I have a question, what about (not set) hostname? I am not sure what these visits are? Your solution excludes those visits from analytics but these are real visits with a number of page views and time spent on site. Please advise.

I would like to add that non ghost referal trafic comes from real bots visiting your site, therefore using one of your legitimate hostnames as you said. The real problem here is that can putt significant load on your servers apart from poluting analytics. The proper way to eliminate the problem is to exclude them permanently by making your site inacessible to them. You can, and should, block them with your htaccess file or within your nginx configuration files and they will stop appearing in your analytics data.

1. markdhansen

hostname should not be (not set) unless there is some bug in your tracking code. GA automatically populates it from your domain, so in order to get (not set), you must be setting it in your tracking code ans passing in nulls.

Yes – not all the spam is ghost referral traffic, that is true. That just has to be filtered out the hard way! Or, blocked in .htaccess.

15. Ling

Hi Mark, Thanks for this very useful post — so I have referral spam from some known sources (making up about 18% of sessions). When I tried to verify my filter, I get the following “This filter would not have changed your data. Either the filter configuration is incorrect, or the set of sampled data is too small.”

Might you know what’s going on here?

Thanks

1. markdhansen

The filter might be configured wrong. Post the regex and I can have a look. Or, you can set up a new view and try the filter on the new view and see what happens. Trying it on a new view prevents you from messing up your existing view.

1. Ling Fu

HI Mark,

Sorry for the delayed response. Here’s what the regex looks like –

.*(semalt|iminent|100dollars-seo|buttons-for-website|best-seo-offer|best-seo-solution|buttons-for-your-website).com|sitevaluation.org.*

PS: I created a segment for these and they show up on there, but in admin/filters it gives me that “This filter would not have changed your data. Either the filter configuration is incorrect, or the set of sampled data is too small.” message

1. markdhansen

Hi Ling,

I tested your regex in a segment and it seems to work fine. I have also created a test filter to try it out. Those warning messages from the Filter Editor – I have found that they are not always so accurate. I think that your regex should work OK. Anyway, I have created a test filter to find out and you can check back with me in a few days if you want to see how its is working for me 🙂

— Mark

1. Marketing

Hi Mark,

I am using a similar format, but it doesn’t seem to work: .*(semalt|buttons-for-website|success-seo|buttons-on-your-website|Get-Free-Traffic-Now).com|forum.20.smailik|guardlink.org.*

16. Lauren McLaughlin

Hi Mark,
Quick question–I noticed an abrupt spike in ORGANIC search traffic, yet am seeing no discrepancies in the “hostname” reports, do you have any idea where this could be coming from?
I wouldn’t be surprised to see this traffic increase happening within my Referral reports, but the majority of the spike has come from organic, with an increase in direct as well. I’ve been thinking the visits were all coming from bots, but my bounce rate is showing these visitors to be staying on my page for quite some time. I’m at a loss. Any advice?

1. markdhansen

Ghost spammers (who never visit your website) can also spoof organic traffic (although this is less common). Although that would usually show up as a foreign Hostname.

Maybe it is real traffic? That would be a good thing 🙂

17. mphdavidson

Thanks for this Mark. Referral spam has been a thorn in my side for some time now, and this was a huge help. Plenty of detail, well-explained, and screenshots to boot.

18. Raju

I love this! It was really painful to keep on adding all those domains in Filter exclude! Finally this will resolve the issue 🙂

19. Lindsay

One of my websites is getting a lot of referral spam. When looking up the hostnames under network, I found a lot of traffic from (non set). Could this be spam? I tested the hostname filter you recommended and all of the (non set) was removed. Did I solve the problem? Or do you think it’s my tracking code?

1. markdhansen

Probably spam. If there was a problem with your tracking code not setting the hostname, that would be very unusual; and you would probably see it everywhere.

20. Elli Puukangas

Thank you soooo much. This has been bugging me for a while. Maybe from now on I can more confidently look at my traffic and not worry about all the spam. I’ve gone and used the htaccess method as well to stop at least some of these shitty people from entering my site.

21. Sandeep Kumar

Hi..

My case is a bit different.
For one of my client, I did try everything but not everything is vain,
I tried to exclude through hostname and referral url bur every time when i verify it say
“This filter would not have changed your data. Either the filter configuration is incorrect, or the set of sampled data is too small.” Although there is a large amount of such traffic.

Then I tried to septate this traffic through advance segments, I tried to create segment based on hostname and referrer path filter, but no data appear in these segments, then I tried source filter and it works, But problem there is no option in profile filter to exclude data by Source.

In referral traffic report when i tried to see hostname in secondary dimension it is “not set” and referral path for most is “/”

Seriously I am frustrated I have exclude such data for many websites but this is exceptional case, would you help me.

1. Amri CelakaRoa

I have the exact same problem that you have… if you find the solution could you please share with me.?
I all the methods suggested in forums and site.. but nothing worked for this problem… 🙁

1. Sandeep Kumar

Hi..
Although I could not exclude this data by a profile filters, but through an advance segment i did it, I just create a new segments and name it fake traffic, go condition in advance segments and select host name in ad content and (not set) in query. match type remains “content”

Surprisingly it not only remove referral data but also such spam direct hit as well
All the direct traffic excluded by segments have 100% bounce rate and landing page is home page (It proves it is spam data)
Similarly you can create an another segments as genuine traffic, everything remains same just change match type “Does not contain”

2. markdhansen

A couple things might help. (1) Google might tell you that the filter “would not have changed your data” – even though the filter is fine. The filter verification test is not very accurate. Try it and see if it works anyway. (2) Filters do not affect historic data – only the data that comes in after the filter was turned on.

Please post your filters here if you think there might be something wrong.

22. HelloArtsy.com

Perfect! Thanks so much for such a detailed explanation. I can’t believe google hasn’t done something about this yet. These spammers have really messed up my analytics. It seems so obvious who the major spammers are. Thanks again for your easy to follow post!

1. markdhansen

Glad it helped. I think Google will solve this eventually, but it is a difficult problem for them to solve in the general case. So, it may take some time.

23. Jen F

Sorry if I am asking a question which has already been asked but…if it is not feasible to filter data by adding valid hostnames and it is feasible to filter using ghost referrers names – how does one do this, and is it possible to filter out multiple ghost referrers in one go? Ghost referrers are accounting for nearly two thirds of the traffic to the site I am looking at, yet when I previewed a filter it said it wouldn’t have changed the data (?). Would you be able to include a screenshot to demonstrate for example how you would filter out data from trafficmonetize.org and another, say 4webmasters.org, please? Any help would be great!

1. markdhansen

The ghosts have the wrong hostname. So, you just filter them out by eliminating traffic with hostnames that are different from yours. The details are explained in the article.

If trafficmonetize.org and 4webmasters.org are ghosts, then just the hostname filter will work. Otherwixe, you need to filter on the Source to exclude: (trafficmonetize.org|4webmasters.org).

24. Ross Turner

So if I want to block not only social-buttons.com, but also www1.social-buttons.com, www2.social-buttons.com, www3.social-buttons.com, etc. (I’m seeing a lot of domains use “www1”, “www2”, etc.), how do I do that in a segment? Thanks!

1. Ross Turner

Thanks. I created a segment to filter spam (screenshot: https://goo.gl/dvueAS). Although it looks like I might have set it up wrong, based on how you said I should do it. I currently have this:

It looks like I should do this instead?

.*(smailik.com)|.*(trafficmonetize.org)|, etc.

Is that correct?

1. Marketing

Can you please confirm how I end this segment, and if its correctly written? Ex. Do I put a period, star or ??

.*(social-buttons.com)| .*(semalt.com)| .*(buttons-for-website.com)|
.*(event-tracking.com)| .*(best-seo-offer.com)| .*(get-free-traffoc-now.com)|
.*(free-share-buttons.com)| .*(100dollars-seo.com)| .*(best-seo-solution.com)|
.*(semaltmedia.com)

1. L.E.Henry

In my case, I found ‘trafficmonetizer.org’ as a spam Source. Note the ‘r’ at the end of ‘monetizer.’ Here’s my current RegEx on Source:
(best|100dollars|success)-seo|(videos|buttons)-for|anticrawler|musica-gratis|semalt|forum69|7makemoney|sharebutton|ranksonic|sitevaluation|dailyrank|4webmasters|(traffic|web)monetize|social-buttons

25. Myke black

Great article. Thanks for posting it. We have a lot of sites to monitor, and adding filters for all of them takes a lot of effort. Using this method means that if the spammers add new hostnames to their repetoire you have to go to each filter view and add the new hostname – new ones seem to pop up about once a month, (but 4webmasters.org is still the biggest culprit for us)

What I found works better is to filter out countries as described in this article: https://moz.com/blog/how-to-stop-spam-bots-from-ruining-your-analytics-referral-data – this method also filters out spam bot hits to your website as well as ghost referrals.

But really the best solution would be if Google just did something about it themselves and blocked analytics referrals from these spammers and retrospectively deleted the data from all website statistics. This would not only clean up everyone’s data, but would also send a message to the spammers that this tactic is destined to fail.

26. Krissy

Do I need to exclude or include hostnames that include google? E.g. I have a few hostnames google.fr and google.es. I’m assuming these are valid hostnames but you didn’t mention anything about including them so I’m hesitant. Thanks!

1. markdhansen

google.fr and google.es shouldn’t show up as Hostnames. I’d guess that is spam. The only way you should see google in a hostname is for the google translate site.

Hi – I followed all of the instructions you provided but for some reason my “Spam Free” view is not showing any hits. I’ve tried visiting my site and flipping back and forth between the Spam Free and All Web Site Data views, and my hits show up in the All but not in Spam Free. I’ve checked and double-checked my filter input and can’t figure out what could be causing this problem. Any help would be greatly appreciated!

Sure, here it is.

1. markdhansen

Looks like it should work. Try using just “distressedchildren” as your filter. If that works, then try distressedchildren.org, etc. – like that to weed out where the bug is.

Do this with a segment first – so you can see results right away. Once it is working, then try a filter.

Remember, filters only impace data from today onward. There will be no historical data affected by the filter.

28. Angela Charles

Filtering spam works fairly well, but you have to keep up with new spam referrals every month. Have you come across any limitations to the number of Exclude filters you can write in Google Analytics? I’ve got 6 on one of our sites and it seems Google won’t let me write any more. Does that sound right?

1. markdhansen

No, that does not sound right. I have views with many more than 6 filters and they work correctly. I’m not sure what the absolute limit is.

1. Angela Charles

Thanks for response. Not sure what happened but we went back into the account with the problem and were able to add more filters. Google must have been having a bad day. Thanks.

29. Judith Andrea Manriquez

Thank you. I was able to follow this pretty easily with no background knowledge. Much appreciated.

30. Matthew J. Bigbee

Thanks for this great article!

What do you do in regards to eliminating Ghost Referrals when they are hidden as Hostname (Not Set) ? screen shot below of both Network Hostname & Referral Hostname

1. Calvin Sauer

If I’m not mistaken, by only including your own hostnames, these should be filtered out too. You don’t need to do anything special to filter those out.

1. Matthew J. Bigbee

Thanks Calvin, that makes sense, I was just concerned that something in the (Not Set) might be of real value. I am getting a lot of daily traffic from the (Not Set). Would be interested in seeing what is behind the (Not Set)

1. Paul Keep

Hey Matthew, have you found out anything about this? I have the same question. I have a significant amount of traffic coming from (Not Set) hostname. I want to make sure I’m not filtering out valid traffic. Thanks!

2. markdhansen

Use a secondary dimension to look at the source/medium of your (not set) hostname traffic. Do the sources look real or like spammers?

2. markdhansen

Yes – Calvin is correct about that. The techniques shown in this article will also remove the (not set) hostname sessions.

1. Alfonso Fernández

Hi, Mark,
I am concerned about the following fact, (but I am not sure if it is correct): maybe some of the traffic assigned to (not set) hostname and source/medium = direct/none could come from an https link, because of the referral is missing when an https link point to an http.
If I am right, should not we take that into account when building the filter?
My apologies if I am missing something
Alfonso

1. markdhansen

Even if there is no referrer, there should be a hostname. Hostname is set by the Google Analytics code running on your site. Unless the hits are coming from spammers through the Google Analytics Measurement Protocol.

31. Rdh H

Hi Mark,
Great article very helpful! After going through this set up I am still getting hits from (not set). Any ideas?

32. nmckean

In your regex, would megalytic.com not also do the job of picking up all subdomains? So “megalytic.com|blog.megalytic.com|old.megalytic.com|forums.megalytic.com|client.meglytic.com|translate.googleusercontent.com|support.megalytic.com” just needs to be “megalytic.com|googleusercontent” (removing translate. will include cache. also).

33. Hannah Bock

Hi, I tried to exclude the domains, semalt.com|buttons-for-website.com and did the check filter option and it told me it would have no effect on my data, even though I had about 300 visits from buttons-for-websites.com this month. I’ve attached two screen shots, one of my data and one of the filter. Any ideas on what I’m doing wrong? Thanks! Hannah

1. L.E.Henry

Hannah, I’ve been working on this spam removal problem for several days…patching together information from this Megalytics post by Mark and the Analytics Edge post he references. My best guess is that verify filter says data wouldn’t change for 2 reasons: 1. use “Source” as the dimension not “Referral” for the RegEx match 2. If you recently made a new View, which is the best practice, then you don’t have much data in it yet; is it possible none of your site’s identified spammers have struck since you created the new View?

2. markdhansen

Hannah, I don’t think the the “Filter Verification” is very accurate, honestly. I usually ignore that message, and have found that filters work anyway. And, like L.E. Henry says, if you are setting up a filter on a new view, then there isn’t much data for Google to work with in doing an impact estimate.

34. Paul Keep

Fantastic article thank you so much! Also, just came across some “event spam”. Any idea how to get rid of this, or to check to see if it is impacting user engagement levels?

1. markdhansen

The same techniques described here should work for removing event spam. For the ghost-spam use a Hostname filter. For the real visitor spammers, you need to build an exclusion list of referrers or sources. You can exclude based on event category also.

35. Georgi Georgiev

Hi Mark. That’s all good when you have only a couple or say, 10 websites to deal with. You spend an hour every month and clean them up. However, when you have dozens or 100s of sites to cover? It seems Google is not very keen on helping us solve this for now, so I developed a tool you might want to check out if you manage many Google Analytics accounts/properties: https://www.analytics-toolkit.com/auto-spam-filters/

It’s a fully-automatic, set-and-forget solution to the issue of referrer spam of all kinds.

1. markdhansen

That’s cool. I hope to have time to try out this tool soon!

I think Google will eventually solve this problem, but (1) it isn’t easy to solve “in general”; (2) Google is a huge organization with multiple priorities, and it takes time for things to rise to the top of the priority list and get acted on.

36. Bobby Webster

Hi, wow incredible post! I followed the directions exactly. It appears it is filtering out the ghost referrals, but it appears to be affecting my e-commerce tracking in a weird way?

I setup one filter to Include Hostname using the following filter pattern:

I then setup another filter for non ghost bots to Exclude – Filter Field – Referral like this:

semalt.semalt.com|buttons-for-website.com|best-seo-offer.com|buttons-for-your-website.com|search.mywebsearch.com|100dollars-seo.com|make-money-online.7makemoneyonline.com

As you can see, the top is NON- filtered for the month. The bottom is filtered. For some reason the filtered organic traffic is slightly less but direct is more? Another weird thing is there is 1 less transaction for organic, for Direct traffic there is 1 less transaction and for CPC there is 1 less transaction.

Any insight you could provide would be great. I was really happy to find a solution to this issue, but obviously it’s an issue if it’s reporting different transactions and revenue numbers.

Bobby

1. markdhansen

Well, I guess that something in your segment is removing sessions with transactions, which are definitely not SPAM! So, I would try by flipping the segment (turn exclude to include) and see what sources in your segment have those transactions, and then do not exclude those.

1. Bobby Webster

Hi Mark, thanks for the reply. For some reason I didn’t get the update?
Anyway, sorry but I don’t understand how that would help? I would understand your point if it was a difference in referral sales, but this is a difference in Organic, direct and PPC transactions? My filter is just excluding referral spam sites like you explained.

1. Lin Song

Just add that hostname in question to your “Exclude” filter (besides semalt, etc.) as mentioned in the article. If it doesn’t work, try using “Campaign Source” instead of “Referral” as the Filter Field.

37. Georgi Georgiev

Hi Mark,

That’s one solid article! I wonder how I didn’t came across it earlier, as when I was reasearching the issue back in March most articles I came across where full of misinformation (like using the “referral exclusion” feature, relying solely on htaccess, etc.). Not yours, though!

I’d like to plug a solution that I’ve developed that applies all filters you’d need and also keeps them up to date, across 100s of web properties with one click of the mouse. Since I manage big accounts with many properties in them, developing such a tool was the only viable solution for me, otherwise I’d have to spend several hours a week dealing with this nuicance.

I hope you and others in a similar situation will find it useful: https://www.analytics-toolkit.com/auto-spam-filters/

38. Jonathan Soifer

The “Eliminating Ghost Referrals” part worked out just fine.

For some reason the “Non-Ghost” part isn’t working for me.

I’m using this Regex:

floating-share-buttons.com|site8.free-floating-buttons.com|site7.free-floating-buttons.com|www.Get-Free-Traffic-Now.com|forum.topic65098423.darodar.com|chinese-amezon.com|get-free-social-traffic.com|hongfanji.com|www.event-tracking.com

Creating a New Filter > Custom > Exclude > Referral

Do you have any idea why this might not be working?

1. markdhansen

How do you know that it is not working? Remember, filters do not fix historical data – they only filter new data coming in after the filter is added.

To get a clean view of your historical data, create a segment using the same Regex; and apply the segment to the view that has the spam.

1. Jonathan Soifer

Sorry it took me so long to get back to you markdhansen.
It is working perfectly fine. Thank you.

39. Emir Nisic

Mark, this is great!

However, I do have one question. After setting up the filter which includes valid hostnames, I am not seeing any conversions? Could this be due to the filter and if so, how would I go about solving that?

And would having two views make sense? As in, one to track traffic with the filter mentioned above and then the other one where I would track conversions?

If you have any suggestions I would really appreciate it.

Thanks!

1. markdhansen

The filter should not prevent conversions from showing up. Something must be wrong. You shouldn’t need 2 views.

1. markdhansen

You are welcome, glad this was helpful. I’m not sure what’s taking Google so long to provide some counter-measures to combat the spam problem. It is very frustrating.

1. markdhansen

Yep, screenresolution being (not set) is another way to identify spammers. They are getting more clever, though, and starting to add values for that dimension (among others).

40. John

Is there a difference when building the spam exclusion filter if you use Filter Field: Campaign Source instead of Referral? Campaign Source seems to be working. I also filter traffic that doesnt have a screen res. set and traffic from a few of the more annoying countries. Is this a good idea?

1. markdhansen

Sure, those kinds of filters can work. The reason that I suggest using hostname is that it is simple. You don’t have to filter out all the sources of spam, just filter on your hostname.

1. markdhansen

Of course there is a difference. Referral will exclude traffic from the specific sites (the referral URLs that you use in the filter – e.g., ‘freestuff.com’) that GA tracks using the Referral dimension; and Campaign Source will exclude traffic from the specific sources (e.g., ‘Facebook’). Sometimes, the same data appears in Referral and Campaign Source – e.g., ‘freestuff.com’ – but not always.

So, putting the spammer sites in Campaign Source might work, but I think it makes more sense to put them in the Referral.

1. Scott Mulholland

Mark – I recently added a hostname filter to a new view and when comparing the results for the new view and the RAW data for the last month I noticed that for the hostname filtered view there are 0 referral – the Acq. > All Traffic > Referrals just has 0’s across the board. In the raw data I do see spam referrals but I also do see valid referrals from sites our site is listed on. Any idea why the hostname (which ours is simple, it’s just the url of the main site) are removing all referrals?

2. markdhansen

Hi Scott – valid referrals should not be blocked by a hostname filter. Check that you are using the correct hostname. It should match whatever is the hostname dimension on the valid referrals you are seeing in the RAW data. On some sites, there can be multiple valid hostnames – for example foobar.com and foobar.org might both be valid domains for the same website. In such cases, you will need to match a regex for the hostname, such as foobar.com|foobar.org

41. chsweb

It is worth noting more clearly that when you set up the new View with the new “My Hosts” filter that all of your data gets zeroed out. I think this is because the new View & Filter cannot parse historical data, so if you don’t get lots of site visits, when you go to the new View, Google Analytics will show zero results.

I have seen this happen on two sites, and I thought I did something wrong, so I deleted the View. Has anyone else seen their Google Analytics stats zeroes out after applying thew new View & Filter?

1. markdhansen

Views do not contain historical data older than the date on which they are created. This has nothing to do with the filter. So if you create a view on Jan 2nd, there will be no data in that view prior to Jan 2nd.

1. chsweb

Yes, true. It would help folks following the steps above if this bit of information was added or clarified.

At first, I thought something was wrong, because I forgot that new Views in Google Analytics do not contain old data. The steps above show data there, which most people will not see (unless they get lots of hourly visits), which will cause them (like me) to think that something was wrong.

The article needs to account for “Confused” people, or first-time Google Analytics users.

2. scottmcandrew

Thanks for writing this and fielding all these questions Mark. Regarding the new View and Data, I wanted one clarification. In your example above, now that the View exists, if one were to add additional domains to the ‘Exclude non-Ghost Referral Spam’ filter on a later date, you would still be able to view data back to when the view was created (you wouldn’t ‘start over’ by appending the filter). Is that correct? (Thanks again!)

1. markdhansen

The filter will not “start over” – so you don’t lose any data. But, the filter changes will only be applied to new data coming int. So the additional domain exclusions that you add don’t work retroactively on the data that is already in the view. They will only be applied to new data coming in from the time you add them.

2. markdhansen

Right, you will not start over. But only the new data will have the additional domains excluded. View filters are not retro-active.

3. chsweb

I get it, but I suspect many people will follow the directions, word for word, multiple times, and end up frustrated that they can’t see data – not knowing how Views work. They will think they are doing something wrong.

I’m suggesting that a clarification be added to the article to save the next person some time and potential frustration. Google Analytics is not at all intuitive, readers will need a little help here.

1. markdhansen

OK, I agree, and added a note about this in the paragraph beginning “To start cleaning this up …”. Thanks for the suggestion!

42. Bill Angelos

In the past I’ve looked into only using hostname data that was the same as my sites and I’ve found this does not work. The problem I have is that a site where I have all the spam bots filtered and can see the real domains that I know are real referrals and the rest is organic traffic, none of the hostnames showing are for my site. What happened was I had 2 weeks of 0 traffic show up in my analytics under this hostname filter. Again the referrals I am getting are from sites I know and have seen the link that is on their page to my site.

I have seen where on one of my domains that this was showing up in the past but for some reason it no longer displays my domain in the hostname area for new traffic.

Suggestions?

1. markdhansen

The hostname dimension tracks the host of the website being visited, not of the referrer – so it should always be the same as your website domain. If you are seeing 0 traffic using a hostname filter, you must be using a hostname that is different from what analytics is recording for your website. What does GA list as the hostname for your site?

2. markdhansen

How did you implement the host filter? Sometimes people accidentally filter on the referrer – and yet, that will eliminate valid traffic! You need to filter on the hostname dimension in analytics.

Hi, am I right in thinking that results will only be seen going forwards as opposed to retrospectively?

44. seonewtool

Email blacklist are the easiest way to reduce spam messages. Seo blacklist check will check over 100 DNS based blacklists on a server IP address.

45. Todd Jamieson

Curious what some other SEOs/marketers are doing regarding historical data? Are any of you using some sort of master SPAM list to normalize the old historical data. We have one client in particular that once we normalized the 2015 data saw a 28k drop in traffic.

46. nkanalyse

Hi
Thanks alot for the article! But I have been having a similar problem:

I have been get a huge amount of direct traffic from the US with ~99% bounce rates, <1s session duration & with valid hostnames(my domain itself)! What do you suggest in such a situation? They are not from a particular city or service provider so there is no definite pattern to sort. Though it showed traits of a ghost spam, setting up a filter that only includes visits with a custom dimension set to a certain value (ga('set', 'dimension1', 'xxx')) , resulted in all these visits still coming through! Is there anything I should be checking ?

47. Pooja Kshirsagar

Hi, thanks for the article. Actually, I get spam traffic with 100% bounce rate from a lot of websites. And, the number of sessions coming from them are between 1-3. Now, if I want to block these spam websites, it will take a long time for me to copy and paste every site address and block the domains. Please suggest me what should I do.
Thanks,
Pooja

1. markdhansen

As described in the article – start by removing the ghost referrals using a filter that INCLUDES only your valid hostnames. That should eliminate a large percentage of them. After that, do you still have a LOT of spam websites? Too many to manage manually?

48. Lorenzitto

Awesome, but I really hope there will be an auto-mode filter creator, but could be so damn slow to check all single links of spam manually…

1. markdhansen

There are a few tools, but I haven’t tried any of them, so I cannot make a recommendation. But, a quick Google search came up with:

In my opinion, Google needs to step up and help us solve this by creating and maintaining a blacklist and enabling GA users to opt-in to blacklist filtering. Honestly, I am not sure why they haven’t done it yet.

49. David Thomason

Hello

Thank you for great explanation about fake referrals traffic. Because two of the links which is give fake referral on my website. I read your blog which is more helpful me to remove that fake referral traffic to my website. Hope so it will not come again.

Thank You
David

50. Nazar

Hello! By adding our hostnames I cannot understand how the spamming sites can be filtered out. Also, can we also restrict the spammers by tick marking the “extract all known bots and spiders”? Well, adding filters worked out! I cannot see the spammers anymore in my analytics report.

Nazar

https://www.chetaru.com/

1. markdhansen

Spammers often use a bogus hostname like apple.com, or something else. When you filter to remove hits with invalid hostnames, you remove that type of spam.

51. Nico

What do you do when the referral domain is reddit real domain? You can’t filter it out without blocking the real one.

1. Nico

Yeah but you would be filtering out real reddit referral. this people are spoofing redit(dot)com domain, I don’t know how to filter the fake one cos the domain looks exactly the same (no fake letters like ɢoogle or lifehacĸer)

1. markdhansen

Actually, no. Read the part of the article about “Eliminating Ghost Referrals” again. hostname tracks the domain(s) where your site is running. Depending on how you are set up, it might have a few valid values, like ‘mycompany.com’ and ‘www.mycompany.com’. But it should never be equal to some other person’s site (like reddit.com or apple.com).

The hostname for all legitimate traffic on your site should be ‘nicoblog.org’. Legitimate referrals from Reddit will have source = ‘reddit.com’ and hostname = ‘nicoblog.org’. You should filter out any traffic that has hostname != ‘nicoblog.org’.

1. hikingmike

Those are from a year ago. I still see them pop up every once in a while. They don’t last long, but they spike so high that it makes the graph pretty unhelpful.

52. Cardi Chievo

I have a view affected by ghost referrals, that already has a include filter that use URL as filter field (includes only a section of the website)
If I add the hostname filter, will be treated as AND or OR ?
I fear that when applying the second filter Google will overwrite the first one

1. markdhansen

You should apply the edited filter to a test view so you can confirm that it works as expected before applying it to data that you care about. Secondly, try the steps that Google recommends for verifying your filter: https://support.google.com/analytics/answer/6046990

Multiple filters will be treated like AND conditions and applied in order. So, if the first filter passes (is included), but the hostname fails, the the hit will be excluded from the View. This is usually the behavior that you want. For example, I have this as an INCLUDE Hostname filter at the bottom of a list of filters applied to my no-spam view on megalytic.com:

megalytic|brandapitool|digitalbrandmine

So, after filtering out a few IPs (in previous steps), then I throw out everything that has an invalid hostname.

53. hikingmike

FYI, Google changed the method for adding the regex – Now you need to select Hostname, contains, then only enter a single hostname. Then click OR to add more possible hostnames in the same way. It will dropdown an autocomplete based on what you start typing and you can select one of the choices that matches. This method is used instead of putting them all on one line with | separators and escaped periods.

54. Jabir Mohamed - SEO Specialist

Hey Mark,

Thanks for putting this piece together. Ever since our website got some more exposure in the SERPs (organically) our website has gotten bombarded with this fake/spam traffic. This article has definitely cleared it up for me.

Thanks!

1. markdhansen

Great! I’m glad to hear that. Although, its frustrating to hear that this problem is still going on. I thought that Google Analytics had made a lot of progress in filtering out the fake hits.