How to Filter Out Fake Referrals and Other Google Analytics Spam

Twitter Facebook Linkedin Email
If you work with Google Analytics, chances are you’ve run into some of these websites in your Referrals report (Acquisition > All Traffic > Referrals) lately:
  • semalt.semalt.com
  • buttons-for-website.com
  • forum.topic31342700.darodar.com
  • make-money-online.7makemoneyonline.com
  • anticrawler.org
What are these sites and why are they linking to your site? Well, actually, they’re not linking to you at all. These sites represent fake referrals. They are created in your Google Analytics account to trick you into visiting spammy websites. If you open one of these URLs in your browser, you will likely be redirected to an online store, marketing scam or malware site. Nice, right?
Welcome to the world of Google Analytics spam, where spammers mess with your Google Analytics data to drive traffic to shady websites!
In this post, we look at the impact these spam sites have on your metrics as well as steps you can take to eliminate the spam from Google Analytics. If you are looking for ways to filter this spam out of your Megalytic reports, see: Removing Semalt and other Referer-Spam from Megalytic Reports.

Google Analytics Spam

 

Is Google Analytics Spam Messing Up Your Metrics? (Probably)

What’s the impact of a little spam data on your Google Analytics metrics? If you run a large website with tens of thousands of visitors or more per day, then maybe not much. However, if your site is smaller, there’s a good chance spam may be seriously skewing your metrics.

Below is the Acquisition > All Traffic > Referrals report from a small non-profit. I’ve checked the spam referral sources, and clicked “Plot Rows” to see the daily level of traffic from these spammers.

 

Google Analytics Referrals Reports Showing Spam

 

In the table above, you can see that spam accounts for the top two slots in this Referrals Report! Not only is this annoying, but it messes up the metrics pretty badly.

To analyze the spam’s impact on the non-profit’s metrics, I exported this table into Excel and did some calculations. In the results below, the Sources highlighted in yellow are spam referrals; the two summary lines at the bottom show metric calculations with and without spam.

 

Spreadsheet Analysis of Google Analytics Spam

 

The first thing to note is that 144 out of 283 referral Sessions are spam – that’s 50.9%! The impact on small websites like this one is huge as these spam visits throw off the engagement metrics. As you can see from the spreadsheet, the Bounce Rate for most spam referrals sources is 100%, the Pages/Session is close to 1.0 and the Avg. Session Duration is close to 0.00. When more than 50% of the referral traffic is spam, it is seriously dragging down the engagement numbers and giving you a false impression of the quality of your traffic.

Compare the Bounce Rate, Pages/Session, and Avg. Session Duration for “Including Spam” vs “Excluding Spam” (numbers inside the red rectangle). The spam is making these metrics look much worse than they really are. Bounce Rate, for example, is reading 77.74%. But, when we exclude the spam, the Bounce Rate is a much better 55.4%.

Other than exporting data to Excel and re-calculating all the numbers, is there any way we can stop these spam referrals from polluting our Google Analytics data?

Filtering Out Google Analytics Spam

The techniques for removing spam rely on using Google Analytics View Filters. I first read about these techniques in this excellent article from the Analytics Edge blog: Removing Referral Spam from Google Analytics.

As explained in that article, there are two basic groups of spammers using two different techniques, and you need to use slightly different filters to combat each technique.

Eliminating Ghost Referrals

The first group is what people are starting to call “Ghost Referrals.” These are referrals generated in your reports by fake visits. In this scenario, the spammers don’t even visit your website. Instead, they transmit spammy data directly to Google Analytics that gets added to your reports.

To start cleaning this up, we create a new view and then add some filters. As shown below, you can create a view in the Admin section of Google Analytics. Pick the Account and Property where you want to create a spam-free view. [Note: Views do not contain historical data older than the date on which they are created. If you create a view on Jan 2nd, there will be no data in that view prior to Jan 2nd. So, this new spam-free view will not help clean up the historical data – only the new data coming in.]

 

Creating a New View in Google Analytics

 

Next, we are going to create a list of the valid hostnames that should be showing up in your Google Analytics reports. The key to removing ghost referrals is that they come from hostnames that are not yours – and you can use that weakness to filter them out.

Below is a list of the valid hostnames of visits to our Megalytic website:

  • megalytic.com
  • blog.megalytic.com
  • support.megalytic.com
  • megalytic.com
  • translate.googleusercontent.com

Note the last one – translate.googleusercontent.com. This is the hostname that shows when a user views your website through Google Translate – you do not want to filter that out.

If you are not sure of your list of valid hostnames, you can look at the Audience > Technology > Network report and select Hostname as the primary dimension. Set a long time range in the calendar – like a year or more if you have that much data. This will ensure that you capture all the valid hostnames.

Here is what that report looks like for Megalytic. The valid hostnames have little red arrows next to them. The rest (e.g., apple.com, iedit.ilovevitaly.com) are from spammers!

 

Google Analytics Report on Hostnames

 

Once you have your list of valid hostnames, put them in a single line of text, separated by the “|” – OR character. Also put a backslash in front of all the “.” – PERIOD characters. This creates a regular expression that will match on your good hostnames and exclude all the spammer hostnames.

For example, here is what we use:

megalytic\.com|blog\.megalytic\.com|old\.megalytic\.com|forums\.megalytic\.com|client\.meglytic.com|
translate\.googleusercontent\.com|support\.megalytic\.com

Before you put this filter expression to use we recommend that you build a segment to test it out on your historical data to see how it looks. Filters permanently alter the data in a view, so it’s a good idea to test filter expressions using non-permanent segments on your historical data before using them in filters.

Another benefit of testing your filter expression in a segment is that you can use this segment to look at your historical data without the ghost referrals.

Here is the testing segment created for Megalytic, which we named “My Hosts.”

 

Google Analytics Use Segment to Test FIlter

 

And here are the results, filtered by using the “My Hosts” segment:

 

Google Analytics Report Comparing Segments

 

As you can see, some of the sessions have been filtered out – the “My Hosts” segment has 19,934 Sessions vs 20,235 in “All Sessions.”

Next, apply the “My Hosts” segment to your Audience > Technology > Network report and select Hostname as the primary dimension. Check to see that only valid hostnames are showing up. Below are the results for Megalytic.

 

Google Analytics Segment Filters Out Spam

 

Once you are confident your filter expression is working correctly, add it to your new view. We called our new view “Spam Free.” You can see below how we selected the “Filters” section to create a filter on this view, and then pasted in our filter expression as a Custom Filter Type. Make sure to select “Include” and to filter on the “Hostname” field.

 

Google Analytics Filter Definition with Hostnames

 

Save this filter and you should be all set. This new view will now exclude all ghost-referral spam. Unfortunately, filtered views only include data from the date they were created. So, you cannot use this view to look back at historical data. However, you can use the segment “My Hosts” created during the testing process to view spam-free historical results.

Eliminating the non-Ghost Referral Spam

Unlike the ghost referrals, some of the spammer bots, like Semalt, actually visit your website. These will not be removed using the hostname filter described above. To remove these, you will need to create another filter that will exclude a list of known referral spam domains.

So, to clarify, the first filter INCLUDES only your valid hostnames. That kills the ghost-referral spammers. This second filter will EXCLUDE known spammer domains.

To find the non-ghost spammers visiting your website, open Acquisition > All Traffic > Referrals and add Hostname as a secondary dimension. Spam sources where the Hostname is valid (in our case, megalytic.com) are the non-ghost spammer domains we need to exclude.

 

Google Analytics Spam Referral Sites - Non Ghosts

 

From this list, you can see that semalt.com and buttons-for-website.com should go on our list. As before, create a filter, but this time use Referral as the Filter Field, and the filter is:

semalt\.com|buttons-for-website\.com

As shown below, we name this filter “Exclude non-Ghost Referral Spam.”

 

Google Analytics Filter to Exclude Spam Referrals

 

You should check your Acquisition > All Traffic > Referrals report periodically to identify any new spam referral domains that start showing up. Add these new ones to your filter as necessary to keep your data as spam-free as possible.

Another approach to filtering out the non-ghost spammers is to stop them from visiting your website at all. If you are hosting your website on the Apache web server, this kind of blocking can be accomplished by modifying the .htaccess file, as described here: How to block referrer spam traffic.

If you are running WordPress, there is now a plugin that will do this for you: SpamReferrerBlock. One advantage of using this plugin is that they claim to keep a “blacklist” of domains that are spammers and filter those visits for you, so you do not have to keep your filters up to date.

Conclusions

Referral spam is becoming a serious problem and I expect that Google will soon introduce new features to help us protect the integrity of our Google Analytics data. Until then, you can use the filtering techniques described in this post to create a view that is relatively free of referral spam.

Appendix

Its been almost 2 years since I wrote this, and the Google Analytics spam problem is still with us! If you are looking for more details on this subject, I suggest that you check out Carlos Escalera’s post: Ultimate Guide to Getting Rid of All the Spam in Google Analytics.

Update on March 23, 2017 …
I’ve seen a few articles indicating that Google is taking action to solve this problem. If you have noticed an improvement, let me know in the comments.

Subscribe to us

  • Dave Fimek

    Few things:

    Self referrals are a sign of a larger problem. If you have self referrals that means quite a number of things that need to be addressed.

    Creating an exclude filter is EXTREMELY bad as a solution here. If I visit your site and trigger the issue where I’m now a self referral, or “ghost referral” as you call it, your filter will REMOVE me completely from your data despite being a valuable traffic. Its kind of like shooting the horse for throwing a shoe. The overwhelming majority of users filtered out in the manner listed above will be quality traffic.

    For self referrals you need to:

    – Discover the issue. Is it a sizable chunk of traffic?

    – Review your site implementation to resolve the issue. It will be something technical.

    – With the main issue fixed, add your own site to the referral exclusion list of your web property:

    https://support.google.com/analytics/answer/2795830?hl=en

    • markdhansen

      Hi Dave – thanks for the comment. I think there is some confusion, this blog post is about “fake referrals” – not self-referrals.

      Regarding exclude filters – yes, I agree that they are dangerous when not used carefully, but really there is no harm in excluding hostnames that are not yours. So, if the hostname “iloveitaly.com” is showing up on my GA report, that’s got to be spam – there is no other way a domain other than some variation on megalytic.com or a google translate domain could be legitimately showing up in the reports is there?

  • How is apple.com a ghost spammer?

    • markdhansen

      Its not actually apple.com. I don’t have my GA tracking for megalytic.com running on the domain apple.com. Somebody is just spamming google analytics to make it look like my GA tracking code is running on apple.com.

      • So the spammer has Host: "apple.com" in their POST requests to GA (using your GA tracking ID). That’s a weird spamming tactic.

        • markdhansen

          They have to include something in the hostname. But, the actually spammy part is what they put in the “source” – so when you look at your GA reports, it looks like you have traffic being referred by something like “buttons-for-website.com” and then you go visit that site.

  • Bill

    Mark, first off thanks for the great article. My question is regarding
    the list of Non-Ghost Referral Spam. My referral traffic list is 2800
    sources long. A lot of them seem to be from Poland (.pl extension). What
    can I use to help me determine if it is a legitimate source? Thanks

    • markdhansen

      Hi Bill,

      Great question! Other than going to the website and looking at it to see if there is really a link to your site there, I don’t know how you can. It would be create if somebody put together a list of known spamming referral URLs.

      • Bill

        If I want to eliminate all Poland referral traffic, is there an expression I can use in the filter for .pl? like *.pl ?
        Thanks

        • markdhansen

          Yeah, you can exclude on the regex for ending in .pl, which I think would be *.pl$ – the $ on the end is probably needed.

          • Bill

            Thanks for the help so far. One more question with regard to Non-Ghost referral spam. Are sources such as google.com, msn.com, linkedin.com, facebook.com, etc all considered legitimate? I ask because there are other recognizable sources such as imdb.com on this list that I know do not include legit links to my page. Should I attempt to remove them using your method or do I want to be more careful with sources such as google.com so that I do not remove legitimate data?

          • markdhansen

            Sure, its possible that you have legitimate referrals from all those sources – although google.com should probably be showing up as somesubdomain.google.com. Referrals from ghost spammers that are not actually visiting your website and using legitimate domains like msn.com, apple.com, etc. – these are blocked using the hostname filter. Not be excluding the referral domain.

  • Hey Mark. Thanks for the great walk through advice in this post. I’ve just implemented these filters on a few of the websites I manage. I’m just wondering why you wouldn’t apply the filters to a normal view that has historical data? After testing with a segment of course. Saves having to turn the advanced segment on. I always keep an untouched view and then a second view in which I apply filters like these to.

    • markdhansen

      Yes – I think you *should* apply as a filter, after testing as a segment. Sorry if that was not clear in the article!

  • Lifesaver! Have been searching everywhere how to add multiple exclusions at once!

    • markdhansen

      Glad this was useful!

  • Hi Mark, can I just confirm that as you are using filters, this will correct historic data as well? (I have previously added specific sites to exclusion lists, but that doesn’t get rid of historic data cluttering up my accounts, so am keen to try your method!)

    • markdhansen

      Hi Bridget. Filters will not fix historic data. However, you can use a segment with the same conditions as the filter and apply it to your reports to clean up the historic data. Filters permanently remove data from views. Segments only work on the reports they are applied to. Make sense?

      • Thanks Mark! (Sorry to be so long replying when you were so prompt!)

  • Awesome description, worked out great! Thank you so much!

    • markdhansen

      Great! Glad this worked for you 🙂

  • Capital SEO

    Thanks Mark for putting this guide together.

    I think a lot of us were fooled by a spike in site traffic, later to reveal it’s these dirt-bags using our trackers to send fake referral data.

    At least the analytics industry has caught on to these Spammers.

    I hear Google is taking steps in the back-end to help us out.

    We’ll see!

    • markdhansen

      My pleasure – glad it was helpful. It really is a dirt-bag tactic isn’t it? Hard to imagine that these spammers actually make money doing this, but they must or it wouldn’t have grown into such a problem!

  • CloudNo9

    So basically the information Google Analytics provides about my visitors is worth crap. Good to know. I might as well implement more trustworthy counter software into my site.

    • markdhansen

      Well, I wouldn’t say that! There is a spam problem right now; but your Google Analytics data is still extremely useful. Just like your email is useful even if you get some spam.

  • Martin Johncox

    Thank you for the advice. The problem I’m having is that in 78 percent of my site sessions (which I think are all spam), the hostname is not set. For another approach, I have tried to set up a filter that would exclude any site visit less than, say, 30 seconds. That seems like to would nail all the spammers. If some non-spammers are caught in the filter, I can honestly say that anyone who visits my site for less than 30 seconds isn’t of much benefit anyway. But I couldn’t figure out how to set up such a filter. Ideas? Thank you!

    • markdhansen

      Martin – I do not recommend that, as you will filter our all single page visits. Whenever a user comes to your website and visits only 1 page, the time on site will be 0 – even if they spend 30 minutes reading your single page.

  • Victoria

    Hi Mark,

    Thanks for the post – it is awesome!

    I’m trying to include the valid hostnames in the filter, but I only have the option to exclude under the filter field. Am I missing something here? I have full admin so should not be causing an issue! :/

    • markdhansen

      Hi Victoria,

      You should have both, but the UI can be a little confusing. Here’s a little video to show where the include/exclude buttons are and how they work: http://quick.as/0ZpxIrpb6

      — Mark

  • Phil Hunter

    This looks like a great solution to eliminating spammy referrals. I can’t get past the ‘Create new view’ stage. All the historical data is completely missing!

    • markdhansen

      Hi Phil,

      When you create a new view, it will only include data going forward from the time it was created. You cannot apply filters to historical data.

      For historical data, apply segments, as described in the article.

      — Mark

      • Phil Hunter

        Thanks. I will give that a try.

  • Roger Spinner

    Great post. Thank you very much!

  • Hi,

    Thanks for the great article. I have just tried this on two of my sites. The segment worked perfectly on 1 of my websites, where my hostname did not have the “www”. However, for the second site, I am having some issues.

    1. In my second site, I have 2 versions of my valid hostname showing up – http://www.mysite.com & mysite.com

    2. Another domain that is showing up here is the domain of my blog website. There is a bit of backlinking between the 2 sites and they are hosted on the same shared server, but they are different domains (not subdomains), so not sure why it shows up as a hostname, with a fairly significant proportion of traffic.

    The regex I am using is

    http://www.mysite.com|mysite.com|translate.googleusercontent.com|www.myblogsiteonsamehost.com|

    However, when I put this, I get 100% in the summary that shows on the right. The correct figure is around 96%. When I set-up the segment using the “include” option instead of Regex, I am able to get the 96% without any problem. However, the problem is when creating the “filter” I can only use Regex with include. I am fairly sure there is something not right with my RegEx so would appreciate if you could tell me what might be the issue here.

    • markdhansen

      Looks like you have an extra “|” on the end. That might be causing the problem, as OR blank might be matching on everything.

  • Luke Miller

    Really appreciate this article! Using the regular expressions to more quickly block a list of 26 spam referral sites from 60+ client analytic accounts was really helpful. It would be nice if it was not limited to 255 characters, as I have to break them into two separate filters, but still sped the process up a lot! There seem to be more and more of these spammers every day, and the list grows constantly.

    I wrote a quick article on my website about the .htaccess trick (http://www.lukeamiller.net/blocking-semalt-buttons-for-website/), but I think this will be a more approachable method for many users.

    • markdhansen

      Hi Luke,

      I’m glad the article was helpful. Yes, the 255 character limitation is frustrating – especially with so many new spammer showing up! I like your piece and left a comment.

      • Luke Miller

        Definitely! Thank you for the comment as well! Yea that character limit is a real pain, I identified over 25 spam referrals on client site analytic reports that I have to start filtering and a larger limit would be helpful!

  • Finally a serious article solving the issue of fake-referrals. I have tried a lot of filtering solutions posted on other blogs, but they never worked permanently and every week I had to update the filter patterns to include the new spams. Thank you so much for publishing this great resource.

    • markdhansen

      Thanks – glad this article helped out!

  • muscade

    Great article, thank you so much! But my view is not working (0 sessions since I implemented it) although it works when I use segments. Do you have an idea of what I could have done wrong? thx

    • markdhansen

      It might take a few hours for data to start showing up in the new view. And, the view will not contain any historical data.

  • BRILLIANT POST! Thanks so much! It has changed everything for us with GA.

    • markdhansen

      Hey, Thanks!! Glad it helped 🙂

  • I’ve noticed that I’m now getting direct spam that’s using the twitter URL shortening service. You can’t filter out the referrer because “t.co” is also a source of real traffic. I’m currently trying to filter it our via the Request URI, but filter verification suggests that my filter isn’t working. I might just try it on my test view and see if it works.

    • markdhansen

      How do you know it is spam if it is coming from t.co?

      • Good question. in the Behaviour -> Site Content -> All Pages view a few pages appear that don’t exist. E.g /www.spamweb101.com/post12976 most of these have the full referrer as “www.spamweb101.com” but a few had t.co/dFGYeRR6e as the full referrer (where this link would normally forward to the top level of my site).

        • markdhansen

          Interesting. But, what I can’t understand is how the spammer knows your t.co URL – unless they specifically did some research on you and the site. That would be a lot of work just to dump spam in your GA account!

          Glad you like the article. If it was helpful, spread the word and link to it 🙂

  • BGz

    Hi Mark,

    First of all thanks for this tutorial. I have a question, what about (not set) hostname? Your solution excludes those visits from analytics but these are real visits with a number of page views and time spent on site. Please advise.

    I would like to add that non ghost referal trafic comes from real bots visiting your site, therefore using one of your legitimate hostnames as you said. The real problem here is that can putt significant load on your servers apart from poluting analytics. The proper way to eliminate the problem is to exclude them permanently by making your site inacessible to them. You can, and should, block them with your htaccess file or within your nginx configuration files and they will stop appearing in your analytics data.

    • markdhansen

      hostname should not be (not set) unless there is some bug in your tracking code. GA automatically populates it from your domain, so in order to get (not set), you must be setting it in your tracking code ans passing in nulls.

      Yes – not all the spam is ghost referral traffic, that is true. That just has to be filtered out the hard way! Or, blocked in .htaccess.

  • Ling

    Hi Mark, Thanks for this very useful post — so I have referral spam from some known sources (making up about 18% of sessions). When I tried to verify my filter, I get the following “This filter would not have changed your data. Either the filter configuration is incorrect, or the set of sampled data is too small.”

    Might you know what’s going on here?

    Thanks

    • markdhansen

      The filter might be configured wrong. Post the regex and I can have a look. Or, you can set up a new view and try the filter on the new view and see what happens. Trying it on a new view prevents you from messing up your existing view.

      • Ling Fu

        HI Mark,

        Sorry for the delayed response. Here’s what the regex looks like –

        .*(semalt | iminent | 100 dollars – seo | buttons – for -website | best – seo – offer | best – seo – solution | buttons – for -your – website).com | sitevaluation.org.*

        • markdhansen

          Hi Ling,

          I tested your regex in a segment and it seems to work fine. I have also created a test filter to try it out. Those warning messages from the Filter Editor – I have found that they are not always so accurate. I think that your regex should work OK. Anyway, I have created a test filter to find out and you can check back with me in a few days if you want to see how its is working for me 🙂

          — Mark

          • Ling Fu

            Thanks so much for your help Mark !

          • Marketing

            Hi Mark,

            I am using a similar format, but it doesn’t seem to work: .*(semalt|buttons-for-website|success-seo|buttons-on-your-website|Get-Free-Traffic-Now).com|forum.20.smailik|guardlink.org.*

            Can you please help me out?

  • Lauren McLaughlin

    Hi Mark,
    Quick question–I noticed an abrupt spike in ORGANIC search traffic, yet am seeing no discrepancies in the “hostname” reports, do you have any idea where this could be coming from?
    I wouldn’t be surprised to see this traffic increase happening within my Referral reports, but the majority of the spike has come from organic, with an increase in direct as well. I’ve been thinking the visits were all coming from bots, but my bounce rate is showing these visitors to be staying on my page for quite some time. I’m at a loss. Any advice?

    • markdhansen

      Ghost spammers (who never visit your website) can also spoof organic traffic (although this is less common). Although that would usually show up as a foreign Hostname.

      Maybe it is real traffic? That would be a good thing 🙂

  • mphdavidson

    Thanks for this Mark. Referral spam has been a thorn in my side for some time now, and this was a huge help. Plenty of detail, well-explained, and screenshots to boot.

    • markdhansen

      Thanks – glad it helped!

  • Raju

    I love this! It was really painful to keep on adding all those domains in Filter exclude! Finally this will resolve the issue 🙂

    • markdhansen

      Great – glad it helped!

  • Lindsay

    One of my websites is getting a lot of referral spam. When looking up the hostnames under network, I found a lot of traffic from (non set). Could this be spam? I tested the hostname filter you recommended and all of the (non set) was removed. Did I solve the problem? Or do you think it’s my tracking code?

    • markdhansen

      Probably spam. If there was a problem with your tracking code not setting the hostname, that would be very unusual; and you would probably see it everywhere.

  • Ban Travis

    Google Analytics is the enterprise-class web analytics solution that gives you rich insights into your website traffic and marketing effectiveness.

    Ban@Googleanalyticsspamremover

  • Thank you for providing a very useful and easy to follow tutorial! -George

    • markdhansen

      Glad it helped!

  • Thank you soooo much. This has been bugging me for a while. Maybe from now on I can more confidently look at my traffic and not worry about all the spam. I’ve gone and used the htaccess method as well to stop at least some of these shitty people from entering my site.

    • markdhansen

      You are welcome!

  • Sandeep Kumar

    Hi..

    My case is a bit different.
    For one of my client, I did try everything but not everything is vain,
    I tried to exclude through hostname and referral url bur every time when i verify it say
    “This filter would not have changed your data. Either the filter configuration is incorrect, or the set of sampled data is too small.” Although there is a large amount of such traffic.

    Then I tried to septate this traffic through advance segments, I tried to create segment based on hostname and referrer path filter, but no data appear in these segments, then I tried source filter and it works, But problem there is no option in profile filter to exclude data by Source.

    In referral traffic report when i tried to see hostname in secondary dimension it is “not set” and referral path for most is “/”

    Seriously I am frustrated I have exclude such data for many websites but this is exceptional case, would you help me.

    • Amri CelakaRoa

      I have the exact same problem that you have… if you find the solution could you please share with me.?
      I all the methods suggested in forums and site.. but nothing worked for this problem… 🙁

      • Sandeep Kumar

        Hi..
        Although I could not exclude this data by a profile filters, but through an advance segment i did it, I just create a new segments and name it fake traffic, go condition in advance segments and select host name in ad content and (not set) in query. match type remains “content”

        Surprisingly it not only remove referral data but also such spam direct hit as well
        All the direct traffic excluded by segments have 100% bounce rate and landing page is home page (It proves it is spam data)
        Similarly you can create an another segments as genuine traffic, everything remains same just change match type “Does not contain”

        Hope it is helpful

        • Amri CelakaRoa

          thanks.. I’ll try it shortly…

    • markdhansen

      A couple things might help. (1) Google might tell you that the filter “would not have changed your data” – even though the filter is fine. The filter verification test is not very accurate. Try it and see if it works anyway. (2) Filters do not affect historic data – only the data that comes in after the filter was turned on.

      Please post your filters here if you think there might be something wrong.

  • Analytics Guard

    Thanks for producing this! We actually based a lot of our solution around this guide so thank you again! 🙂

    • markdhansen

      Awesome – glad it helped! Spread the word, link to the article 🙂

  • HelloArtsy.com

    Perfect! Thanks so much for such a detailed explanation. I can’t believe google hasn’t done something about this yet. These spammers have really messed up my analytics. It seems so obvious who the major spammers are. Thanks again for your easy to follow post!

    • markdhansen

      Glad it helped. I think Google will solve this eventually, but it is a difficult problem for them to solve in the general case. So, it may take some time.

  • Jen F

    Sorry if I am asking a question which has already been asked but…if it is not feasible to filter data by adding valid hostnames and it is feasible to filter using ghost referrers names – how does one do this, and is it possible to filter out multiple ghost referrers in one go? Ghost referrers are accounting for nearly two thirds of the traffic to the site I am looking at, yet when I previewed a filter it said it wouldn’t have changed the data (?). Would you be able to include a screenshot to demonstrate for example how you would filter out data from trafficmonetize.org and another, say 4webmasters.org, please? Any help would be great!

    • markdhansen

      The ghosts have the wrong hostname. So, you just filter them out by eliminating traffic with hostnames that are different from yours. The details are explained in the article.

      If trafficmonetize.org and 4webmasters.org are ghosts, then just the hostname filter will work. Otherwixe, you need to filter on the Source to exclude: (trafficmonetize.org|4webmasters.org).

  • Russ Tanner

    So if I want to block not only social-buttons.com, but also www1.social-buttons.com, www2.social-buttons.com, www3.social-buttons.com, etc. (I’m seeing a lot of domains use “www1”, “www2”, etc.), how do I do that in a segment? Thanks!

    • markdhansen

      use a regex (see: https://support.google.com/analytics/answer/1034324?hl=en), like this:

      .*(social-buttons.com)

      That matches anything ending in “social-buttons.com”

      • Russ Tanner

        Thanks. I created a segment to filter spam (screenshot: https://goo.gl/dvueAS). Although it looks like I might have set it up wrong, based on how you said I should do it. I currently have this:

        smailik.org|trafficmonetize.org|4webmasters.org|100dollars-seo.com|free-share-buttons.com|social-buttons.com|buttons-for-website.com|buttons-for-your-website.com|best-seo-offer.com|Get-Free-Traffic-Now.com|100dollars-seo.com|event-tracking.com|dailyrank.net|webmonetizer.net|sitevaluation.org|free-social-buttons.com|semaltmedia.com|social-buttons.com|buy-cheap-online.info|free-share-buttons.com|anticrawler.org|googlsucks.com|webmaster-traffic.com|fiverr.com|myhairtransplantmd.com|brighthouse.rr.com|semalt.com|7makemoneyonline.com|kambasoft.com|hulfingtonpost.com|darodar.com|sandicor.com|put-your-site-here.com|7makemoneyonline.com|anticrawler.org|seoanalyses.com|best-seo-solution.com|savetubevideo.com|srecorder.com|descargar-musica-gratis.net|baixar-musicas-gratis.com

        It looks like I should do this instead?

        .*(smailik.com)|.*(trafficmonetize.org)|, etc.

        Is that correct?

        • markdhansen

          Yes, you need the . and not just .

          • Marketing

            Can you please confirm how I end this segment, and if its correctly written? Ex. Do I put a period, star or ??

            .*(social-buttons.com)| .*(semalt.com)| .*(buttons-for-website.com)|
            .*(event-tracking.com)| .*(best-seo-offer.com)| .*(get-free-traffoc-now.com)|
            .*(buttons-for-your-website.com)| .*(success-seo.com)| .*(buy-cheap-online.info)|
            .*(free-share-buttons.com)| .*(100dollars-seo.com)| .*(best-seo-solution.com)|
            .*(semaltmedia.com)

        • In my case, I found ‘trafficmonetizer.org’ as a spam Source. Note the ‘r’ at the end of ‘monetizer.’ Here’s my current RegEx on Source:
          (best|100dollars|success)-seo|(videos|buttons)-for|anticrawler|musica-gratis|semalt|forum69|7makemoney|sharebutton|ranksonic|sitevaluation|dailyrank|4webmasters|(traffic|web)monetize|social-buttons

  • Myke black

    Great article. Thanks for posting it. We have a lot of sites to monitor, and adding filters for all of them takes a lot of effort. Using this method means that if the spammers add new hostnames to their repetoire you have to go to each filter view and add the new hostname – new ones seem to pop up about once a month, (but 4webmasters.org is still the biggest culprit for us)

    What I found works better is to filter out countries as described in this article: https://moz.com/blog/how-to-stop-spam-bots-from-ruining-your-analytics-referral-data – this method also filters out spam bot hits to your website as well as ghost referrals.

    But really the best solution would be if Google just did something about it themselves and blocked analytics referrals from these spammers and retrospectively deleted the data from all website statistics. This would not only clean up everyone’s data, but would also send a message to the spammers that this tactic is destined to fail.

  • Krissy

    Do I need to exclude or include hostnames that include google? E.g. I have a few hostnames google.fr and google.es. I’m assuming these are valid hostnames but you didn’t mention anything about including them so I’m hesitant. Thanks!

    • markdhansen

      google.fr and google.es shouldn’t show up as Hostnames. I’d guess that is spam. The only way you should see google in a hostname is for the google translate site.

  • Decadent Dragon Snacks

    Hi – I followed all of the instructions you provided but for some reason my “Spam Free” view is not showing any hits. I’ve tried visiting my site and flipping back and forth between the Spam Free and All Web Site Data views, and my hits show up in the All but not in Spam Free. I’ve checked and double-checked my filter input and can’t figure out what could be causing this problem. Any help would be greatly appreciated!

    • markdhansen

      Can you post a screen-shot of your filter?

      • Decadent Dragon Snacks

        Sure, here it is.

        The filter text is: distressedchildren.org|mydci.distressedchildren.org|translate.googleusercontent.com

        • markdhansen

          Looks like it should work. Try using just “distressedchildren” as your filter. If that works, then try distressedchildren.org, etc. – like that to weed out where the bug is.

          Do this with a segment first – so you can see results right away. Once it is working, then try a filter.

          Remember, filters only impace data from today onward. There will be no historical data affected by the filter.

  • This was super helpful. Thanks for putting all of this in one place.

    • markdhansen

      Great – glad it helped!

  • Angela Charles

    Filtering spam works fairly well, but you have to keep up with new spam referrals every month. Have you come across any limitations to the number of Exclude filters you can write in Google Analytics? I’ve got 6 on one of our sites and it seems Google won’t let me write any more. Does that sound right?

    • markdhansen

      No, that does not sound right. I have views with many more than 6 filters and they work correctly. I’m not sure what the absolute limit is.

      • Thanks for response. Not sure what happened but we went back into the account with the problem and were able to add more filters. Google must have been having a bad day. Thanks.

  • Judith Andrea Manriquez

    Thank you. I was able to follow this pretty easily with no background knowledge. Much appreciated.

    • markdhansen

      You are very welcome!

  • Matthew J. Bigbee

    Thanks for this great article!

    What do you do in regards to eliminating Ghost Referrals when they are hidden as Hostname (Not Set) ? screen shot below of both Network Hostname & Referral Hostname

    • Calvin Sauer

      If I’m not mistaken, by only including your own hostnames, these should be filtered out too. You don’t need to do anything special to filter those out.

      • Matthew J. Bigbee

        Thanks Calvin, that makes sense, I was just concerned that something in the (Not Set) might be of real value. I am getting a lot of daily traffic from the (Not Set). Would be interested in seeing what is behind the (Not Set)

      • markdhansen

        Yes – Calvin is correct about that. The techniques shown in this article will also remove the (not set) hostname sessions.

    • markdhansen

      Hostname being (not set) must either be spam or something is wrong with the tracking code installed on the website. Hostname is usually set automatically; but if you are setting it manually for some reason, there might be a bug that is making these (not set).

      The more likely explanation is SPAM, as describe here: https://productforums.google.com/forum/#!msg/analytics/Diifpmuy9tY/FlZb9rQ1tAkJ

      • Alfonso Fernández

        Hi, Mark,
        I am concerned about the following fact, (but I am not sure if it is correct): maybe some of the traffic assigned to (not set) hostname and source/medium = direct/none could come from an https link, because of the referral is missing when an https link point to an http.
        If I am right, should not we take that into account when building the filter?
        My apologies if I am missing something
        Alfonso

        • markdhansen

          Even if there is no referrer, there should be a hostname. Hostname is set by the Google Analytics code running on your site. Unless the hits are coming from spammers through the Google Analytics Measurement Protocol.

  • Rdh H

    Hi Mark,
    Great article very helpful! After going through this set up I am still getting hits from (not set). Any ideas?

    • markdhansen

      Which dimension are you seeing at (not set)? Is it the Hostname?

  • Natalie

    This was a big help. Thank you!

    • markdhansen

      Glad it was helpful 🙂

  • nmckean

    In your regex, would megalytic.com not also do the job of picking up all subdomains? So “megalytic.com|blog.megalytic.com|old.megalytic.com|forums.megalytic.com|client.meglytic.com|translate.googleusercontent.com|support.megalytic.com” just needs to be “megalytic.com|googleusercontent” (removing translate. will include cache. also).

    • markdhansen

      Yes, you are right about that!

  • Hannah Bock

    Hi, I tried to exclude the domains, semalt.com|buttons-for-website.com and did the check filter option and it told me it would have no effect on my data, even though I had about 300 visits from buttons-for-websites.com this month. I’ve attached two screen shots, one of my data and one of the filter. Any ideas on what I’m doing wrong? Thanks! Hannah

    • Hannah, I’ve been working on this spam removal problem for several days…patching together information from this Megalytics post by Mark and the Analytics Edge post he references. My best guess is that verify filter says data wouldn’t change for 2 reasons: 1. use “Source” as the dimension not “Referral” for the RegEx match 2. If you recently made a new View, which is the best practice, then you don’t have much data in it yet; is it possible none of your site’s identified spammers have struck since you created the new View?

    • markdhansen

      Hannah, I don’t think the the “Filter Verification” is very accurate, honestly. I usually ignore that message, and have found that filters work anyway. And, like L.E. Henry says, if you are setting up a filter on a new view, then there isn’t much data for Google to work with in doing an impact estimate.

  • Fantastic article thank you so much! Also, just came across some “event spam”. Any idea how to get rid of this, or to check to see if it is impacting user engagement levels?

    • markdhansen

      The same techniques described here should work for removing event spam. For the ghost-spam use a Hostname filter. For the real visitor spammers, you need to build an exclusion list of referrers or sources. You can exclude based on event category also.

  • Hi Mark. That’s all good when you have only a couple or say, 10 websites to deal with. You spend an hour every month and clean them up. However, when you have dozens or 100s of sites to cover? It seems Google is not very keen on helping us solve this for now, so I developed a tool you might want to check out if you manage many Google Analytics accounts/properties: https://www.analytics-toolkit.com/auto-spam-filters/

    It’s a fully-automatic, set-and-forget solution to the issue of referrer spam of all kinds.

    • markdhansen

      That’s cool. I hope to have time to try out this tool soon!

      I think Google will eventually solve this problem, but (1) it isn’t easy to solve “in general”; (2) Google is a huge organization with multiple priorities, and it takes time for things to rise to the top of the priority list and get acted on.

  • Great Article!! Thanks of sharing it! We are facing the problem! And this post hopefully helps!!!

    • markdhansen

      Glad you liked it!

  • Hi Mark,

    EXCELLENT information. Thank you!

    I hope you don’t mind but I have added further to your information and created a video with a step-by-step instruction on how to filter out the referrer spam data so you have a clean view of your analytics

    http://seo-company-bristol.com/how-to-remove-google-analytics-spam/

    • markdhansen

      Glad you like this. Awesome that you made a video, I think that will help many people!

  • Bobby Webster

    Hi, wow incredible post! I followed the directions exactly. It appears it is filtering out the ghost referrals, but it appears to be affecting my e-commerce tracking in a weird way?

    I setup one filter to Include Hostname using the following filter pattern:

    http://www.kooksheaders.com|kooksheaders.com|translate.googleusercontent.com|kooks.edreamz.com

    I then setup another filter for non ghost bots to Exclude – Filter Field – Referral like this:

    semalt.semalt.com|buttons-for-website.com|best-seo-offer.com|buttons-for-your-website.com|search.mywebsearch.com|100dollars-seo.com|make-money-online.7makemoneyonline.com

    As you can see, the top is NON- filtered for the month. The bottom is filtered. For some reason the filtered organic traffic is slightly less but direct is more? Another weird thing is there is 1 less transaction for organic, for Direct traffic there is 1 less transaction and for CPC there is 1 less transaction.

    Any insight you could provide would be great. I was really happy to find a solution to this issue, but obviously it’s an issue if it’s reporting different transactions and revenue numbers.

    Bobby

    • markdhansen

      Well, I guess that something in your segment is removing sessions with transactions, which are definitely not SPAM! So, I would try by flipping the segment (turn exclude to include) and see what sources in your segment have those transactions, and then do not exclude those.

      • Bobby Webster

        Hi Mark, thanks for the reply. For some reason I didn’t get the update?
        Anyway, sorry but I don’t understand how that would help? I would understand your point if it was a difference in referral sales, but this is a difference in Organic, direct and PPC transactions? My filter is just excluding referral spam sites like you explained.

  • Kanu Gupta

    great post very helpful!! thanks a lot.

    • markdhansen

      Glad you liked it!

  • John Scheer

    Would love to hire someone to help on this! Going in circles on a couple sites…

  • FYI – A new Ghost Referral (sexyali dot com) is abusing the hostname “translate dot googleusercontent dot com”:

    • markdhansen

      Thanks for letting us know. That is really LOW!

    • Christian Taylor

      I am seeing this too. Have you found a way to filter this out?

      • Just add that hostname in question to your “Exclude” filter (besides semalt, etc.) as mentioned in the article. If it doesn’t work, try using “Campaign Source” instead of “Referral” as the Filter Field.

        • Christian Taylor

          Thanks!

      • markdhansen

        You will need to add sexyali.com to your Excludes list for sources.

  • Hi Mark,

    That’s one solid article! I wonder how I didn’t came across it earlier, as when I was reasearching the issue back in March most articles I came across where full of misinformation (like using the “referral exclusion” feature, relying solely on htaccess, etc.). Not yours, though!

    I’d like to plug a solution that I’ve developed that applies all filters you’d need and also keeps them up to date, across 100s of web properties with one click of the mouse. Since I manage big accounts with many properties in them, developing such a tool was the only viable solution for me, otherwise I’d have to spend several hours a week dealing with this nuicance.

    I hope you and others in a similar situation will find it useful: https://www.analytics-toolkit.com/auto-spam-filters/

  • I started to put together a list of the non-ghost referral domains and created the RegEx needed to add to the filter. The sheet is here: http://bit.ly/NonGhostSpam
    You can submit new domains for the list here: http://bit.ly/NewDomainForm

  • Jonathan Soifer

    Hi, thank you for this article.

    The “Eliminating Ghost Referrals” part worked out just fine.

    For some reason the “Non-Ghost” part isn’t working for me.

    I’m using this Regex:

    floating-share-buttons.com|site8.free-floating-buttons.com|site7.free-floating-buttons.com|www.Get-Free-Traffic-Now.com|forum.topic65098423.darodar.com|chinese-amezon.com|get-free-social-traffic.com|hongfanji.com|www.event-tracking.com

    Creating a New Filter > Custom > Exclude > Referral

    Do you have any idea why this might not be working?

    • markdhansen

      How do you know that it is not working? Remember, filters do not fix historical data – they only filter new data coming in after the filter is added.

      To get a clean view of your historical data, create a segment using the same Regex; and apply the segment to the view that has the spam.

      • Jan Hansen

        Thanks!

      • Jonathan Soifer

        Sorry it took me so long to get back to you markdhansen.
        It is working perfectly fine. Thank you.

  • Emir Nisic

    Mark, this is great!

    However, I do have one question. After setting up the filter which includes valid hostnames, I am not seeing any conversions? Could this be due to the filter and if so, how would I go about solving that?

    And would having two views make sense? As in, one to track traffic with the filter mentioned above and then the other one where I would track conversions?

    If you have any suggestions I would really appreciate it.

    Thanks!

    • markdhansen

      The filter should not prevent conversions from showing up. Something must be wrong. You shouldn’t need 2 views.

  • Thanks for this nice explainer — and you have to wonder how Google Analytics can go on for such a long time with zero new functionality added to combat this spam. I suppose we get what we pay for?

    • markdhansen

      You are welcome, glad this was helpful. I’m not sure what’s taking Google so long to provide some counter-measures to combat the spam problem. It is very frustrating.

  • ddofborg

    Another interesting way I found how to clean the spam is by adding a screenresolution filter, which should be not (not set).

    Another free tool is which ads 550+ domains + some extra filters: https://www.adwordsrobot.com/en/tools/ga-referrer-spam-killer

    • markdhansen

      Yep, screenresolution being (not set) is another way to identify spammers. They are getting more clever, though, and starting to add values for that dimension (among others).

  • shashi

    Thanks for such a great post..It helped me a lot..

    • markdhansen

      Awesome! Glad it was useful.

  • John

    Is there a difference when building the spam exclusion filter if you use Filter Field: Campaign Source instead of Referral? Campaign Source seems to be working. I also filter traffic that doesnt have a screen res. set and traffic from a few of the more annoying countries. Is this a good idea?

    • markdhansen

      Sure, those kinds of filters can work. The reason that I suggest using hostname is that it is simple. You don’t have to filter out all the sources of spam, just filter on your hostname.

      • Doc Kodam

        So there isn’t a difference between use Referral instead Campaign source?

        • markdhansen

          Of course there is a difference. Referral will exclude traffic from the specific sites (the referral URLs that you use in the filter – e.g., ‘freestuff.com’) that GA tracks using the Referral dimension; and Campaign Source will exclude traffic from the specific sources (e.g., ‘Facebook’). Sometimes, the same data appears in Referral and Campaign Source – e.g., ‘freestuff.com’ – but not always.

          So, putting the spammer sites in Campaign Source might work, but I think it makes more sense to put them in the Referral.

          • Scott Mulholland

            Mark – I recently added a hostname filter to a new view and when comparing the results for the new view and the RAW data for the last month I noticed that for the hostname filtered view there are 0 referral – the Acq. > All Traffic > Referrals just has 0’s across the board. In the raw data I do see spam referrals but I also do see valid referrals from sites our site is listed on. Any idea why the hostname (which ours is simple, it’s just the url of the main site) are removing all referrals?

          • markdhansen

            Hi Scott – valid referrals should not be blocked by a hostname filter. Check that you are using the correct hostname. It should match whatever is the hostname dimension on the valid referrals you are seeing in the RAW data. On some sites, there can be multiple valid hostnames – for example foobar.com and foobar.org might both be valid domains for the same website. In such cases, you will need to match a regex for the hostname, such as foobar.com|foobar.org

  • chsweb

    It is worth noting more clearly that when you set up the new View with the new “My Hosts” filter that all of your data gets zeroed out. I think this is because the new View & Filter cannot parse historical data, so if you don’t get lots of site visits, when you go to the new View, Google Analytics will show zero results.

    I have seen this happen on two sites, and I thought I did something wrong, so I deleted the View. Has anyone else seen their Google Analytics stats zeroes out after applying thew new View & Filter?

    • markdhansen

      Views do not contain historical data older than the date on which they are created. This has nothing to do with the filter. So if you create a view on Jan 2nd, there will be no data in that view prior to Jan 2nd.

      • chsweb

        Yes, true. It would help folks following the steps above if this bit of information was added or clarified.

        At first, I thought something was wrong, because I forgot that new Views in Google Analytics do not contain old data. The steps above show data there, which most people will not see (unless they get lots of hourly visits), which will cause them (like me) to think that something was wrong.

        The article needs to account for “Confused” people, or first-time Google Analytics users.

      • scottmcandrew

        Thanks for writing this and fielding all these questions Mark. Regarding the new View and Data, I wanted one clarification. In your example above, now that the View exists, if one were to add additional domains to the ‘Exclude non-Ghost Referral Spam’ filter on a later date, you would still be able to view data back to when the view was created (you wouldn’t ‘start over’ by appending the filter). Is that correct? (Thanks again!)

        • markdhansen

          The filter will not “start over” – so you don’t lose any data. But, the filter changes will only be applied to new data coming int. So the additional domain exclusions that you add don’t work retroactively on the data that is already in the view. They will only be applied to new data coming in from the time you add them.

        • markdhansen

          Right, you will not start over. But only the new data will have the additional domains excluded. View filters are not retro-active.

      • chsweb

        I get it, but I suspect many people will follow the directions, word for word, multiple times, and end up frustrated that they can’t see data – not knowing how Views work. They will think they are doing something wrong.

        I’m suggesting that a clarification be added to the article to save the next person some time and potential frustration. Google Analytics is not at all intuitive, readers will need a little help here.

        • markdhansen

          OK, I agree, and added a note about this in the paragraph beginning “To start cleaning this up …”. Thanks for the suggestion!

  • In the past I’ve looked into only using hostname data that was the same as my sites and I’ve found this does not work. The problem I have is that a site where I have all the spam bots filtered and can see the real domains that I know are real referrals and the rest is organic traffic, none of the hostnames showing are for my site. What happened was I had 2 weeks of 0 traffic show up in my analytics under this hostname filter. Again the referrals I am getting are from sites I know and have seen the link that is on their page to my site.

    I have seen where on one of my domains that this was showing up in the past but for some reason it no longer displays my domain in the hostname area for new traffic.

    Suggestions?

    • markdhansen

      The hostname dimension tracks the host of the website being visited, not of the referrer – so it should always be the same as your website domain. If you are seeing 0 traffic using a hostname filter, you must be using a hostname that is different from what analytics is recording for your website. What does GA list as the hostname for your site?

    • markdhansen

      How did you implement the host filter? Sometimes people accidentally filter on the referrer – and yet, that will eliminate valid traffic! You need to filter on the hostname dimension in analytics.

  • Jenny Maden

    Hi, am I right in thinking that results will only be seen going forwards as opposed to retrospectively?

    • Jenny, don’t know if you got an answer but, at least in the free version of Google Analytics, filters only work moving forward, not in arrears.

    • markdhansen

      Tim is right – only forwards. Use segments to filter out data retroactively in your reports.

  • Email blacklist are the easiest way to reduce spam messages. Seo blacklist check will check over 100 DNS based blacklists on a server IP address.

  • BuySaleandTrade

    Thanks for this article! Just done it on mines

    • markdhansen

      Glad it helped!

  • Todd Jamieson

    Curious what some other SEOs/marketers are doing regarding historical data? Are any of you using some sort of master SPAM list to normalize the old historical data. We have one client in particular that once we normalized the 2015 data saw a 28k drop in traffic.

  • nkanalyse

    Hi
    Thanks alot for the article! But I have been having a similar problem:

    I have been get a huge amount of direct traffic from the US with ~99% bounce rates, <1s session duration & with valid hostnames(my domain itself)! What do you suggest in such a situation? They are not from a particular city or service provider so there is no definite pattern to sort. Though it showed traits of a ghost spam, setting up a filter that only includes visits with a custom dimension set to a certain value (ga('set', 'dimension1', 'xxx')) , resulted in all these visits still coming through! Is there anything I should be checking ?

  • Pooja Kshirsagar

    Hi, thanks for the article. Actually, I get spam traffic with 100% bounce rate from a lot of websites. And, the number of sessions coming from them are between 1-3. Now, if I want to block these spam websites, it will take a long time for me to copy and paste every site address and block the domains. Please suggest me what should I do.
    Thanks,
    Pooja

    • markdhansen

      As described in the article – start by removing the ghost referrals using a filter that INCLUDES only your valid hostnames. That should eliminate a large percentage of them. After that, do you still have a LOT of spam websites? Too many to manage manually?

  • Lorenzitto

    Awesome, but I really hope there will be an auto-mode filter creator, but could be so damn slow to check all single links of spam manually…

  • You just made my life and my analytics a lot easier. Thank you.

    • markdhansen

      Glad that the article was helpful!

  • Hello

    Thank you for great explanation about fake referrals traffic. Because two of the links which is give fake referral on my website. I read your blog which is more helpful me to remove that fake referral traffic to my website. Hope so it will not come again.

    Thank You
    David

    • markdhansen

      Great! Glad that the article helped you out!

  • Thanks, this article was extremely helpful!

    • markdhansen

      You are very welcome!

  • Nazar

    Adding filters to your website could be of great value. Ghost spammers suddenly increases our traffic to great extent. Thanks for the blog!

    Nazar

    https://www.chetaru.com/

    • markdhansen

      Glad that you found it helpful!

  • Nazar

    Hello! By adding our hostnames I cannot understand how the spamming sites can be filtered out. Also, can we also restrict the spammers by tick marking the “extract all known bots and spiders”? Well, adding filters worked out! I cannot see the spammers anymore in my analytics report.

    Nazar

    https://www.chetaru.com/

    • markdhansen

      Spammers often use a bogus hostname like apple.com, or something else. When you filter to remove hits with invalid hostnames, you remove that type of spam.

  • Nico

    What do you do when the referral domain is reddit real domain? You can’t filter it out without blocking the real one.

    • markdhansen

      Can you filter based on hostname?

      • Nico

        Yeah but you would be filtering out real reddit referral. this people are spoofing redit(dot)com domain, I don’t know how to filter the fake one cos the domain looks exactly the same (no fake letters like ɢoogle or lifehacĸer)

        • markdhansen

          Actually, no. Read the part of the article about “Eliminating Ghost Referrals” again. hostname tracks the domain(s) where your site is running. Depending on how you are set up, it might have a few valid values, like ‘mycompany.com’ and ‘www.mycompany.com’. But it should never be equal to some other person’s site (like reddit.com or apple.com).

          The hostname for all legitimate traffic on your site should be ‘nicoblog.org’. Legitimate referrals from Reddit will have source = ‘reddit.com’ and hostname = ‘nicoblog.org’. You should filter out any traffic that has hostname != ‘nicoblog.org’.

          • Nico

            Gotcha thanks, i get it now. Great article!

  • I’m surprised Google still hasn’t done anything re this pesky problem, they have been aware of it for some time now…

  • Cardi Chievo

    I have a view affected by ghost referrals, that already has a include filter that use URL as filter field (includes only a section of the website)
    If I add the hostname filter, will be treated as AND or OR ?
    I fear that when applying the second filter Google will overwrite the first one

    • markdhansen

      You should apply the edited filter to a test view so you can confirm that it works as expected before applying it to data that you care about. Secondly, try the steps that Google recommends for verifying your filter: https://support.google.com/analytics/answer/6046990

      Multiple filters will be treated like AND conditions and applied in order. So, if the first filter passes (is included), but the hostname fails, the the hit will be excluded from the View. This is usually the behavior that you want. For example, I have this as an INCLUDE Hostname filter at the bottom of a list of filters applied to my no-spam view on megalytic.com:

      megalytic|brandapitool|digitalbrandmine

      So, after filtering out a few IPs (in previous steps), then I throw out everything that has an invalid hostname.

  • Nicz.I need some tips related to your content..I am working in Erp Software Development Company In India If You need any more information kindly make me call to this number 044-6565 6523.

  • FYI, Google changed the method for adding the regex – Now you need to select Hostname, contains, then only enter a single hostname. Then click OR to add more possible hostnames in the same way. It will dropdown an autocomplete based on what you start typing and you can select one of the choices that matches. This method is used instead of putting them all on one line with | separators and escaped periods.