Spam has returned to Google analytics again. A little nostalgia for Google Universal Analytics? I’ve come across it on several projects, somewhere it’s less visible and somewhere more. Mostly, it’s driving up the numbers of smaller projects.
A bit of a story. This is a sample of a developer debug profile where I filtered out the human trafic from most of it, there are about 3 of my approaches left in there. So what you’re realistically seeing is 99% spam.
- Who’s affected? Most of the GA4 property.
- Does everyone have to deal with this? Probably not, for large GA4 with a lot of traffic you won’t even notice it because it will make really promile traffic. But it’s good to have clean data.
- What about small sites? It’s worse there, it can skew the data and look weird in reports.
- Is this a big problem yet? No, but it’s getting worse. It’s good to know about it. There’s no need to get too stressed about it.
What does this type of spam look like in GA4?
I warn you in advance that there are several types of spam from different creators and I have chosen just two of them for the example, which I encounter most often.
What do you see in the following image?
Spam in GA4 traffic report (session scope)
- Traffic in the Traffic acquisition report has no source
- Not a lot of “users”.
- Engagement sessions “0”
- It doesn’t have visits “0” (visit level), which is due to using the ignore_referrer = true parameter
- Makes a very decent spike in data.
Spam in GA4 export to Google big query
The user’s first visit / start of visit has a source with a spam target. Which here is the “urlumbrella.com” site that tries to sell you that by sending fake trafiks to Google analytics 4 for bosses that don’t care about real performance, just numbers. And as a spam promotion all GA4. This then implies, what you will see later in the images, that they are trying to emulate partly the user and play long time on the page. Then in Google big query you don’t see any other parameters in the data you normally have sent, which implies that this trafic never went through your server, but was sent directly to GA4. This means that this spam can’t be eliminated by using protection on the server/hosting or in Google tag manager or Server GTM. It is highly probable, so once a crawler (a bot that crawls the internet) visits your site and then gets spammed the following days. The reason why I don’t have this data in my main GA4 profile is because I already knew on the initial visit that this is spam and I’ve subtracted this GA4 developer property from them. The other interesting thing is that the referral is not even true, because there are real hardcoded event source and medium parameters.
Spam in GA4 report from User acquisition (Source of first user visit)
By using the ignore_referrer = true parameter, this spam data is only visible at the First User source/medium level.
Spam in GA4 event report
In the following image you can see that there are a bunch of page_views around 30-32pcs. Also a bit interesting is the number of “first visits” and starts of visits which have 1.1, there it should normally be 1.
Spam in GA4 site report
Here you can then look at the fact that all like traffic was to the homepage of the site. Of the important things here you can see the time on page and it’s practically ten minutes. Which is a thing that will be repeated on multiple sites.
Spam in GA4 demographics report
A bit of demographics… the tabloids spread across different states. Very similar time.
Spam in GA4 technology report
And then there are the beautifully balanced numbers.
- Operating systems evenly split. Their versions are different.
- All from the desktop
- Browsers with practically the same layout, also with unique versions.
- The resolution is close to the classic distribution in normal traffic.
Another type of referral spam in GA4.
Namely, for example, traffic from:
- news.grets.store / referral
- static.seders.website / referral
- rida.tokyo / referral
- info.seders.website / referral
- kar.razas.site / referral
- trast.mantero.online / referral
- game.fertuk.site / referral
- ofer.bartikus.site / referral
- garold.dertus.site / referral
The difference is that they are not as smart and do not play for time on the page and a lot of pageviews. As a bonus, they also send a “scroll” event. They can also be seen in the traffic session source / medium tab.
Sure there will be a bunch of other spam from different creators, but these two types were probably the most well known.
How to prevent referral spam in GA4?
Theory
The chance that you will prevent the attacker’s robot from entering the website for the first time is very small, just change its IP address and use a headless browser (it can be detected, but it is more difficult to determine that it is not a real person), which will look like a completely normal user. Therefore, there is a very high chance that the attacker’s bot will learn your GA4 measure ID (“G-something” + domain). But this is currently known if you measure in Google consent mode v2 advance mode (you also measure static data even if you do not agree to save cookies in the browser). Positive news for people who do not measure statistical data without cookies (the script is only loaded after consent), so they will probably avoid this spam :), because the robot will not find your GA4 measure ID. If you don’t have a consent bar at all, then you should be ashamed and you deserve the spam 😉 .
In reality, the attacker’s robot does not visit your website, but only sends simulated traffic directly from its server to Google analytics. So there is no way to prevent it, help exclusion help protection on the server / hosting or in GTM itself.
And I use that in my defense. By modifying the event or modifying the traffic_type parameter via GTM, the attacker will not send this change and I will then exclude all newsagents where this filled data is missing. Currently, as far as I know, there is no other defense than this procedure. And that’s also thanks to the limits that are in the GA4 settings, where only the traffic_type change and no other can be used for exclusion.
Method:
Warning
If you do not perform the check correctly before the final activation of the filter, then you can permanently delete the new data sent to Google analytics 4 with a bad implementation.
Creating a custom traffic_type variable.
- This is done in GA4 > Administration > Property settings > Data display > Custom definitions.
- There button “Create custom dimension”
3. Insert the “traffic_type” parameter into the event
4. In the name “traffic_type_cd” – I put _cd there, in the end I could distinguish it, etc.
5. And then “Save”
Filling the variable in traffic_type with your value.
- Version in GA4 UI admin (Recommended)
- GA4 UI admin > Data collection and modification > Data Streams > Select your data stream > Events > Modify events > Create
- According to the picture
b. Create a name for yourself. For example “traffic_type antispam”
c. Enter “traffic_type” in the parameters and set the condition to “does not match regular expression” and enter “..*” in the value
d. At the bottom of the midified parameters, you enter “traffic_type” and a value such as “my_domain” or something you come up with.
- Version in GTM. – this version of the implementation has a certain problem, but also some advantages. Advantage: If you have a lot of GA4, you don’t need to set them manually. Disadvantage: In GTM, it is not possible to read the filter set using the IP address, so it will be overwritten when the traffic_type parameter is used . I do not recommend it if you are using the exclusion via IP addresses implemented in the GA4 UI Admini interface.
- GA4 + GTM combo: In GA4, use a filter that fills in the empty traffic_type and GTM only excludes things I don’t want. Personally, I don’t recommend excluding data and prefer things that I know I don’t want to be forwarded directly to the GA4 visual property (separate GA4 property).
- GA4 + sGTM version. You can modify the “traffic_type” in the GTM server and this will avoid the problem where the traffic_type is normally visible to the user. But for now, no one but you knows about the modification in sGTM, because the data continues to go directly to the Google server. This is pretty much the ultimate protection. It’s just not for everyone, but if someone regularly spams you with data, it’s a solution. However, there is a risk that your budget will be spent on spam, so it is good that similar spam is cut before it reaches sGTM.
- Advanced risks. If you use the measure protocol or measure mobile applications, you have to manually add traffic_type there, otherwise this data will be discarded because it will not pass the filter.
Creating a filter that allows only data marked with the appropriate traffic_type and turning on testing
- GA4 UI admin > Data collection and modification > Data filters > Create Filter > Internal Traffic
- According to the picture
- In data filter name, write the name of the given filter
- Set “Include Only“
- Set the traffic_type to your value from the previous steps. In my case “my_domain“
- Set Filter state “Testing“
- Save.
Please wait 3-4 days before checking
It is very important to collect data to verify the correctness of the settings.
Check, everything works as it should.
- Go to GA4 Ui > Report > Engagement > Events
- Set the date so that they are there at least two full days from the initial setup.
- Here click “+” and select “ Test data filter name”
- Check
- Right above the “Rows per page:” table “select 250”
- It should then look like this. That is, for all events in the first column, the name of the filter should be filled in … the value in the second column. – Everything is OK.
- If a value is missing in the second column, I recommend checking whether it is SPAM (which is correct) or if you are missing a measurement somewhere (measure protocol, mobile application, etc.) – If there is an unmarked event and it is not spam, then it is something wrong and do not activate the given antispam filter.
If everything is OK, we turn on the spam filter
Can go back to GA4 UI admin > Data collection and modification > Data filters > Select your filter and switch it from testing to Activate and save. This turns the filter on.
Automation is not yet possible.
So far, there are no functions in the GA4 admin API that would allow you to “modify an event” and add a traffic filter.
Is it enough?
Against exactly that type of spam attack, yes, but it is generally good to have additional layers of protection:
- In both GTM and sGTM you can use the bot detection variable template from the great Markus Baersch.
What a lot of people don’t do is to block marketing bots from running unnecessarily and thus have cleaner data.
2. Detection of headless browsers . – I’m researching this and it’s quite challenging to figure out what’s a robot and what’s a real person.
3. User testing via Google recaptcha – It will tell you in retrospect if it was a bot.
4. You can detect my spam traffic on the server and send these things to GTM via dataLayer (dataLayer push before GTM without the “event” parameter so that it arrives before approvals)
5. Allow measurement only on the given domain . – Here you cut off translations from Google translate and similar services. But even that can be fixed with a little effort.
I am quite a believer in not throwing away such traffic if possible, but redirecting it to the developer profile of Google analytics 4. The main advantage of such an approach is that it can happen that sometimes you accidentally mark real traffic as spam, but if the data is only redirected to the GA4 dev property, then you still have data.
6. Retrospectively clean your data … in GA4, exclude data from important reports via segments. It’s more work, but you can save the data retroactively. Spam data will remain there, but it is not necessary to report it ;).
7. Data cleaning in Google big query etc. This is a special category where it is necessary to know what a fake tobacconist is, because this will save you a lot of time over some meaningless numbers during analyses.
What about the other guides? Sorry, but they don’t work?
Excluding the referral in the GA4 data stream only masks this fake traffic, but does not clean it. It won’t exclude her, the spam will still be there, you just won’t recognize it. In addition, it is not even a real referral, but a fake referral is sent there via a source and medium (something like a UTM that says in text that it is a referral, even though it is not), so you have no way to exclude it anyway and it will not work.
Marking a newsagent through the exclusion of IP addresses only works on known IP addresses, you will not clear anything new. You will always be behind in spam protection, it is not possible to treat it permanently.
Where can SPAM harm you?
- It can throw you, for example, advertising campaigns on Google ads, Meta and other systems.
- It can change your AB test results. Here you should clean the data from spam, super users, developer test scripts and availability etc. before evaluation.
- For data analysts… it increases the cost of data processing and storage. It impairs the purity and credibility of the results.
Have you already found spam in Google analytics 4?
Write me in the comments on social networks. The same for questions or other experiences with GA4 spam.