Web analytics

Reflection on the future of web analytics.

A thought exercise on what will happen next in web analytics in a few years and where it will go.

Data measurement in the browser

Data in users’ browsers will not be collected synchronously [ 1], but will be stored for the user, in his private storage at the subdomain / domain level, in a sandbox. It can be built on PWA right now. The data will be sent in the package only after the consent to the sending by the user (Opt-in) and only towards the given domain and not to the advertising systems. If the user does not approve the sending of measured data within a week, then all measured data will be deleted and even then there will be only a limited package of data, which is to keep the user, will have to save space. Often, only changes to the data will be sent. It is practically nonsense to send the number of displayed colors or the resolution of the monitor with each analytical hit. Typically, data is divided into static and dynamic within a given level. For example, a package setting the mentioned resolution will be sent and a checksum will be made, if it does not change, the data will not be sent again. There is info about the page on the page, again, detailed data sending will be done once and other data about what the user did on the same page will be sent only with a checksum. The size of the data package will be limited and all parameters and other personal data will be cropped. (PII restrictions and anonymization through truncation of unique visit characters). The bonus size will be obtained for the increased set security of the website. Goals and other parts of the analytics will often be pre-calculated from the user, but the final summary will be sent and not the data itself. Higher device requirements will occur, but the data sent will be reduced.

Data for advertising systems

If there is data for advertising systems, it will be sent by the server ( server-site measurement) and not by the user. The owner of the advertising system will no longer really exchange a packet with the user (Sandbox), he will only have data bound to the user, what will have the key and he will be able to discard it at any time. However, if the user is good, he will prove himself with the same key on another website. ( Description Privacy sandbox  Discussion) What is important and the current server-site measurement will be limited, currently it is about “bypassing” the restriction through the fact that cookies are sent via HTTP headers and are not created via Javascript API in the browser. It is already possible to measure data for advertising systems without Javascript and the user has almost no defense and therefore this will be cut. What’s interesting, I think 1st party cookies will remain, but simply no one outside the domain can get to them.

Personal data is not free.

The user gets extra benefits for sending data.
Users do not get a discount right after turning on the measurement, but I have to have enough data of the “accumulated credit” to exchange for the discount. A user with more data will be appreciated. It will be possible to have a credit multiplier of type… I have Linkedin and Facebook connected with this advertising ID, I will provide additional data. Let’s call it the “loyalty system of advertising with elements of gamification” 🙂. The GDPR says that there must be no advantage for personal data. But if you look at it realistically, then social networks practically sell personal data in the target and already earn by knowing more about you, etc. Of course, it’s about the angle of view and maybe even the loop will go through the “loyalty program”.

Enhanced data

In addition to this quality but uncertain personal data with a link to the ad (Javascript in the browser), there will be data about the server-site use of the site, accurate but truncated in detail. They will not be bound to JavaScript and the browser will practically be an analysis of “logs”. There will be statistics on how many “data enhanced” users we have.

Consequences of measurement restrictions

Overall, this will speed up websites and truncate the possibilities of third-party ads and tools, everything will happen on the first party page, external content will be via a cname record , each tool will be hosted on a subdomain of the website. There will be tools for automated management of Cname records, only “connectors” will be connected.

Purchase data and sell data for ad targeting

Then it could be buying or selling data. Advertising targeting data will go to sell and buy. The owner of the site and advertising link will be able to choose whether to sell all the data to the advertising system or whether the advertising system will be able to use only his data within his domain and nowhere else. On the other hand, a large company will combine, for example, 5 large websites into one advertising package and will target internally, but no one else or the owner of the advertising system will be able to use data about these users elsewhere. I would even see at a different price of this data, a person with a higher income will have a higher price for the advertising system, will have a higher Customer lifetime value, will be more lucrative for advertising than a less income internet user. This will create an imbalance, people with less money and people who value privacy will simply not receive rewards from the advertising system and will generally pay more. Yes, it’s quite utopian, but for how much money will they exchange their personal data? How much does it cost that according to the content of your email hosted on a large server, the company will not use it for advertising purposes and will be just yours?

The data will be hosted on the website domain and not in the advertising system

Advertising data will be stored in a sandbox ( Docker ) on the Cname domain , where the DB and API with which the advertising system will be able to communicate, at the level at which users target, but the data will not have the advertising system directly with them, will be decentralized on these sandboxes. It will be hosted in the cloud, on a subdomain of the site.

It will be expensive

So far, the owners of Google Analytics have “paid” for their data (GA and Google ads / Doublick) and thanks to good integration with the Google Ads advertising system, it could be said that GA is an acquisition tool for Google Ads. Google Ads feeds 96% of its advertising. What happens is that if you cut these integrations and submissions out of it, it will become a paid tool. Google has no obligation to give other companies a free tool to make money on. And that is why he will earn money by hosting such advertising systems in the Google cloud. Website owners will again be pushed into the use of services, but this time through easy integration. You pay, you join a Cname domain and everything goes. And pretty much every month, you pay according to the amount of data. There will of course be a super small amount of free data and then the monthly fee will go up with the number of users.

Open-source managed by large companies

Overall, this data collection system, the API between the user ( browser ) web server and then the advertising system should be optimally opensource. So that all advertising systems are equal in terms of data collection. Unified approach as to collect and send data to advertising systems. It will then be better to deal with the display of the ad to the end user, large trusted systems will not have to display the ad, but simply take the money to sell targeting or% of the price of the ad even if the ad is sold by another ad system. As a result, it will be possible to sell more freely through several different systems, advertising space and advertising will tell more stories, as programatic / RTB can already do.

Of course, the question is, what do you think?

This consideration is based in part on what can already be measured and the current possible future limits.

The article was originally created on Facebook, discussion and post is here.