Fighting the Fog - Machine Learning versus Unconsented Data

William Woods | 9 May 2025

The Data Dilemma We Face

Digital marketing attribution and web analytics have never been more powerful, or more frustrating. As privacy rules tighten and cookie tracking gets sidelined, those of us working with analytics are left squinting at our dashboards wondering, “where did half my tracking data go?”

The main issue? Missing channel attribution. When users opt out of tracking, the ‘where-did-they-come-from’ bit disappears. Was it Email, Organic, Paid Social? No idea. And that missing piece can break your entire attribution model, leading you to overfund underperformers or overlook your most effective channels.

Google’s Consent Mode offers some help, filling gaps using statistical modelling. But it’s a broad-brush fix. Sometimes, you need precision, and that’s where custom machine learning for marketing attribution comes in. It provides a smarter, tailored way to reconstruct the full user journey, even when traditional tracking fails.

The Shrinking Data Window

Let’s talk scale. Your starting point might be 100% of user traffic, but as tracking restrictions kick in, the amount of observable data starts shrinking rapidly:

GDPR, CCPA, and similar regulations reduce visibility by around 20%, leaving you with just 80%.
Adblockers knock that down further to 64%.
Apple ITP, Firefox, and other privacy-first browsers can drop you to 45%.
By the time we hit Chrome’s expected 2025 updates, you might only see 19% of your original traffic.

The consequences are serious: unreliable KPIs in analytics tools, difficulties in attributing ROI, weakening retargeting performance, and the erosion of data-driven marketing altogether.

Teaching a Model to Connect the Dots

What if you could train a model to learn from user journeys where the channel is known, and then use that knowledge to predict the missing bits in journeys where the channel is blank?

That’s the idea behind this project. I created a machine learning model that learns to recognise patterns in both the summary of a journey and the step-by-step flow of events. Think of it like training a detective: it spots patterns in known cases and uses that to solve new mysteries.

The model doesn’t rely on statistical averages like other models, it learns patterns across user behaviour, campaign metadata, and temporal sequences. That said, it reflects the distribution of channels seen in training, so more common channels will naturally have stronger learned representations.

Feeding the Model

Everything starts with BigQuery. Specifically, I'm working with Google Analytics 4 (GA4) data exported to BigQuery.

The GA4 BigQuery export contains detailed event-level data from your website or app without relying on cookies for tracking. But what makes this data particularly powerful for modelling isn't just the standard GA4 parameters, it's the custom dimensions that businesses can define and pass with each event.

For example, an e-commerce site might pass custom dimensions for product price brackets, while a content site might track content topics or reading time thresholds. When these custom dimensions are incorporated into the model alongside standard GA4 parameters, they create more accurate channel predictions by adding business context to behavioural signals.

I group those events by user and line them up in the order they happened. For each journey, I create two views:

Aggregated features: a summary snapshot of behaviour across the journey
Sequential features: the journey in full, step-by-step, to catch patterns over time

I convert all of this into dense numerical arrays using a handy tool called ‘DictVectorizer’, which translates a mix of categorical and numerical features into a standardised format that the model can process. This effectively turns complex user journey data into a structured numerical matrix suitable for training.

By using both the standard GA4 export and your unique custom dimensions, the model effectively learns the specific patterns of your business and customers, not just generic browsing behaviours.

Under the Bonnet

Now to dive a bit into the ‘technicals’. The model has two parallel branches. An aggregated branch captures high-level frequency signals (e.g. how often a user interacted with a campaign or used a specific device), while the sequential branch preserves event order to pick up temporal dependencies (e.g. campaign -> browse -> purchase).

Aggregated features branch: goes through a Dense (fully connected) layer with 128 neurons and a ReLU activation. This distils the whole journey into a kind of behaviour summary.
Sequential features branch: starts with a Masking layer to skip over padded steps, then feeds into an LSTM (Long Short-Term Memory) layer with 128 units. LSTMs are brilliant at learning from sequences, perfect for time-based user journeys.

I then combine the outputs of both branches (with a Concatenate layer) and send them through a final Dense layer with a SoftMax activation that produces the most probable channel.

Training the Model

I use the middle portion of each journey for labelling and feature extraction. this helps avoid edge effects like noisy landing or exit events and focuses the model on the most stable behavioural signals. To keep it accurate, I used an 80:20 train-validation split.

Optimiser: Adam (a favourite in the ML world)
Learning rate: 0.001
Loss function: categorical cross-entropy
Epochs: 10
Batch size: 32
Safety net: early stopping (so I don’t overcook it)

Every neuron learns to combine features, apply weights, and spot patterns. The LSTM layer especially learns how behaviour unfolds over time, which is super handy for predicting what a journey should look like, even when some data is missing.

Imputation in Action

Once the model’s trained, it’s ready to do its real job, predicting the missing channel info.

It takes incomplete user journeys, runs the same feature extraction process, and produces a likely channel classification. The script then writes that back into the data, so you end up with a far more complete view.

Using event-level data from our BigQuery instance on hookflash.co.uk, I trained a model that predicts the primary channel with 99.2% accuracy. And that’s based on a relatively small dataset.

For websites with larger volumes, from thousands to hundreds of thousands of users with rich user journeys, that accuracy can climb toward 99.99%. In real terms, it means you could recover around 40% of missing channel data with a high degree of confidence.

Basically, it’s like putting on glasses: the blurry stuff suddenly becomes much clearer.

Why It Matters

If your channel data is riddled with holes, your campaign reporting (and budget allocation) is basically running on assumptions. This model brings back clarity.

What makes it powerful is that it’s not a one-size-fits-all fix. It learns from your data and understands the nuances of your user journeys. Plus, because it uses both high-level summaries and fine-grained sequences, it balances context with detail.

It’s explainable, reliable, and designed to give you confidence, not just in your models, but in your marketing decisions.

In a Nutshell

I built a deep learning, dual-input model that helps fight back against data loss by imputing missing channel values in user journeys. By combining a zoomed-out and zoomed-in view of behaviour, the model creates predictions that are both accurate and insightful.

With this approach, you can:

Sharpen your attribution accuracy
Recover lost insights
Make better budget decisions

All without having to sacrifice user privacy or your sanity.

Fancy Giving It a Go?

If this piques your interest and you want to try it on your own data, get in contact with our team.
Even in a privacy-first world, there’s still room for clever, ethical insights.

Your analytics deserves better than missing data.

For the Technically Curious

If some of the technical terms caught your interest, or you want to understand the deeper mechanics behind the model, here are a few curated resources:

What is a Summary Vector?
Understand how high-dimensional behavioural data can be reduced into a single compact representation used for modelling.
https://neptune.ai/blog/understanding-vectors-from-a-machine-learning-perspective
What Does 'Bias' in Neural Networks Mean?
Learn why adding a bias term in each neuron is critical for flexible learning and decision boundaries.
https://www.turing.com/kb/necessity-of-bias-in-neural-networks
What Are Kernels in Machine Learning?
A look at how kernels help linear models solve nonlinear problems, and what they mean in deep learning contexts like dense layers.
https://dida.do/blog/what-is-kernel-in-machine-learning
How Gating Mechanisms Work in LSTMs
Explore how LSTM cells decide what information to keep or forget across time steps using gates.
https://d2l.ai/chapter_recurrent-modern/lstm.html
Epochs vs Batches

Clarifies how training iterations are structured useful for those trying to grasp what “10 epochs with batch size 32” actually means.
https://machinelearningmastery.com/difference-between-a-batch-and-an-epoch/

What does ReLU mean?

Explains why this activation function is often used in neural networks to introduce non-linearity.
https://machinelearningmastery.com/rectified-linear-activation-function-for-deep-learning-neural-networks/

< Older Post

Newer Post >

Want to have a chat?

Chat through our services with our team today and find out how we can help.