Sampling in Google Analytics occurs when the program takes a subset of data from your website traffic and reports on trends available in that sample set. Website owners and marketers rely on this data to see how well their website is performing, and to analyze any changes that may have happened to the website.
How is Google Analytics Data Sampled?
When you look at a Standard Report, you will see unsampled data. Meaning, if you look at “All Traffic” during any given period of time, you will see the correct numbers as reported by Google. However, once you add a filter, custom report, or segment to this data (organic traffic, blog traffic, pages with more than 1,000 visits, etc.) your data becomes sampled. You will know your data is sampled when you see the yellow box in the upper right hand side by the date.
By default, this sample number is set by Google at 250,000 visits (not pageviews). However, Google does allow you to adjust this number up to 500,000. If you have a large amount of data, keep in mind this can really slow down the processing time.
When Sampling Can Be a Problem
For small websites with minimal traffic, sampling isn’t as much of an issue. However, for large traffic websites (think of Amazon…they most likely receive over 250,000 visits in less than a day), the sampling percentage number becomes lower and lower. When you see particularly small numbers, as in fewer than 2%, you’ll begin to see similar numbers across the board. This doesn’t necessarily only happen with such low percentages, either. You can see this trend with higher percentages as well (I’ve seen it with percentages up to 75%).
Another issue is that different date ranges produce different results. For example, when I look at just November data for this particular profile, I see 1,084 visits.
Now, let’s say I want to compare December to November.
Excuse me, but how did my total visits go from 1,084 to only 332? That’s because one sample is based on 70% of visits and thus includes higher numbers. The other compares a larger date range and only compares 40% of total visits.
Generally, when you pull the data a specific way the numbers will remain fairly accurate. Every time I look at November data only, I see 1,084 visits. However, the numbers do slightly change when I select a different primary dimension. The difference when selecting a date range? For this profile, comparing singular month data yields a 1,095% increase (from 1,084 to 12,963). Comparing the previous period directly in Analytics yields a 4,374% increase (from 332 to 14,854). Quite the difference.
So How Does the Issue Get Fixed?
In short, unless you want to give Google $150,000 a year for a Premium account with unsampled data, you will always receive sampled data. For the 99% of us who choose not to pay this fee, there are other things you can try:
- Use smaller date ranges – Smaller date ranges mean smaller numbers and therefore your visits will be based on a higher percentage of sampled data. Keep this data in a spreadsheet (whether a client report or for your own personal records) and rely on this more accurate data instead.
- Use a Standard Report when possible – This includes real time reports, audience reports, traffic source reports, content reports, and conversion reports.
- Always look at data the same way – If you look at the total visits under “All Pages” with a specific filter, make sure to always use that same view. Comparing data from different sections of Google Analytics will most likely yield different data.
- Create different profiles with filters – For SEO’s, applying an “Organic Traffic” advanced segment or looking at the Organic traffic under Keywords means you’ll have sampled data. If you can create a new profile with a filter that only captures organic visits, the Standard Reports in this profile will remain unsampled (any additional segments or filters on top of this will yield sampled data.)