You Can Now Filter Bots & Spiders in Google Analytics

You Can Now Filter Bots & Spiders in Google Analytics



While there have been complicated workarounds for discounting bots in Google Analytics, Google has never offered a way for users to automatically filter various bots.  However, with more bots executing javascript, which executes the Google Analytics script on the page, it is definitely causing some users’ data to become skewed, particularly if they get hit hard by a bot gone berserk.
Fortunately for advertisers, Google is finally giving us the option.  Google Analytics has announced a new way to filter bot traffic automatically with a new setting in the management user interface.
The new option will automatically remove the “known bots” from reporting in Google Analytics.  This is still turned off by default, so users will need to turn the option on in order to utilize it.  However, there are reports that it doesn’t work retroactively.
It can be applied on a site by site basis, so you can enable filtering of bots on some sites, while leaving bots included in reporting on other sites.
When viewing the site you wish to filter bots on, click the “Admin” link at the top of the page and in the View column, select “View Settings”.

The option is now available for all Google Analytics users.

How Google Analytic Works

 Information

Understanding the Google Analytic architecture. How it collects data, processes data, and creates reports. This is the key to understanding many of the advanced topics that we will discuss later on in another article.
Google Analytics is no longer a simple “hit collector” for websites, but rather an information Aggregation system that collects data from standard websites, mobile websites, Adobe Air applications, and iPhone and Android apps. Google has progressively added more data collection methods as technology has driven new and different ways of distributing content to people.

Data Collection and Processing

Google Analytic uses a common data collection technique called page tags. A page tag is a small piece of JavaScript that you must place on all the website pages you want to Track. We affectionately call this code the Google Analytic Tracking Code, or GATC For short. If you do not place the code on a page, Google Analytic will not track that Page.
The data collection process begins when a visitor requests a page from the web server. The server responds by sending the requested page back to the visitor’s browser. As the browser processes the data, it contacts other servers that may Host parts of the requested page, like images, videos, or script files. This is the case with the GATC.
When the visitor’s browser reaches the GATC, the code begins to execute. During execution, the GATCidentifies attributes of the visitor and her browsing environment, such as how many times she’s been to the site, where she came from, her operating system, her web browser, etc.
After collecting the appropriate data, the GATC sets (or updates, depending on the situation) a number of first-party cookies, which are discussed later in this section. The cookies store information about the visitor. After creating the cookies on the visitor’s machine, the tracking code waits to send the visitor data back to the Google Analytics server.
While the data is collected and the cookies are set, the browser is actively downloading a file named ga.js from a Google Analyticsserver. All of the code that Google Analytics needs to function is contained within ga.js.
Once the ga.js file is loaded in the browser, the data that was collected is sent to Google in the form of a page view. A page view indicates that a visitor has viewed a certain page on the website. There are other types of data, like events and e-commerce data that can be sent to Google Analytics.
The page view is transmitted to the Google Analytics server via a request for an invisible GIF file named __utm.gif. Each piece of information the GATC has collected is sent as a query-string parameter in the __utm.gif request, as shown below:
=cutroni.com&utmcs=UTF-8&utmsr=1152×720&utmsc=24-bit&utmul=en- s
&utmje=1&utmfl
=10.0%20r42&utmdt=Analytics%20Talk%20by%20Justin%20Cutroni&utmhid
=465405990&utmr=-&utmp=%2Fblog%2F&utmac=UA-XXXX-1&utmcc=
__utma%3D32856364.1914824586.1269919681.1269919681.1269919681.1%3B%2B
__utmz%3D32856364.1269919681.1.1.utmcsr%3D(direct)
%7Cutmccn%3D(direct)%7Cutmcmd%3D(none)%3B&gaq=1


When the Google Analyticsserver receives this page view, it stores the data in some type of temporary data storage. Google has not indicated exactly how the data is stored, but we know that there is some type of storage for the raw data. Think of this data storage as a large text file or a log file.
Each line in the log file contains numerous attributes of the page view sent to Google.
This includes:
• When the data was collected (date and time)
• Where the visitor came from (referring website, search engine, etc.)
• How many times the visitor has been to the site (number of visits)
• Where the visitor is located (geographic location)
• Who the visitor is (IP address)
After the page view is stored in the log file, the data collection process is complete. The data collection and data processing components of Google Analytics are separate. This ensures Google Analytics will always collect data, even if the data processing engine is undergoing maintenance.
The next step is data processing. At some regular interval, approximately every 3 hours, Google Analytics processes the data in the log file. Data processing time does fluctuate. Google Analytics does not process data in real time. While data is normally processed about every 3 hours, it’s not normally complete until 24 hours after collection. The reason the data is not complete until 24 hours after processing is that the entire day’s data is reprocessed after it has been collected.
Be aware that this processing behavior can lead to inaccurate intraday metrics. It is best to avoid using Google Analytics for real-time or intraday reporting.
During processing, each line in the log file is split into pieces, one piece for each attribute of the page view. Here’s a sample log file; this is not an actual data storage line from Google Analytics, but a representation:


65.57.245.11 http://www.google.com[28/Jan/2014:19:05:06 −0600]
“GET __utm.gif?utmwv=4.6.5&utmn=1881501226&utmhn=cutroni.com&utmcs=UTF-8&utmsr
=1152×720&utmsc=24-bit&utmul=en-us&utmje=1&utmfl=10.0%20r42&utmdt
=Analytics%20Talk%20by%20Justin%20Cutroni&utmhid=465405990&utmr
=-&utmp=%2Fblog%2F&utmac=UA-XXXX-11&utmcc
=__utma%3D32856364.1914824586.1269919681.1269919681.1269919681.1%3B%2B
__utmz%3D32856364.1269919681.1.1.utmcsr%3D(direct)%7Cutmccn%3D(direct)
%7Cutmcmd%3D(none)%3B&gaq=1″__utma
=32856364.1914824586.1269919681.1269919681.1269919681.1; __utmb
=100957269; __utmc=100957269; __utmz=100957269.1164157501.1.1.utmccn
=(direct)|utmcsr=(direct)|utmcmd=(none)”


While most of this data is difficult to understand, a few things stand out. The date and time (Jan 28, 2014 at 19:05:06) and the IP address of the visitor (65.57.245.11) are easily identifiable.
Google Analytics turns each piece of data in the log file record into a data element called a field. Later, the fields will be transformed into dimensions. For example, the IP address becomes the Visitor IP field. The city that the visitor is visiting from becomes the Visitor City field and the City dimension.
It’s important to understand that each page view has many, many attributes and that each one is stored in a different field or dimension. Later, Google Analytics will use fields to manipulate the data and immersions to build the reports.
After each line has been broken into fields and dimensions, the configuration settings are applied to the data. 

This includes features like:
• Site search
• Goals and funnels
• Filters
Finally, after all of the settings have been applied, the data is stored in the database
Once the data is in the database, the process is complete. When you (or any other user) request a report, the appropriate data is retrieved from the database and sent to the browser.
Once Google Analytics has processed the data and stored it in the database, it can never be changed. This means historical data can never be altered or reprocessed. Any mistakes made during setup or configuration can permanently affect the quality of the data. It is critical to avoid configuration mistakes, as there is no way to undo data issues.
This also means that any configuration changes made to Google Analytics will not alter historical data. Changes will only affect future data, not past data.

Reports


When you log in to Google Analytics to view a report, Google Analytics creates that report in real time. Reports are created by comparing a dimension, like the Visitor City, to a numerical piece of information called a metric. Metrics include common web analytics numbers like visits, page views, bounce rate, conversion rate, revenue, etc. When viewed alone, a metric provides a site-wide total for that metric. But when viewed compared to a dimension, the metric represents the total for that specific dimension. For example, a website may have a conversion rate of 2.87%. The metric in this case is conversion rate and the value is 2.87%. However, if you view conversion rate based on the City dimension, Google Analytics will display the conversion rate for each country in the database.
Each row in above images is a different value for the City dimension. Notice that there are many columns, or metrics, in the report. Google Analytics can associate many different metrics for a single dimension. Almost every report is created in the same manner. Google Analytics displays various metrics for a given dimension. If you are interested in a certain metric that Google Analytics does not include in a report, you can create a custom report to display that metric for the dimension.