When Firefox Preview shipped, it was also the official launch of Glean, our new mobile product analytics & telemetry solution true to Mozillas values. This post goes into how we got there and what it’s design principles are.
In the last few years, Firefox development has become increasingly data-driven. Mozilla’s larger data engineering team builds & maintains most of the technical infrastructure that makes this possible; from the Firefox telemetry code to the Firefox data platform and hosting analysis tools. While data about our products is crucial, Mozilla has a rare approach to data collection, following our privacy principles. This includes requiring data review for every new piece of data collection to ensure we are upholding our principles — even when it makes our jobs harder.
One great success story for us is having the Firefox telemetry data described in machine-readable and clearly structured form. This encourages best practices like mandatory documentation, steering towards lean data practices and enables automatic data processing — from generating tables to powering tools like our measurement dashboard or the Firefox probe dictionary.
However, we also learned lessons about what didn’t work so well. While the data types we used were flexible, they were hard to interpret. For example, we use plain numbers to store counts, generic histograms to store multiple timespan measures and allow for custom JSON submissions for uncovered use-cases. The flexibility of these data types means it takes work to understand how to use them for different use-cases & leaves room for accidental error on the instrumentation side. Furthermore, it requires manual effort in interpreting & analysing these data points. We noticed that we could benefit from introducing higher-level data types that are closer to what we want to measure — like data types for “counters” and “timing distributions”.
Another factor was that our mobile product infrastructure that was not ideally integrated yet with the Firefox telemetry infrastructure above. Different products used different analytics solutions & different versions of our own mobile telemetry code, across Android & iOS. Also, our own mobile telemetry code did not describe its metrics in machine-readable form. This meant analysis was potentially different for each product & new instrumentations were higher effort. Integrating new products into the Firefox telemetry infrastructure meant substantial manual effort.
From reviewing the situation, one main question came up: What if we could provide one consistent telemetry SDK for our mobile products, bringing the benefits of our Firefox telemetry infrastructure but without the above mentioned drawbacks?
In 2018, we looked at how we could integrate Mozilla’s mobile products better. Putting together what we learned from our existing Firefox Telemetry system, feedback from various user interviews and what we found mattered for our mobile teams, we decided to reboot our telemetry and product analytics solution for mobile. We took input from a cross-functional set of people, data science, engineering, product management, QA and others to form a complete picture of what was required.
From that, we set out to build an end-to-end solution called Glean, consisting of different pieces:
Our main goal was to support our typical mobile analytics & engineering use-cases efficiently, which came down to the following principles:
To make sure that what we build is true to Mozilla’s values, encourages best practices and is sustainable to work with, we added these principles:
One crucial design choice here was to use higher-level metric types for the collected metrics, while not supporting free-form submissions. This choice allows us to focus the Glean end-to-end solution on clearly structured, well-understood & automatable data and enables us to scale analytics capabilities more efficiently for the whole organization.
So how does this work out in practice? To have a more concrete example, let’s say we want to introduce a new metric to understand how many times new tabs are opened in a browser.
In Glean, this starts from declaring that metric in a YAML file. In this case we’ll add a new “counter” metric:
browser.usage:
tab_opened:
type: counter
description: Count how often a new tab is opened. …
…
Now from here, an API is automatically generated that the product code can use to record when something happens:
import org.mozilla.yourApplication.GleanMetrics.BrowserUsage
…
override fun tabOpened() {
BrowserUsage.tabOpened.add()
…
}
That’s it, everything else is handled internally by the SDK — from storing the data, packaging it up correctly and sending it out.
This new metric can then be unit-tested or verified in real-time, using a web interface to confirm the data is coming in. Once the product change is live, data starts coming in and shows up in standard data sets. From there it is available to query using SQL through Redash, our generic go-to data analysis tool. Other tools can also later integrate it, like the measurement dashboard or Amplitude.
Of course there is a set of other metric types available, including events, dates & times and other typical use cases.
Want to see how this looks in code? You can take a look at the Glean Android sample app, especially the metrics.yaml file and its main activity.
The first version of the Glean solution went live to support the launch of Firefox Preview, with an initial SDK support for Android applications & a priority set of data tools. iOS support for the SDK is already planned for 2019, as is improved & expanded integration with different analysis tools. We are also actively considering support for desktop platforms, to make Glean a true cross-platform analytics SDK.
If you’re interested in learning more, you can check out:
We’ll certainly expand on more technical details in future upcoming blog posts.
While this project took contributions from a lot of people, I especially want to call out Frank Bertsch (data engineering lead), Alessio Placitelli (Glean SDK lead) and Michael Droettboom (data engineer & SDK engineer). Without their substantial contributions to design & implementation, this project would not have been possible.
Introducing Glean — Telemetry for humans was originally published in Georg Fritzsche on Medium, where people are continuing the conversation by highlighting and responding to this story.