On this page
An introduction to analyzing your email data using SQL and a Redshift datastore.
Using data sent to a Redshift database from Vero, we’ll put together a chart that will help you start to conceptualize how you could build out a detailed and accurate Email Dashboard for your organization.
This will also further the skills you need to analyze your email marketing data when joining it up with the rest of your customer data, using your own data warehouse and SQL charting tools.
A few weeks ago, we talked about how you could graph your Net Churn using SQL. This was to introduce you to the world of data warehousing and detailed analysis using your own data.
The post generated huge interest so this week we’re focusing on email, an area in which we aim to set the benchmark for what’s possible.
Charting open rates for your latest newsletter campaigns
Here’s an example of the chart we’ll be building.
What you’ll need to do this yourself
Here’s the setup we use to get accurate and complete data and to chart it beautifully:
- We use Vero to send all of Vero’s emails. Vero’s campaign management platform helps us create and manage our customer journey so we know who we’re sending what, and when. We send both automated and newsletter email campaigns with Vero.
- Segment’s integration automatically plugs into Vero and allows Vero to send back data on all email deliveries, opens, clicks, unsubscribes and so on into their platform.
- RJ Metrics’ Pipeline product syncs all of our Segment behavioral data to a Redshift database.
- Our data warehouse is hosted on a basic Amazon Web Services (AWS) Redshift cluster.
- We chart our reports using Periscope Data.
- We also use our own database with data about our users (in this case, we won’t be using this data, but we will in the future). Our application uses a mixture of datastores, but a PostgreSQL database contains the most relevant data.
This might sound complex, but all of these steps are a one to five-click process to get running. Once running, you have a really robust pipeline of user interaction data that you can use to build out sophisticated analyses any time you want.
Building your ‘deliveries’ and ‘opens’ data models
When you’re getting started with your SQL email analysis, the hardest part is getting your data in order.
The first step to creating the chart above is to create two SQL Views, one for your deliveries, and one for your opens. These views will represent a nice, clean view of all opens and all deliveries in your datastore, transformed and filtered to remove any junk. We’ll then use these tables to build out our charts.
Let’s dive into deliveries first.
Using the setup described above, your email data sends up in a
Redshift database table via Segment. This is part of a large table
track, representing every user action that you
have tracked via Segment (note: this includes any on-site
activity, not just email deliveries).
For this analysis, create an SQL view called
vero_analysis_deliveries_base. As you can see, this
will view returning all of the raw fields you need for your
create or replace view vero_analysis_deliveries_base as (
veroproduction.track.event as event,
veroproduction.track.user_id as user_id,
veroproduction.track.context__traits__email as context__traits_email,
veroproduction.track.properties__campaign_name as properties__campaign_name,
veroproduction.track.properties__email_subject as properties__email_subject,
veroproduction.track.properties__email_type as properties__email_type,
veroproduction.track.timestamp as timestamp
This view can now be queried anytime and has only the data you need to create your email analysis charts.
The next step is to create an SQL view called
vero_analysis_deliveries_filtered. In this view,
select everything from the previous view, and filter out anything
you don’t need. In this case we don’t want to filter anything, but
it’s worth highlighting this step to help you learn to structure
your data better.
create or replace view vero_analysis_deliveries_filtered as (
vero_analysis_sent_base.event = 'Email Delivered'
vero_analysis_deliveries_base.timestamp is not null
At this point, these views include all the data we need to create our tables, so it’s time to transform and normalize the formats for various columns (such as date columsn).
In this case we only want to see
Email Delivered events (not the other Segment events)
so you’ll want to filter out any other data in the
To do this, create a view called
vero_analysis_deliveries_transformed. This table will
select everything from the
vero_analysis_deliveries_filtered table and transform
various columns, like the
received_at column, into a
create or replace view vero_analysis_deliveries_transformed as (
vero_analysis_deliveries_filtered.user_id as user_id,
vero_analysis_deliveries_filtered.context__traits_email as user_email,
vero_analysis_deliveries_filtered.properties__campaign_name as campaign_name,
vero_analysis_deliveries_filtered.properties__email_subject as campaign_subject,
vero_analysis_deliveries_filtered.properties__email_type as campaign_type,
vero_analysis_deliveries_filtered.timestamp::timestamp as date
That’s it! This is the final view you need to do the analysis. At this point the data is accurate, clean and it has it’s columns formatted in a way that makes it easy to query any time.
Creating your chart
Now that you’ve got everything you need, you can begin charting.
To create the chart mentioned in this post you need to query a
table that has three columns:
deliveries (the number of emails that were
opens (the number of emails that were
open_rate (the percentage open rate).
Here’s the SQL to do this.
with latest_newsletters as (
max(date) as date
campaign_type = 'newsletter'
campaign_name not like '%CLONE%'
opens as (
count(user_email) as opens
deliveries as (
count(user_email) as deliveries
(opens.opens::decimal / deliveries.deliveries::decimal) as open_rate
opens.campaign_name = deliveries.campaign_name
opens.campaign_name in (
What is presented here comprises of three key steps.
Firstly, this SQL creates a temporary table that returns only the
top ten most recent newsletters that you’ve sent.
It does this by looking at the deliveries table and, for each
campaign (grouped by the campaign name), returns the time of the
max) email that was delivered.
You can then order by this field and return just ten campaigns. It’s also important, for this example, to filter down just the newsletters and not other automated campaigns (since you’ll have data on all of your emails from Vero).
Secondly, this SQL creates temporary tables that count how many
deliveries and opens there were per
Each of these temporary tables is separate. They return just the
campaign_name and the count for the type of
interaction we’re looking at.
Finally, we combine the campaign_name and opens temporary tables
to return the
campaign_name, the count of emails
delivered and the count of emails opened. As part of this step we
can do some maths to return a decimal number representing the
percentage of emails that were opened.
That’s it! The final output is a table that you can use to create the graph above.
I charted my example using a bar chart to help compare absolute opens with absolute deliveries. I then used a line chart to highlight large peaks in open rate on a second Y axis. This gives a clearer perspective of each metric.
This post was just an appetizer. This is a very basic analysis of your email data and forms the basis for much more complex, useful and interesting analysis.
If this looks useful to you, check out Vero.
If you are a Vero user and want to learn more about how you can do this drop us an email and we’d love to talk about it.
Share your thoughts and ideas in the comments below! I’d love to see what sorts of analyses you’ve built!
Want to send more personalized mobile and email messages to your users?Learn more
How Vero helps Dribbble take full advantage of their customer data to improve personalization