This project contains code related to running PPC analysis. As of this writing, that includes only AdWords.
The analysis is broken down into 4 distinct steps:
- Get the data
- Format and enrich the data (including the creation of an
event log
akaelog
) - Create data frames for tabulation/visualization
- Create visualizations - tables (.csv files) and graphical plots
An event log can be thought of as a chronological ledger. Whenever an event happens, it is marked in the ledger. A good example of this is a transaction log whenever someone orders from Power Supply:
Date | Transaction Amount | Recurring? | Meals | User ID | User Name | Campaign ID | Campaign Name |
---|---|---|---|---|---|---|---|
2016-01-01 06:04:15 UTC | $50.00 | Yes | Paleo - 5 Lunches, 5 Dinners | 1 | Alice | 1 | DC Paleo Performers |
2016-02-01 10:29:12 UTC | $50.00 | Yes | Paleo - 5 Lunches, 5 Dinners | 1 | Alice | 1 | DC Paleo Performers |
2016-02-05 00:10:40 UTC | $200.00 | No | Mixitarian - 3 Lunches, 3 Dinners | 2 | Bob | 4 | LA Paleo Performers |
In the above transaction log, every time a single person takes an action, it's logged as an event with the exact second the event occurred. Pretty basic stuff.
But it doesn't always need to be that a single event is logged at a single point in time down to the second. It can be that all events that happen in a day are logged togther. For example, AdWords reports their events in aggregate, and you can specify the granularity (e.g. daily):
Date | Impressions | Clicks | Cost | Campaign ID | Campaign Name |
---|---|---|---|---|---|
2016-01-01 | 1000 | 10 | $10.00 | 1 | DC Paleo Performers |
2016-02-01 | 5000 | 80 | $88.00 | 1 | DC Paleo Performers |
2016-02-05 | 5200 | 60 | $54.00 | 4 | LA Paleo Performers |
If we want to look at these events together, we can combine them into a single event log. Just two simple rules for doing so:
- Formatting for common columns must match (e.g. Campaign ID, Campaign Name and Date) so that we can group the data together
- Variables that don't apply to certain observations must be marked as NA.
Here's an example of how the two sample Event Logs can be combined together, once we make sure the date columns are in a similar format of YYYY-MM-DD.
Date | Transaction Amount | Recurring? | Meals | User ID | User Name | Campaign ID | Campaign Name | Impressions | Clicks | Cost |
---|---|---|---|---|---|---|---|---|---|---|
2016-01-01 | $50.00 | Yes | Paleo - 5 Lunches, 5 Dinners | 1 | Alice | 1 | DC Paleo Performers | NA | NA | NA |
2016-02-01 | $50.00 | Yes | Paleo - 5 Lunches, 5 Dinners | 1 | Alice | 1 | DC Paleo Performers | NA | NA | NA |
2016-02-05 | $200.00 | No | Mixitarian - 3 Lunches, 3 Dinners | 2 | Bob | 4 | LA Paleo Performers | NA | NA | NA |
2016-01-01 | NA | NA | NA | NA | NA | 1 | DC Paleo Performers | 1000 | 10 | $10.00 |
2016-02-01 | NA | NA | NA | NA | NA | 1 | DC Paleo Performers | 5000 | 80 | $88.00 |
2016-02-05 | NA | NA | NA | NA | NA | 4 | LA Paleo Performers | 5200 | 60 | $54.00 |
This enables us to group columns together to quickly visualize the data. For example, let's ignore the NA values and group by Campaign Name and sum the columns for Impressions, Cost, Clicks and Transaction Amount:
library(dplyr)
event_log %>%
group_by(campaign_name) %>%
summarize(impressions=sum(impressions,na.rm=TRUE),
cost=sum(cost,na.rm=TRUE),
clicks=sum(clicks,na.rm=TRUE),
amount=sum(transaction_amount,na.rm=TRUE),
)
The result would be:
Campaign Name | impressions | cost | clicks | amount |
---|---|---|---|---|
DC Paleo Performers | 6000 | $100.00 | 90 | 100 |
LA Paleo Performers | 5200 | $200.00 | 60 | 200 |
In our AdWords analysis, we create an Event Log that ties together keyword events from AdWords and transaction events from the database powering Power Supply's ordering application. Since AdWords doesn't know anything about the application's users, and the application knows nothing about AdWords campaigns, we need something to tie users back to the campaign where they originated.
To tie these two data sources together, we use Mixpanel to capture information about the user when they click on an ad. We use that user information to tie the user's transaction behavior to the ad's cost.
In other words, we pull from 3 different data sources (AdWords, Mixpanel and Power Supply's orders app) to create a unified event log. The resulting event log can be created in a single line of code:
keywords_elog <- create_event_log(from=start_date, to=end_date)
The resulting data frame includes the following variables that can be grouped and/or aggregated to create tables and visualizations:
- campaign_id - The ID of the campaign. Not intended to be human readable. Used to join campaign data to other data sets based on ID.
- campaign_name - The name of the campaign.
- user_name - The user's name
- user_id - The user's ID in the orders app
- discount_amount - If this observation is a purchase, this is the amount of any discount applied.
- credit_used - If this observation is a purchase, this is the amount of any credit applied.
- sales_tax_amount - If this observation is a purchase, this is the amount of any sales tax applied.
- refund_amount - If this observation is a refund, this is the amount of the refund.
- retail_price - If this observation is a purchase, this is the retail price (not including any discounts or credits) of the item.
- money_in_the_bank_paid_to_us - If this observation is a purchase, this is the amount of money paid to use after discounts and credits.
- purchase_type - - If this observation is a purchase, was it a recurring purchase or a one-off purchase? NOTE: recurring does not necessarily mean they paid more than once, but that they selected the "recurring" option to get 5% off their order.
- X.browser - During the user's first order, this is the browser the user was using.
- X.browser_version - During the user's first order, this is the version of the browser the user was using.
- X.city - During the user's first order, this is the city the user was in, based on the users's IP.
- X.current_url - During the user's first order, this is the URL a users's browser was on when completing the event.
- X.initial_referrer - During the user's first order, this is the URL that the user was initially referred from.
- X.initial_referring_domain - During the user's first order, this is the domain that the user was initially referred from.
- X.os - During the user's first order, this is the OS that the user was using.
- X.referrer - During the user's first order, this is the URL of the site the user was most recently referred from
- X.referring_domain - During the user's first order, this is the domain of the site the user was most recently referred from
- X.region - During the user's first order, this is the region the user was in at the time of the first purchase.
- X.screen_height - During the user's first order, this is the screen height of the screen the user was using.
- X.screen_width - During the user's first order, this is the screen width of the screen the user was using.
- X.search_engine - During the user's first order, this is the search engine the user was using to land on the site
- latest_ad_awcreativeid - This is the ID of the latest ad (aka Creative) the user had seen before placing their first order.
- keyword - This is the name of the most recent keyword that the user had matched with their search term, and clicked on before placing their first order.
- match_type - This is the match type (
Broad
,Exact
orPhrase
) of the most recent keyword that the user had matched with their search term, and clicked on before placing their first order. - latest_ad_awnetwork - This is the network on which the user had seen an before placing their first order. Values are
g
(Google),s
(Search Partners) andd
(Display Network) - latest_ad_awposition - The position of the ad the user had clicked on before placing their first order, with a value like "1t2"(this means page 1, top, position 2)
- device - The device the user was using to place their first order.
- latest_ad_psgeo - The PSGeo (power supply geo) that the user was located in (according to campaign targetting) when placing their first order.
- latest_ad_search - The full search string (i.e. everything after the
?
in a tracking URL) associated with the most recent ad the user had experienced. - latest_ad_utm_medium - The medium on which the user had seen an ad before making their first purchase. (e.g. cpc = cost per click advertising)
- latest_ad_utm_source - The source from which the user had seen an ad before making their first purchase. (e.g. Google)
- orderId - The ID of the order in our orders application
** More Data Fields to be defined as they are needed. For explanation, reach out to Chris Salvato ([email protected]) and then add the definition to the list below in README.md**
- utm_campaign
- utm_medium
- utm_source
- X.device
- ad_group_id
- date
- day_of_week
- keyword_state
- ad_group_name
- cost
- network
- est_search_impression_share
- est_search_impression_share_lost_rank
- impressions
- clicks
- average_position
- quality_score
- landing_page_experience
- week
- referred_users_ordering
- new_referred_users
- total_referred_users
- new_referree_sales
- new_referree_orders
- referred_users_meals
- referred_users_orders
- referred_users_discounted_price
- referred_users_price_amount
- referred_users_transaction_amount
- influencer_meals
- influencer_orders
- influencer_discounted_price
- influencer_price_amount
- influencer_transaction_amount
Here's a short glossary of data that we pull from various sources.
To be written...
To be written...
To be written...
Several functions within this project pull data from external APIs such as Mixpanel and AdWords. To manage the credentials for these APIs, we keep all credentials in YAML files to read the data. Since these YAML files contain sensitive data, they are not included in our git repository.
To run the code within this project, you will need the following files. They can be obtained by reaching out to Chris (@chris on Slack; [email protected] via email):
- adwords_credentials.yml
- .google.auth.RData (for instant google Authentication)
- mixpanel_credentials.yml
- jdbc_transactions_database_config.txt
- jdbc_data_warehouse_database_config.txt