Sending out Starbuck’s offer in a more intelligent manner
Introduction
Once every few days, Starbucks sends out an offer to users of the mobile app. An offer can be only an ad for a beverage or a real offer, for example, a discount or BOGO (buy one get one free) or just an informational offer which includes the product information. Some users might not receive any offers during certain weeks. Along these lines, Starbucks can most likely build the possibility that the customer opens the offer after they get it and inevitably finish the transaction. It will also help improve customer loyalty by reminding them of the latest product information. But the point here is how to send out the offer in a smarter way, which means, how to maximize the possibility that customer opens the offer and finish the transactions. Therefore, we’ll attempt to break down the Starbucks history dataset to check whether we could get some understanding from it.
The case we discuss here is a real-life marketing strategy study based on the simulated data set that mimics customer behaviour on the Starbucks rewards mobile app.
Business Context
- The solution here aims to analyse how people make purchasing decisions and how those decisions are influenced by promotional offers.
- Every individual in the dataset has some hidden attributes that impact their buying patterns and are related to their discernible characteristics. Individuals produce different events, including accepting offers, opening offers, and making buys.
- There are three types of offers that can be sent: buy-one-get-one (BOGO), discount, and informational. In a BOGO offer, a user needs to spend a certain amount to get a reward equal to that threshold amount. In a discount, a user gains a reward equal to a fraction of the amount spent. In an informational offer, there is no reward, but neither is there a required amount that the user is expected to spend. Offers can be delivered via multiple channels.
Project Goal
Based on the context above, this project will try to ask the questions below
- What are the main features influencing the effectiveness of an offer on the Starbucks app?
- Could the data provided, namely offer characteristics and user demographics, predict whether a user would take up an offer?
Data Dictionary
The data is contained in three files:
- portfolio.json — file describes the characteristics of each offer, including its duration and the amount a customer needs to spend to complete it (difficulty).
- profile.json — file contains customer demographic data including their age, gender, income, and when they created an account on the Starbucks rewards mobile application.
- transcript.json — file describes customer purchases and when they received, viewed, and completed an offer. An offer is only successful when a customer both views an offer and meets or exceeds its difficulty within the offer’s duration.
Here is the schema and explanation of each variable in the files:
portfolio.json
- id (string) — offer id
- offer_type (string) — type of offer ie BOGO, discount, informational
- difficulty (int) — minimum required spend to complete an offer
- reward (int) — reward given for completing an offer
- duration (int) — time for offer to be open, in days
- channels (list of strings)
profile.json
- age (int) — age of the customer
- became_member_on (int) — date when customer created an app account
- gender (str) — gender of the customer (note some entries contain ‘O’ for other rather than M or F)
- id (str) — customer id
- income (float) — customer’s income
transcript.json
- event (str) — record description (ie transaction, offer received, offer viewed, etc.)
- person (str) — customer id
- time (int) — time in hours since start of test. The data begins at time t=0
- value — (dict of strings) — either an offer id or transaction amount depending on the record
Data Exploration
In order to analyze the problem better in the next sections, we first need to explore the datasets which includes checking the missing value, visualizing the data distribution, etc. In that way, we can have a better understanding of how the dataset looks like and how we can featurize the data to make it ready for modelling.

As shown above, there are no missing values in the portfolio dataset. The channels columns require to be one-hot encoded. we need to rename id column name to offer_id.

By viewing the first several rows of the dataset, it apparently shows missing values in the age column which is encoded as 118, and therefore we need to:
- Drop rows with no gender and income data

Apparently, the rows which have age equal to 118 have also missing gender and income, which means probably it’s fine to just drop the rows in the following steps to support the model implementation.
Get a quick check on how the income distribution and age distribution looks like in the dataset.


Age distribution plot depicts that the median age of a customer is 60 and most of the customers belong to the age range between 40 to 70. Income distribution plot shows that the number of customers whose average salary is less than 70K is high than the other side considering 70K to be median of the income distribution. Plots also conclude that minimum and maximum income for both male and female are approximately the same but the count of male customers in low-income level is slightly higher than that of female customers
Then take a quick view on the transcript dataset.

The cleaning and preprocessing of the transcript can be done as follows :
- Convert time in hours to time in days
- Process the value column i.e segregate it into different column based on event column.
- Segregate offer and transaction data
Since the value columns include multiple information which should be extracted out for clearer and easier analysis, first we perform some basic manipulation on the dataset.

Data Preprocessing
In order to identify the main drivers of an effective offer, I have to first define what an ‘effective’ offer is within the Starbucks app. We also need to process the data to merge the events of each specific offer sent so as to find out which offer was received, viewed and finally completed with a transaction.
We can define all our customers into 4 main groups:
- People who are influenced and successfully complete — effective offers:
- `offer received` -> `offer viewed` -> `transaction` -> `offer completed` (BOGO/discount offers)— `offer received` -> `offer viewed` -> `transaction` (informational offers — must be within validity period of offer)
2. People who received and viewed an offer but did not successfully complete — ineffective offers:
- `offer received` -> `offer viewed`
3. People who purchase/complete offers regardless of awareness of any offers:
- `transaction`
- `offer received` -> `transaction` -> `offer completed` -> `offer viewed`
- `transaction` -> `offer received` -> `offer completed` -> `offer viewed`
- `offer received` -> `transaction` -> `offer viewed` -> `offer completed`
- `offer received` -> `transaction` (informational offers)
- `offer received` -> `transaction` -> `offer viewed` (informational offers)
4. People who received offers but no action taken:
- `offer received`
I would have to separate out the people in group 2 from people in group 4, as people in group 2 may have viewed an offer but did not take any action, whereas people in group 4 did not even have an offer viewed event.
Separating the people of group 1 (effective offers) and people who purchase/complete offers regardless of awareness of any offers (group 3) is particularly tricky. For people in group 3, a conversion is invalid (i.e., not a successful conversion from an offer) if an offer completed or transaction occurs before an offer viewed. There also may be scenarios where an offer completed occurs after the offer is viewed, but a transaction was done prior to the offer being viewed. In this instance, the offer may have been completed, but it is also not a valid conversion.
Defining the target variable effective offer:
We know that group 1 customers will be our target variable “effective_offer” = 1, but there are many ineffective offer definitions for groups 2–4.
So what would we define as an ineffective offer? As already stated above, group 2 would be within our definition of an ineffective offer; where a user is aware of an offer, but the offer is ineffective as it does not convert the user into a customer. So group 2 can be defined as our target variable “effective_offer” = 0.
What about group 3 and group 4? Group 3 consists of users who may have received offers but would have purchased regardless. From the business point of view, we would not want to be sending them any offers.
Meanwhile, group 4 users would be considered low priority customers, as they do not do any action, regardless of whether they receive offers or not.

We know that there are 4 types of events: offer completed, offer received, offer viewed and transaction. But we have seen that our data shows we do not have any offer_id associated with transactions because they are not recorded in the transcript event data. Thus, the first objective in data preprocessing is to define a methodology to assign offer_ids to specific transactions
Moreover, BOGO and discount offers have an offer completed event when offers are completed. However, informational offers do not have this event associated with it.
Thus,
Moreover, BOGO and discount offers have an offer completed event when offers are completed. However, informational offers do not have this event associated with it.
Thus,
1) For BOGO and discount offer, an effective offer is one if the following events are recorded in the right sequence in time:
offer received -> offer viewed -> transaction -> offer completed
2) For an informational offer, effective offer:
offer received -> offer viewed -> transaction

Next, after we get the data together after assigning offer ids to transactions, we need to extract the transactions which were completed after the offer was received and viewed. Since we’ve already filled all transaction’s offer id, we can extract the transactions converted from offers by checking if the offer id before the transaction is the same as the transaction’s offer id.


# join back the 'offer received' events which was filtered out in the previous stepoffer_received = transcript_processed[transcript_processed['event']=='offer received']offer_received['pre_offer_id']=np.nan
offer_received['completed_offer']=np.nantranscript_processed = offer_received.append(transactions_after_viewed).sort_values(['person','time'])
Finally, as in the above code, we join back the offer received data to form the final processed transcript dataset.
Since the different offer has difference consequence of completion, for example, for the informational offer, there’ll not be rewards. Therefore, we separate the transcript data by offer type for easier analysis.

Within each offer type, we can already successfully separate every unique person-offer_id in group 1 from the others using our completed_offer
column. Since we have flagged all conversion events (transaction or offer completed event depending on offer type) occurring after an offer viewed event, we can be assured that whichever conversion events are flagged with completed_offer=1 are at least within the first group (People who are influenced and successfully convert - effective offers).
For BOGO and discount offers, we will only consider offer completed events as the conversion events, while we can consider transaction event as the conversion event for the informational offers.

Now, we can look into separating group 2
and group 4
unique person-offer_ids for BOGO and discount offers. So, we will separate out customers who only viewed the offers without transaction and completion at the end and the customers who only received the offer without viewing it.
def not_converted(df):#subset offer ids that have transactions or conversions by person and offer_id
conversion_ids=df[['person','offer_id']][(df['event']=='transaction') | (df['event']=='offer completed') ].groupby(['person','offer_id']).count().reset_index()#check for unique person-offer_id pairs that consist of offers received
offers_received_only=df[['person','offer_id']][df['event']=='offer received'].groupby(['person','offer_id']).count().reset_index()#create merged dataset to diffrentiate groups
check_merge=conversion_ids.merge(offers_received_only,how='right',on=['person','offer_id'],indicator=True)
return check_merge#check how many are in either group
check_merge_bogo=not_converted(bogo)
print('For BOGO offers:')
print(check_merge_bogo.groupby(['_merge']).count())check_merge_discount=not_converted(discount)
print('For Discount offers:')
print(check_merge_discount.groupby(['_merge']).count())

There are a fair number of unique person-offer_id pairs that have offer received events, but no conversion events. These would be considered offers in group 2 and 4 within each offer type.
Then, based on merged dataset above, we can separate out customers who only viewed the offer after they received the offer and customers who didn’t even open the offer after they receive the offer.
grp_2_4_bogo = check_merge_bogo[check_merge_bogo['_merge'] == 'right_only']
grp_2_4_bogo = grp_2_4_bogo.merge(transcript_processed,how='left',on=['person','offer_id'])
grp_2_bogo = grp_2_4_bogo[['person','offer_id']][grp_2_4_bogo['event'] == 'offer viewed'].groupby(['person','offer_id']).count().reset_index()
grp_2_4_bogo.drop(['_merge'], axis=1, inplace=True)
grp_4_bogo=grp_2_4_bogo.merge(grp_2_bogo[['person','offer_id']],how='left',indicator=True)
grp_4_bogo=grp_4_bogo[grp_4_bogo['_merge']=='left_only'].copy()
For these steps, will do the same manipulation for both BOGO and discount offer.
Group 3 people are everyone in the converted ids who do not have an offer viewed prior — hence, they would be people with the transaction and offer_completed events but no offer viewed event prior. For BOGO and discount offers, they would be people with offer completed events that have completed_offer != 1.
# subset the offer which has no correction with offer
grp3_bogo = bogo[['person','offer_id']][(bogo['event']=='offer completed') & (bogo['completed_offer']!=1)].groupby(['person','offer_id']).count().reset_index()grp3_discount = discount[['person','offer_id']][(discount['event']=='offer completed') & (discount['completed_offer']!=1)].groupby(['person','offer_id']).count().reset_index()
Next, we have to consider the effective and ineffective offers depending on the group type. As already elaborated above, any unique person-offer_id belonging to group 1 can be considered in our target variable effective_offer=1 group. Meanwhile, group 2 is in our target variable effective_offer=0 group.

Now we have successfully prepared the target variables for our BOGO and discount datasets.
As for the informational offer, the offer could only be counted as responded under the effect of the offer when the transaction is finished within the duration of the offer. You can check the entire logic in python notebook here.
Now that we have subset all our datasets into effective and ineffective offers depending on offer type, we can append the datasets accordingly into datasets for modelling.
Feature engineering
After basic processing, the next step we have to look into the features and see how to be creative in creating new features.
- The became_member_on column was in date format. Hence in order to extract meaningful insights from that feature, we can convert it as a feature indicating tenure of membership. This can have some effect on predicting whether the customer will take up an offer or not

- As part of some further data exploration, I discovered that there could be multiple offers received per person. The no of times each offers received by the customer also can have an effect on offers being effective or not.
#get count of offers received per person, put into separate datasetdf_offer_received_cnt=transcript_processed[transcript_processed['event']=='offer received'].groupby(['person','offer_id','time']).count()['event'].reset_index()#rename columns
df_offer_received_cnt.rename(columns={'event':'offer_received_cnt'},inplace=True)#drop unnecessary columns
df_offer_received_cnt.drop(['time'], axis=1, inplace=True)#ensure only unique person-offer_id pairs
df_offer_received_cnt=df_offer_received_cnt.groupby(['person','offer_id']).sum().reset_index()

- subtract the transactions which are not related to the offer
#filter dataset by invalid transactions
# subtract the transactions which's not related to the offertransactions_not_related=transcript_processed[(transcript_processed['event']=='transaction') & (transcript_processed['completed_offer']==0)].groupby(['person','offer_id'])['amount'].sum().reset_index()transactions_not_related.rename(columns={'amount':'amount_invalid'},inplace=True)
- Merge the temporary data created above together, then drop the missing values in the gender column and create dummy variables for gender column, and split the channel column to the categorical variable.
# merge to get offers received count and invalid amount transactedoffers_bogo=offers_bogo.merge(df_offer_received_cnt[['person','offer_id','offer_received_cnt']],how='left',on=['person','offer_id'])offers_bogo=offers_bogo.merge(transactions_not_related[['person','offer_id','amount_invalid']],how='left',on=['person','offer_id'])# fill missing values for amount_invalid with 0
offers_bogo['amount_invalid']=offers_bogo['amount_invalid'].fillna(value=0)
offers_bogo.dropna(inplace=True)


Building model
After pre-processing the data, the next step we’ll start to implement models to figure out which factors affect most whether the customer will respond to the offer or not. And this project also attempts to predict whether the customer will respond to the different types of offers or not.
Since we have 3 offer types, there are thus 3 different models to be built. This is effectively a binary classification supervised learning model.
I decided to compare the performance of a simple decision tree classifier model as a baseline model, with an ensemble random forest classifier model. Reason for which we selected tree-based models because we also want interpretability of the model. And this project also attempts to predict whether the customer will respond to the different types of offers or not.
Meanwhile, I also selected a random forest as an alternate model to compare the baseline model. Random Forest is an ensemble bagging of decision trees, which aim towards a high accuracy in training the model.
Model implementation preparation
- Prepare the date set, set the features variable and target columns

- Split the data into training and test sets

- Create a function to execute the model for different offer types


Initial the model baseline
At this point, we will first use default parameters for the baseline model and will tune the parameters in the later tuning steps if needed.
- BOGO model

As shown above, the accuracy of both models is good for initial model implementation. But the F1 score is a bit lower than 80% which may be tuned better in the later steps.
However, in terms of the F1 score, both models are below 80%, with the Random Forest model performing better compared to the Decision Tree Classifier. The results would indicate that RF model is doing slightly better compared to DT at not misclassifying negative events as positive (meaning, misclassifying people on which offers are ineffective, as people on which offers would be effective).
Our model is predicting the positive case (i.e. where an offer is effective) more accurately compared to predicting the negative cases (i.e. where an offer is ineffective), which is expected given the uneven classes. We are perhaps not as concerned with these misclassifications since we don’t mind sending people more offers than they would have liked; we would rather not miss anyone on which an offer would have been effective.
Therefore, here can still select the random forest with slightly better accuracy right now.
- Discount Offer model

As shown above, the random forest performs slightly better than the decision tree.
- Informational offer model

One potential reason for the worse performance is perhaps due to the fact that I had the key assumption to assign the conversion events to be transactions that only occur after an offer is viewed and within the specified duration; I might have missed out on some valuable information by removing those transactions that occur regardless. We can see this from how the overall sample dataset is smaller (about half) the datasets for the other 2 offers, with only about 5K samples compared to about 10K for both BOGO and discount respectively.
Model tuning
This section will attempt to tune the parameters of the initial model to get higher performance. In the tuning section, we will first use GridSearch to search for parameters that are likely to get better model performance before experimenting with removing or adding features to improve model performance.

- BOGO Offer model

Use optimized parameters to rerun the model in the previous steps.

Compare the results with the previous initial model.

As shown above in the comparison, after using tune parameters, the test accuracy slightly improved from 82.8316% to 82.8723% and the F1 score increased from 76.70% to 77.57%.
Do the same steps for discount offer data.



As shown above in the comparison, after using tune parameters, the accuracy of the model increased slightly, from 87.07% to 87.39%, and the F1 score improved from 81.71% to 82.08%.
And for the informational offer, do the same step.



As shown above in the comparison, after using tune parameters, the test accuracy slightly improved from 74.87% to 75.23%, and a slight decrease in F1 score from 68.23% to 67.14%.
Finally, I wanted to try and see if removing the amount_invalid variable, which we had noted as being sparse, hence may not be useful in predicting the effectiveness of offers, would help. I removed the feature from my data prep and retrained the model using the same optimal parameters found via GridSearch, with the DT model as a baseline. However, the results that we got didn’t improve the model accuracy by a very good amount and hence we selected the tuned model as our final acceptable model.
View the feature importance
Next, we’ll look at the model’s result and see if there’s any insight into main factors which decide whether customers will respond to offers we could get by investigating feature importance.

Display the feature importance based on the model we have.



As shown above, we can see that for all three types of offer, the most important factor that largely affects if the offer will be responded to eventually is the length of membership. That is, the longer the customer as a member of Starbucks, the more likely (s)he will respond to the offer they receive. Then the second and third important factors which affect the possibility of customer’s response are age and income which very make sense. Also, the number of offers they received will also affect the response a lot.
Conclusion & Next steps
Conclusion
Overall, I found this project challenging, mainly due to the structure of the data in the `transcript` dataset. I had started out with 2 business questions:
1. What are the main features influencing the effectiveness of an offer on the Starbucks app?
2. Could the data provided, namely offer characteristics and user demographics, predict whether a user would take up an offer?
a. Reflection:
i. Question 1 findings:
For Question 1, the feature importance given by all 3 models were that the tenure of a member is the biggest predictor of the effectiveness of an offer.
For all three models, the top 3 variables were the same — membership tenure, income and age. However, income and age switched orders depending on offer type.
For BOGO and discount offers, the distribution of feature importances were relatively equal. However, for informational offers, the distribution is slightly more balanced, with income the second most important variable.
ii. Question 2 findings:
My decision to use 3 separate models to predict the effectiveness of each offer type ended up with good accuracy for the BOGO and discount models (82.87% for BOGO and 87.38% for discount), while slightly less accurate performance for informational offers (75.23%). However, I would regard 75% as acceptable in a business setting, as for informational offers, there is no cost involved to inform users of a product.
Meanwhile, for BOGO and discount models, I am quite happy with the 80% and above accuracy, as in a business setting that would be acceptable to show offers to people, even if the model misclassifies a few, the overall revenue increase might justify the few mistakes.
b. Main challenges and potential improvement:
When analysing and building the machine learning models to answer the above questions, reflections on my main challenges and findings are as follows:
i. Attribution framework for assigning offer_ids for transactions:
In order to answer Question 1, I had to first define what an ‘effective offer’ means using the transactional records. This proved to be the trickiest portion of the project. I had to define a funnel for what an effective conversion would look like, as we had data on both effective and noneffective conversions. Thus, I was designing an attribution model for the conversion events (`offer completed` and `transaction` events) based on the events that occurred prior to each person.
I ended up having to separate the users into 4 different pools, based on their actions in the transcript data:
- Group 1: People who are influenced by offers and thus purchase/complete the offer(successful/effective conversion of an offer)
- Group 2: People who receive and an offer but is not influenced and thus no conversion event (ineffective conversion of an offer)
- Group 3: People who have conversion events but was not actually influenced by an offer
- Group 4: People who receive offers but no views or action taken
Even after separating the groups, it was challenging to assign the people in group 3 based on the transactional data. I had to define the event space where the right sequence of events would occur before I could assign an offer id to transactions (which did not have an offer_id), essentially designing an event/sequence-based attribution window.
ii. Feature engineering:
Deciding on what features to re-engineer and how to engineer for an effective model. In the end, our re-engineered feature of membership tenure was indeed the most important feature.
iii. Model implementation decisions:
I had made the decision to build 3 separate models depending on offer types based on my definition of the problem statement — as I wanted to discover what would drive an effective offer, I thought it made more sense to remove noise from the data by separating the data into the offer types. My decision ended up to be quite a good one as the single BOGO and discount models got a good performance in testing scores.
For the info model, the accuracy was slightly worse as we had fewer records overall. As elaborated above, I believe that if we had more data, I could have gotten the accuracy higher.
An additional note on model selection — I selected tree-based models as I wanted to assess feature importance, but I could have extended this study further by testing a parametric/ regression model (e.g. logistic regression for classification tasks). The weights of the coefficients from a regression model might have been interesting to contrast with the feature importance of a tree-based model, given that both models have different ways of analysing the data. The feature `membership_tenure_days` might not have been the highest weighted feature, in contrast to how it was in this study.
Further Improvements and Experimentation
Due to time reasons, I couldn’t get a chance to try some other enhancement in the step of model tuning. For example, probably, I can do some more experiment on feature engineering step to see if any other new features can improve the model, also I could also try to reduce some feature to see how it will affect the model performance.
Also, so far the analysis is focused more on customer’s who successfully finish the transaction after they received the offer, there should be more insight for the other cases where the customer finishes the transactions regardless of the offer. If we could get any insight into those cases, maybe we can send out more offers to those customers.
In addition, I was thinking if I could do some unsupervised learning on clustering the customers based on information we are given, to see if there are any specific characteristics on a group of customers who will be more likely to respond to the offer.
To see more details on the analysis, you can find my Github link here.