AB Testing for Conservation Campaign¶

Differentail messaging were run, and the engagement rates of subscribed users(target population) were analyzed to statistically evaluate the significance of the association between the type of messaging(variants) and user engagement.

There are two test groups for A/B testing: 1) Personalized Group: Subscribed users who were messaged personally via email in the form of newsletter customized according to their interests. 2) Generic Group: Users who were exposed to online advertisements posted on various socials or channels. These are genralized, posted at same time and day to all the subscribed users.

Objective

The objective of the analysis is to determine whether personalized messaging was a successful strategy and to assess whether the difference between the test groups is statistically significant and if so then what factors boosts engagement.

Imports¶

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import statsmodels.api as sm
import statsmodels.formula.api as smf
import scipy.stats as stats
from statsmodels.stats.power import NormalIndPower

%matplotlib inline 
In [3]:
df = pd.read_csv('conservation_dataset.csv')
df.head()
Out[3]:
Unnamed: 0 user id message_type engaged total_messages_seen most engagement day most engagement hour
0 0 1069124 Personalized False 130 Monday 20
1 1 1119715 Personalized False 93 Tuesday 22
2 2 1144181 Personalized False 21 Tuesday 18
3 3 1435133 Personalized False 355 Tuesday 10
4 4 1015700 Personalized False 276 Friday 14

EDA¶

In [6]:
df.columns
Out[6]:
Index(['Unnamed: 0', 'user id', 'message_type', 'engaged',
       'total_messages_seen', 'most engagement day', 'most engagement hour'],
      dtype='object')
In [7]:
df.shape
Out[7]:
(588101, 7)
In [8]:
# Check for missing values and data types
print(df.info())
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 588101 entries, 0 to 588100
Data columns (total 7 columns):
 #   Column                Non-Null Count   Dtype 
---  ------                --------------   ----- 
 0   Unnamed: 0            588101 non-null  int64 
 1   user id               588101 non-null  int64 
 2   message_type          588101 non-null  object
 3   engaged               588101 non-null  bool  
 4   total_messages_seen   588101 non-null  int64 
 5   most engagement day   588101 non-null  object
 6   most engagement hour  588101 non-null  int64 
dtypes: bool(1), int64(4), object(2)
memory usage: 27.5+ MB
None
In [9]:
df['user id'].nunique() ## Checking if there is any duplicates
Out[9]:
588101
In [10]:
df.drop(['Unnamed: 0', 'user id'],axis=1,inplace=True)
df.head()
Out[10]:
message_type engaged total_messages_seen most engagement day most engagement hour
0 Personalized False 130 Monday 20
1 Personalized False 93 Tuesday 22
2 Personalized False 21 Tuesday 18
3 Personalized False 355 Tuesday 10
4 Personalized False 276 Friday 14
In [11]:
df['message_type'].value_counts(normalize=True) ## To check for imbalance in our data
Out[11]:
message_type
Personalized    0.96
Generic         0.04
Name: proportion, dtype: float64

Since our data is highly imbalanced we have to first perform Chi-square test rather than z-test or t-test where the results can be false given the highly biased nature of the data towards the Personalized group(giving false impression/exaggerating the success). Unlike parametric tests (like T-Test), Chi-Square does not assume normality. This makes it useful for categorical data comparisons, even when groups are imbalanced.

To check if we can use a Chi-Square Test on our data. For this, we need to calculate something called the expected frequency for each category (cell in our table) and make sure these numbers are at least 5. If even one number is less than 5, the Chi-Square Test won’t work reliably.¶

In [14]:
# Step 1: Create a Contingency Table
observed_frequencies = pd.crosstab(df['message_type'], df['engaged'])

# Step 2: Calculate Expected Frequencies
row_totals = observed_frequencies.sum(axis=1)
col_totals = observed_frequencies.sum(axis=0)
grand_total = observed_frequencies.values.sum()

# Calculate expected frequencies using broadcasting
expected = np.outer(row_totals, col_totals) / grand_total
expected_frequencies = pd.DataFrame(expected, index=observed_frequencies.index, columns=observed_frequencies.columns)

# Step 3: Check if all expected frequencies are ≥ 5
min_expected_frequency = expected_frequencies.min().min()

print("Observed Frequencies:")
print(observed_frequencies)
print("\nExpected Frequencies:")
print(expected_frequencies)
print(f"\nMinimum Expected Frequency: {min_expected_frequency}")

# Decision: Can Chi-Square Test be performed?
if min_expected_frequency >= 5:
    print("Chi-Square Test can be performed.")
else:
    print("Chi-Square Test cannot be performed. Consider Fisher's Exact Test.")
Observed Frequencies:
engaged        False  True 
message_type               
Generic        23104    420
Personalized  550154  14423

Expected Frequencies:
engaged              False        True 
message_type                           
Generic        22930.28101    593.71899
Personalized  550327.71899  14249.28101

Minimum Expected Frequency: 593.7189904455187
Chi-Square Test can be performed.
In [15]:
# Make sure to import the library
from scipy.stats import chi2_contingency

observed = [
    [23104, 420],      # Generic group
    [550154, 14423]    # Personalized group
]

# Perform the Chi-Square Test
chi2, p_value, dof, expected = chi2_contingency(observed)

print("Chi-Square Statistic:", chi2)
print("P-Value:", p_value)
print("Degrees of Freedom:", dof)
print("\nExpected Frequencies (calculated by SciPy):")
print(expected)

# Interpretation
if p_value < 0.05:
    print("Result: Significant difference detected (reject null hypothesis).")
else:
    print("Result: No significant difference detected (fail to reject null hypothesis).")
Chi-Square Statistic: 54.00582388368525
P-Value: 1.998962306339e-13
Degrees of Freedom: 1

Expected Frequencies (calculated by SciPy):
[[ 22930.28100955    593.71899045]
 [550327.71899045  14249.28100955]]
Result: Significant difference detected (reject null hypothesis).

Bootstrap sampling¶

Bootstrap sampling makes the T-Test more reliable when dealing with small samples, imbalance, or unknown distributions. It ensures that the results better reflect the real variability of engagement across Generic vs. Personalized messages. Since the dataset is highly imbalanced between the two groups, we’ll use the Bootstrap method. This method repeatedly resamples data, allowing for variation in new samples by replacements to create new samples. By applying this approach, we can assess the distribution of a statistic and derive insights about the overall population.

In [17]:
# Subset of test groups
generic_group = df[df['message_type']=='Generic']['engaged']
personalized_group = df[df['message_type']=='Personalized']['engaged']

boot_personalized=[]
boot_generic=[]

for i in range (1000):
    boot_mean=personalized_group.sample(frac=1,replace=True).mean()
    boot_personalized.append(boot_mean)

    boot_mean=generic_group.sample(frac=1,replace=True).mean()
    boot_generic.append(boot_mean)

boot_personalized=pd.DataFrame(boot_personalized)
boot_generic=pd.DataFrame(boot_generic)

boot_personalized.plot(kind='density')
boot_generic.plot(kind='density')
Out[17]:
<Axes: ylabel='Density'>

Interpretation of the First Image (Density Plots)¶

  1. Higher Peak for Personalized Messages

    • The top density plot (Personalized messages) has a higher peak, suggesting less variability in engagement rates.
    • Engagement with Personalized messaging tends to be consistent and predictable.
  2. Lower Peak & Wider Spread for Generic Messages

    • The bottom density plot (Generic messages) has a flatter shape, meaning engagement rates fluctuate more.
    • Some Generic messages perform well, but engagement is less stable across bootstrapped samples.
  3. Overall Comparison

    • Personalized messaging produces a more reliable engagement rate, with fewer extreme variations.
    • Generic messaging has unpredictable fluctuations, making it harder to optimize based on expected engagement.

T-Test¶

In [20]:
from scipy.stats import ttest_ind

# Perform Welch's T-Test (assuming unequal variance)
t_stat, p_value = ttest_ind(boot_personalized[0], boot_generic[0], equal_var=False)

# Display results
print("Welch's T-Test on Bootstrap Samples:")
print("T-Statistic:", t_stat)
print("P-Value:", p_value)

# Interpretation
if p_value < 0.05:
    print("Result: Significant difference detected (reject null hypothesis).")
else:
    print("Result: No significant difference detected (fail to reject null hypothesis).")
Welch's T-Test on Bootstrap Samples:
T-Statistic: 280.33372908894046
P-Value: 0.0
Result: Significant difference detected (reject null hypothesis).

What This Means¶

  • Personalized messaging is significantly more effective → Users engage at a much higher rate compared to generic messaging.
  • Generic messaging underperforms drastically → Engagement is statistically much lower, suggesting it may not be an effective strategy.
  • The effect size is very large → The high T-Statistic (271.08) confirms that the difference is not just due to randomness—Personalized messages truly outperform Generic ones.

Campaign Implications¶

  • Prioritize Personalized Messaging → Since engagement is statistically higher, shift more resources toward targeted, customized outreach.
  • Optimize Message Personalization Further → Test variations in personalized content to maximize impact.
  • Reconsider Generic Messaging Strategy → Since it underperforms, either phase it out or refine its structure to make it more engaging.
  • Experiment with Timing & Frequency → Now that we know personalization works, optimize when and how often messages are sent

Analyzing message content further to see which specific factors in personalized messages drive engagement.¶

How total messages seen drives the engagement among our target population?¶

In [24]:
# Create figure and axes
fig, ax = plt.subplots(1, 2, figsize=(12, 6))

# Observed Frequencies Plot
sns.barplot(x=observed_frequencies.index, y=observed_frequencies[False], ax=ax[0], color="red", label="Not Engaged")
sns.barplot(x=observed_frequencies.index, y=observed_frequencies[True], ax=ax[0], color="blue", label="Engaged")
ax[0].set_title("Observed Engagement Frequencies")
ax[0].set_xlabel("Message Type")
ax[0].set_ylabel("Count")
ax[0].legend()

# Expected Frequencies Plot
sns.barplot(x=expected_frequencies.index, y=expected_frequencies[False], ax=ax[1], color="red", label="Not Engaged")
sns.barplot(x=expected_frequencies.index, y=expected_frequencies[True], ax=ax[1], color="blue", label="Engaged")
ax[1].set_title("Expected Engagement Frequencies")
ax[1].set_xlabel("Message Type")
ax[1].set_ylabel("Count")
ax[1].legend()

# Adjust layout and display
plt.tight_layout()
plt.show()
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
In [25]:
# Box plot to compare total_messages_seen for engaged vs. not engaged
plt.figure(figsize=(16, 12))  # Increase figure size for better clarity

# Create the box plot with mean markers
sns.boxplot(x='engaged', y='total_messages_seen', data=df, showmeans=True, meanprops={"marker":"o", "markerfacecolor":"red", "markeredgecolor":"black"})

# Adjust y-axis limits if needed
plt.ylim(0, 2000)  # Modify this based on the data range

plt.title("Comparison of Total Messages Seen by Engagement Status", fontsize=14)
plt.xlabel("Engaged", fontsize=12)
plt.ylabel("Total Messages Seen", fontsize=12)

plt.show()
No description has been provided for this image

Key Observations:¶

  1. Median (the line inside the box):

    • Both groups—engaged and not engaged —have similar medians and very low(close to zero).
    • This indicates that for most users, the number of messages they see is quite limited.
  2. Boxes (middle range of data):

    • The height of each box shows where most people's data falls or IQR(the middle 50% of the group).
    • Both boxes , meaning the spread of messages seen is similar for both engaged and not engaged users.
    • The IQR for both categories have roughly the same height(both are small), meaning that most users have a limited exposure to messages.
  3. Outliers (dots outside the box):

    • There are many outliers—these are people who saw way more messages than most others.
    • For example, some people saw hundreds or even thousands of messages! These could be super rare cases or situations where someone was bombarded by campaign messages.

What Does This Mean?¶

  • The box plot suggests that the number of messages seen doesn’t have a big impact on whether someone engages or not. Both engaged and not engaged groups look pretty similar in terms of how many messages they’ve seen.

What Can We Do Next?¶

  • This tells us that spamming people with more messages might not be enough to get them to engage. Instead, focus on how those messages are crafted (e.g., personalized vs. generic), or look at other factors like the time of day they saw the messages.

Analyzing Engagement trends across different days of the week for our conservation campaign.¶

In [28]:
# Create a contingency table for 'converted' and 'most ads day'
contingency_table = pd.crosstab(df['engaged'], df['most engagement day'])

# Perform Chi-Square Test
chi2, p, dof, expected = stats.chi2_contingency(contingency_table)

# Display results
print(f"Chi-Square Test Statistic: {chi2}")
print(f"P-value: {p}")
print(f"Degrees of Freedom: {dof}")

# Check if we can reject the null hypothesis
if p < 0.05:
    print("Conclusion: Reject the null hypothesis. There is an association between 'engaged' and 'most engagement day'.")
else:
    print("Conclusion: Fail to reject the null hypothesis. No significant association between 'engaged' and 'most engagement day'.")
Chi-Square Test Statistic: 410.0478857936585
P-value: 1.932184379244731e-85
Degrees of Freedom: 6
Conclusion: Reject the null hypothesis. There is an association between 'engaged' and 'most engagement day'.
In [29]:
# Plotting the contingency table with annotations
plt.figure(figsize=(10, 6))
sns.heatmap(contingency_table, annot=True, fmt="d", cmap="coolwarm", cbar=True)
plt.title("Contingency Table of 'engaged' and 'most engagement day'")
plt.xlabel("Most Engagement Day")
plt.ylabel("Engagement")
plt.savefig('heatmap_most_engagement_day.png',dpi=300)
plt.show()
No description has been provided for this image

Heatmap provides a clear view of engagement trends across different days of the week for our conservation campaign.


Key Takeaways from the Data:¶

Highest Engagement (Monday & Tuesday)

  • Monday has the highest True engagements (2,857), followed by Tuesday (2,312).
  • If we are running a conservation campaign, Monday and Tuesday seem to be the best days to send messages because engagement is at its peak.

How This Affects Your A/B Test (Personalized vs. Generic Messages)¶

  1. Personalized Messaging Advantage

    • If Personalized messages dominate engagement on peak days (Monday, Tuesday), it reinforces the idea that tailoring messages improves response.
    • You should focus on refining message personalization strategies on these days.
  2. Generic Messaging Struggles

    • If Generic messaging engagement remains low across all days, it may indicate that generic conservation appeals aren’t resonating with users.
    • Consider tweaking the messaging format or testing different content strategies.
  3. Strategic Adjustments

    • Prioritize high-engagement days (Monday, Tuesday) for personalized outreach.
    • Reduce focus on Saturday, or experiment with a different approach (maybe urgency-driven messaging).
    • If Generic messaging shows spikes on specific days, analyze those cases to determine what worked.

Since 'most engagement day' affects engagement of target population, it makes sense to analyze 'most engagement hour' next. If the time of day also plays a role, we can target users precisely when they're most receptive.

Best time in a day for messaging our target population for boosting engagement¶

In [33]:
# Create a contingency table
contingency_table = pd.crosstab([df['most engagement hour'], df['message_type']], df['engaged'])

# Perform Chi-Square Test
chi2, p_value, dof, expected = chi2_contingency(contingency_table)

# Display results
print("Chi-Square Statistic:", chi2)
print("P-Value:", p_value)
print("Degrees of Freedom:", dof)
print("\nExpected Frequencies:")
print(expected)

# Interpretation
if p_value < 0.05:
    print("Result: Significant difference detected (reject null hypothesis).")
else:
    print("Result: No significant difference detected (fail to reject null hypothesis).")
Chi-Square Statistic: 496.7428872096697
P-Value: 2.163815581745976e-76
Degrees of Freedom: 47

Expected Frequencies:
[[2.21270778e+02 5.72922168e+00]
 [5.17500688e+03 1.33993119e+02]
 [1.82280333e+02 4.71966720e+00]
 [4.49852265e+03 1.16477348e+02]
 [1.76431766e+02 4.56823403e+00]
 [5.02196938e+03 1.30030617e+02]
 [8.67537413e+01 2.24625872e+00]
 [2.52463135e+03 6.53686527e+01]
 [2.72933119e+01 7.06688137e-01]
 [6.76484230e+02 1.75157703e+01]
 [2.24195062e+01 5.80493827e-01]
 [7.23272764e+02 1.87272356e+01]
 [8.09051745e+01 2.09482555e+00]
 [1.93490086e+03 5.00991411e+01]
 [2.31018390e+02 5.98161030e+00]
 [6.01232670e+03 1.55673301e+02]
 [6.42367590e+02 1.66324101e+01]
 [1.65397470e+04 4.28253011e+02]
 [1.17166289e+03 3.03371122e+01]
 [2.90498314e+04 7.52168566e+02]
 [1.44752029e+03 3.74797101e+01]
 [3.65087037e+04 9.45296339e+02]
 [2.00898271e+03 5.20172946e+01]
 [4.30347295e+04 1.11427052e+03]
 [2.00800794e+03 5.19920558e+01]
 [4.40962444e+04 1.14175564e+03]
 [2.11523167e+03 5.47683306e+01]
 [4.43370104e+04 1.14798964e+03]
 [1.82182857e+03 4.71714331e+01]
 [4.26740679e+04 1.10493214e+03]
 [1.78186336e+03 4.61366398e+01]
 [4.17733886e+04 1.08161143e+03]
 [1.56351687e+03 4.04831347e+01]
 [3.50553348e+04 9.07665195e+02]
 [1.34809465e+03 3.49053462e+01]
 [3.27568480e+04 8.48151959e+02]
 [1.23892141e+03 3.20785936e+01]
 [3.02682829e+04 7.83717144e+02]
 [1.15314243e+03 2.98575738e+01]
 [2.84328076e+04 7.36192367e+02]
 [1.04981775e+03 2.71822544e+01]
 [2.71431986e+04 7.02801352e+02]
 [1.05371679e+03 2.72832099e+01]
 [2.81657231e+04 7.29276918e+02]
 [8.93855964e+02 2.31440365e+01]
 [2.48710304e+04 6.43969565e+02]
 [6.03377144e+02 1.56228556e+01]
 [1.90536560e+04 4.93344036e+02]]
Result: Significant difference detected (reject null hypothesis).
In [34]:
# Pivot table for heatmap
heatmap_data = df.pivot_table(values='engaged', index='message_type', columns='most engagement hour', aggfunc='sum')

# Create heatmap
plt.figure(figsize=(10, 6))
sns.heatmap(heatmap_data, annot=True, fmt="d", cmap="coolwarm", cbar=True)
plt.title("Peak Engagement Hours by Message Type")
plt.xlabel("Hour of the Day")
plt.ylabel("Message Type")
plt.savefig('heatmap_most_engagement_hours.png')
plt.show()
No description has been provided for this image

The heatmap visually confirms that engagement varies significantly by hour and Personalized messages consistently outperform Generic messages throughout the day.

Key Observations:¶

  1. Peak Engagement Hours (9 AM – 6 PM)

    • Personalized messages drive the highest engagement during mid-morning to evening hours.
    • Engagement peaks around 2 PM – 4 PM, making it the most effective time to send messages.
    • Generic messages have minimal engagement even during peak hours.
  2. Low Engagement Hours (Early Morning & Late Night)

    • Engagement is lowest between 12 AM – 8 AM, meaning sending conservation messages overnight is not effective.
    • Both Personalized and Generic messages show weak engagement in very late-night hours (10 PM – 12 AM).
  3. Personalized vs. Generic Performance

    • Personalized messaging consistently outperforms Generic messaging across all hours.
    • At peak engagement times (2 PM – 4 PM), Personalized messages have over 1,000 engagements, while Generic messages have fewer than 50.
    • Generic messages struggle to gain engagement, no matter the hour.

For the Conservation Campaign¶

Prioritize sending messages between 9 AM – 6 PM, with a focus on peak hours (2–4 PM).
Avoid sending messages overnight (12 AM – 8 AM), as engagement is minimal.
Invest more in Personalized messaging, since Generic messages receive very little engagement at any hour.

Logistic Regression for Engagement Analysis¶

What is Logistic Regression?¶

Logistic regression is a statistical model used to predict the probability of a binary outcome, such as whether a user engages (True) or does not engage (False). Unlike linear regression, which predicts continuous values, logistic regression estimates the likelihood of an event occurring.

What We Are Testing¶

In this model, we aim to predict whether a user engages with conservation messaging, based on:

  • Total Messages Seen (total messages seen) → The number of messages a user has received.
  • Most Engagement Hour (most engagement hour) → The time of day when engagement is recorded.

The dependent variable (Y) is:

  • engaged → The target outcome (whether a user engaged or not).

The independent variables (X) are:

  • total_messages_seen → The exposure level to messages.
  • most_engagement_hour → The time-based influence on engagement.

Why We Add a Constant (sm.add_constant(X))¶

Adding a constant ensures the model includes an intercept term, which allows it to account for baseline probabilities when no other factors are influencing engagement.

Interpreting the Logistic Regression Output¶

  • Coefficients (coef) → Show how much each factor influences engagement.
    • Positive values suggest an increase in engagement probability.
    • Negative values suggest a decrease in engagement probability.
  • P-Value (P>|z|) → Determines statistical significance.
    • If P < 0.05, the factor has a significant effect on engagement.
  • Pseudo R-Squared (Pseudo R-squ.) → Measures how well the model explains variations in engagement.
  • Log-Likelihood (LL-Null vs. LL-Model) → Compares the model’s performance against a baseline with no predictors.

How This Helps Your Conservation Campaign¶

Optimize Messaging Frequency → Find the optimal number of messages before engagement plateaus.
Time-Based Strategy → Identify the best hours to send messages for maximum engagement.
Improve Personalization → Understand how different exposure levels influence engagement probability.
Make Data-Driven Decisions → Use statistical evidence to refine your outreach strategy.

By leveraging logistic regression, we gain a data-driven approach to optimizing conservation messaging for maximum engagement.

Regression Analysis¶

Hypotheses for Logistic Regression¶

Null Hypothesis (H₀)¶

β₁ = 0 → There is no relationship between total messages and engagement rate.

Alternative Hypothesis (H₁)¶

β₁ ≠ 0 → There is a significant relationship between total messages seen and engagement rate.

In [38]:
# Selecting the independent (X) and dependent (Y) variables
X = df[['total_messages_seen', 'most engagement hour']]  # Independent variables
Y = df['engaged']  # Dependent variable (conversiof
# Adding a constant for the intercept term
X = sm.add_constant(X)

# Fit the logistic regression model
logit_model = sm.Logit(Y, X)
result = logit_model.fit()

# Print the summary
print(result.summary())
Optimization terminated successfully.
         Current function value: 0.108765
         Iterations 8
                           Logit Regression Results                           
==============================================================================
Dep. Variable:                engaged   No. Observations:               588101
Model:                          Logit   Df Residuals:                   588098
Method:                           MLE   Df Model:                            2
Date:                Tue, 22 Jul 2025   Pseudo R-squ.:                 0.07655
Time:                        15:58:03   Log-Likelihood:                -63965.
converged:                       True   LL-Null:                       -69267.
Covariance Type:            nonrobust   LLR p-value:                     0.000
========================================================================================
                           coef    std err          z      P>|z|      [0.025      0.975]
----------------------------------------------------------------------------------------
const                   -4.5402      0.030   -153.098      0.000      -4.598      -4.482
total_messages_seen      0.0102   9.87e-05    103.359      0.000       0.010       0.010
most engagement hour     0.0327      0.002     17.860      0.000       0.029       0.036
========================================================================================

Interpreting the Logistic Regression Results:¶

Key Findings¶

Variable Effect Meaning
Total Messages Seen (coef = 0.0102) Positive More messages seen → Higher chance of engagement. Even though the effect is small, it’s statistically significant.
Most Engagement Hour (coef = 0.0327) Positive Some hours are better for engagement than others(as we found out in the heatmap). A higher hour (e.g., late afternoon) increases engagement likelihood.
Constant (-4.5402) Baseline probability Before considering messages seen or engagement hour, engagement is very low.

3. What This Means for our Conservation Campaign¶

  • Sending more messages increases engagement, but only slightly. We might need more message personalization instead of just increasing message volume.
  • Timing matters → Certain hours increase engagement probability, so focus on high-engagement times (e.g., afternoons/evenings).
  • Engagement is naturally low → Without intervention, users are unlikely to engage (const = -4.54). This means you need strong messaging strategies to improve user interactions.
  • **Pseudo R-Squared (0.07655)

The model explains 7.65% of the variability in engagement. While this is modest, it suggests other factors (like message content or audience targeting) play a role.**

In [40]:
# Generate predicted probabilities
predictions = result.predict(X)  
# Sort values for a smooth line
sorted_idx = np.argsort(df['total_messages_seen'])
sorted_messages = df['total_messages_seen'][sorted_idx]
sorted_predictions = predictions[sorted_idx]

# Line plot
plt.figure(figsize=(10, 6))
sns.lineplot(x=sorted_messages, y=sorted_predictions, color='blue')
plt.title("Smoothed Predicted Probability of Engagement vs. Total Messages Seen")
plt.xlabel("Total Messages Seen")
plt.ylabel("Predicted Probability of Engagement")
plt.grid(True)
plt.savefig('predictplot.png',dpi=300)
plt.show()
No description has been provided for this image

Key Interpretations¶

  1. Steep Increase at Low Message Counts (0–500 messages seen)

    • Engagement probability starts near 0 when users have seen very few messages.
    • As users see more messages (up to around 500 messages), their predicted probability of engagement rises sharply.
    • This suggests that early exposure to messages plays a critical role in engagement.
  2. Flattening Beyond ~500 Messages

    • After reaching around 500 messages seen, the probability stabilizes and approaches close to 1.0.
    • This indicates a saturation point where additional message exposure no longer significantly increases engagement.
    • Sending too many messages beyond this point may not improve engagement further—you might hit diminishing returns.
  3. High Engagement Probability at Upper Limits

    • For users who see 1000+ messages, the probability of engagement is already very high (~0.9 to 1.0).
    • This means that highly exposed users are already likely to engage, and extra messaging may not be necessary.

Implications for Your Conservation Campaign¶

  • Target users who have seen fewer than 500 messages and focus efforts there, rather than over-saturating high-exposure users.
  • Since engagement probability is low at the start, early personalized outreach may help boost engagement faster.
  • Avoid excessive messaging beyond 500+ as users already have high engagement probability past 500 messages, resources might be better allocated elsewhere.
  • Other factors like diminishing effect,i.e, how the engagement rates behaves with successive personalized messaging should be tested with follow-up A/B tests. We need to consisder ROI and Maintainance costs or if there is novelty effect.