# Statistical Interactions (Testing a Potential Moderator)¶

```
Author: Samuel M.H. <samuel.mh@gmail.com>
Date: 31-01-2016
```

## Instructions

The final assignment deals with testing a potential moderator. When testing a potential moderator, we are asking the question whether there is an association between two constructs for different subgroups within the sample.

Run an ANOVA, Chi-Square Test or correlation coefficient that includes a moderator.

## What to submit:

Following completion of the steps described above, create a blog entry where you submit syntax used to test moderation (copied and pasted from your program) along with corresponding output and a few sentences of interpretation.

## Dataset

- National Epidemiological Survey on Alcohol and Related Conditions (NESARC)
- CSV file
- File description

```
%matplotlib inline
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
import scipy
import seaborn as sns
```

## Test of correlation

I am testing if there is a linear relationship (pearson correlation) between:

- Number of drinks of any alcohol usually consumed on days when drank alcohol in last 12 months.
- How many drinks can hold without feeling intoxicaded.

And if the sex is a moderator.

### Ingesting and curating the data

```
#Load
data = pd.read_csv('../datasets/NESARC/nesarc_pds.csv', usecols=['S2AQ8B','S2AQ11','SEX'])
#Select
df1 = pd.DataFrame()
df1['drinks_usually'] = pd.to_numeric(data['S2AQ8B'],errors='coerce').replace(99, np.nan)
df1['drinks_till_drunk'] = pd.to_numeric(data['S2AQ11'],errors='coerce').replace(99, np.nan)
df1['sex'] = data['SEX']
df1 = df1.dropna()
print(df1.shape)
```

```
def corr(x,y,df):
r,p = scipy.stats.pearsonr(df[x],df[y])
print('Correlation coefficient (r): {0}'.format(r))
print('p-value: {0}'.format(p))
sns.lmplot(x=x, y=y, data=df)
```

```
corr('drinks_usually','drinks_till_drunk',df1)
```

There is medium (0,44) positive correlation with a p-value of 0, so it hasn't happened by chance.

## Moderator

Lets test if the sex is a moderator in the relationship.

```
#Split data by sex
df_male = df1[(df1['sex']==1)]
df_female = df1[(df1['sex']==2)]
```

### Males

```
corr('drinks_usually','drinks_till_drunk',df_male)
```

### Females

```
corr('drinks_usually','drinks_till_drunk',df_female)
```

## Summary

Both results are significant but the variables are medium-weakly correlated (they don't really fit a linear model).

It is easily seen that the correlation is stronger in women when talking about alcohol tolerance.

So in this case, the sex is a moderator in the relationship between the number of drinks a person usually drinks and the number of drinks a person can take before feeling intoxicated because it affects the strength of the relationship.

## No comments:

## Post a Comment