In this biological project, we are interested in analyzing the effects of different characteristics of female crabs on the number of their satellites. Satellites are additional male crabs that reside near a female crab whose nest is shared. We considered various characteristics such as color, spine structure, width, weight, and the presence of satellites. The dataset contains both continuous and categorical variables.
We begin by creating a correlation plot to explore the relationships between these variables. As shown in the following plot, a strong relationship exists between weight and width. The plot also suggests a relationship between the color and spine structure of the crabs.
To investigate the possibility of an outlier, we examine the relationship between weight and width by plotting these two variables for each satellite, as shown below:
No evidence of an outlier is present, and given the correlation between width and weight, including both variables in the model would not result in a significant contribution from the weight variable. We proceed by fitting a logistic regression model to the data. To select the appropriate model, we employ the AIC and BIC criteria using the backward, forward, and stepwise methods.
mod <- glm(sat ~ as.factor(color) + as.factor(spine) + width
+ as.factor(color) * as.factor(spine)
+ as.factor(color) * width + as.factor(spine) * width
+ as.factor(color) * as.factor(spine) * width,
family = binomial, data = data)
step(mod) #BACKWARDS-AIC
backward <- step(mod , k = log(13) ) #BACKWARDS-BIC
mod2 <- glm(sat ~ 1, family = binomial, data = data)
forward <- step(mod2, scope = list ( lower = y ~ 1,
upper = formula (mod)), direction="forward") #FORWARD
stepwise<-step(mod2, scope = list ( lower = y ~ 1,
upper = formula (mod)), direction="both", k = log(13) ) #STEPWISE
Based on the results, we ended up with two following models:
\[model1: sat \sim width\] \[model1: sat \sim width+color\]
Now, we compare these two models by ANOVA and can conclude that at the \(\alpha-\)level \(0.01\) the model \(sat \sim width\) is the appropriate model with the \(AIC=192.66.\) The Normal QQ plot do not also reveal any problem. We can now estimate the coefficients and use this model as a prediction equation.
mod1 <- glm(sat ~ width, family = binomial, data = data)
mod2 <- glm(sat ~ width + as.factor(color), family = binomial, data = data)
anova(mod1, mod2, test = "Chisq")
## Analysis of Deviance Table
##
## Model 1: sat ~ width
## Model 2: sat ~ width + as.factor(color)
## Resid. Df Resid. Dev Df Deviance Pr(>Chi)
## 1 166 188.66
## 2 163 181.00 3 7.6583 0.05363 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
qqnorm(residuals(mod1), main="Normal Plot of Residuals")
summary(mod1)
##
## Call:
## glm(formula = sat ~ width, family = binomial, data = data)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -2.0358 -1.0379 0.5498 0.9047 1.6862
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -12.3337 2.6483 -4.657 3.20e-06 ***
## width 0.4973 0.1026 4.848 1.25e-06 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 218.99 on 167 degrees of freedom
## Residual deviance: 188.66 on 166 degrees of freedom
## AIC: 192.66
##
## Number of Fisher Scoring iterations: 4
Thus, we predict the probability \(\pi\) of obtaining a satellite by following equation \[\mathrm{log(\frac{\pi}{1-\pi})}=-12.334+0.497\texttt{width}.\]
The following plot demonstrates the statistical model to calculate the probability of obtaining a satellite. This model uses the width of a crab. We now utilize the model to calculate the likelihood of obtaining the satellite for each crab.
width <- seq(20, 35, by=0.1)
prob <- exp(-12.334+0.497*width)/(1+exp(-12.334+0.497*width))
pred <- data.frame(cbind(width, prob))
ggplot(pred) +
geom_point(aes(x=width, y=prob)) +
labs(title="Prediction Model", x="Width (cm)", y="Probability of having at least one satellite") +
theme(plot.title=element_text(size=15, face="bold"),
axis.text.x=element_text(size=10),
axis.text.y=element_text(size=10),
axis.title.x=element_text(size=12),
axis.title.y=element_text(size=12))