To celebrate the launch of our new desktop application I thought it would be fun to use Terrene Desktop to do an analysis of a classic dataset, the passengers on the Titanic and try to predict my family's survival had we been on the Titanic. This data set is used as a sample dataset by data scientists and is an example we have often used in demos due to its small size and familiarity. The data for the Titanic passengers looks like this, it has information on whether each passenger survived, what passenger class they were in (Pclass), how many siblings and spouses they travelled with (SibSp), how many parents and children they travelled with (Parch), and several other variables.
During the rest of this post, you will see dashboards and charts which I made using Terrene as I show in the GIF below. Terrene allowed me to filter my data into various sub-groups that I will discuss later and then create visualizations which I displayed in the dashboards you will see with just a few clicks.
Breaking Down the Numbers
Who was on the Titanic?
Before we dive into what affected survival we should look at the general demographics of the Titanic Passengers.
From this initial analysis, we can see that the passengers on the Titanic were largely between the ages of 20 and 4o, ~65% were male, and what I found most surprising is that they were largely travelling alone.
What Affected Survival?
Now that we have quickly looked at the demographics onboard the Titanic we should start looking into who survived and what impacted someone's chances of survival.
The first thing I did to analyze the effects on survival was use Terrene's Feature Importance charts. These charts use a machine learning model to estimate the influence of each variable on an outcome (in this case survival). As you can see when we look at the overall population Age and Sex each contribute ~35% of the outcome, meaning 70% of the influence exerted on surviving or not was based purely on age and gender. With this information, I decided to break down the data into 4 groups. Overall population, adult males, adult females, and minors.
From here we can see what impacted each group's odds of survival. For adult males, it was almost entirely linked to age, whereas for adult females age and passenger class were equally influential.
For minors there were a few influential variables, passenger class and age were important, but interestingly how many siblings they had with them also influenced their odds of survival. Later I will dive into whether or not having more siblings was beneficial to survival or detrimental.
In the pie charts below we see the breakdown of who survived and who did not. One thing we notice is that females had a significantly better chance of surviving than males. Where 77% of adult females survived only 18 % of adult males survived. Historically this makes sense, at the time it was women and children first into the life rafts and it explains what we saw earlier with Sex being of the most influential variables for survival. What I did find interesting is when you look at the survival of minors there is a similar trend where 69% of females survived and only 40% for males.
Beyond the influence of age and gender, we noticed that passenger class influenced the survival chances of minors and adult female passengers. When we dive into survival by passenger class we can see that for minors and adult females first and second class passengers had a significantly better chance of surviving compared to their third-class counter-parts. However, this trend is not present in male passengers where each class of passengers had low chances of survival.
We also saw that in minors the number of siblings impacted their odds of survival. Once we look into the data we see that those with a higher number of siblings travelling with them had a lower chance of survival than those with small families.
However, when I dig into the data itself I see that every passenger with 3 or more siblings was a 3rd class passenger. So more than likely, the number of siblings did not directly influence your survival, however, the fact that a 6 or 7 person family was less likely to buy 1st or 2nd class tickets did influence survival.
Predicting My Survival
Based on what we have seen so far, I am not likely to survive as an adult male. But I will build a machine learning model and also test how my family would have fared if they travelled with me.
Sadly, my father and I would likely have not survived the Titanic disaster but it looks like my sister and mother would have survived.
What Else Can Terrene Do?
This was a fun example to use Terrene and showcase some of its features, but there are plenty of other, more relevant, uses for Terrene. We have helped companies forecast sales and customer demand, predict energy usage across Ontario, and even predicted the risk of someone defaulting on a loan.
If you want to try it for yourself you can start a 14-day free trial at our website https://terrene.co.
Thanks for reading!