Gonzalo’s Portfolio
Hello! Welcome to my Data science portfolio.
Hope you like it and if you are interested to contact me here is my LinkedIn.
Project 1: Exploratory Data Analysis: Personal Key Indicators of Heart Disease
Take a look to the Kaggle URl where you can download the data According to the CDC, heart disease is one of the leading causes of death for people of most races in the US (African Americans, American Indians and Alaska Natives, and white people). About half of all Americans (47%) have at least 1 of 3 key risk factors for heart disease: high blood pressure, high cholesterol, and smoking. Other key indicator include diabetic status, obesity (high BMI), not getting enough physical activity or drinking too much alcohol. Detecting and preventing the factors that have the greatest impact on heart disease is very important in healthcare. Computational developments, in turn, allow the application of machine learning methods to detect “patterns” from the data that can predict a patient’s condition.
Project 2: ICU Transfers 2019-2021
- Using highcharter library we develop a time-series visualization of ICU transfers of a region of Spain
- It is an interactive way between periods and more than 20 bases
- A summary of different parameters per each base
- A boxplot to visualize the distribution of the records of transfers in general and per year
- Total number of transfers per year and base
Project 3: Churn customers prediction
- A Exploratory Data Analysis (EDA) was performed to understand the data and to compare betwwen groups of population
- Comparing by graphs the two groups was performed too in case it was a real variable was performed a density and boxplot graph otherwise was performed a barplot
- The resampling and feature engineering was important due to the imbalance dataset
- At the end we use a XGboost model for classification and using gridsearch we improve the hyperparameter tune to improve the performance
Project 4: Bootcamp DPhi Gender Determination by Morphometry of Eyes
Problem Statement
The anthropometric analysis of the human face is an essential study for performing craniofacial plastic and reconstructive surgeries. Facial anthropometrics are affected by various factors such as age, gender, ethnicity, socioeconomic status, environment, and region.
Our objective is to build a model to scan the image of an eye of a patient and find if the gender of the patient is male or female.
For this problem a deep learning model using tensorflow and keras with a result of 92.4512% of correct labeled images to the competition and being 37 out of 91.
Project 5: Forecast of Ethereum using LSTM Neural Network
Problem Statement
The variance of the price of the cryptocurrencies are one of the main problems
Our objective is forecast with precision into this time series problem
For this problem a powerful type of neural network designed to handle sequence dependence is called recurrent neural networks. The Long Short-Term Memory network or LSTM network is a type of recurrent neural network used in deep learning because very large architectures can be successfully trained.
We have a result of 0.1 in the Root Mean Squared Error (RMSE) which measures the difference between the prediction and the real value.
Project 6: Breast cancer detection using Random Forest
(recommended to open in colab)
Breast cancer is a type of cancer that starts in the breast. Cancer starts when cells begin to grow out of control. Breast cancer cells usually form a tumor that can often be seen on an x-ray or felt as a lump. Breast cancer occurs almost entirely in women, but men can get breast cancer, too. A benign tumor is a tumor that does not invade its surrounding tissue or spread around the body. A malignant tumor is a tumor that may invade its surrounding tissue or spread around the body.
In this project we are going to develop a random forest solution but an emphasis in using SHAP (SHapley Additive exPlanations) which is a unified approach to explain the output of any machine learning model. SHAPE explanation
Thank you for the visit :)