Projects Summary
Machine Learning
Tools Main topics covered in the different projects:
Feature Engineering: Data cleaning, feature preparation (RFECV, heatmap, corr), selection and engineering
Supervised Learning: LinearRegression (uni and multivariate), Logistic Regression (simple and one-versus-all Method), Neural Networks, DecisionTree, RandomForest, ExtraTrees, Bagging, AdaBoost, GradientBoosting, VotingClassifier, GaussianNB, K-nearest Neighbors
Unsupervised Learning: K-means clustering, Support Vector Machines, Principal Component Analysis
NLP Sentiment Analysis with Naive Bayes, Regression with LinearRegression and RandomForestRegressor + bag-of-words, Topic modelling with Latent Dirichlet Allocation and Non-negative Matrix Factorisation (with TF-IDF), Dimensionality reduction with truncated singular value decomposition (SVD).
Error Metrics: ROC_AUC, Accuracy, Precision, Recall, F1, MCC || R2, Explained Variance, MSE, MAE
Validation & Model Selection: Holdout, cross validation, Hyper parameter search with GridSearchCV
SQL & Web Scraping
Tools Main topics covered in the different projects:
Scraping: Data mining using API's and scraping
SQLite Basic: Clean, upload, join and aggregate
SQLite Advanced: Advanced joins and subqueries, split tables, create relations and indexes
DBDesigner: Design and create a database from scratch
Spark: Data analysis with Spark
Data Cleaning, Exploration and Visualisation
Tools Main topics covered in the different projects:
Matplotlib: Handling data with dictionaries and visualisation
Pandas: Data analysis, processing and visualisation
Seaborn: Advanced Visualisation with Seaborn
Statistics: Hypothesis Testing with PearsonR and Linear Regression
Basemap: Visualising geographic data

Data Science Portfolio by Bruno Henriques

explore data patterns, construct robust models, predict future trends

(based on DataQuest & DataCamp online courses)