Tools | Main topics covered in the different projects: |
---|---|
Feature Engineering: | Data cleaning, feature preparation (RFECV, heatmap, corr), selection and engineering |
Supervised Learning: | LinearRegression (uni and multivariate), Logistic Regression (simple and one-versus-all Method), Neural Networks, DecisionTree, RandomForest, ExtraTrees, Bagging, AdaBoost, GradientBoosting, VotingClassifier, GaussianNB, K-nearest Neighbors |
Unsupervised Learning: | K-means clustering, Support Vector Machines, Principal Component Analysis |
NLP | Sentiment Analysis with Naive Bayes, Regression with LinearRegression and RandomForestRegressor + bag-of-words, Topic modelling with Latent Dirichlet Allocation and Non-negative Matrix Factorisation (with TF-IDF), Dimensionality reduction with truncated singular value decomposition (SVD). |
Error Metrics: | ROC_AUC, Accuracy, Precision, Recall, F1, MCC || R2, Explained Variance, MSE, MAE |
Validation & Model Selection: | Holdout, cross validation, Hyper parameter search with GridSearchCV |
Tools | Main topics covered in the different projects: |
---|---|
Scraping: | Data mining using API's and scraping |
SQLite Basic: | Clean, upload, join and aggregate |
SQLite Advanced: | Advanced joins and subqueries, split tables, create relations and indexes |
DBDesigner: | Design and create a database from scratch |
Spark: | Data analysis with Spark |
Tools | Main topics covered in the different projects: |
---|---|
Matplotlib: | Handling data with dictionaries and visualisation |
Pandas: | Data analysis, processing and visualisation |
Seaborn: | Advanced Visualisation with Seaborn |
Statistics: | Hypothesis Testing with PearsonR and Linear Regression |
Basemap: | Visualising geographic data |
explore data patterns, construct robust models, predict future trends
(based on DataQuest & DataCamp online courses)