Data Science Topics

This page contains most of the topics I've covered in a self-set curriculum as I study the field of data science (with a strong focus on machine learning). Bullets without a link are topics that I plan to get to, but will not post an article on in the immediate future. Links labeled "coming soon" are posts currently in progress.

Courses I've taken

Machine Learning

Machine Learning Overview

The General ML Framework

Machine Learning Models

Classification

Classification algorithms are used when you have a dataset of observations where we'd like to use the features associated with an observation to predict its class.

Example: Predict the type of flower when provided information on sepal length, sepal width, color, petal width, and petal length.

Regression

Regression algorithms are used when you have a dataset of observations where you'd like to use the features to predict a continuous output.

Example: Predict the price of a house using the following features: sq ft, number of rooms, zip code, age of house, school district.

Clustering

Clustering is a popular technique to find groups or segments in your data that are similar. This is an unsupervised learning algorithm in the sense that you don't train the algorithm and give it examples for what you'd like it to do, you just let the clustering algorithm explore the data and provide you with new insights.

K-means clustering
Soft clustering with Gaussian mixture models
Density-based spatial clustering of applications with noise (DBSCAN)

Dimensionality Reduction

When we're building machine learning models, sometimes we deal with datasets with well over 1,000 or even 10,000 dimensions. While this allows us to account for many features, these features are often redundant. Ideally, due to the curse of dimensionality, we'd like to limit our data to capture the true signal in the data and ignore the noise. Dimensionality reduction is one technique to reduce the dimension of our feature-space while maintaining the maximum amount of information. Dimensionality reduction is also very convenient for visualizing higher-dimensional data sets in two or three dimensions. This paper provides a great overview of the different techniques available for dimensionality reduction.

Neural Networks

Neural networks are one of the most popular approaches to machine learning today, achieving impressive performance on a large variety of tasks. Often referred to as the "universal function approximator", this approach is very flexible to learning a variety of tasks.

Foundation
- Introduction and network representation
- Activation functions
Training
Convolutional neural networks
- Introduction to convolutional neural networks
- Common architectures in convolutional neural networks
- Image segmentation
  - Semantic image segmentation
  - Instance image segmentation
  - Evaluating image segmentation models
- Object detection
  - One stage methods: YOLO and SSD
  - Two stage methods: Faster R-CNN
  - Evaluating object detection models
- Facial recognition
Recurrent neural networks
- Introduction to recurrent neural networks
- Gated recurrent units: Introducing intentional memory
- Long short term memory networks: Learning what to remember and what to forget
- Attention mechanisms
Transformer networks
Transfer learning
- Image recognition
- Natural language processing
One-shot learning
- Siamese networks

Reinforcement Learning

Reinforcement learning is an approach to machine learning where agents are rewarded to accomplish some task. "Good" behavior is reinforced via a reward, so this approach can more realistically be considered a method of reward maximization. This book is the canonical resource for learning RL.

Machine Learning Applications

Building a recommendation system with collaborative filtering

Natural Language Processing

Preprocessing text data for NLP
TF-IDF Vectorization

Data Visualization

The following links are external links to useful resources. At this time, I haven't written any blog posts on data visualizations but wanted to save a few external posts for future reference.

Data Acquisition and Wrangling

Basics of SQL