This project involves extracting new Spotify releases via Spotify’s Web API, transforming, and then loading them into a PostgreSQL database. The entire pipeline was orchestrated by Airflow on Docker to run on a daily schedule.
Check out the repo here: https://github.com/samuel-lwl/spotify_etl_public.
In this project, I explored the Huggingface framework and fine-tuned a pre-trained distilBERT model to outperform the default Huggingface sentiment classification model on the task of sentiment analysis of tweets. The dataset used was the Sentiment140 dataset. I also trained a bi-LSTM model in TensorFlow for comparison.
Lastly, I scraped twitter for recent tweets related to Elon Musk and used the fine-tuned transformer to predict their sentiments.
In this project, I explored and visualised public housing resale prices using Plotly from 2017 to 2021 to learn about recent market trends - specifically, the impact of COVID on the market. I also analysed possible factors affecting resale prices.
Supervised learning models were then trained on past data using Scikit-learn and PyTorch and tested on newer unseen transactions, achieving results less than 10% away from the true values on average. Lastly, unsupervised learning models were used to identify interesting patterns in the data by clustering transactions to form house segments.
In this short project, I attempted to detect outliers in product names using a combination of word processing techniques and a couple of unsupervised learning models with varying success.
This project involves predicting whether or not a customer is likely to leave a company. Several supervised learning models were trained and compared on their results, with the best model obtaining 80% accuracy.