Wenxian Fei - Software Engineering & Data Science Portfolio

Recommendation System for Yelp on Text Mining

Collaborative Filtering / Latent Dirichlet allocation: To improve the rating accuracy by extracting users' preference on multiple dimensions from user review text.

Automation of Market Segmentation and User Persona

Based on users’ large-scale app installation data, we established an automated process for T-mobile to segment their users and create persona labels with machine learning. This process will improve mobile advertising services. We conducted social network analysis to cluster users, did topic modeling (LDA) for apps, extracted distinctive keywords in each community with linear sum assignment to build user personas The persona is able to show multi-dimension characteristics of mobile audiences such as popular apps, genres, topics within each community.

Sparkify Data Pipeline and User Retention Analysis based on AWS and Spark

Created the Datawarehouse for Sparkfiy User Acitivity Log on Redshift, Configured Airflow for scheduling, Using Spark to predict User Retention and Connected the Results with Tableau Dashboard

E-Commerce Retailer Sales Business Intelligence System Design and Implementaton

Dimesional Model & Tableau: Design for XYZ retailer companies Sales BI systems, from dimensional model design, automating table staging from source tables to target tables, data cleaning and preprocessing within SSIS/SSMS, and Tableau Designing and Business Analysis.

Airbnb Housing Price Prediction

Ridge Regression / Random Forest: Aiming to provide house owner an appropriate airbnb rent rate estimation

One Month Data Challenge

Crack the data challenge within one month! Covering fraud detection, user segmentation, A/B testing, Recommendation system based on Clustering, etc,.

Dognition User Rentention Data Visualization and Business Analysis

Tableau: Data visulization of user activity tracking for dognition company to analyze the marketing strategy

Customer Churn Rate Prediction in Telecommunication Industry

Logistics Regression / Random Forest / KNN: Aiming for identifying customers who are likely to stop using service in the future with the analysis of top factors that influence user retention.

SanFrancisco Crime Rate Analysis and Modelling Based on Spark

This project aims to analyze the crime rate of San Francisco to provide the travelling recommendations. The raw data comes from SF government official website.

D3.js Data Visualization: Seattle Airbnb Rental Analysis

Using HMTL/D3.js to realize a heatmap with the popular rating