Jiayu (Alice) Wu 伍佳昱
I currently work as a Machine Learning Research Scientist at Stratifyd Inc, working with Dr. Liu on natural language modeling for an AI-powered business intelligence platform providing automated real time customer analysis for business owners. Prior to that, I graduated from UCLA with a MS degree in Statistics, supervised by Professor Y. Wu. I also have Bachelor degrees of Economics and of Arts from Xiamen University.
I am particularly interested in structured representation learning via generative learning, attnetion mechansm, graph network, etc., to facilitate the understanding of often unbalanced and noisy industrial data. My ongoing project is on semi-supervised Extreme Label Classification on text data.
E-mail: wujiayu86@outlook.com /jiayuwu@ucla.edu
Github: https://github.com/Alice86
LinkedIn: https://www.linkedin.com/in/alice-wu-170146139/
Education
09/2017 —— 06/2019 Master of Science in Statistics $\qquad$
GPA: 3.96 $\qquad$ UCLA
Thesis: Leveraging Label Information in Representation Learning for Multi-label Text Classification
· introduce two designs of label-enhanced representation learning: Label-embedding Attention Model (LEAM) and Conditional Variational Document model (CVDM) with application on real-world datasets
· rely on the annotated labels as auxiliary information to guide the learning of task-tangent data representation as well as supervision in classication stage
09/2014 —— 07/2017 Bachelor of Economics in Statistics $\quad$ GPA: 3.91 $\quad$ Rank:1/21 $\quad$ XMU-WISE
Thesis: 中国省域电子商务发展规模及其影响因素 Provincial Development of Chinese E-commerce
with Related Factors
· Analysis based on PCA and Regression on e-commerce development of 31 regions with 12 features in China
09/2013 —— 07/2017 Bachelor of Art in English $\quad$ GPA: 3.84 $\quad$ Rank:5/99 $\quad$ XMU-CFLC
Thesis: Pauses in Chinese-English Consecutive Interpreting by Novice Learners 口译初学者汉英交替传译中的停顿
· Analyzed features of typical pauses to discuss major challenges for interpreting learners with 244 observations of pauses from simulated interpreting experiments
Working Experience
10/2018-07/2019 Stratifyd Inc., R&D Team, Machine Learning Research Intern
07/2019— Stratifyd Inc., R&D Team, Machine Learning Research Scientist
• Design and deploy machine learning models for AI-powered business intelligence platform providing automated real time customer analysis for business owners [Python, Pytorch, Gensim, Spacy]
• Develop a scalable and generalizable Extreme Label Learning (XML) framework consists of label-aware data representation, hierarchical label representation and parallelizable neural network classifiers for the Semi-auto Taxonomy feature, which improves the F1 by around 0.1 and the coverage rate by around 0.06
• Investigate various designs of model including Attention, Generative Learning, Graph Network, etc. for effective document representation that is relevant to downstream classification tasks by leveraging label information and label relational structure to facilitate the understanding of unbalanced and noisy data
· Contribute in weekly team discussion and journal reading with the research and development team
06/2016-07/2016 ICBC Bills Discounting, Information Management Dept., Shanghai
• Wrote market and industrial analysis for internal briefing and national publications, ex., [R, Excel]
Jiayu Wu. Analyses on the Background and Impact of Establishment of the National Commercial Paper Exchange [J]. China Urban Finance, 2016(8): 28-31
• Data cleaning and feature selection on financial and industrial data for database maintenance and analysis
08/2015-09/2015 CCPIT Xiamen (CIFIT Committee), International Relations Dept., Xiamen
· Assisted organization of the 14th World Business Leader Roundtable on cross-border e-commerce
· Contacted and arranged reception for VIPs from Alibaba, Amazon, Google, Paypal, OTTO, etc.
· Led and trained a group of 12 volunteers for reception and liaison of guests
Research Projects
Learning 3D Grid Cells Patterns as Vector Representation of Self-position Feb. 2019
• Learnt hexagon 3D grid patterns for navigation with only generic assumptions about the algebra and geometry of the representational scheme for position and motion [Tensorflow]
• Proofread the paper and provided the above experiment results in the appendix of ICLR2019 paper by Gao, et al. ‘Learning grid cells as vector representation of self-position with matrix representation of self-motion’
Machine Learning with Application on Vision and Cognition Oct. 2018
• Implement machine learning algorithms like boosting, SVM, PCA, Neural Nets, MCMC [Python, R]
• Replicate state-of-art deep learning methods including ConvNet, ResNet, VAE, DCGAN, ZSL [Tensorflow]
Link to Github Directory
Gender Difference in Movie Genre Preferences – Factor Analysis (FA) on Ordinal Data May 2018
• Examine methodology for nonlinear FA including polychoric FA and optimal scaling by homogeneous analysis
• Apply nonlinear FA on survey data (ordinal) with 10 features and 955 observations and identified two latent factors of preference for the storyline and for the scene to discuss gender difference [R]
Link to Project Report
PCR quantification with state space model estimated by EM algorithm, June 2018
· Built gaussian state space model with time-dependent variance for PCR quantification based on branching theory
· Derived EM algorithm for model estimation and implemented with R on simulated data and real data
Research on Neyman-Pearson (NP) Classification Algorithms (group project), Mar. 2018
· Discussed three approaches to control type I error for binary classification with simulation experiments
· Applied Adaboost with NP umbrella algorithm to predicting whether a customer will respond to promotion plan
· Reduced type I error by .3 and improved testing accuracy by .2 on 22400 observations with 6 features
Behavioral Patterns of Sina Microblog On-line Celebrities (groupb work), May 2016
· Analyzed behavioral patterns and popularity with 7000 microblog posts by 50 on-line celebrities
· Identified peak hours, optimal length, correlation between popularity and originality and text sentiments
Opening Strategy of Gated Neighborhood Based on Urban Traffic Network Evaluating Index (CUMCM Contest, Problem B: the opening of gated neighborhood and its influence on traffic capacity), 2016(10), with Hui Ma, Sijia Li.
· Built performance evaluating model with capacity and efficiency indexes based on graph theory system
· Discussed system changes with traffic assignment model to evaluate the performances with matlab
Skills
Programming: Python (pytorch, tensorflow, scikit-learn, matplotlib), R (ggplot2, rcpp), mySQL, matlab
Statistical Modeling: generalized linear model, latent variable model, multivariate dimension scaling, etc.
Machine Learning: regularization, kernel machine, boosting machine, neural variational inference, MCMC, etc.
Publications
· Jiayu Wu. Analyses on the Background and Impact of Establishment of the National Paper Exchange [J]. China Urban Finance, 2016(8): 28-31
Link to online magazine
· Jiayu Wu, Xiaozheng Wang. Discussion on Unified Nationwide Bill Market - from the Perspective of Establishment of the National Paper Exchange [J]. Journal of Shanghai Finance University, 2016(5):67-72.
Awards
CUMUM 1st Prize of Fujian Division (2016)
National Scholarship (Ministry of Education, 2016)
1st Prize Academic Scholarship (XMU, 2015, 2014)
Excellent Graduate (XMU, 2017)
Excellent Merit Student (XMU, 2016, 2015, 2014)
Dean’s List Awards (WISE, 2016, 2015)
Dean’s List Awards (CFLC, 2016)
Courses
Pattern Recognition and Machine Learning (A+)
Nonparametric Function Estimation and Modeling (A+)
Matrix Algebra and Optimization (A+)
Statistical Modeling and Learning in Vision and Cognition (A)
Monte Carlo Methods for Optimization (A)
Statistical Modeling and Learning (A)
Applied Probability (A)
Research Design, Sampling, and Analysis (A)
Certifications
SQL - Stanford Online $\qquad$ License: d1a297cf388d47b891e58065cc23c9ee $\qquad$ Aug. 2017
R Programming - Coursera $\qquad$ License: DZ4XS8HX3SKK $\qquad$ Aug. 2016