# Lanfeng Pan

PhD in Statistics

I graduated from Department of Statistics at Iowa State University under the direction of Dr. Yehua Li. My research interests include Clustering, Missing Data Analysis and Multiple Testing.

I completed my Master and Bachelor degree in Renmin University of China. My advisor of Master degree is Dr. Xiaoling Lu. My research was about data mining and matrix factorization at that time.

The programming languages I use most are R and Julia. I have been using R for 10 years and Julia for 6 years. I use R for plotting and reporting as well as small projects. When need to do heavy computing, I will turn to Julia for higher performance.

## Education

• Ph.D. in Statistics, Iowa State University, 2012 – 2017.
• Master in Statistics, Renmin University of China, 2010 – 2012.
• Bachelor in Statistics, Renmin University of China, 2006 – 2010.

## Skills

• 10 years experience with R
• 6 years experience with Julia
• 6 years experience with Linux Shell, git, PBS (Portable Batch System)

## Awards

• First Place in the 15th Annual Data Mining Cup out of 125 teams from 28 countries, May 2014.

Predicted item returning probability given customer and item information in an online shopping problem. In charge of tuning the C5.0 algorithm which gave the best performance. See details and news reports at

## Papers

• PAN, L., LI, Y., HE, K., LI, Y. and LI, Y. (2018). Latent Gaussian Mixture Models For Nationwide Kidney Transplant Center Evaluation. (Submitted, under review). arXiv:1703.03753.

Assessed the service quality of nationwide kidney transplant facilities and provided important guide for policymaker. To guarantee the fairness of comparison between facilities with very different number of patients, we proposed to model the facilities as random effects with Gaussian Mixture distribution. Our model avoided estimating variance for each facility so it was more stable than fixed effect models. Furthermore we compared facilities directly based on the their effects so our identification rule was superior than those based on p values.

• LU, X., SI, J., PAN, L. and ZHAO, Y. (2011). Imputation of missing data using ensemble algorithms. Fuzzy Systems and Knowledge Discovery, 2011 Eighth International Conference on Shanghai. pp. 1312-1315

## Work experience

• Research Scientist, Eli Lilly and Company, 2017 – now.

• Research Assistant, 2014 – 2017.

• Intern at Novartis Pharmaceuticals, NJ, May 2015 – August 2015.

Project 1: Built shiny based user interface for data analysis and visualization on remote server. Project 2: Modeled the labor investment of pharmaceutical projects in decades to predict future labor investments. Also built an interactive visualization app for this data.

• Agriculture Experiment Station Consulting Group, May 2014 – July 2014.

• Teaching Assistant, August 2012 – May 2014.

## Presentation

• PAN, L., LI, Y., HE, K., LI, Y. and LI, Y. (2015). Generalized Linear Mixed Model with Normal Mixture Random Effects. Joint Statistical Meetings. ASA. Seattle, WA, USA, Aug. 2015.

## Research Interests

• Data Mining
• Multiple Testing: False discovery rate control, graphical testing.
• Clustering: Normal mixture model, subgroup analysis
• Missing Data: Inverse probability weighting, fractional imputation
• Nonparametrics
• Data Visualization

## Software Packages

Implement kernel density estimation and kernel regression. In particular this package can deal with bounded kernel estimation using beta and gamma kernel and can choose bandwidth via cross valuation.

• LatentGaussianMixtureModel.jl

Fit a Generalized Linear Mixed Model with Gaussian mixture random effects and decide the number of components for Gaussian mixture. And further conduct a multiple test to detect heterogeneity while controlling the False Discovery Rate.

Implement a lot of useful and handy R functions in Julia. The purpose is to provide better statistical functions for Julia language as well as make it easy to translate R code into Julia.

Implement the Kasahara-Shimotsu Test to decide number of components in Gaussian Mixture Model.

R package to solve the nonnegative matrix factorization problem using coordinate descent.

Port the Yeppp! library into Julia, significantly speeding up several basic arithmetic operations.