Course Projects


Functional Data Analysis (FDA)
Three seperate projects are done with regards to functional data. Which focus on the fitting data with basis functions, cross validation, see project 1. Dimension reduction through functional PCA and dynamic fPCA, see project 2. Scalar-on-function and function-on-function regression, Gaussian process, see project 3.

coefficient estimation and confidence interval error-covariance matrix

coefficient estimation with confidence interval (left) and error-covariance matrix (right) for functional ANOVA model in project 3.

Categorical Data Analysis (CDA)
The final project of CDA (categorical data analysis) targeted mainly at model fitting (logit-based model, probit and bayesian model), estimation, intpretation of coefficients and goodness of fit. Problems are solved under different grouped data through R language. See final final project.

predicted probability of believes probability distribution for each category

Predicted probability of believes in heaven (left), and probability distribution for each category (right).

Social Network Analysis (SNA)
The final project for social network analysis focuses on calculating and intepreting network-based measures, community detection and visualization for Github users. It also contains a simulation step for random graph, small-world and BA model. In addition, manipulation of data through SQL server to Python and R language is shown. For more details, please refer to the final project.

louvain algorithm fast greedy algorithm

Louvain algorithm (left) and fast greedy algorithm (right) modes for community detection.

Multivariate Data Analysis
This final project focuses on the disease prevalence and control dataset, carried out principal component analysis, factor analysis and cluster analysis. Multidimensional scaling are also carried out to study the similarities between each group. For more please refer to final project.

radar plot disease prevalence of obesity clustering result

Radar plot of disease prevalence for four representative area (left), disease prevalence of obesity (middle), clustering result (right).



Course Presentations


Selected topics in statistics and probabilities
With my teammate, for the final project I complete the review of basic concepts and nonnormal approximation for unbounded exchangeable pairs mainly based on Shao and Zhang, 2019 . For more details, refer to final presentation.

High dimentional statistics
The presentation is a group work, I completed the part from computational approach of Square-root Lasso (convex and non convex, Bunea et al., 2011, Shen et al., 2016) to other aspects of Square-root Lasso (grouped square root lasso). For more details refer to final presentation.

Social network analysis
For final presentation I completed a review on tricks for Econometrics under big data paradigm from the work of Hal and Varian, 2014. It discussed problems of algorithms (overfitting, multicollinearity), classification, longitudinal data analysis and modeling uncertainty, see final presentation.



Undergraduate Thesis


Statistical analysis of the giant panda's behavioral study
The background of this work comes from historical record of pandas from China Conservation and Research Center for the Giant Panda , where I had a volunteer experience at.

volunteer photo volunteer photo volunteer photo

Photos taken during volunteering activity

My graduate thesis focuses on behavioral research of the giant panda through statistical analysis. For studying the similarity and common behavior patterns among individuals, the research carried out time series analysis, hypothesis testing, factor analysis, clustering and time series clustering, with evident behavioral patents fond among pandas related to tense and oestrus behavior, appropriate intepretations are made and compared. For the final presentation click here.

Projection of hierarchical cluster formation mode Projection of hierarchical cluster formation mode

Left: spectral clustering result is similar to multidimensional scaling (MDS) in grouping pandas. Right: Time series clustering result of rest behavior using GAK distance, two prototypes are fond. One type have stable rest over the year, the other have more during winter.


Contact


Email
marcelineliu(at)hotmail(dot)com

Adress
1088 Xueyuan Avenue, Shenzhen 518055, P.R. China

Sitemap
 

© 2021 Marceline Liu.
Last Modified in October 2021.