I am a Senior Student in SMS(School of Mathematical Science) PKU, majoring Probability and Statistics. My GPA is 3.92/4 and ranks 1/217 in SMS PKU with average score of core courses being 95+. I have solid mathematical foundation especially in Statistics. My research interest includes Theoretical Reinforcement Learning, Machine Learning, and their applications in the real world, such as Genomics. I'm both interested in theoretical and practical problems. I am going to pursue my PhD in the U.S. In PKU I am supervised by Prof. Hao Ge in BICMR, PKU. We did research on RNA Velocity and its application. We are now working on Maternal Cell Contamination Correction in pGT_M and pGT_A dataset. In 2021 summer and autumn, I am supervised by Prof. Mengdi Wang for my research internship. We did research on statistical optimality of a FQE-style Policy Gradient Estimator. We are now working on Off Policy Evaluation(OPE) with General Function Approximation.
Email  /  English: TOEFL 111(R29,L30,S25,W27); GRE 325(V156,Q169,AW3.5)  /  Coding: Python, R, Matlab
Off Policy Evaluation estimates cumulative reward function of certain target policy with data generated by a possibly unknown behavior policy. Among various algorithms, we dig into Fitted Q Estimation which iteratively estimates Q functions via a regression problem. This regression problem can be transformed into empirical loss minimization on certain function class. Typically, for the simplicity of proof, previous research considered either tabular case, where both state space and action space are finite, or linear function approximation, where the considered function class is linear space spanned by some feature map. General function class has been considered, but mostly in non-parametric way. We focus on parametric differentiable function class and we want to expand as much as posslble results in linear case towards more general functions, most impotantly, certain neural networks. By assuming some smoothness condition of this function class, we prove that, the error of FQE estimator is asymptotically normal with explicit variance expression. Variance in linear case is a special case of this general form. We prove two finite upper bound for estimation error. The first error bound is variance-aware, which is tighter and dependent on the asymptotic variance. The other one is in worst case and is reward-free. The dominanting term in worst-case upper bound depends heavily on restricted chi square divergence on some function subset. We further prpose two bootstrap FQE estimatr with general parametric function estimation. The bootstrapping error is asymptotic normal, and by bootstrapping we can establish the asymptotic confidence interval of policy value. Finally we prove the variance of our estimator matches the Cramer Rao lower bound, which implies our estimator is asymptotically efficient.
Off Policy Learning is proven to be an efficient learning paradigm, compared to possibly costly or dangerous on-policy learning methods. We develop Fitted Policy Gradient(FPG) algorithm inspired by Fitted Q Evaluation. FPG method estimates Policy Gradient(PG) rather than Policy Value(expected cumulative reward), but uses a similar way of iteratively solving Q function sand gradient of Q functions. We give a tight finite sample upper bound for the error of our FPG estimator. The dominanting term of upper bound depends heavily on a chi square divergence of certain function class, which measures the distribution mismatch. In tabular case we can get a even sharper bound. We further proved that the error of FPG estmator is asymptotically normal, and its varance matches the Cramer-Rao Lower Bound, which implies FPG estimator is asymptotically efficient. Finally we use FPG estimator to do policy optimization and under Lipschitz condition and PL condition, we provide the sample complexity learn an epsilon-optimal policy. We do several experience and empirically show its efficiency and robust against huge distribution shift. We further propose an implementable bootstrapping version of this FPG estimator and show its distributional consistency.
I have spent half year to cooperate with biology researchers to focus on the issue of maternal cell contamination phenomenon in prenatal screening and diagnose. We want to detect and correct the influence of maternal contamination in prenatal analysis, especially on the vartiant calling. The main difficulty lies on a variety of unknown randomness and inconsistency in batches of data. Besides a Bayesian model as central tool, we improve the model flexibility by enable it adaptively identify the upstream and downstream regions of pathogenic sites, and additionally, reduce its non-robustness by repeating subsampling these sites. On dozens of trials on culture medium with various contamination rate and allele drop out rate, our methods shows advantage against traditional methods adopted by current variant calling pipeline.
My first research project aims at exploiting RNA Velocity, a brand new toolkit invented to recover the dynamical information lost during ScRNA-Seq, to enhance the performance of inference of Gene Regulatory Network. During the project, I independently completed work of literature search, algorithm comparison and coding. Among various GRN Inference methods, we selected several dynamical methods relating to velocity estimation with difference method or kernel smoothing and compared the accuracy with or without RNA Velocity. We find although the increments being data-dependent, RNA Velocity, does restores certain temporal information and improves the prediction effectiveness.