RUIQI
profile photo

Ruiqi Zhang

I am a Senior Student in SMS(School of Mathematical Science) PKU, majoring Probability and Statistics. My GPA is 3.92/4 and ranks 1/217 in SMS PKU with average score of core courses being 95+. I have solid mathematical foundation especially in Statistics. My research interest includes Theoretical Reinforcement Learning, Machine Learning, and their applications in the real world, such as Genomics. I'm both interested in theoretical and practical problems. I am going to pursue my PhD in the U.S. In PKU I am supervised by Prof. Hao Ge in BICMR, PKU. We did research on RNA Velocity and its application. We are now working on Maternal Cell Contamination Correction in pGT_M and pGT_A dataset. In 2021 summer and autumn, I am supervised by Prof. Mengdi Wang for my research internship. We did research on statistical optimality of a FQE-style Policy Gradient Estimator. We are now working on Off Policy Evaluation(OPE) with General Function Approximation.

Email  /  English: TOEFL 111(R29,L30,S25,W27); GRE 325(V156,Q169,AW3.5)  /  Coding: Python, R, Matlab

News

  • 06/2022, I graduated from the School of Mathematical Sciences at Peking University.
  • 03/2022, Two papers were accepted by The 39th International Conference on Machine Learning (ICML 2022).
  • 03/2022, I accepted the offer from UC Berkeley and will pursue my Ph.D. in Statistics Department at Berkeley .
  • 03/2022, I reviewed four papers for ICML 2022.
  • 03/2022, Two papers were accepted by The Multi-disciplinary Conference on Reinforcement Learning and Decision Making(RLDM) 2022.
  • 01/2022, Two papers were submitted to ICML 2022.
  • 07/2021, I started working remotely with Professor Mengdi Wang in Electrical and Computer Engineering at Princeton University.
  • 07/2020, I started working with Professor Hao Ge in Biomedical Pioneering Center at Peking University.
  • 09/2018, I was admitted by School of Mathematical Science at Peking University.
  • Research

    Undergraduate Thesis:

    1. Maternal Cell Contamination Correction in Non-invasive Preimplantation Genetic Test for Monogenic Sisease and Aneuploidy Based on Bayesian Model.
      Ruiqi Zhang

    Working Papers:

    1. Statistical Theory of Fitted Policy Evaluation for Bellman Equations with General Risk Functions.
      Ruiqi Zhang, Xuezhou Zhang, Chengzhuo Ni, Csaba Szepesvari, Mengdi Wang

    2. Sample Efficient Off-Policy Policy Gradient via Multiple Marginalized Importance Sampling.
      Zijun Zeng, Ruiqi Zhang

    Conference Papers:

    1. Off-Policy Fitted Q-Evaluation with Differentiable Function Approximators: Z-Estimation and Inference Theory.
      Ruiqi Zhang, Xuezhou Zhang, Chengzhuo Ni, Mengdi Wang
      ICML 2022 | RLDM 2022 | Paper | Talk

    2. Optimal Estimation of Off-Policy Policy Gradient via Double Fitted Iteration.
      Chengzhuo Ni, Ruiqi Zhang, Xiang Ji, Xuezhou Zhang, Mengdi Wang
      ICML 2022 | RLDM 2022 | Paper

    Recent Projects

  • Off Policy Evaluation with General Function Approximation
  • Off Policy Evaluation estimates cumulative reward function of certain target policy with data generated by a possibly unknown behavior policy. Among various algorithms, we dig into Fitted Q Estimation which iteratively estimates Q functions via a regression problem. This regression problem can be transformed into empirical loss minimization on certain function class. Typically, for the simplicity of proof, previous research considered either tabular case, where both state space and action space are finite, or linear function approximation, where the considered function class is linear space spanned by some feature map. General function class has been considered, but mostly in non-parametric way. We focus on parametric differentiable function class and we want to expand as much as posslble results in linear case towards more general functions, most impotantly, certain neural networks. By assuming some smoothness condition of this function class, we prove that, the error of FQE estimator is asymptotically normal with explicit variance expression. Variance in linear case is a special case of this general form. We prove two finite upper bound for estimation error. The first error bound is variance-aware, which is tighter and dependent on the asymptotic variance. The other one is in worst case and is reward-free. The dominanting term in worst-case upper bound depends heavily on restricted chi square divergence on some function subset. We further prpose two bootstrap FQE estimatr with general parametric function estimation. The bootstrapping error is asymptotic normal, and by bootstrapping we can establish the asymptotic confidence interval of policy value. Finally we prove the variance of our estimator matches the Cramer Rao lower bound, which implies our estimator is asymptotically efficient.

  • Fitted Policy Gradient Estimator and its Statistical Optimality
  • Off Policy Learning is proven to be an efficient learning paradigm, compared to possibly costly or dangerous on-policy learning methods. We develop Fitted Policy Gradient(FPG) algorithm inspired by Fitted Q Evaluation. FPG method estimates Policy Gradient(PG) rather than Policy Value(expected cumulative reward), but uses a similar way of iteratively solving Q function sand gradient of Q functions. We give a tight finite sample upper bound for the error of our FPG estimator. The dominanting term of upper bound depends heavily on a chi square divergence of certain function class, which measures the distribution mismatch. In tabular case we can get a even sharper bound. We further proved that the error of FPG estmator is asymptotically normal, and its varance matches the Cramer-Rao Lower Bound, which implies FPG estimator is asymptotically efficient. Finally we use FPG estimator to do policy optimization and under Lipschitz condition and PL condition, we provide the sample complexity learn an epsilon-optimal policy. We do several experience and empirically show its efficiency and robust against huge distribution shift. We further propose an implementable bootstrapping version of this FPG estimator and show its distributional consistency.

  • Maternal Cell Contamination Correction
  • I have spent half year to cooperate with biology researchers to focus on the issue of maternal cell contamination phenomenon in prenatal screening and diagnose. We want to detect and correct the influence of maternal contamination in prenatal analysis, especially on the vartiant calling. The main difficulty lies on a variety of unknown randomness and inconsistency in batches of data. Besides a Bayesian model as central tool, we improve the model flexibility by enable it adaptively identify the upstream and downstream regions of pathogenic sites, and additionally, reduce its non-robustness by repeating subsampling these sites. On dozens of trials on culture medium with various contamination rate and allele drop out rate, our methods shows advantage against traditional methods adopted by current variant calling pipeline.

  • RNA Velocity
  • My first research project aims at exploiting RNA Velocity, a brand new toolkit invented to recover the dynamical information lost during ScRNA-Seq, to enhance the performance of inference of Gene Regulatory Network. During the project, I independently completed work of literature search, algorithm comparison and coding. Among various GRN Inference methods, we selected several dynamical methods relating to velocity estimation with difference method or kernel smoothing and compared the accuracy with or without RNA Velocity. We find although the increments being data-dependent, RNA Velocity, does restores certain temporal information and improves the prediction effectiveness.

    Courses

  • Mathematical Analysis 1/2/3: 97/94/98.
  • Advanced Algebra 1/2/ Abstract Algebra/ Geometry: 96/93/92/95.
  • Complex Analysis/ Real Analysis/ ODE: 94/98/97.
  • General Phisics 1/2: 96/100.
  • Probability/ Mathematical Statistics/ Stochastic Process: 100/98.5/99.
  • Regression Analysis/ Survival Analysis: 99/93.
  • Data Structure/ Machine Learning: 94/97.
  • Measure Theory/ Mathematical Models/ Financial Mathematics/ Time Series: 97/93/94/93.
  • High Dimensional Probability/ Non-parametric Statistics/ Bayesian Statistics: 88/87/89
  • French 1/2/3: 97/92/95.
  • Awards

  • 2021: Huawei Scholarship.

  • 2021,2020,2019: Honor Student in PKU SMS.

  • 2021: H Prize in Americal Mathematical Contest In Modeling.

  • 2020: Qin and Jin Scholarship.

  • 2020: The Second Prize in the Chinese Matematics Competition.

  • 2019: Huawei Scholarship.

  • 2019: The Second Prize in the Chinese Physics Competition.

  • 2019: Fangzheng Scholarship.

  • 2018: Runner Up in Freshmen Debate Competition in Autumn in PKU.

  • Welcome to use this website's source code, just add a link back to here.
    No. Visitor Since Jun 2020. Powered by w3.css