Statistics and data science encompasses the art of modelling, summarizing, and dissecting data, utilizing the powers of mathematics and computational tools to make informed predictions and decisions. Our Doctor of Philosophy in Statistics and Data Science has been meticulously crafted to equip students with a well-rounded education in the theories and methodologies of these disciplines, enabling them to address challenges across various domains. This Ph.D. program empowers students to engage in pioneering research, whether in the realms of theory, methodology, or practical applications. It fosters the development of advanced research and computational skills, positioning our graduates to excel in competitive roles within academic, governmental, and corporate settings.
Welcome to the Department of Statistics and Data Science at Mohamed bin Zayed University of Artificial Intelligence (MBZUAI). Our department uniquely integrates rigorous statistical foundations with cutting-edge artificial intelligence (AI). At MBZUAI, our faculty and researchers emphasize the critical role of statistical theory and modern data science in ensuring reliable and effective AI solutions. We equip our students with the expertise to tackle real-world challenges, including efficient data handling, uncertainty quantification, and responsible decision-making.
As AI increasingly transforms industries and society, statistics plays a pivotal role in guiding the responsible and informed development of these technologies. Our department leads the development of innovative AI methods, advancing their reliability and extending their applications to diverse domains such as life sciences, social sciences, healthcare, economics, and environmental studies. We actively explore how statistical approaches can enhance AI’s potential, ensuring trustworthy and impactful solutions across these critical areas.
Join us at MBZUAI in advancing the critical synergy between statistics, data science, and AI.
Department Chair of Statistics and Data Science, and Visiting Professor of Statistics and Data Science
National Qualifications Framework – three strands
The program learning outcomes (PLOs) are aligned with the Emirates Qualifications Framework and, as such, are divided into the following learning outcomes strands: knowledge (K), skills (S), and responsibility (R).
Program learning outcomes
Upon completion of the program requirements, graduates will be able to:
The PLOs are mapped to the National Qualifications Framework Level Eight (8) qualification and categorized into three domains (knowledge, skill, and Responsibility) as per the National Qualifications Framework set by the UAE National Qualifications Centre (NQC) and the Ministry of Higher Education and Scientific Research (MoHESR):
PLOs | Knowledge | Skill | Responsibility |
---|---|---|---|
PLO 01 | K | S | R |
PLO 02 | K | – | R |
PLO 03 | K | S | R |
PLO 04 | K | – | – |
PLO 05 | K | S | R |
The minimum degree requirements for the Doctor of Philosophy in Statistics and Data Science is 60 credits, as follows:
Number of courses | Credit hours | |
---|---|---|
Core | 4 | 16 |
Electives | At least two dependent on credit hours | 8 |
Internship | At least one internship of a minimum of three months duration must be satisfactorily completed as a graduation requirement | 2 |
Advanced research methods | 1 | 2 |
Research thesis | 1 | 32 |
The Doctor of Philosophy in Statistics and Data Science is primarily a research-based degree. The purpose of coursework is to equip students with the correct skill set, enabling them to complete their research project (thesis) successfully. Students are required to take the mandatory core courses. They must select a minimum of two (2) electives.
To accommodate a diverse group of students, coming from different academic backgrounds, students have been provided with flexibility in course selection. The decision on the courses to be taken will be made in consultation with the students’ supervisory panel, which will comprise two or more faculty members. Essentially, the student’s supervisory panel will help design a personalized coursework plan for each individual student by considering their prior academic track record and experience, as well as the planned research project.
All students must take the following core courses:
Code | Course title | Credit hours |
---|---|---|
INT899 |
Ph.D. Internship
Assumed knowledge: Prior to undertaking an internship opportunity, students must have successfully completed 24 credit hours. Course description: The MBZUAI internship with industry is intended to provide the student with hands-on experience, blending practical experiences with academic learning. |
2 |
RES899 |
Advanced Research Methods
Course description: This course will prepare students to produce professional-quality academic research and solve practical research challenges based on innovative and ethical research principles. This course will provide exposure to a variety of research topics related to AI, research integrity, AI ethics, and organizational challenges. Students will learn to assess their own research projects and scrutinize the research methods and metrics used in their research and critically examine the ethical implications of their work. They will learn about the peer-reviewing process, participate in reviewing their classmates’ work, and learn best-practice for oral and written presentation of research. After completing the course, students will have the skills to develop a research methodology and conduct research that is rigorous and ethical. |
2 |
SDS899 |
Statistics and Data Science Ph.D. Research Thesis
Assumed knowledge: Course work plus a pass in the qualifying exam. Course description: Ph.D. thesis research exposes students to cutting-edge and unsolved research problems, where they are required to propose new solutions and significantly contribute towards the body of knowledge. Students pursue an independent research study, under the guidance of a supervisory panel, for a period of three (3) to four (4) years. Ph.D. thesis research helps train graduates to become leaders in their chosen area of research through partly supervised study, eventually transforming them into researchers who can work independently or interdependently to carry out cutting edge research. |
32 |
SDS8101 |
Advanced Probability Theory and Stochastic Processes
Assumed knowledge: Calculus and linear algebra. A calculus-based introduction to probability theory. Course description: This course provides an advanced course on probability theory and stochastic processes essential for modeling a variety of real-world scenarios involving uncertainty and randomness. Students will become experts in the language of probability theory, enabling them to effectively analyze and address complex challenges in both pure and applied sciences. Through practical problem solving, they will harness the power of probabilistic thinking to derive insightful solutions. |
4 |
SDS8102 |
Advanced Mathematical Statistics
Assumed knowledge: Calculus and linear algebra. Measure-based probability theory. An introductory course in statistics is recommended. Course description: This course explores statistics based on the principles of probability theory. It takes an in-depth look at the mathematical foundations of decision theory and provides students with tools and techniques for constructing estimators, hypothesis tests, and confidence regions. The course also focuses on the exploration of empirical process theory, emphasizing key optimality results of likelihood methods. In addition, the course highlights several key areas of statistics, such as resampling techniques and nonparametric statistics. |
4 |
SDS8103 |
Statistical Learning
Assumed knowledge: The course requires a good level of mathematical maturity. Students are expected to have a working knowledge of core concepts in probability theory (basic properties of probabilities such as union bounds, and of conditional expectations, such as the tower property; basic inequalities, such as Markov’s and Jensen’s), statistics (confidence intervals, hypothesis testing), and linear algebra (matrix-vector operations, eigenvalues and eigenvectors; basic inequalities, such as Cauchy-Schwarz’s and Hölder’s). Previous exposure to machine learning is recommended. Prerequisite course/s: SDS8102 Advanced Mathematical Statistics Course description: This course delves into the theoretical core of statistical learning through a rigorous exploration of algorithmic paradigms, statistical learning frameworks, and optimization strategies. This course provides a principled foundation, enriched by the study of high-dimensional probability and statistics, preparing students to navigate and innovate within the statistical learning landscape. |
4 |
SDS8104 |
Computation in the Era of Big Data
Course description: This course prepares Ph.D. students to tackle scientific questions using data within the modern data science life cycle. It emphasizes critical thinking, computational methodology, and trustworthy, reproducible practices. Through open-ended labs in Python students will explore exploratory data analysis, model formulation and validation, simulation, and interpretation, while critically examining the assumptions underpinning standard statistical models and methods. The class will culminate in a group project. |
4 |
The Ph.D. research thesis exposes students to cutting-edge and unsolved research problems in the field of statistics and data science, where they are required to propose new solutions and significantly contribute to the body of knowledge. Students pursue an independent research study, under the guidance of a supervisory panel, for a period of three (3) to four (4) years.
Code | Course title | Credit hours |
---|---|---|
SDS899 |
Statistics and Data Science Ph.D. Thesis
Assumed knowledge or preparation: Coursework and a pass in qualifying exam. Course description: Ph.D. thesis research exposes students to cutting-edge and unsolved research problems, where they are required to propose new solutions and significantly contribute towards the body of knowledge. Students pursue an independent research study, under the guidance of a supervisory panel, for a period of three (3) to four (4) years. Ph.D. thesis research helps train graduates to become leaders in their chosen area of research through partly supervised study, eventually transforming them into researchers who can work independently or interdependently to carry out cutting-edge research. |
32 |
RES899 |
Advanced Research Methods
Course description: This course will prepare students to produce professional-quality academic research and solve practical research challenges based on innovative and ethical research principles. This course will provide exposure to a variety of research topics related to AI, research integrity, AI ethics, and organizational challenges. Students will learn to assess their own research projects and scrutinize the research methods and metrics used in their research and critically examine the ethical implications of their work. They will learn about the peer-reviewing process, participate in reviewing their classmates’ work, and learn best-practice for oral and written presentation of research. After completing the course, students will have the skills to develop a research methodology and conduct research that is rigorous and ethical. |
2 |
The MBZUAl internship with industry is intended to provide the student with hands-on experience, blending practical experiences with academic learning.
Code | Course title | Credit hours |
---|---|---|
INT899 |
Ph.D. Internship (up to four months)
Assumed knowledge: Prior to undertaking an internship opportunity, students must have successfully completed 24 credit hours. Course description: The MBZUAI internship with industry is intended to provide the student with hands-on experience, blending practical experiences with academic learning. |
2 |
Students will select a minimum of two elective courses, with a total of eight (or more) credit hours based on interest, proposed research thesis, and career aspirations, in consultation with their supervisory panel. The elective courses available for the Doctor of Philosophy in Statistics and Data Science are listed below.
Code | Course title | Credit hours |
---|---|---|
ML804 |
Advanced Topics in Continuous Optimization
Assumed knowledge: Basic optimization class. Basics of linear algebra, calculus, trigonometry, and probability and statistics. Proficiency in Python and PyTorch. Course description: The course covers advanced topics in continuous optimization, such as stochastic gradient descent and its variants, methods that use more than first-order information, primal-dual methods, and methods for composite problems. Participants will read the current state-of-the-art relevant literature and prepare presentations to the other students. Participants will explore how the presented methods work for optimization problems that arise in various fields of machine learning and test them in real-world optimization formulations to get a deeper understanding of the challenges being discussed. Participants will explore how the presented methods work for optimization problems that arise in various fields of Machine Learning and test them in real-world optimization formulations to get a deeper understanding of the challenges being discussed. |
4 |
ML806 |
Advanced Topics in Reinforcement Learning
Assumed knowledge: Good understanding of basic reinforcement learning (RL). Basics of linear algebra, calculus, trigonometry, and probability and statistics. Proficiency in Python and good knowledge of Pytorch library. Course description: The course covers advanced topics in reinforcement learning (RL). Participants will read the current state-of-the-art relevant literature and prepare presentations to the other students. Participants will explore how the presented methods work in simplified computing environments to get a deeper understanding of the challenges that are being discussed. Topics discussed include exploration, imitation learning, hierarchical RL, multi agent RL in both competitive and collaborative setting. The course will also explore multitask and transfer learning in RL setting. |
4 |
ML808 |
Causality and Machine Learning
Assumed knowledge: Basic knowledge of linear algebra, probability, and statistical inference. Basics of machine learning. Basics of Python (or Matlab) or Pytorch. Course description: In the past decades, interesting advances were made in machine learning, philosophy, and statistics for tackling long-standing causality problems, including how to discover causal knowledge from observational data, known as causal discovery, and how to infer the effect of interventions. Furthermore, it has recently been shown that the causal perspective may facilitate understanding and solving various machine learning/artificial intelligence problems such as transfer learning, semi-supervised learning, out-of-distribution prediction, disentanglement, and adversarial vulnerability. This course is concerned with understanding causality, learning causality from observational data, and using causality to tackle a large class of learning problems. The course will include topics like graphical models, causal inference, causal discovery, and counterfactual reasoning. It will also discuss how we can learn causal representations, perform transfer learning, and understand deep generative models. |
4 |
ML813 |
Dimensionality Reduction and Manifold Learning
Assumed knowledge: Advanced calculus, and probability and statistics. Proficiency in programming. Foundation of machine learning. Good Knowledge of optimization tools. Course description: The course focuses on building foundations and introducing recent advances in dimensionality reduction and manifold learning, which are key topics in machine learning. This course builds upon fundamental concepts in machine learning and assumes familiarity with concepts in optimization and advanced calculus. The course covers advanced topics in spectral, probabilistic, and neural network-based dimensionality reduction and manifold learning, as well as contrastive learning and disentangled representation learning. Students will be engaged through coursework, assignments, presentations, and projects. |
4 |
ML8102 |
Advanced Machine Learning
Assumed knowledge: This Ph.D.-level course assumes familiarity with core concepts in probability theory, including random variables and basic stochastic processes. Students should have prior exposure to fundamental machine learning methods and neural networks, as well as comfort with Python programming, ideally using frameworks such as PyTorch. Basic understanding of ordinary differential equations (ODEs) and elementary numerical methods is advantageous but not strictly required. Advanced concepts such as measure theory, stochastic differential equations, and optimal transport will be briefly reviewed during the course, though some prior exposure would be beneficial to fully engage with the material. Prerequisite course/s: ML8101 Foundations of Machine Learning Course description: This advanced course offers an in-depth exploration of diffusion models, flow matching, and consistency models, essential tools for state-of-the-art generative AI. Beginning with foundational principles, students will gain rigorous understanding of diffusion processes, stochastic differential equations, and discrete Markov chains, ensuring a robust conceptual framework. We will examine classical and modern diffusion-based generative techniques, such as denoising diffusion probabilistic models (DDPM) and score-based generative modeling (SGM), alongside detailed mathematical derivations and convergence analysis. The course progresses to flow matching, elucidating the connections and contrasts with diffusion methods through explicit mathematical formulations, focusing on optimal transport theory, continuous normalizing flows, and numerical solutions of differential equations governing generative processes. We will then dive deeply into consistency models, analyzing their theoretical foundations, fast sampling techniques, and how they bridge diffusion and flow-based approaches. The course incorporates practical implementations and case studies in Python, ensuring students achieve both theoretical depth and applied proficiency. Upon completion, participants will be equipped with comprehensive knowledge of the mathematics, theory, and practical applications behind diffusion and flow-based generative models, preparing them for advanced research or industry innovation. |
2 |
ML8501 |
Algorithms for Big Data
Assumed knowledge: Good knowledge of calculus, linear algebra, and probability and statistics. Course description: This course is an advanced course on algorithms for big data that involves the use of randomized methods, such as sketching, to provide dimensionality reduction. It also discussed topics such as subspace embeddings and low rank approximation. The course lies at the intersection of machine learning and statistics. |
2 |
ML8505 |
Tiny Machine Learning
Assumed knowledge: Understanding of calculus, algebra, and probability and statistics. Proficiency in Python or similar language. Basic knowledge of machine learning and deep learning. Course description: This comprehensive Ph.D.-level course explores the intricacies of modern machine learning (ML), with a specific focus on TinyML, efficient machine learning and deep learning. Through an integration of lectures, readings, and practical labs, students will be exposed to the evolution of TinyML from its traditional roots to the deep learning era. This course will introduce efficient AI computing strategies that facilitate robust deep learning applications on devices with limited resources. We will explore various techniques including model compression, pruning, quantization, neural architecture search, as well as strategies for distributed training, such as data and model parallelism, gradient compression, and methods for adapting models directly on devices. Additionally, the course will focus on specific acceleration approaches tailored for large language models and diffusion models. Participants will gain practical experience in implementing large language models (LLMs) on standard laptops. |
2 |
ML8509 |
Collaborative Learning
Assumed knowledge: Understanding of machine learning (ML) principles and basic algorithms. Good knowledge of multivariate calculus, linear algebra, optimization, probability, and algorithms. Proficiency in some ML frameworks, e.g., PyTorch and TensorFlow. Course description: This graduate course explores a modern branch of machine learning: collaborative learning (CL). In CL, models are trained across multiple devices or organizations without requiring centralized data collection, with an emphasis on efficiency, robustness, and privacy preservation. CL encompasses approaches such as federated learning, split learning, and decentralized training. It integrates ideas from supervised and unsupervised learning, distributed and edge computing, optimization, communication compression, privacy preservation, and systems design. The field is rapidly evolving, with early production frameworks (e.g., TensorFlow Federated, Flower) and active research addressing both theoretical foundations and practical challenges. This course familiarizes students with key developments and practices including: Evaluation is based primarily on students’ paper presentations and a final project chosen by each student, encouraging hands-on engagement with cutting-edge research and applications. |
4 |
SDS8501 |
Deep Learning Theory
Assumed knowledge: Advanced course on mathematical statistics. Advanced course on optimization. Course description: Deep learning (DL) is an extremely important applied science that, at present, is poorly understood theoretically. We know that neural networks work well but cannot fully explain why. Nevertheless, in recent years, there has been a rapid growth of publications that shed light on the new mathematics underlying DL, and we see now many interesting connections between DL and various other fields, e.g., approximation theory, differential equations, information theory, random matrix theory, statistical physics. This course aims to introduce students to these cutting-edge developments. |
4 |
SDS8502 |
Generative Models
Assumed knowledge: Fundamental courses on machine learning and statistics. Prerequisite course/s: Fundamental courses on machine learning and statistics. Course description: This comprehensive course delves into the intricate theoretical and mathematical underpinnings of generative modeling techniques. It offers a meticulous exploration of a wide spectrum of models, ranging from the fundamental normalizing flows to the advanced denoising diffusion models. Participants will gain a profound understanding of the core principles, enabling them to harness the power of generative modeling for diverse applications. Through a blend of theoretical insights and practical applications, this course equips learners with the knowledge and skills necessary to navigate the cutting-edge landscape of generative modeling, paving the way for innovation and creativity in fields like artificial intelligence, computer vision, and data science. |
4 |
SDS8503 |
High-Dimensional Probability and Statistics
Assumed knowledge: Advanced course on probability theory. Advanced course on classical mathematical statistics. Course description: High-dimensional probability investigates the behavior of high dimensional random objects such as random vectors and random matrices with the emphasis upon quantifying the role of the dimension. The course also introduces some contemporary statistical problems including principal component analysis in high-dimension, sparse linear regression, and community detection. |
4 |
MBZUAI accepts applicants who hold a completed degree in a STEM field such as computer science, electrical engineering, computer engineering, mathematics, physics, or other relevant science or engineering major that demonstrates academic distinction in a discipline appropriate for the doctoral degree – either:
Applicants must provide their completed degree certificates and official transcripts when submitting their application. Senior-level students can apply initially with a copy of their official transcript and expected graduation letter and upon admission must submit the official completed degree certificate and transcript. A degree attestation from UAE Ministry of Higher Education and Scientific Research (for degrees from the UAE) or Certificate of Recognition (for degrees acquired outside the UAE) should also be furnished within students’ first semester at MBZUAI.
All applicants whose first language is not English must demonstrate proficiency in English through one of the following:
*Exams must be administered at an approved physical test center. Home Edition exams are not accepted.
English language proficiency waiver eligibility
Applicants may qualify for a waiver if they meet one of the following conditions:
English language requirement deadline: The English language requirement should be submitted within the application deadline. However, for those who require more time to satisfy this requirement, there is a final deadline of March 1.
Submission of GRE scores is optional for all applicants but will be considered a plus during the evaluation.
In a 500- to 1000-word essay, explain why you would like to pursue a graduate degree at MBZUAI and include the following information:
The research statement is a document summarizing the potential research project an applicant is interested in working on and clearly justify the research gap which the applicant would like to fill in during the course of his/her study. It must be presented in the context of currently existing literature and provide an overview of how the applicant aims to investigate the underlying research project as well as predict the expected outcomes. It should mention the relevance and suitability of the applicant’s background and experience to the project and highlight the project’s scientific and commercial significance. The research statement should include the following details:
Applicants are expected to write the research statement independently. MBZUAI faculty will NOT help write it for the purpose of the application. The MBZUAI Admission Committee will review the submitted document and use it as one of the measures to gauge and assess applicants’ skills.
Applicants will be required to nominate referees who can recommend their application. Ph.D. applicants should have a minimum of three (3) referees wherein at least one was a previous course instructor or faculty/researcher advisor and the others were current or previous work supervisors.
To avoid issues and delays in the provision of the recommendation, applicants have to inform their referees of their nomination beforehand and provide the latter’s accurate information in the online application portal. Automated notifications will be sent out to the referees upon application submission.
Within 10 days of submitting your application, you will receive an invitation to book and complete an online screening exam that assesses knowledge and skills relevant to your chosen field. While you may choose to opt out of the screening exam, this is only recommended for applicants whose profiles already demonstrate strong evidence of the skills assessed in the exam.
Exam topics
Math: Calculus, probability theory, linear algebra, and trigonometry.
Programming: Knowledge surrounding specific programming concepts and principles such as algorithms, data structures, logic, OOP, and recursion as well as language–specific knowledge of Python.
Machine learning: Supervised and unsupervised learning, neural networks, and optimization.
Applicants are highly encouraged to complete the following online courses to further improve their qualifications:
For more information regarding the screening exam (e.g. process, opting out criteria, and technical specifications), register on the application portal here, and view this knowledge article.
A select number of applicants may be invited to an interview with faculty as part of the screening process. The time and instructions for this will be communicated to applicants in a timely manner.
Only one application per admission cycle must be submitted; multiple submissions are discouraged.
Application portal opens | Priority deadline* | Final deadline | Decision notification date | Offer response deadline |
---|---|---|---|---|
September 1, 2025 (8 a.m. GST) |
November 15, 2025 (5 p.m. GST) |
December 15, 2025 (5 p.m. GST) |
March 15, 2026 (5:00 p.m. GST) |
April 15, 2026 |
* Applications submitted by the priority deadline will be reviewed first. While all applications submitted by the final deadline (December 15, 2025) will be considered, applying by the priority deadline is strongly encouraged. Admissions are highly competitive and space in the incoming cohort is limited.
Detailed information on the application process and scholarships is available here.
AI is reshaping industries worldwide and MBZUAI’s research continues to highlight the impact of AI advancements across key industries.
More informationThe Incubation and Entrepreneurship Center (IEC) is a leading AI-native incubator with the aim to nurture and support the next generation of AI-driven startups.
More informationWe’ll keep you up-to-date with the latest news and when applications open.