Statistics and Data Science encompasses the art of modelling, summarizing, and dissecting data, utilizing the powers of mathematics and computational tools to make informed predictions and decisions. Our Doctor of Philosophy (PhD) program in Statistics and Data Science has been meticulously crafted to equip students with a well-rounded education in the theories and methodologies of these disciplines, enabling them to address challenges across various domains.
This Ph.D. program empowers students to engage in pioneering research, whether in the realms of theory, methodology, or practical applications. It fosters the development of advanced research and computational skills, positioning our graduates to excel in competitive roles within academic, governmental, and corporate settings.
Deadlines for Applications for Fall 2025:
Priority deadline for early decision: 28th February 2025 (5:00 PM UAE time)
Regular deadline: 31st May 2025 (5:00 PM UAE time)
The minimum degree requirements for the Doctor of Philosophy in Statistics and Data Science is 60 Credits, as follows:
Number of Courses | Credit Hours | |
---|---|---|
Core | 4 | 16 |
Electives | 2 | 8 |
Internship | At least one internship of up to four-months’ duration must be satisfactorily completed as a graduation requirement | 2 |
Advanced Research Methods | 1 | 2 |
Research Thesis | 1 | 32 |
The Doctor of Philosophy in Statistics and Data Science is primarily a research-based degree. The purpose of coursework is to equip students with the right skillset, so they can successfully accomplish their research project (thesis). Students are required to take STADS801 , STADS802 , STADS803 , and STADS804 as mandatory courses.
Course Title | Credit Hours | |
---|---|---|
STADS801 |
Advanced Probability Theory and Stochastic Processes
This course provides an advanced course on probability theory and stochastic processes essential for modeling a variety of real-world scenarios involving uncertainty and randomness. Students will become experts in the language of probability theory, enabling them to effectively analyze and address complex challenges in both pure and applied sciences. Through practical problem solving, they will harness the power of probabilistic thinking to derive insightful solutions. |
4 |
STADS802 |
Advanced Mathematical Statistics
This course explores statistics based on the principles of probability theory. It takes an in-depth look at the mathematical foundations of decision theory and provides students with tools and techniques for constructing estimators, hypothesis tests, and confidence regions. The course also focuses on the exploration of empirical process theory, emphasizing key optimality results of likelihood methods. In addition, the course highlights several key areas of statistics, such as resampling techniques and nonparametric statistics. |
4 |
STADS803 |
Statistical Learning
This course delves into the theoretical core of statistical learning through a rigorous exploration of algorithmic paradigms, statistical learning frameworks, and optimization strategies. This course provides a principled foundation, enriched by the study of high-dimensional probability and statistics, preparing students to navigate and innovate within the statistical learning landscape. |
4 |
STADS804 |
Computation in the Era of Big Data
This course prepares PhD students to tackle scientific questions using data within the modern data science life cycle. It emphasizes critical thinking, computational methodology, and trustworthy, reproducible practices. Through open-ended labs in Python students will explore exploratory data analysis, model formulation and validation, simulation, and interpretation, while critically examining the assumptions underpinning standard statistical models and methods. The class will culminate in a group project. |
4 |
Students will select a minimum of two elective courses, with a total of eight (or more) credit hours. The choice of electives must be selected based on interest, proposed research thesis, and career aspirations, in consultation with their supervisory panel. The elective courses available for the Doctor of Philosophy in Statistics and Data Science are listed in the tables below:
Course Title | Credit Hours | |
---|---|---|
ML802 |
Advanced Machine Learning
This course is designed to explore recent breakthroughs in machine learning and provide students with the necessary skills to conduct research and advance the field of machine learning. It will cover highly specialized topics related to large-scale optimization for real-world problems, including Large-Scale Training of Kernel Methods, Sparse Learning, Bilevel Optimization, Black Box Optimization, and Spiking Neural Networks. Prior knowledge of fundamental concepts in machine learning, optimization, and statistics is assumed. |
4 |
ML804 |
Advanced Topics in Continuous Optimization
The course covers advanced topics in continuous optimization, such as stochastic gradient descent and its variants, methods that use more than first-order information, primal-dual methods, and methods for composite problems. Participants will read the current state-of-the-art relevant literature and prepare presentations to the other students. Participants will explore how the presented methods work for optimization problems that arise in various fields of Machine Learning and test them in real-world optimization formulations to get a deeper understanding of the challenges being discussed. |
4 |
ML806 |
Advanced Topics in Reinforcement Learning
The course covers advanced topics in Reinforcement Learning (RL). Participants will read the current state-of-the-art relevant literature and prepare presentations to the other students. Participants will explore how the presented methods work in simplified computing environments to get a deeper understanding of the challenges that are being discussed. Topics discussed include exploration, imitation learning, hierarchical RL, multi agent RL in both competitive and collaborative setting. The course will also explore multitask and transfer learning in RL setting. |
4 |
ML807 |
Federated Learning
This is a graduate course in a new branch of machine learning: federated learning (FL). In FL, machine learning models are trained on mobile devices with an explicit effort to preserve the privacy of users’ data. FL combines supervised machine learning, privacy, distributed and edge computing, optimization, communication compression, and systems. This is a new and fast-growing field with few theoretical results and early production systems (e.g., Tensor Flow Federated and FedML). This course aims for students to become familiar with the field’s key developments and practices, namely optimization methods for FL and techniques to address communication bottlenecks, systems and data heterogeneities, client selection, robustness, fairness, personalization and privacy aspects of FL. The evaluation of the course heavily relies on students’ paper presentations and the final project selected by the student. |
4 |
ML808 |
Advanced Topics on Causality and Machine Learning
In the past decades, interesting advances were made in machine learning, philosophy, and statistics for tackling long-standing causality problems, including how to discover causal knowledge from observational data, known as causal discovery, and how to infer the effect of interventions. Furthermore, it has recently been shown that the causal perspective may facilitate understanding and solving various machine learning / artificial intelligence problems such as transfer learning, semi-supervised learning, out-of-distribution prediction, disentanglement, and adversarial vulnerability. This course is concerned with understanding causality, learning causality from observational data, and using causality to tackle a large class of learning problems. The course will include topics like graphical models, causal inference, causal discovery, and counterfactual reasoning. It will also discuss how we can learn causal representations, perform transfer learning, and understand deep generative models. |
4 |
ML812 |
Advanced Topics in Algorithms for Big Data
This course is an advanced course on algorithms for big data that involves the use of randomized methods, such as sketching and sampling, to provide dimensionality reduction. It also discussed topics such as Sub-space Embeddings, Low rank Approximation, L1 Regression, Data Streams. The course lies at the intersection of machine learning and statistics. |
4 |
ML813 |
Topics in Dimensionality Reduction and Manifold Learning
This course focuses on building foundations and introducing recent advances in dimensionality reduction and manifold learning, important topics in machine learning. This course builds upon fundamental concepts in machine learning and additionally assumes familiarity with concepts in optimization and mathematics. The course covers advanced topics in spectral, probabilistic, and neural network-based dimensionality reduction and manifold learning. Students will be engaged through coursework, assignments, and projects. |
4 |
STADS804 |
Generative Models
This course prepares PhD students to tackle scientific questions using data within the modern data science life cycle. It emphasizes critical thinking, computational methodology, and trustworthy, reproducible practices. Through open-ended labs in Python students will explore exploratory data analysis, model formulation and validation, simulation, and interpretation, while critically examining the assumptions underpinning standard statistical models and methods. The class will culminate in a group project. |
4 |
STADS805 |
Deep Learning Theory
Deep Learning (DL) is an extremely important applied science that, at present, is poorly understood theoretically. We know that neural networks work well but cannot fully explain why. Nevertheless, in recent years, there has been a rapid growth of publications that shed light on the new mathematics underlying DL, and we see now many interesting connections between DL and various other fields, e.g., approximation theory, differential equations, information theory, random matrix theory, statistical physics. This course aims to introduce students to these cutting-edge developments. |
4 |
STADS807 |
High-Dimensional Probability and Statistics
High dimensional Probability investigates the behavior of high dimensional random objects, such as random vectors, random matrices with the emphasis upon quantifying the role of the dimension. The course also introduces some contemporary statistical problems, including principal component analysis in high-dimension, sparse linear regression, community detection. |
4 |
The Ph.D. thesis exposes students to cutting-edge and unsolved research problems in the field of Statistics and Data Science, where they are required to propose new solutions and significantly contribute towards the body of knowledge. Students pursue an independent research study, under the guidance of a supervisory panel, for a period of three to four years.
Course Title | Credit Hours | |
---|---|---|
RES899 |
Advanced Research Methods
This course will prepare students to produce professional-quality research and solve a practical research challenge in an organization based on an innovative, sustainable, and entrepreneurial research topic. This course will provide exposure to a variety of special topics, research integrity, ethics, organizational challenges, and needs related to various disciplines. Students will design and implement a research project suitable for conference presentation or journal submission relevant to their field of interest, in addition to peer-reviewing a paper. The instructor, and guest lecturers, as appropriate, will present topics necessary to develop well-rounded researchers, innovators, and entrepreneurs in the AI disciplines. |
32 |
STADS899 |
PhD in Statistics and Data Science PhD Theses
PhD in Statistics and Data Science thesis research exposes students to cutting-edge and unsolved research problems, where they are required to propose new solutions and significantly contribute towards the body of knowledge. Students pursue an independent research study, under the guidance of a supervisory panel, for a period of 3 to 4 years. PhD thesis research helps train graduates to become leaders in their chosen area of research through partly supervised study, eventually transforming them into researchers who can work independently or interdependently to carry out cutting edge research. |
2 |
The MBZUAl internship with industry is intended to provide the student with hands-on experience, blending practical experiences with academic learning.
Course Title | Credit Hours | |
---|---|---|
INT899 |
Internship
The MBZUAI internship with industry is intended to provide the student with hands-on experience, blending practical experiences with academic learning. For PhD students the internship should be 3 months in length. Hours should align with the working hours of the host organization and the internship should directly relate to the student’s research at MBZUAI. |
2 |
MBZUAI accepts applicants from all nationalities who have a completed degree in a STEM field such as Computer Science, Electrical Engineering, Computer Engineering, Mathematics, Physics, or other relevant Science or Engineering major that demonstrates academic distinction in a discipline appropriate for the doctoral degree – either:
Applicants must provide their completed degree certificates and official transcripts when submitting their application. Senior-level students can apply initially with a copy of their official transcript and expected graduation letter and upon admission must submit the official completed degree certificate and transcript. A degree attestation from UAE MoE (for degrees from the UAE) or Certificate of Recognition from UAE MoE (for degrees acquired outside the UAE) should also be furnished within students’ first semester at MBZUAI.
All submitted documents must either be in English, originally, or include legal English translations.
Additionally, official academic documents should be stamped and signed by the university authorities.
Each applicant must show proof of English language ability by providing valid certificate copies of either of the following:
TOEFL iBT and IELTS academic certificates are valid for two (2) years from the date of the exam while EmSAT results are valid for eighteen (18) months. Only standard versions (i.e. conducted at physical test centers) of the accepted English language proficiency exams will be considered.
Waiver requests from eligible applicants who are citizens (by passport or nationality) of UK, USA, Australia, and New Zealand who completed their studies from K-12 until bachelor’s degree and master’s degree (if applicable) from those same countries will be processed. They need to submit notarized copies of their documents during the application stage and attested documents upon admission. Waiver decisions will be given within seven (7) days after receiving all requirements.
Submission of GRE scores is optional for all applicants but will be considered a plus during the evaluation.
In a 500- to 1000-word essay, explain why you would like to pursue a graduate degree at MBZUAI and include the following information:
The research statement is a document summarizing the potential research project an applicant is interested in working on and clearly justify the research gap which the applicant would like to fill in during the course of his/her study. It must be presented in the context of currently existing literature and provide an overview of how the applicant aims to investigate the underlying research project as well as predict the expected outcomes. It should mention the relevance and suitability of the applicant’s background and experience to the project and highlight the project’s scientific and commercial significance. The research statement should include the following details:
Applicants are expected to write the research statement independently. MBZUAI faculty will NOT help write it for the purpose of the application. The MBZUAI Admission Committee will review the submitted document and use it as one of the measures to gauge and assess applicants’ skills.
Applicants will be required to nominate referees who can recommend their application. Ph.D. applicants should have a minimum of three (3) referees wherein at least one was a previous course instructor or faculty/researcher advisor and the others were current or previous work supervisors.
To avoid issues and delays in the provision of the recommendation, applicants have to inform their referees of their nomination beforehand and provide the latter’s accurate information in the online application portal. Automated notifications will be sent out to the referees upon application submission.
All applicants with complete files, including the required number of recommendations, will be invited to participate in an online screening exam to assess their knowledge and skills. Completion of the exam is not mandatory but highly encouraged as it would provide additional information to the evaluation committee. Waiving the exam is only recommended for those students who can provide strong evidence of their research capability, subject matter expertise, and technical skills.
Exam Topics
Math: Calculus, probability theory, linear algebra, trigonometry and optimization
Machine learning: Machine learning algorithms and concepts such as linear regression, decision trees, loss functions, support vector machines, classification, regression, clustering, convolutional neural networks, dimensionality reduction, neural networks and unsupervised learning
Programming: Knowledge surrounding specific programming concepts and principles such as algorithms, data structures, logic, OOP, and recursion as well as language–specific knowledge of Python
Applicants are highly encouraged to complete the following online courses to further improve their qualifications :
The exam instructions are available here
A select number of applicants may be invited to an interview with faculty as part of the screening process. The time and instructions for this will be communicated to applicants on timely bases.
Deadlines for Applications for Fall 2025:
Priority deadline for early decision: 28th February 2025 (5:00 PM UAE time)
Regular deadline: 31st May 2025 (5:00 PM UAE time)
AI is reshaping industries worldwide. MBZUAI’s research projects highlight the impact of AI advancements across key industries such as energy, healthcare, technology and transport.
More informationThe Incubation and Entrepreneurship Center is a leading AI-native incubator with the aim to nurture and support the next generation of AI-driven startups.
More informationWe’ll keep you up to date with the latest news and when applications open.