MBZUAI’s Computational Biology department has hosted the first Data Carpentry workshop in the UAE – part of a global initiative to teach foundational computational and data science skills to researchers.
The Genomics Data Carpentry workshop took place at MBZUAI’s Masdar City campus last week, hosting 35 participants from academia and industry across two days as they learned fundamental data skills needed to conduct research, such as data tidiness, cloud computing, command lines, and data wrangling.
Data Carpentry is a subsection of The Carpentries, which also includes Software Carpentry and Library Carpentry. Together, they share a mission to ‘provide researchers high-quality, domain-specific training covering the full lifecycle of data-driven research’.
“We aimed to equip participants with essential foundational skills in genomics data science for capacity building in the country,” says Aziz Khan, Assistant Professor of Computational Biology at MBZUAI, certified Carpentries instructor, and organizer of the workshop.
“Genomics research has traditionally been divided into two domains: the wet lab, where experiments are conducted, and the computational side, where data is analyzed. Often, these two worlds don’t speak the same language — wet lab scientists may not fully grasp the data analysis, and computational scientists may not understand the experimental context.
“That’s changing now with the advent of more data-driven science, and biology is not just biology anymore — it’s computational biology. So, a lot of wet lab researchers are very eager to learn dry-lab skills, such as data analysis. This is one of the key reasons we organized this workshop to help bridge that gap.”
Participants included master’s students, Ph.D. students and postdocs from MBZUAI; students from various other UAE universities including NYU Abu Dhabi, Khalifa University, and United Arab Emirates University in Al Ain; and industry. Some 70% of people taking part were female, and several were Emirati nationals.
The workshop was hosted with the support and help of instructors and helpers from MBZUAI and NYU Abu Dhabi. Khan was joined by Nizar Drou, senior research scientist at NYU Abu Dhabi – who shared instructor duties – as well as helpers from both MBZUAI (Luiz Maniero) and NYU Abu Dhabi (Muhammad Arshad, Nadine Hosny El Said, Jayaram Radhakrishnan, Nabil Rahiman, and Giuseppe-Antonio Saldi). Together they took participants through a series of sessions designed to refine their project organization and data analysis.
“We started with the project organization and metadata,” explains Khan. “If your metadata annotations are wrong, then downstream data analysis are going to get problematic, so we trained them on how you can organise a project and better annotate data/metadata using the FAIR data principles: Findable, Accessible, Interoperable, Reusable.
“From there, we showed them some problematic metadata and asked them to identify the problem and then went on to show them what to do when you get data from the sequencing facility. This is where you need to process, and as the datasets are so large — gigabytes in size — you need to use a command line. So, we trained them how to manipulate and process files and data at scale using the Shell/Bash.
“From command line we went on to learn about the cloud and how to process datasets in the cloud, and then we dived into the data itself — learning how the sequencing data looks like, what each line means in a sequencing read file, how to assess the quality of the data and perform downstream processing. And finally, we went on to bioinformatics analysis from reference indexing, mapping to reference genome, and mutation identification and visualization.”
Feedback for the workshop was overwhelmingly positive, with a 93% ‘promotor score’ on the post-workshop survey — the score that shows how likely students are to recommend the course to others.
“This workshop was everything I had hoped for and more,” said Rahaf M. Ahmad, Ph.D. candidate at UAE University. “What truly stood out was the passion and clarity of the instructors – particularly their ability to break down complex genomic workflows into clear, hands-on steps. It wasn’t just a learning experience; it was a shift in perspective.”
“It was incredibly informative workshop,” added Lina Utenova, research assistant at NYU Abu Dhabi’s Center for Genomics and Systems Biology. “Every step in primary metagenomics data analysis – from reads trimming to biologically meaningful data extraction – was shown and explained. I appreciated being able to ask questions any time from any of the many helpers and instructors present throughout the session.”
Ph.D. candidate at Khalifa University, Hamda Alshehhi, agreed that the instructors and helpers were a great help, “providing clear guidance and patient support throughout”.
“The workshop strengthened my skills in genomic data analysis and reproducible research,” she added. “This experience has boosted my confidence and deepened my appreciation for good research practices.”
MBZUAI Ph.D. candidate Roba Al Majzoub also said the workshop has given her the right tools to improve her research. “Seeing firsthand how each preprocessing step shapes data integrity was eye-opening,” she explained. The early cleaning choices make or break downstream genomic analyses, so I now understand and am able to perform better data preprocessing.”
For Khan, one of the biggest successes of the workshop’s was building students’ awareness of their strengths and weaknesses, as highlighted by the pre- and post-workshop surveys.
“It was very interesting because we asked participants pre-workshop about their confidence levels on various aspects, such as programming, data handling, reproducibility and other things,” he says. “We then asked their confidence levels post-workshop. For most parameters such as programming confidence and handling raw data, they reported a significant increase (70-90%) in confidence pre- to post-workshop. But in reproducibility they reported a drop in confidence.
“This showed awareness on their part and shows what they need to think about moving forward, which is very important in making positive changes.”
With such glowing feedback, Khan says he is planning to run more workshops in the future.
“It’s likely that we will do more such workshops next year and ahead, and perhaps we will do more domain specific advanced hands-on sessions,” he says. “We may bring in more AI with language models and show how they can help researchers advance their programming skills.
“Ultimately, we want to help participants build good habits from day one,” says Khan. “If you start with a messy project structure, that disorganization tends to carry through the entire project. By the time you realize things aren’t working, it’s often too late to fix them properly. But if you’re organized from the beginning, even when something doesn’t go as planned, you’ll be in a strong position to troubleshoot and recover.
“Reproducibility and good organization of data, code, and projects shouldn’t wait until publication — they should be part of the research process from the very start.”
The inaugural Abu Dhabi AI-Robotics Conference took place at MBZUAI, exploring the potential for AI-driven robotics to.....
MBZUAI is leading a healthcare revolution, from fetal health to remote accessibility, and deeper diagnosis to home-based.....
MBZUAI is leading a healthcare revolution, from fetal health to remote accessibility, and deeper diagnosis to home-based.....