The energy demands of artificial intelligence are rarely out of the spotlight, but discussion and debate on the matter has grown more intense of late, with COP29 in Azerbaijan and ADIPEC in Abu Dhabi both taking place in recent weeks.
For all the good work AI does, its enormous energy requirements continue to put its long-term viability into question, with calls for energy efficient solutions and alternatives growing in volume.
To put the problem into perspective, AI, data centers and crypto currency consumed nearly 2% of the world’s energy in 2022, and this could double by 2026. Training an LLM such as GPT-3 is estimated to use the equivalent energy of 130 homes per year – and it took around 50-times more electricity than that to train GPT-4. What’s more, AI uses around 33 times more energy to complete a task than task-specific software would, due to the significant computing power it requires.
The consequences can be severe for the environment due to the significant carbon emissions. In 2019, research found that training a single common large AI model can produce as much CO2 as five cars over their entire lifetime. And the problem has exacerbated since then. Microsoft recently announced its CO2 emissions have risen nearly 30% since 2020 due to data center expansion, and Google said that its greenhouse gas emissions in 2023 were almost 50% higher than in 2019, largely due to the energy demand tied to data centers. It’s little surprise that both companies have turned to nuclear energy in recent weeks to find lower-carbon ways to power the use of AI.
The need for greater energy efficiency in AI is clear, and many institutions, including MBZUAI, are taking steps to address it. This is a key focus for the University’s recently launched computer science department, led by Xiaosong Ma, acting department chair and professor.
“Every day [AI] is burning a substantial and growing fraction of the total data center energy or total computer cycles in the world,” she says. “I did some energy-related research 13 years ago, and we were working on eight core CPUs. The power for the whole thing was about 110-120 watts. Nowadays, a top-of-the-line AMD server with 96 cores is about 400 watts. And if you add DRAM, perhaps 450. A single H100 NVIDIA card can be 700 watts. And we typically have eight of them within a node.
“You can imagine how much energy consumption is added by these new AI applications and model training.”
Xiaosong’s research is concerned with finding energy efficiency in these ‘cards’ — hardware that houses the GPU chip and other supporting components.
“We look at the resource utilization under a microscope,” she says. “You are doing a lot of things, especially when you are doing distributed training or inference, so the work involves complex workflows. There’s computation doing number crunching, there’s memory access, there’s communication between the GPU cards within a single node and across the nodes, and so on.
“When you are doing one thing, the other resources are sitting there, not being used well — especially these expensive and power hungry GPU cards. So my recent research is looking at all these streams of processing across different micro-batches in training.”
By maximizing the overlaps between operations, Xiaosong believes savings of up to 39% can be made in terms of money and carbon footprint.
“That’s how we attack,” she says. “We make the same thing run faster by better overlapping the fine granular tasks, and that hopefully will reduce the total time people need to run these huge machines.”
Working alongside Professor Xiaosong in the computer science department is assistant professor Abdulrahman Mahmoud.
“My work is in the hardware architecture arena,” he says. “So, how do you manipulate bits [units of data that are the building blocks for larger data units]? How do you move data around? What can you do to reduce energy at the hardware level without suffering at the model level in terms of computation accuracy?
“I look at how much information you can bring down to just a few bits. When you have fewer bits you can shuffle them around and use them in a more energy efficient way. So instead of having to move 32 bits around, you can just move 4 bits, which is an 8x improvement.
“So it’s about finding techniques that can bring down overall energy usage, which would also help with carbon efficiency.”
Another aspect that Mahmoud sees as improving carbon efficiency is the manufacturing process.
“Manufacturing all this hardware takes up a lot of carbon,” he adds. “If you’re building these chips, for example, you’re putting a lot of capex into the manufacturing process, so if we can make them last longer, stay performant and stay relevant for a longer period of time, then not only will the upfront cost you pay for it amortize over time and save you money, but the lower the carbon footprint will be due to a reduction in manufacturing energy and CO2 generation.
Another angle the department is exploring is the viability of ‘tiny’ models. The idea is that as models get bigger and we gain a better understanding of how they work, then we can find ways to ‘prune’ them to make smaller ones.
“You have ‘mix of expert’ (MOE) models, where your question can be routed to a sequence of much smaller, but more specialized models,” explains Xiaosong. “For example, if you ask a question about football, it’s not supposed to have anything to do with someone asking about classical music or medical publications. So all of these can be split, and hopefully the overall efficiency will be better.”
A related project concerns better caching of inference results. “Many questions we ask will be quite similar,” Xiaosong adds. “So if you could find the similarity very early in the workflow, you could hopefully get existing data results from somewhere, which could also help a lot.
“If a huge model is split into these tiny independent units, and people’s questions are understood better and more accurately routed, we could get a very good experience using these chatbots without actually going through all the layers and parameters.”
This energy efficiency drive does not belong to the computer science department alone, with MBZUAI committed to raising awareness about energy efficient AI. At COP28, for instance, the University presented insights on how AI can transform climate action by developing AI that is less energy dependent. Students have also presented energy efficient solutions at hackathons, while faculty members have led several research projects to improve efficiency and sustainability.
One example is the development of the Artificial Intelligence Operating System (AIOS) for decarbonisation, led by assistant professor of machine learning, Qirong Ho. AIOS is designed to reduce the high cost of AI programs by decreasing the amount of energy expended in their development. It helped to create Vicuna, an alternative to chatbots like ChatGPT that was developed with a fraction of their monetary and carbon costs, but still with 90% of their subjective quality.
Meanwhile, Martin Takáč, deputy department chair of machine learning and associate professor of machine learning, is striving to help the UAE achieve its 2050 Net Zero goals by optimizing smart grids through a technique he and his team developed that combines reinforcement learning and federated learning. They are continuing to explore energy efficient machine learning applications across the energy value chain.
Together, these projects and many more are contributing to solving one of the most pressing challenges for AI today, and reaffirms MBZUAI’s intention to develop AI solutions that benefit both people and the planet.
The Arabic language is underrepresented in the digital world, making AI inaccessible for many of its 400.....
Martin Takáč and Zangir Iklassov's 'self-guided exploration' significantly improves LLM performance in solving combinatorial problems.
A team from MBZUAI is improving LLMs' performance across languages by helping them find the nuances of.....