Transformers of the handwritten word

Monday, December 25, 2023

Handwriting is an ancient technology. Perhaps the oldest evidence of writing are artifacts that have been found in what is present day Iraq and feature characters of the Sumerian language. These pieces are thought to have been penned more than 5,000 years ago. Millennia later, even with access to gadgets like keyboards and speech-to-text software, many people still make language legible the old-fashioned way — by writing it down by hand.

A team of researchers at MBZUAI are combining ancient and contemporary technologies in an artificial intelligence program that can learn the handwriting style of a person and generate text scrawled in what looks like their hand.

The inventors were recently granted a patent by the United States Patent and Trademark Office for the tool, which could help people who have injuries that prevent them from taking up a pen. It could also be used to efficiently generate a large amount of data to improve machine learning models’ ability to process handwritten script.

Can it be done?

Like many scientific endeavors, the project started with curiosity, said Hisham Cholakkal, assistant professor of computer vision at MBZUAI and one of the inventors of the technology: “We wanted to know if you gave a model a few samples of someone’s handwriting if the model could learn about the style of that person and then write anything in the handwriting style of that person.”

Cholakkal and colleagues shared their initial research findings in 2021 at the International Conference on Computer Vision (ICCV).

The team was comprised of Assistant Professor of Computer Vision Rao Muhammad Anwer, Associate Professor of Computer Vison Salman Khan, Deputy Department Chair of Computer Vision and Professor of Computer Vision Fahad Shahbaz Khan, and Ankan Kumar Bhunia.

In that presentation, the researchers noted that previous approaches to mimicking a person’s handwriting style had been developed using a machine learning technique called a generative adversarial network, or GAN.

Handwriting generated by GANs capture the overall, general style of a writer — for example, the slant with which a person composes letters, or the width of the strokes that make up letters. But GANs struggle to recreate how people create individual characters and the lines, known as ligatures, that tie characters together.

Instead of GANs, the researchers used vision transformers, which are a type of neural network designed for computer vision tasks. Their study was the first use of vision transformers to mimic handwriting.

The proposed vision transformer-based solution is different from GANs in that vision transformers are able to process what are known as long-range dependencies. This concept relates to how parts of an image that are physically distant from each other in fact have meaningful relationships.

“To mimic someone’s handwriting style, we want to look at the whole text, and only then will we start to understand how the writer ligated characters, how the writer connected letters, or spaced words,” Fahad Khan said. “All these tasks require a kind of global receptive field, which is not easy using convolutional neural networks. We identified this gap by in existing methods and adopted this transformer-based method.”

While the initial study focused on generating handwriting in English, the researchers are also interested to apply their technology to other languages, like Arabic, which is challenging to analyze due to the way Arabic letters are connected in handwritten script.

Even better than the real thing?

In the study, the scientists compared their handwritten text image generation approach, which they shorten to HWT, to two other handwriting generation technologies. They showed text generated by the three models to 100 people and asked which one they preferred. The participants in the study preferred HWT to the other text generators 81% of the time.

 

 

“We also showed the handwriting mimicking that was generated to humans to compare it to the benchmark, and to our surprise the result of the generated handwriting was quite good. They could not distinguish the mimicked handwriting from the actual handwriting, and it was satisfying to see that kind of validation of the performance,” Salman Khan said.

The researchers’ model doesn’t require much data to be trained. A few paragraphs of original handwriting were all it needed.

But there is also always a risk to innovation. “We are very cautious about it because it could be misused,” Anwer said. “Handwriting represents a person’s identity, so we are thinking carefully about this before deploying it.”

And while there are risks, new findings can also raise awareness of potential threats. “It’s important to be aware that it’s possible to use AI to generate handwriting that matches the style of an individual,” Cholakkal said.

 

Related

thumbnail
Monday, March 24, 2025

MBZUAI and Berkeley explore the future of machine learning

Machine learning pioneer Michael I. Jordan was among the speakers discussing the cutting-edge ideas shaping the field.

  1. berkeley ,
  2. workshop ,
  3. ML ,
  4. collaborations ,
  5. innovation ,
  6. research ,
  7. machine learning ,
Read More
thumbnail
Thursday, March 20, 2025

Can LLMs reason? New benchmark puts models to the test

The game-based dataset created by MBZUAI scientists tests LMMs' pattern recognition, spatial awareness, arithmetic, and logical thinking.

  1. reasoning ,
  2. intelligence ,
  3. benchmark ,
  4. llms ,
  5. dataset ,
  6. large language models ,
  7. research ,
Read More
thumbnail
Tuesday, March 18, 2025

Culturally Yours: A new tool for understanding cultural references in text

Researchers from MBZUAI have developed a tool that uses demographic information to help bridge linguistic and cultural.....

  1. research ,
  2. COLING 2025 ,
  3. linguistics ,
  4. languages ,
  5. large language models ,
  6. llms ,
  7. culture ,
Read More